0% found this document useful (0 votes)

20 views

Abstract Linear Algebra

Uploaded by

Javier Hernando Ochoa Arteaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Abstract Linear Algebra

Uploaded by

Javier Hernando Ochoa Arteaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Abstract Linear Algebra

Tim Smits

May 26, 2023

Contents

1 Vector Spaces 3
1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Operations on Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Finite Dimensional Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Zorn’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Linear Transformations 16
2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Dimension Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 The Vector Space HomF (V, W ) . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 The Matrix of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . 22
2.5 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Change of Basis and Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Diagonalization 31
3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Properties of the Characteristic Polynomial . . . . . . . . . . . . . . . . . . . 33
3.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Inner Product Spaces 37

4.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 The Adjoint of a Linear Operator . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

1
Introduction

These notes arose from a math 115A course I was a teaching assistant for at UCLA in
fall 2018. They contain, in my opinion, what all students should learn in a first course on
abstract vector spaces. At UCLA, math 115A is taken by students from various disciplines.
Many students have a hard time adjusting to the abstract nature of the course. These notes
try to provide many abstract examples to make the transition easier, as well as provide some
interesting, non-standard examples of how the language of linear algebra can be applied to
solve (hopefully) interesting problems. There may be many typos. Let me know if any are
found!

2
Chapter 1

Vector Spaces

Linear algebra arose out of trying to solve systems of linear equations. In a first course, one
learns the proper way to think about solutions to a system of n linear equations in n variables
is by viewing them as solutions to a certain matrix equation of the form Ax = b in Rn , and
then develops the necessary theory of matrices and tools needed to solve these equations.
The goal of abstract linear algebra is to capture the special properties of Rn and of matrices
that made the theory useful in the first place, and expand it to work in larger settings. This
will lead to the abstract definitions of vector spaces and linear transformations, which will
be the main objects of study for us.

1.1 Basic Definitions

Definition 1.1.1. A vector space V over a field F is a set V with an addition operation
+ and scalar multiplication operation · by elements of F that sasify the following axioms:

1. For all x, y ∈ V , x + y = y + x.

2. For all x, y, z ∈ V , (x + y) + z = x + (y + z).

3. There exists 0 ∈ V such that x + 0 = x for all x ∈ V .

4. For all x ∈ V , there exists −x ∈ V such that x + (−x) = 0.

5. For all x ∈ V , 1 · x = x.

6. For all a, b ∈ F and x ∈ V , (a + b) · x = a · x + b · x.

7. For all a, b ∈ F and x ∈ V , (ab) · x = a · (b · x).

8. For all a ∈ F and x, y ∈ V , a · (x + y) = a · x + a · y.

The elements of V are called vectors, and it is understood that we write cx to mean c · x.
From the axioms above, one can deduce the usual algebraic rules are true in vector spaces:
Proposition 1. Let V be a vector space. For x, y, z ∈ V , and a ∈ F , the following hold:

3
1. If x + y = x + z, then y = z.
2. The vectors 0 and −x are unique.
3. 0 · x = a · 0 = 0.
4. (−a)x = −(ax).
The proofs of the above all follow quickly from the vector space and field axioms, and are
left as an exercise.
Definition 1.1.2. For a vector space V and W ⊂ V , we call W a subspace of V if W is
vector space under the same operations as in V .

Proposition 2 (Subspace criterion). Let V be a vector space. Then W ⊂ V is a subspace

⇐⇒ 0 ∈ W , and W is closed under addition and scalar multiplication.
Proof. The forward direction is immediate by definition of a vector space. Conversely, if W
is closed under addition and scalar multiplication, since vectors in W are vectors in V , this
immediately gives axioms 1, 2, 5, 6, 7, 8. Since 0 ∈ W , 3 is satisfied, and since W is closed
under scalar multiplication (−1)x = −x ∈ W so 4 is satisfied.
Example 1.1.3. Any field F is a vector space over itself. More generally, F n is an F -vector
space for any n with operations of addition and scalar multiplication performed componen-
twise, where F n = {(a1 , . . . , an ) : ai ∈ F } is the set of all n-tuples with entries in F .
Example 1.1.4. Let F ⊂ L be a subfield. Then L is an L-vector space because L is a
field, but L is also an F -vector space, with scalar multiplication by an element of F given
by performing the multiplication in L, and addition also performed in L.
Example 1.1.5. Mn (F ), the set of n × n matrices with entries in a field F , is an F -vector
space, with addition and scalar multiplication done entrywise.
Example 1.1.6. Let S be a non-empty set, then the set of all functions from S to F , denoted
F(S, F ), is an F -vector space. A vector in F(S, F ) is a function f : S → F , and two vectors
f, g are equal if f (s) = g(s) for all s ∈ S. The operations are given by (f +g)(s) = f (s)+g(s)
and (cf )(s) = cf (s).
Example 1.1.7. The set of polynomials of degree at most n with coefficients in F , Pn (F ),
is a subspace of F(F, F ).
Example 1.1.8. Let C([a, b]) be the set of continuous functions from [a, b] to R, and let
C ∞ ([a, b]) be the subset of all infinitely differentiable functions. Then both C([a, b]) and
C ∞ ([a, b]) are subspaces of F(R, R), and C ∞ ([a, b]) is a subspace of C([a, b]). Let V = {f ∈
C ∞ ([a, b]) : f 0 = f }. Then V is a subspace of C ∞ ([a, b]). This gives a connection between
studying solutions to differential equations and studying the vector space C ∞ ([a, b]).
Example 1.1.9. Let S be a set and set V = 2S , the set of all subsets of S. Then V is a
vector space over Z/2Z with addition given by A+B = (A\B)∪(B \A), scalar multiplication
in the obvious manner, and the 0 element being the empty set. Note the additive inverse of
any set A is itself.

4
1.2 Operations on Vector Spaces
A natural question to ask is what operations can we do on vector spaces to create new vector
spaces? Below are some examples:

Proposition 3. Let V and W be vector spaces. Then V ∩ W is a subspace of both V and

Proof. Clearly 0 ∈ W ∩ V . If x, y ∈ V ∩ W , then as both V, W are vector spaces x + y lies

in both V and W , so x + y ∈ V ∩ W , and similarly for c ∈ F we see cx ∈ V ∩ W .

Proposition 4. Let V and W be vector spaces. Then V ∪ W is a vector space ⇐⇒ W ⊂ V

or V ⊂ W .

Proof. The backwards direct is immediate: if W ⊂ V or V ⊂ W , then the union is equal to

either V or W which is a vector space. Conversely, suppose that V ∪ W is a subspace and
that W 6⊂ V . Suppose for contradiction that V 6⊂ W . Pick w ∈ W \ V , and v ∈ V \ W .
Then both w, v ∈ W ∪ V , so w + v ∈ W ∪ V . If w + v ∈ W , then (w + v) − w = v ∈ W , a
contradiction. Similarly, if w + v ∈ V then w ∈ V , again impossible. Therefore V ⊂ W . A
similar argument shows that if V 6⊂ W then W ⊂ V .
The above proposition says that taking unions of vector spaces won’t produce anything
new. However, there is a way to create a vector space that contains copies of both V and
W:

Proposition 5. Let V and W be vector spaces. Then V × W is a vector space with the
operations of componentwise addition and scalar multiplication.

Proof. Exercise.
The vector space V × W is sometimes called the external direct sum of V and W and
is commonly denoted V ⊕ W . However to avoid confusion with the definition below, we’ll
keep the notation V × W . There is a way to “add” vector spaces, but only if they are both
subspaces of some common vector space, so that addition of vectors makes sense.

Definition 1.2.1. Let W1 and W2 be subspace of a vector space V . The sum of W1 and
W2 , denoted W1 + W2 is defined as W1 + W2 = {w1 + w2 : w1 ∈ W1 , w2 ∈ W2 }. Further, if
W1 ∩ W2 = {0}, then we call W1 + W2 the interal direct sum of W1 and W2 and denote
this W1 ⊕ W2 .

The difference between external and interal direct sums is that in the latter case, both
spaces live internally inside a larger vector space to begin with. In an external direct sum, we
create a larger vector space in which copies of V and W can be identified, namely we identify
V with the subspaces {(v, 0) : v ∈ V } = V × {0} and W with {(0, w) : w ∈ W } = {0} × W .
A sum being internal or external is to be understood by the context, and will just be referred
to as a direct sum.

Proposition 6. For subspaces W1 , W2 of a vector space V , W1 +W2 (and therefore W1 ⊕W2 )

is a subspace of V .

5
Proof. It’s clear that 0 ∈ W1 + W2 since 0 ∈ W1 and 0 ∈ W2 . If x, y ∈ W1 + W2 , then
x = w1 + w2 and y = w10 + w20 for w1 , w10 ∈ W1 and w2 , w20 ∈ W2 . Therefore x + y =
(w1 + w10 ) + (w2 + w20 ) and w1 + w10 ∈ W10 because W1 is a subspace of V , and w2 + w20 ∈ W2
for the same reasoning. The proof that W1 + W2 is closed under scalar multiplication is
similar.
The difference between being a sum of subspaces and a direct sum of subspaces is the
following:
Proposition 7. Suppose V = W1 +W2 for some subspaces W1 , W2 . Then V = W1 ⊕W2 ⇐⇒
every vector x in V can be written uniquely as x = w1 + w2 for w1 ∈ W1 and w2 ∈ W2 .
Proof. If V = W1 ⊕ W2 , and x has two different representations as a sum of the above form,
write x = w1 + w2 and x = w10 + w20 for some w1 , w2 , w10 , w20 ∈ W . Then w1 − w10 = w20 − w2 ,
and the left hand side lives in W1 while the right hand side lives in W2 . This says w1 − w10 ∈
W1 ∩ W2 = {0}, so w1 = w10 . Similarly w2 = w20 so the representation is unique. Conversely,
suppose that any vector x ∈ V can be written uniquely as x = w1 + w2 for some w1 ∈ W1
and w2 ∈ W2 . Then clearly, V = W1 + W2 . If x ∈ W1 ∩ W2 , we can write x = x + 0 by
taking w1 = x and w2 = 0. Similarly, we can write x = 0 + x by taking w1 = 0 and w2 = x.
By uniqueness, this says x = 0, so that W1 ∩ W2 = {0} says V = W1 ⊕ W2 .
Example 1.2.2. In R2 , set X = {(x, y) : y = 0} and Y = {(x, y) : x = 0}. Then R2 = X⊕Y .
Note these subspaces are simply the x and y axes. In R3 , set V = {(x, y, z) : z = 0}
and W = {(x, y, z) : x = 0}. Then R3 = V + W , but the sum is not direct because
V ∩ W = {(x, y, z) : x = z = 0}.
Example 1.2.3. Let F be a field not of characteristic 2, and let Symn (F ), Skewn (F ) ⊂
Mn (F ) be the subspaces of symmetric and skew-symmetric matrices respectively. Then
Mn (F ) = Symn (F ) ⊕ Skewn (F ). Any matrix A ∈ Mn (F ) can be written A = 12 (A + At ) +
1
2
(A − At ), so Mn (F ) = Symn (F ) + Skewn (F ), and if A ∈ Symn (F ) ∩ Skewn (F ), we have
A = At and A = −At so that 2At = 0 says At = 0, so that the sum is direct.
There’s one more common operation on subspaces that we’ll study, although it is quite
a bit more abstract.
Definition 1.2.4. Let V be a vector space, and W ⊂ V be a subspace. For a vector v ∈ V ,
we define the coset of v, denoted v + W , to be v + W = {v + w : w ∈ W }, the set of
translates of v by elements of W .
Example 1.2.5. Let V = R2 , W = {(x, 0) : x ∈ R}, and v = (0, 1). What set is v + W ?
Elements of the coset v + W look like (0, 1) + w for different choices of vectors w ∈ W . Since
an arbitrary w ∈ W looks like (a, 0) for some a ∈ R, such elements look like (a, 1) for some
a ∈ R. For any choice of a the vector (a, 0) is in W , so we see that v + W = {(a, 1) : a ∈ R}.
The point of cosets is that they
S give us a way of partitioning the vector space V : as an
equality of sets, we have V = (v + W ). We’ll use these cosets to construct a new vector
v∈V
space. Let V /W = {v + W : v ∈ V }. We can define addition and scalar multiplication
operations on V /W as follows:

6
Proposition 8. V /W is a vector space, where the operations are given by (v+W )+(v 0 +W ) =
(v + v 0 ) + W and c · (v + W ) = c · v + W .
Proof. Exercise.
Definition 1.2.6. The set V /W with the operations of addition and scalar multiplication
as given above is known as the quotient space of V by W .
The idea behind the quotient space is that it “crushes” the subspace W to the 0 vector.
This can be seen from the following:
Proposition 9. Two cosets v + W and v 0 + W are equal in V /W if and only if v − v 0 ∈ W .
In particular, v + W = 0 + W in V /W if and only if v ∈ W .
Proof. Exercise.
Example 1.2.7. Consider V = R2 and W = {(x, 0) : x ∈ R}, the x-axis. For any vector
v = (a, b), we have that v + W = {(a + x, b) : x ∈ R} is the horizontal line through the
vector v. The quotient space V /W “crushes” each of these horizontal lines to a single point,
namely the intersection of this line with the y-axis: in the quotient space, we have the
equality (a, b) + W = (0, b) + W because (a, b) − (0, b) = (a, 0) ∈ W . We see that points in
V /W can be “identified” with points on the y-axis, so that one can “picture” V /W as the
y-axis.

1.3 Linear Independence

The above discussion tells us how to create new vector spaces from subspaces of some V .
How can we create subspaces of V ? Starting with S ⊂ V , what is needed to build a vector
space out of elements of S? By definition, such a subspace would have to be closed under
scalar multiplication, so for s ∈ S and c ∈ F it must contain c · s. Similarly, it would need
to be closed under addition, so it needs to contain all possible finite sums of the elements of
the form just mentioned. It turns out, this is enough.
Definition 1.3.1. A linear combination of vectors v1 , . . . , vn is an expression of the form
c1 v1 + . . . + cn vn for some ci ∈ F . An equation of the form c1 v1 + . . . + cn vn = 0 is called a
linear dependence relation. A dependence relation is called trivial if the only possible
solution is when all ci = 0, and is called non-trivial otherwise.
Definition 1.3.2. Let S ⊂ V . The span of S denoted Span(S) is the set of all finite linear
combinations of elements of S. That is, Span(S) = {c1 v1 + . . . + cn vn : ci ∈ F, vi ∈ S, n ≥ 1}.
Proposition 10. Let S ⊂ V . Then Span(S) is a subspace of V .
Proof. By convention, if S = ∅ we define Span(S) = {0}. If S 6= ∅, pick v ∈ S. Then
0 = 0 · v ∈ Span(S). If x, y ∈ Span(S), then x = c1 v1 + . . . + cn vn and y = d1 w1 + . . . + dm wm
for some ci , dj ∈ F and vi , wj ∈ S. Then x + y = c1 v1 + . . . + cn vn + d1 w1 + . . . + dm wm is
a linear combination of the vectors v1 , . . . , vn , w1 , . . . , wm ∈ S so x + y ∈ Span(S). Similarly
for c ∈ F we see c · v ∈ Span(S), so Span(S) is a subspace of V .

7
Span(S) is sometimes referred to as the subspace generated by S. If V = Span(S), then
we call S a generating set for V . Observe that any subspace of V containing S must
contain Span(S), and therefore Span(S) is the smallest subspace of V containing S.

a b
Example 1.3.3. In Sym2 (F ), any symmetric matrix is of the form for some a, b, c ∈
b c
1 0 0 1 0 0
F . We see Sym2 (F ) = Span , , .
0 0 1 0 0 1

Example 1.3.4. Let V = R3 . Then V = Span{(1, 0, 0), (0, 1, 0), (0, 0, 1)}. We may also
write V = Span{(1, 1, 0), (0, 1, 1), (1, 0, 1)}), or V = Span{(1, 1, 1), (1, 0, 0), (0, 1, 0), (0, 1, 1)}.
A spanning set need not be unique, nor must any spanning set have the same cardinality.

The above example shows that a spanning set may contain “redundant” information. In
the third spanning set above, notice that (1, 1, 1) is already contained in Span{(1, 0, 0), (0, 1, 0), (0, 1, 1)},
so removing it from S does not change Span(S). We give this condition a name:

Definition 1.3.5. Let V be a vector space. For S ⊂ V , we call S linearly dependent if

there exist v1 , . . . , vn ∈ S and c1 , . . . , cn ∈ F not all 0 such that c1 v1 + . . . + cn vn = 0. S is
called linearly independent if S is not linearly dependent.

If S is linearly dependent, the above says there is a non-trivial linear combination of

some vectors in S that equals 0. Since some coefficient ci must be non-zero, we can solve for
vi in terms of the remaining vectors, so another way of saying this is that some vector vi is
contained in the span of some other vectors.

Example 1.3.6. In the above example, the set {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is linearly inde-
pendent. The set {(1, 1, 1), (1, 0, 0), (0, 1, 0), (0, 1, 1)} is linearly dependent.

Example 1.3.7. In C ∞ (R), the vectors sin(x) and cos(x) are linearly independent: if
c1 sin(x) + c2 cos(x) = 0 for all x, plugging in x = 0 and x = π2 shows that c1 = c2 = 0.
Similarly if r 6= s, the functions erx and esx are linearly independent.

Since linear dependence is defined in terms of a finite quantity, an easy definition of linear
independence that handles the case of S being infinite is as follows:

Proposition 11. Let V be a vector space and S ⊂ V . Then S is linearly independent if and
only if all finite subsets of S are linearly independent.

Proof. If S is linearly independent, for any S 0 ⊂ S, a linear dependence relation among

vectors in S 0 is also a linear dependence relation among vectors in S, so it must be trivial.
Conversely, if all finite subsets of S are linearly independent, given any vectors v1 , . . . , vn ,
if c1 v1 + . . . + cn vn = 0, this is a dependence relation among the vectors of the finite set
S 0 = {v1 , . . . , vn }, and so must be trivial by assumption.
Given S ⊂ V , how can we check if S is linearly independent? One way is as follows:

8
Proposition 12. Let V be a vector space and S = {v1 , . . . , vn } for some vi ∈ V . Then
S is linearly dependent if and only if v1 = 0 or there exists 1 ≤ k < n such that vk+1 ∈
Span({v1 , . . . , vk }).
Proof. The backwards direction is immediate, so suppose that S is linearly dependent. Then
c1 v1 + . . . + cn vn = 0 for some ci not all 0. Set k = max{n : cn 6= 0}, which exists since
some coefficient is non-zero and there are finitely many. Notice that this says ci = 0 for all
k < i ≤ n. If k = 1, this says c1 is the only non-zero coefficient, so c1 v1 = 0 gives v1 = 0.
Otherwise, k > 1 so c1 v1 + . . . + cn vn = c1 v1 + . . . + ck vk = 0. Since ck 6= 0, this says
vk ∈ Span({v1 , . . . , vk−1 }), so we are done.
This gives a method of checking if a set is linearly independent that works well for sets
of small size. For example, to check if {v1 , v2 , v3 } is linearly independent one just needs to
check that v2 6∈ Span({v1 }) and v3 6∈ Span({v1 , v2 }). For sets of larger size, we will later
develop more efficient methods. We end the section with an extremely useful proposition.
Proposition 13. Let S ⊂ V be linearly independent and v ∈ V . Then S ∪ {v} is linearly
dependent if and only if v ∈ Span(S).
Proof. If S ∪ {v} is linearly dependent, then there are s1 , . . . , sn ∈ S and c1 , . . . , cn+1 ∈ F
not all 0 such that c1 s1 + . . . + cn sn + cn+1 v = 0. Necessarily, cn+1 6= 0 otherwise the linear
1
independence of S forces all ci = 0. Then solving for v gives v = − cn+1 (c1 s1 + . . . + cn sn )
so v ∈ Span(S). Conversely, if v ∈ Span(S) then v = c1 s1 + . . . + cn sn for some si ∈ S and
ci 6= 0. Then c1 s1 + . . . + cn sn − v = 0 is a non-trivial linear dependence relation among
elements of S ∪ {v}, so S ∪ {v} is linearly dependent.
Theorem 1.3.8. Let S ⊂ V be linearly independent. If v = 6 0 ∈ Span(S), then v =
c1 v1 + . . . + cn vn for unique distinct vectors vi ∈ S and unique ci =
6 0 ∈ F.
Proof. Suppose that v has two different representations using vectors in S. Write v =
c1 s1 + . . . + cn sn and v = d1 t1 + . . . + dm tm for some ci , dj 6= 0 ∈ F and si , tj ∈ S, where we
may assume none of the si are the same and none of the tj are the same. Subtracting shows
c1 s1 + . . . + cn sn − d1 t1 − . . . − dm tm = 0. If {s1 , . . . , sn } =
6 {t1 , . . . , tm }, then there is some i
such that si is not equal to any of tj . Since S is linearly independent, this forces ci = 0, since
there is no other term in the sum that can be grouped with ci si . This is a contradiction, so
n = m and {s1 , . . . , sn } = {t1 , . . . , tm }. Relabeling as necessary, we may assume that si = ti
so that the above can be written as (c1 − d1 )s1 + . . . + (cn − dn )sn = 0, so ci = di for all i
and therefore such a representation is unique.

1.4 Bases and Dimension

The above theorem is of critical importance: the vectors in Span(S) can then be though of
as tuples of elements of F by reading off the coefficients in the corresponding linear com-
bination. A natural question to ask is if every vector space arises as the spanning set of a
linearly independent subset. The answer is yes, and is the most important result in linear
algebra.

9
Definition 1.4.1. A basis of a vector space V is a linearly independent spanning set. The
dimension of V is the cardinality of a basis of V .

Perhaps in more familiar terms, the above says that every vector space has a basis. The
fact that the dimension of a vector space is actually well defined is a fairly non-trivial result,
but the proof is a rather technical set theoretic argument that is unenlightening, so for our
purposes it will be taken for granted.

Proposition 14. Let B and B 0 be two bases of a vector space V . Then |B| = |B 0 |.

Dimension is one of the most useful ideas in linear algebra: it gives us a notion of size for
a vector space, and being able to translate questions about vector spaces into statements
about integers makes them easier to understand. At this stage, linear algebra branches off
in two directions: the study of infinite dimensional vector space, and the study of finite
dimensional vector spaces, the latter of which we will focus the majority of our attention on.

1.5 Finite Dimensional Vector Spaces

Throughout the rest of this section, we will assume that V is an n-dimensional vector space
over a field F unless otherwise stated.

The above proof that every vector space has a basis is non-constructive – it tells us one must
exist but gives us no way of finding one. In the finite dimensional case, we actually have a
constructive method for finding bases of a vector space.

Theorem 1.5.1. Let S = {v1 , . . . , vk } be a subset of V that spans V . Then there is B ⊂ S

such that B is a basis of V .

Proof. We may assume that the vi are non-zero, otherwise remove them. Let m be the
largest integer such that there is an m element subset B of S that is linearly independent.
As {vi } is linearly independent for any i, and S has at most k elements, in particular B must
exist and 1 ≤ m ≤ k. Then Span(S) = Span(B). To see this, we show that vi ∈ Span(B)
for all i. If vi 6∈ B, then B ∪ {vi } is a linearly dependent subset by definition of B, so
there are c1 , . . . , cm+1 ∈ F not all 0 such that c1 s1 + . . . + cm sm + cm+1 vi = 0. By linear
independence of elements of B, necessarily cm+1 6= 0, so we can solve for vi in terms of si ,
giving vi ∈ Span(B) as desired.

Theorem 1.5.2. Let S = {v1 , . . . vk } be a linearly independent subset of V . Then there exist
vectors w1 , . . . , wm ∈ V such that {v1 , . . . , vk , w1 , . . . , wm } is a basis of V .

Proof. Pick a basis {e1 , e2 , . . . , en } of V . Then {w1 , . . . , wk , e1 , . . . , en } is a spanning set.

Remove vectors ei from the above set if ei ∈ Span(S). The remaining vectors w1 , . . . , wm not
removed are not contained in Span(S), so the set {v1 , . . . , vk , w1 , . . . , wm } must be linearly
independent, and it remains a spanning set of V by construction so it is a basis.
An immediately corollary is the following:

10
Corollary 1.5.3. Let W ⊂ V be a subspace. Then there exists W 0 ⊂ V a subspace such that
V = W ⊕ W 0.

Proof. Pick a basis {v1 , . . . , vk } of W and extend to a basis {v1 , . . . , vk , e1 , . . . , em } of V . Set

W 0 = Span({e1 , . . . , em }). Then it’s clear that V = W + W 0 , and W ∩ W 0 = {0} because if
x ∈ W ∩ W 0 , we can write x = c1 v1 + . . . + ck vk and x = d1 e1 + . . . + dm em for ci , dj ∈ F ,
so c1 v1 + . . . + ck vk − d1 e1 − . . . − dm em = 0 gives all ci , dj = 0 as these vectors are linearly
independent in V .
The subspace W 0 is called the complement of W in V .

In linear algebra, it’s not uncommon to be interested in finding a basis with some particular
choices of basis vectors, so the extension result is quite useful. The following is a translation
of the above two results using the language of dimension.

Theorem 1.5.4. Let S = {v1 , . . . , vk }.

1. If S is linearly independent, then k ≤ n.

2. If S spans V , then k ≥ n

3. If k = n, S is linearly independent if and only if S is a spanning set.

Proof. Items 1 and 2 are immediately corollaries of the above two results. To prove 3, If S is
linearly independent and S doesn’t span V , then there is v ∈ V such that S ∪ {v} is linearly
independent. But then this says n+1 ≤ n, a contradiction. Therefore S spans V . Conversely,
if S is not linearly independent, we may trim S to a basis B with n = |B| < |S| = n, a
contradiction.

Example 1.5.5. In F n , the vectors ei where ei is the vector that is 1 in the i-th coordinate
and 0 elsewhere form a basis. It’s easy to see that if c1 e1 + . . . cn en = 0, then (c1 , . . . , cn ) =
(0, . . . , 0) so ci = 0, and it’s obvious this is a spanning set. This is an n-dimensional F -vector
space.

Example 1.5.6. In Mn (F ), the matrices Eij where Eij is the matrix with (i, j)-th entry
equal to 1 and 0 elsewhere is a basis – the argument is the same as above. This is an
n2 -dimensional F -vector space.

Example 1.5.7. In Pn (F ), the set {1, x, . . . , xn } is a basis. It’s clear that this is a spanning
set, so it remains to see linear independence. If c0 + c1 x + . . . cn xn = 0 in Pn (F ), then in
particular, this holds true for all x ∈ F . The left hand side is a degree at most n polynomial,
so it has at most n roots, while the right hand side is 0 everywhere. This is only possible if
all coefficients are 0. This is an n + 1-dimensional F -vector space.

Example 1.5.8. The space of all polynomials with coefficients in F , P (F ) is infinite dimen-
sional: any finite set of polynomials has a maximal degree m, so their F -span is contained
in Pm (F ). This says no finite subset of P (F ) is a spanning set, so it is infinite dimensional
as an F -vector space.

11
Example 1.5.9. In R3 , the set {(1, 0, 1), (1, 1, 0), (0, 1, 1)} is a basis, because it is linearly
independent: one can check by hand that (0, 1, 1) 6∈ Span({(1, 0, 1), (1, 1, 0)}).
Example 1.5.10. Let V = {(x, y, z) ∈ R3 : x − 2y + z = 0 and 2x − 3y + z = 0}. Then V
is a subspace of R3 , and has basis {(1, 1, 1)}.
Example 1.5.11. The dimension of a vector space depends on the underlying field. As a
C-vector space, Cn has dimension n with basis vectors ej for 1 ≤ j ≤ n. However, C is a
2-dimensional R-vector space: any complex number z is of the form z = a + bi for real a, b,
so {1, i} is basis. The vectors ej , iej for 1 ≤ j ≤ n form a basis of Cn as a 2n-dimensional
R-vector space.
Example 1.5.12. For a 6= 0 ∈ R, the set {1, x − a, (x − a)2 , . . . , (x − a)n } is a basis for
Pn (R): if c0 + c1 (x − a) + . . . + cn (x − a)n = 0 for all x, plugging in x = a shows c0 = 0, and
taking derivatives and repeating the argument shows ci = 0. This shows linear independence
and since a basis has n + 1 elements, this is a spanning set. Every polynomial p(x) can be
written in the form p(x) = c0 + c1 (x − a) + . . . + cn (x − a)n . One can solve for the coefficients
(k)
ci by taking derivatives as necessary and plugging in x = a, to see ck = p k!(a) , recovering
the usual Taylor expansion around x = a.
To illustrate why dimension is useful, we prove a quick result, which helps us understand a
vector space by understanding its subspaces.
Proposition 15. Let W ⊂ V be a subspace. Then dim(W ) ≤ n. If dim(W ) = n, then
W =V.
Proof. First we show that W is finite dimensional. If W = {0}, we are done. Otherwise,
pick w1 6= 0 ∈ W . If W = Span({w1 }), we are done, otherwise there is w2 ∈ W with
w2 6∈ Span({w1 }), so {w1 , w2 } is linearly independent. Continue choosing vectors w3 , . . . , wk
in this way such that {w1 , . . . , wk } is a linearly independent subset of W . Since W ⊂ V , it’s
also a linearly independent subset of V , so this process must stop before the n-th step, and
the termination of this process is equivalent to saying that W = Span({w1 , . . . , wk }). This
says {w1 , . . . , wk } is a basis of W , and we have k ≤ n. If k = n, these vectors are actually a
basis of V as well, so W = V .
Example 1.5.13. Let W ⊂ R3 be a subspace. Then dim(W ) = 0, 1, 2, 3. If dim(W ) = 0,
then W = {0}, and if dim(W ) = 3, then W = R3 . If dim(W ) = 1, then W = Span({v})
for some v ∈ W , i.e. W is the line through the origin in the direction of v. If dim(W ) = 2,
we have W = Span({v1 , v2 }) for some vectors v1 , v2 . Let v = v1 × v2 , so x · v = 0 for all
x ∈ W . This defines the equation of a plane with normal vector v1 × v2 , so that subspaces of
R3 are either {0}, R3 , lines through the origin or planes through the origin. The dimensions
of these objects should hopefully match your own geometric intuition.
Example 1.5.14. Set V = (Z/pZ)2 , which is a 2-dimensional Z/pZ-vector space with basis
vectors (1̄, 0̄) and (0̄, 1̄). What are all the subspaces of V ? If W ⊂ V is a subspace, we
have dim(W ) = 0, 1, 2. If dim(W ) = 0 then W = {0}, and if dim(W ) = 2 then W = V . If
dim(W ) = 1, then W = Span({v}) for some non-zero vector v. There are a total of p2 −1 such
vectors v, and each of the p − 1 non-zero multiples of v span the same subspace of V . Since
the 1-dimensional subspaces of V partition W , we conclude there are (p2 − 1)/(p − 1) = p + 1
different 1-dimensional subspaces of V , for a total of p + 3.

12
We end with some useful dimension counting results:
Proposition 16. Let be V a vector spaces and let W, W1 , W2 be subspaces.
(a) dim(W1 + W2 ) = dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ).

(b) dim(W1 ⊕ W2 ) = dim(W1 ) + dim(W2 ).

(b) dim(V /W ) = dim(V ) − dim(W ).

Proof. Exercise.

1.6 Zorn’s Lemma

In order to deal with infinite spanning sets, we need some set theory:
Definition 1.6.1. A partial ordering on a set S is a binary relation ≤ that satisfies the
following conditions for all a, b, c ∈ P :
1. (Reflexivity) a ≤ a.

2. (Anti-symmetry) If a ≤ b and b ≤ a then a = b.

3. (Transitivity) If a ≤ b and b ≤ c then a ≤ c.

Definition 1.6.2. A poset is a set P with a partial ordering ≤. A poset is called totally
ordered if for all pairs of elements a, b ∈ P , either a ≤ b or b ≤ a. A chain C of a poset P
is a totally ordered subset of P . For A ⊂ P , an upper bound of A is an element m of P
such that x ≤ m for all x ∈ A.
Example 1.6.3. Let P = R and let ≤ be the usual relation of less than or equal to. Then
P is a poset, and P is totally ordered.
Example 1.6.4. Let P = N and let a ≤ b ⇐⇒ a | b. This makes P a poset. Under this
ordering, we have 3 ≤ 6, but 3 and 5 are not comparable, so P is not totally ordered. The
set P 0 = {1, 2, 4, 8, 16} is totally ordered, so it is a chain in P . The element 16 is an upper
bound of P 0 .
Example 1.6.5. Let P a set of subsets of a vector space V , and ≤ be ordering by inclu-
sion, i.e. W ≤ W 0 ⇐⇒ W ⊂ W 0 . Then P is a poset.
The proof that every vector space has a basis is one of many non-constructive existence
results in mathematics that follow from Zorn’s lemma, which is (surprisingly!) equivalent to
the Axiom of Choice:
Theorem 1.6.6 (Zorn’s lemma). Let P be a poset such that every chain in P has an upper
bound in P . Then P has a maxmimal element with respect to ≤. That is, there is an element
m ∈ P such that x ≤ m for all x ∈ P .
We are now ready to prove the theorem:

13
Theorem 1.6.7. Every vector space has a basis.
Proof. Let V be a vector space over some field F . The idea of the proof is as follows: use
Zorn’s lemma to show that V contains a maximal linearly independent subset B of V (in
the sense that there is no linearly independent subset S with B ( S), and then show that
B must be a basis of V .

If V = {0}, then by definition V = Span(∅), and the empty set is linearly independent.
Now suppose that V 6= {0} and let P = {S ⊂ V : S is linearly independent} be the set of
all linearly independent subset of V with an ordering on P given by inclusion. Then P 6= ∅,
because there exists v 6= 0 ∈ V so {v} is a linearly independent subset of V . We now check
the conditions of Zorn’s lemma. Suppose
S that C ⊂ P is a chain, and write C = {Sα }α∈I
for some indexing set I. Set M = α∈I Sα . The claim is that M is an upper bound of C
S of P . The first statement is immediate by definition: for any Sα ∈ C,
that is an element
we have Sα ⊂ α∈I Sα , so C ≤ M . Therefore, we only need to check that M is a linearly
independent subset of V , so that M ∈ P , letting Zorn’s lemma kicks in.

Suppose that M is not linearly independent, then there are vectors s1 , . . . , sn where
si ∈ Sαi for some Sαi and scalars c1 , . . . , cn ∈ F not all 0 such that c1 s1 + · · · cn sn = 0. As C
is totally ordered, one of the sets Sα1 , . . . , Sαn must contain the others, so each of the vectors
si live in some common set, which we denote Sα . This says there is a non-trivial dependence
relation among vectors in Sα , contradicting that Sα is linearly independent (because Sα lives
in P !). Therefore, M is a linearly independent subset of V . By Zorn’s lemma, P contains a
maximal element with respect to inclusion, say B.

To finish up, we need to show that B spans V . Suppose otherwise, then there is some
v ∈ V such that v 6∈ Span(B). This says that B ∪ {v} is a linearly independent subset of
V with B ⊂ B ∪ {v}, contradicting the maximality of B. Therefore B spans V , and we are
done.
It’s important to note that the proof only shows that a basis exists – it gives absolutely
zero indication of what one is. The proof technique of using Zorn’s lemma is a rather stan-
dard one for proving existence theorems in mathematics (especially in algebra) and is worth
understanding.

For vector spaces spanned by finite sets, you saw in lecture that it’s not too hard to
show that any two bases have the same number of elements. This allows us to define the
dimension of a vector space. What happens if the vector space has a basis of infinitely many
elements? The dimension of a vector space is still well defined, but this now becomes a fairly
non-trivial result. Instead of talking about the number of elements in a basis, we have to
talk about the cardinality of the basis, and if you know anything about set theory, there are
many different “sizes” of infinite sets which is what causes complications. The proof is a
rather technical set theoretic argument that is unenlightening, so we will take it for granted.
Proposition 17. Let B and B 0 be two bases of a vector space V . Then |B| = |B 0 |.
This gives us the following definition that works for any vector space:

14
Definition 1.6.8. Let V be a vector space. The dimension of V is defined as the cardinality
of a basis of V . V is said to be infinite dimensional if it’s dimension is not finite.

If you’re familiar with the notion of countability, the following are examples of infinite
vector spaces of different “sizes”:

Example 1.6.9. The vector space Q[x] is infinite dimensional as a Q-vector space, because
the span of any finite set of polynomials has bounded degree. The set {1, x, x2 , . . .} is a basis
of Q[x] as a Q-vector space, and so Q[x] has countable dimension.

Example 1.6.10. R is infinite dimensional as a Q-vector space, because any finite dimen-
sional vector space over Q must be countable, and R is not countable. It turns out that R
has uncountable dimension as a Q-vector space (but this is much harder to show).

15
Chapter 2

Linear Transformations

A general philosophy is that to study algebraic structures, one needs to not just study the
objects but structure preserving maps between these objects as well. There is no simple
explanation for the latter, but historically it has been very productive. When studying how
to solve systems of linear equations in Rn , one is naturally led to matrix equations of the
form Ax = b. A matrix A defines a function T : Rn → Rn given by T (x) = Ax. This func-
tion T respects the structure of Euclidean space, in the sense that T (x + y) = T (x) + T (y)
and T (cx) = cT (x) for all x, y ∈ Rn and c ∈ R. Since vector spaces are nothing more than
abstracted versions of Euclidean space, we should look at abstract analogues of matrices, i.e.
functions that preserve the vector space structure.

Unless otherwise stated, through the handout V is a finite dimensional vector space of
dimension n over a field F . The letter T will always denote a linear transformation.

2.1 Basic Definitions

Definition 2.1.1. A linear transformation T : V → W between vector spaces V and W
over a field F is a function satisfying T (x + y) = T (x) + T (y) and T (cx) = cT (x) for all
x, y ∈ V and c ∈ F . If V = W , we sometimes call T a linear operator on V .

Note that necessarily a linear transformation satisfies T (0) = 0. We also see by induction
that for any finite collection of vectors v1 , . . . , vn and scalars c1 , . . . , cn ∈ F we have T (c1 v1 +
. . . + cn vn ) = c1 T (v1 ) + . . . + cn T (vn ).

Definition 2.1.2. The kernel ker(T ) is defined by ker(T ) = {x ∈ V : T (x) = 0}. The
image Im(T ) is defined by Im(T ) = {T (x) : x ∈ V }.

The image and kernel of T are two important subspace of V and W respectively, and we
can translate set theoretic statements about injectivity and surjectivity into the language of
linear algebra.

Proposition 18. Let T : V → W be linear. Then ker(T ) is a subspace of V and Im(T ) is

a subspace of W .

16
Proof. Since T is linear, we have T (0) = T (0 + 0) = T (0) + T (0), so 0 = T (0) gives
0 ∈ ker(T ). If x, y ∈ ker(T ) then T (x + y) = T (x) + T (y) = 0 by linearity. Similarly, if
c ∈ F , T (cx) = cT (x) = 0 so cx ∈ ker(T ) giving ker(T ) is a subspace of V . Since T (0) = 0,
this says 0 ∈ Im(W ). If x, y ∈ Im(W ) then there are u, v ∈ V such that x = T (u) and
y = T (v). Then x + y = T (u) + T (v) = T (u + v) so x + y ∈ Im(T ). Finally, if x = T (u) then
cx = cT (u) = T (cu) so cx ∈ Im(T ) which says Im(T ) is a subspace of W .

Proposition 19. Let T : V → W be linear.

(a) T is injective if and only if ker(T ) = {0}.

(b) T is surjective if and only if Im(T ) = W .

Proof.

(a) Suppose that T is injective. If x ∈ ker(T ), then T (x) = T (0) so injectivity says
x = 0 giving ker(T ) = {0}. If ker(T ) = {0}, if T (x) = T (y) then T (x − y) = 0 says
x − y ∈ ker(T ) so x − y = 0, i.e. x = y so T is injective.

(b) If T is surjective, then for every y ∈ W there is x ∈ V such that T (x) = w, which is
precisely the same as saying W = Im(T ). On the other hand, if W = Im(T ) then for
all w ∈ W there is x ∈ V with w = T (x), so T is surjective.

Example 2.1.3. For any vector space V , the identity transformation idV : V → V given
by idV (x) = x is linear.

Example 2.1.4. For any field F and a ∈ F , the map T : F → F given by T (x) = ax is a
linear transformation by the field axioms.

Example 2.1.5. The map T : R2 → R3 given by T (x, y) = (x + y, y, x − y) is a linear

transformation.

Example 2.1.6. For any matrix A ∈ Mm×n (F ), the map T : F n → F m given by T (x) = Ax
is a linear transformation, since A(x + y) = Ax + Ay and A(cx) = c(Ax) by how matrices
work.
Rx
Example 2.1.7. In P (R), the maps D(p)(x) = p0 (x) and I(p)(x) = 0 p(t) dt are linear
operators on P (R) by calculus. D is not Rinjective, because any constant polynomial has
x
derivative 0, but D is surjective since D( 0 p(t) dt) = p(x) by the fundamental theorem
of calculus. The operator I is injective but not surjective because nothing maps to the
polynomial p(x) = 1.

Example 2.1.8. The map D : C ∞ ([0, 1]) → C ∞ ([0, 1]) given by D(f )(x) = f (x) − f 0 (x) is
a linear transformation. Saying f ∈ ker(D) is the same as saying f 0 (x) = f (x), so ker(D)
is precisely the set of functions that satisfy the differential equation f = f 0 . From calculus,
we know the only such functions are of the form cex for c ∈ R, so ker(D) = Span({ex }) is a
1-dimensional subspace of C ∞ ([0, 1]).

17
Example 2.1.9. The map T : Mn (F ) → Mn (F ) given by T (A) = A − At is linear. ker(T )
is the set of matrices with A = At , i.e. ker(T ) = Symn (F ). Any matrix in Im(T ) is of the
form A − At for some A, which is skew-symmetric, so Im(T ) ⊂ Skewn (F ). If F does not
have characteristic 2, for any skew-symmetric matrix B, we have T (B) = B − B t = 2B, so
T ( 21 B) = B says Im(T ) = Skewn (F ).
Example 2.1.10. Let F ∞ be the sequence space of elements of F . That is, F ∞ = {(a1 , a2 , . . .) :
ai ∈ F }. Define maps R : F ∞ → F ∞ by R((a1 , a2 , . . .)) = (a2 , a3 , . . .) and L((a1 , a2 , . . .)) =
(0, a1 , a2 , . . .), the right and left shift operators respectively. Then both R and L are linear
operators on F ∞ .
Example 2.1.11. Let W ⊂ V be a subspace. The map π : V → V /W given by T (v) = v+W
is a linear transformation, called the quotient map.
Example 2.1.12. Suppose V = W ⊕ U for some subspaces W, U of V . The projection
πW of V onto W along U is defined by πW (x) = w where x = w + u for unique w ∈ W
and u ∈ U . Then πW is linear, and ker(πW ) = U and Im(πW ) = W . If we assume V is
finite dimensional, for any subspace W there is U such that V = W ⊕ U . This then says
that any subspace W is the kernel of some linear transformation, namely πU where U is the
complement of W in V . Similarly, W appears as the image of πW .
A natural question is given a linear operator T : V → V and a subspace W of V , when
does T restrict to a linear operator on W ? Necessarily, if T restricts to an operator on W
we must have T (W ) ⊂ W , and actually this is sufficient: if T (W ) ⊂ W , then for x, y ∈ W
we have T (x + y) = T (x) + T (y) since x, y ∈ V and T (cx) = cT (x) for c ∈ F . We give such
subspaces a name:
Definition 2.1.13. Given a linear operator T : V → V , a subspace W ⊂ V is called
T-invariant if T (W ) ⊂ W . The restriction of T to W , denoted by T |W , is the linear
transformation T |W (x) = T (x) for all x ∈ W .
Example 2.1.14. Let V = W ⊕ U , and consider πW . Then W is πW -invariant, and πW |W
is the identity map on W .
Example 2.1.15. The map T : Mn (F ) → Mn (F ) given by T (A) = A − At is Symn (F )-
invariant. The restriction T |Symn (F ) is simply the 0 map.

2.2 Dimension Counting

The image and kernel of a linear transformation T are extremely important because they give
rise to a powerful dimension counting result. We’ll illustrate how we can use such counting
arguments to get useful results.
Definition 2.2.1. The rank of a linear transformation T : V → W , denoted by rank(T ) is
defined as rank(T ) = dim(Im(T )).
Proposition 20. If {v1 , . . . , vn } is a basis of V , then {T (v1 ), . . . , T (vn )} is a spanning set
of Im(T ). If T is injective then {T (v1 ), . . . , T (vn )} is a basis of Im(T ). In particular, if T
is injective then rank(T ) = n.

18
Proof. Let y ∈ Im(T ). Then y = T (x) for some x ∈ V , and we may write x = c1 v1 +. . .+cn vn
for some ci ∈ F . Then y = T (x) = T (c1 v1 + . . . + cn vn ) = c1 T (v1 ) + . . . + cn T (vn ), so
y ∈ Span({T (v1 ), . . . , T (vn )}, which says this is a spanning set of Im(T ). If further we
assume that T is injective, if c1 T (v1 ) + . . . + cn T (vn ) = 0, then T (c1 v1 + . . . + cn vn ) = 0, so
c1 v1 + . . . + cn vn ∈ ker(T ). Since T is injective, this says c1 v1 + . . . + cn vn = 0, and since the
vectors vi are linearly independent this says ci = 0, i.e. that {T (v1 ), . . . , T (vn )} is linearly
independent and therefore a basis of Im(T ), so that rank(T ) = n.

Theorem 2.2.2. (Rank-Nullity) Let W be a vector space. If T : V → W is linear, then

rank(T ) + dim(ker(T )) = n.

Proof. Pick a basis {v1 , . . . , vk } of ker(T ) and extend this to a basis {v1 , . . . , vk , w1 , . . . , w` }
of V . The above shows that Im(T ) = Span({T (w1 ), . . . , T (w` )}), so {T (w1 ), . . . , T (w` )}
is a spanning set and therefore it’s sufficient to prove it is a basis of Im(T ). Suppose
c1 T (w1 ) + . . . + c` T (w` ) = 0. Then T (c1 w1 + . . . + c` w` ) = 0, so c1 w1 + . . . + c` w` ∈
ker(T ). We may then write c1 w1 + . . . + c` w` = a1 v1 + . . . + ak vk for some ai ∈ F , so
c1 w1 + . . . + c` w` − a1 v1 − . . . − ak vk = 0. Since the vectors wi , vj are a basis of V , this says
all ci = 0 and all ai = 0, so that {T (w1 ), . . . , T (w` )} is a basis as desired.
Sometimes dim(ker(T )) is referred to as the nullity of T , hence the name of the theorem,
but this terminology is not commonly used outside of linear algebra textbooks. As an
immediate corollary of the rank-nullity theorem, we getting the following analogous result
for functions between finite sets of the same size:

Corollary 2.2.3. Let V and W be vector spaces of the same dimension. Then T : V → W
is injective ⇐⇒ T is surjective ⇐⇒ T is bijective.

Proof. T is injective if and only if ker(T ) = {0}, so by rank-nullity this says n = rank(T )+0,
i.e. Im(T ) = W so T is surjective. Similarly if T is surjective, rank(T ) = n so rank-
nullity says n = n + dim(ker(T )) so dim(ker(T )) = 0 gives ker(T ) = {0} and therefore T is
injective.
We give some examples to illustrate how the rank-nullity theorem is used to compute images
and kernels of linear transformations.

Example 2.2.4. Let T : R3 → R3 be given by T (x, y, z) = (x+y +2z, 2x+2y +4z, 2x+3y +
5z). Then T is a linear transformation, and Im(T ) = Span({(1, 2, 2), (1, 2, 3), (2, 4, 5)}) =
Span({(1, 2, 2), (1, 2, 3)}). The latter set is a basis for Im(T ), so that rank(T ) = 2, i.e. Im(T )
is a plane in R3 . By rank-nullity the kernel ofT is 1-dimensional,
 so it must be a line. Which
1 1 2
line is it? Representing T as the matrix A = 2 2 4, one sees that any vector orthogonal
2 3 5
to the rows of A is contained in the kernel. Taking the cross product of the first and third
rows shows (−1, 1, 1) ∈ ker(T ) so that ker(T ) = Span({(−1, 1, 1)}).

Example 2.2.5. Let T : Mn (F ) → F be the trace map T (A) = tr(A). Clearly T is

surjective, so by rank-nullity we have dim(ker(T )) = n2 − 1. For A ∈ ker(T ), we have
a11 + . . . + ann = 0, which says a11 = −a22 − . . . − ann . Since the condition on trace has

19
nothing to do with entries off the diagonals, we see that the matrices Eij with i 6= j along
the matrices −E11 + Eii with 2 ≤ i ≤ n form a spanning set for ker(T ), and therefore a basis
because there are n2 − 1 of them.

Example 2.2.6. The map T : P (R) → P (R) defined by T (p) = 5p00 + 3p0 is surjective. Let q
be a polynomial of degree n. Restricting T to Pn+1 (R) defines a map T 0 : Pn+1 (R) → Pn (R).
with T 0 (p) = T (p). By rank-nullity, rank(T 0 ) + dim(ker(T 0 )) = n + 2. If p ∈ ker(T 0 ), then
5p00 + 3p0 = 0 says 5p00 = −3p0 . Since deg(p00 ) = deg(p0 ) − 1, this is impossible unless both p0
and p00 are 0, i.e. p is a constant. This says ker(T 0 ) = Span({1}), so ker(T 0 ) is 1-dimensional,
and rank(T 0 ) = n + 1 says T 0 is surjective.

Example 2.2.7. Let F not be of characteristic 2, and T : Mn (F ) → Mn (F ) be given by

T (A) = A − At . The rank-nullity theorem says that dim(Symn (F )) + dim(Skewn (F )) = n2 .
One can check (by explicitly finding a basis) that dim(Skewn (F )) = n(n−1)
2
, so dim(Symn (F )) =
2 n(n−1) n(n+1)
n − 2 = 2 .

Example 2.2.8. Let dim(V ) = n and dim(W ) = m with n < m. Then there is no surjective
linear transformation T from V to W : by rank-nullity, rank(T ) + dim(ker(T )) = n, so
rank(T ) = n − dim(ker(T )) ≤ n < m says Im(T ) 6= W . Similarly, if n > m there is no
injective linear transformation from V to W . This says if n < m then an m-dimensional
vector space is “larger” than an n-dimensional vector space, which hopefully matches with
your intuition.

We end with some useful dimension formulas:

Proposition 21. Let W, U ⊂ V be subspaces. Then dim(W ⊕ U ) = dim(W ) + dim(U ) and

dim(V /W ) = dim(V ) − dim(W ).

Proof. Define T : W ⊕ U → W by x → xW , where the element x ∈ W ⊕ U is written

uniquely as xW + xU for some xW ∈ W and xU ∈ U . This map is easily see to be linear,
and is surjective since T (w + 0) = w for any w ∈ W . We also see that ker(T ) = U , so by
rank-nullity, we have dim(W ⊕ U ) = dim(U ) + dim(W ) as desired.

For the other statement, define T : V → V /W by T (v) = v + W . Then T is linear,

and T is clearly surjective by the way it’s defined. We also see that ker(T ) = W , so that
rank-nullity says dim(V ) = dim(W ) + dim(V /W ).

2.3 The Vector Space HomF (V, W )

Definition 2.3.1. A linear transformation T : V → W is called an isomorphism if T is
bijective. V and W are called isomorphic if there is an isomorphism between them, and
we write V ∼= W.
Definition 2.3.2. Let V and W be F -vector spaces. Then we define HomF (V, W ) = {T :
V → W : T is linear}, the set of linear transformations from V to W . If V = W , we instead
write EndF (V ).

20
Proposition 22. HomF (V, W ) is a subspace of F(V, W ).
Proof. If T, U ∈ HomF (V, W ) recall that by definition, we have (T + U )(v) = T (v) + U (v)
and (cT )(v) = cT (v). To check that HomF (V, W ) is a subspace, we need to check that it
is non-empty, and that for T, U linear transformations and c ∈ F that T + U and cT are
linear transformations. Note that the 0 function T (x) = 0 for all x ∈ V is certainly linear.
If x, y ∈ V then (T + U )(x + y) = T (x + y) + U (x + y) = T (x) + U (x) + T (y) + U (y) =
(T + U )(x) + (T + U )(y). Next, for c ∈ F we have (T + U )(cx) = T (cx) + U (cx) =
cT (x) + cU (x) = c(T (x) + U (x)) = c(T + U )(x), so T + U is a linear transformation which
says T +U ∈ HomF (V, W ). For x, y ∈ V , c, k ∈ F , we have (cT )(x+y) = (cT )(x)+(cT )(y) =
cT (x) + cT (y) = (cT )(x) + (cT )(y), and (cT )(kx) = cT (kx) = k(cT (x)) = k(cT )(x). This
says cT is a linear transformation, so cT ∈ HomF (V, W ) so that HomF (V, W ) is a subspace
of F(V, W ).
In most linear algebra books the space HomF (V, W ) is denoted as L(V, W ) and EndF (V )
as L(V ), but the above notation is more common elsewhere in mathematics. One of the rea-
sons why finite dimensional vector spaces are so easy to study is that linear transformations
between V and W are the same things as functions defined on a basis of V . This reduces
much of the study of linear algebra to studying functions defined on a finite set. This is
stated precisely in the following form.
Theorem 2.3.3. Let V be finite dimensional, and let B be a basis of V . Then HomF (V, W ) ∼
=
F(B, W ). In other words, every linear transformation is determined uniquely by what it does
on a basis of V .
Proof. Suppose that T : V → W is a linear transformation, and let B = {v1 , . . . , vn } be a
basis of V . For x ∈ V , we may uniquely write x = c1 v1 + . . . + cn vn , so T (x) = c1 T (v1 ) + . . . +
cn T (vn ) by linearity. This defines a function f : B → W by f (vi ) = T (vi ). Now suppose
we have a function f : B → W . Define Tf : V → W by Tf (c1 v1 + . . . + cn vn ) = c1 f (v1 ) +
. . . + cn f (vn ). We need to check that Tf is a linear transformation, and that it is the only
transformation that agrees with f on B. Write x = c1 v1 +. . .+cn vn and y = d1 v1 +. . .+dn vn .
Then Tf (x + y) = Tf ((c1 + d1 )vn + . . . + (cn + dn )vn ) = (c1 + d1 )f (v1 ) + . . . + (cn + vn )f (vn ) =
c1 f (v1 )+. . .+cn f (vn )+d1 f (v1 )+. . .+dn f (vn ) = Tf (x)+Tf (y). For c ∈ F , we have Tf (cx) =
Tf ((cc1 )v1 +. . .+(ccn )vn ) = (cc1 )f (v1 )+. . .+(ccn )f (vn ) = c(c1 f (v1 )+. . .+cn f (vn )) = cTf (x),
which shows that Tf is linear. Finally, suppose there is some other linear transformation
T 0 : V → W such that T 0 (vi ) = f (vi ). As mentioned above then says for any x ∈ V ,
T 0 (x) = T 0 (c1 v1 + . . . + cn vn ) = c1 T 0 (v1 ) + . . . + cn T 0 (vn ) = c1 f (v1 ) + . . . cn f (vn ) = Tf (x), i.e.
T 0 = Tf so Tf is the only linear transformation with this property.

Putting this all together, this says the map G : F(B, W ) → HomF (V, W ) with G(f ) = Tf is
a bijection: it is injective because if G(f ) = G(g), then Tf = Tg for all x ∈ V . This then says
Tf (vi ) = f (vi ) = g(vi ) = Tg (vi ) for all vi , so that f = g because they agree on all elements
of B. It is surjective because T ∈ HomF (V, W ) defines a map f : B → W by f (vi ) = T (vi )
and by definition we have G(f ) = T . It remains to show that G is linear, however this is
clear because G(f + g) = Tf +g = Tf + Tg because Tf +g (vi ) = (f + g)(vi ) = f (vi ) + g(vi ) =
Tf (vi ) + Tg (vi ) so Tf +g and Tf + Tg agree on B and therefore on all of V . Similarly we see
Tcf = cTf for c ∈ F , so G is linear and we are done.

21
Theorem 2.3.4. Two finite dimensional vector spaces are isomorphic if and only if they
have the same dimension.

Proof. Suppose that V, W are finite dimensional with V ∼ = W . Then by definition, there is a
bijective linear transformation T : V → W . By rank-nullity, rank(T ) + dim(ker(T )) =
dim(V ), and since T is a bijection this says Im(T ) = W so rank(T ) = dim(W ) and
dim(ker(T )) = 0, i.e. dim(V ) = dim(W ). Now Suppose that V and W are vector spaces
of the same dimension. Let B = {v1 , . . . , vn } be a basis of V and B 0 = {w1 , . . . , wn } be
a basis of W . Define f : B → W by f (vi ) = wi . The previous theorem gives us a linear
transformation Tf : V → W . Write x = c1 v1 + . . . + cn vn . Then if Tf (x) = 0, this says
c1 T (v1 ) + . . . + cn T (vn ) = c1 w1 + . . . + cn wn = 0, so all ci = 0 because wi are linearly
independent. This says x = 0, so ker(T ) = {0}. Thus, T is injective and therefore bijective,
so V ∼ = W.

2.4 The Matrix of a Linear Transformation

Throughout this section V and W are finite dimensional F -vector spaces of dimensions n
and m respectively, with bases β = {v1 , . . . , vn } and γ = {w1 , . . . , wm }.

For any vector x ∈ V , we may uniquely write x = c1 v1 + . . . + cn vn , so the data of the

vector x is contained entirely in the list of coefficients of the basis vectors. This gives the
following definition.

Definition 2.4.1. The coordinate representation of a vector x = c1 v1 + . . . + cn vn with

respect to the basis β = {v1 , . . . , vn } of V is defined by [x]β = (c1 , . . . , cn ) ∈ F n .

Example 2.4.2. Let x = (1, 3) ∈ R2 and let β = {e1 , e2 } be the standard basis of R2 .
Then x = e1 + 3e2 so [x]β = (1, 3). Set γ = {(1, 1), (1, −1)}. Then x = 2(1, 1) − (1, −1) so
[x]γ = (2, −1). With α = {(1, 3), (1, 0)} we have [x]α = (1, 0).

Example 2.4.3. Let β = {1, x, x2 } be the standard basis of P2 (R). Then with p(x) =
4 − 3x + 3x2 , we have [p(x)]β = (4, −3, 3). If γ = {1, x, 23 x2 − 21 }, we have [p(x)]γ = (5, −3, 2),
as 5 − 3x + 2( 32 x2 − 12 ) = p(x).

Example 2.4.4. Let β = {E11 , E12 , E21 , E22 } be the standard

basis
of M2 (R), and let
2 −1
γ = {E11 , E12 + E21 , E22 } be a basis of Sym2 (R). Let A = Then viewed as an
−1 5
element of M2 (F ), we may write [A]β = (2, −1, −1, 5), but viewed as an element of Sym2 (R)
we have [A]γ = (2, −1, 5).

Example 2.4.5. View C as an R-vector space with basis β = {1, i}. Then x = 3 + 5i has
[x]β = (3, 5). As a C-vector space, C has basis γ = {1}, so [x]γ = 3 + 5i

Proposition 23. The map Cβ : V → F n given by Cβ (x) = [x]β gives an isomorphism

V ∼
= F n.

22
Proof. Let x, y ∈ V with x = c1 v1 + . . . + cn vn and y = d1 v1 + . . . + dn vn . Then x + y =
(c1 + d1 )v1 + . . . + (cn + dn )vn . We have Cβ (x + y) = (c1 + d1 , . . . , cn + dn ) = (c1 , . . . , cn ) +
(d1 , . . . , dn ) = Cβ (x) + Cβ (y). For any k ∈ F , we have kx = kc1 v1 + . . . + kcn vn , so Cβ (kx) =
(kc1 , . . . , kcn ) = k(c1 , . . . , cn ) = kCβ (x). This proves that Cβ is linear. If Cβ (x) = 0, this
says that x = 0v1 + . . . + 0vn = 0. This says ker(Cβ ) = {0}, so Cβ is injective and therefore
bijective giving V ∼ = F n.
Coordinates are one of the best ideas in mathematics, and in linear algebra this is no
different. Coordinates give us a way of viewing a vector in an abstract vector space as a
more concrete n-tuple of elements of F . In fact, we can do more: using coordinates, we can
associate to every linear transformation T : V → W a matrix [T ]γβ ∈ Mm×n (F ). This reduces
the study of linear maps from V to W , and therefore linear algebra as a whole, to studying
Mm×n (F ).

For x ∈ V , write x = c1 v1 + . . . + cn vn , so T (x) = c1 T (v1 ) + . . . + cn T (vn ). Since the

coordinate map Cγ : W → F m is a linear transformation, this says [T (x)]γ = c1 [T (v1 )]γ +
.. . + cn [T (vn )]γ . Set
  [T (vi )]γ = (a1i , . . . , ami ). Written as a matrix equation, [T (x)]γ =
a11 . . . a1n c1
 .. . .. .
..   ... .
 
 .
am1 . . . amn cn

Definition 2.4.6. Let T : V → W be a linear transformation. The matrix of T with 

p p
respect to β and γ is denoted by [T ]γβ and is defined by [T ]γβ = [T (v1 )]γ . . . [T (vn )]γ .
p p
That is, [T ]γβ is the matrix whose columns are given by [T (vi )]γ . If T : V → V and β = γ,
then we usually just write [T ]β .

The definition of the matrix of T says that [T (x)]γ = [T ]γβ [x]β , and so one can then recover
the actual vector T (x) by setting up the corresponding linear combination of basis vectors
in γ.

Example 2.4.7. Let T : R3 → R3 with T (x, y, z) = (x + 3z, −x + 2y + z, x + y + z).

 With
1 0 3
β = {e1 , e2 , e3 }, and γ = {(1, −1, 1), (0, 2, 1), (3, 1, 1)}, we see [T ]β = −1 2 1, and
  1 1 1
1 0 0
[T ]γβ = 0 1 0.
0 0 1

Example 2.4.8. Let T : M2 (F ) → M2 (F ) be given by T (A)= A − At . With

 β =
0 0 0 0
0 1 −1 0
{E11 , E12 , E21 , E22 } the standard basis of M2 (F ), we have [T ]β = 
0 −1
.
1 0
0 0 0 0

23
Example 2.4.9. Let α = a + bi ∈ C. View C as an R-vector space with the standard basis
β = {1, i} , and consider the linear transformation
T : C → C defined by T (x) = αx, the
a −b
multiplication by α map. Then [T ]β = . This says any complex number a + bi can
b a
a −b
be thought of as the matrix .
b a
Example 2.4.10. Let D : P3 (R) → P2 (R) be the derivative map, and β= {1, x, x2 , x3 } and
0 1 0 0
2 γ
γ = {1, x, x } be the standard bases of P3 (R) and P2 (R). Then [D]β = 0 0 2 0.

0 0 0 3
Example 2.4.11. Set V = Span({sin(x), cos(x)}) ⊂ C ∞ (R) and define T : V→ V by
4 4
T (f ) = 3f + 2f 0 − f 00 . With β = {sin(x), cos(x)}, we see that [T ]β = because
2 −2
T (sin(x)) = 4 sin(x) + 2 cos(x) and T (cos(x)) = 4 sin(x) − 2 cos(x). Using row reduction, one
can check the only solution to [T ]β x = 0 is x = 0. This says no non-trivial linear combination
of sin(x) and cos(x) are solutions the the differential equation 3f + 2f 0 − f 00 = 0.
Example 2.4.12. Let T : V → V be linear and suppose W is a T -invariant subspace. Let
U be the complement of W in V , so V = W ⊕ U . If {w1 , . . . , wk } and {u1 , . . . , u` } are bases
of W and U , then {w1 , . . . , wk , u1 , . . . , u` } is a basis of V . Since T (W ) ⊂ W , we may write
T (wi ) = ci1 w1 + . . . + cikwk , so [T (wi )]β = (c1i , . . . , cki , 0, . . . , 0). This gives that [T ]β is a
A B
block matrix of the form , where A is the k ×k matrix [cij ], O is the (n−k)×(n−k)
O C
zero matrix, and B and C are some matrices of size (n − k) × (n − k) and k × k respectively.
Therefore having a T -invariant subspace allows one to decompose the matrix of T into an
easier to work with block form.
The results of this section that there is a correspondence between matrices and linear
transformations can be summed up in the below theorem.
Lemma 2.4.13. Let T, U : V → W be linear transformations, and c ∈ F . Then [T + U ]γβ =
[T ]γβ + [U ]γβ and [cT ]γβ = c[T ]γβ .
Proof. By definition, the i-th column of [T +U ]γβ is equal to [(T +U )(vi )]γ = [T (vi )+U (vi )]γ =
[T (vi )]γ + [U (vi )]γ by linearity of the map Cγ . However, this is clearly also the i-th column
of the matrix [T ]γβ + [U ]γβ so [T + U ]γβ = [T ]γβ + [U ]γβ . Similarly, the i-th column of the matrix
[cT ]γβ is given by [(cT )(vi )]γ = [cT (vi )]γ = c[T (vi )]γ , which is again the i-th column of the
matrix c[T ]γβ .
Theorem 2.4.14. HomF (V, W ) ∼
= Mm×n (F ). So in particular, dim(HomF (V, W )) = mn.
Proof. Define F : HomF (V, W ) → Mm×n (F ) by F (T ) = [T ]γβ . Suppose that F (T ) = F (U ).
Then [T ]γβ = [U ]γβ , so in particular the columns of these matrices are the same so [T (vi )]γ =
[U (vi )]γ for all i. Translating back
 to the actual  vectors says T (vi ) = U (vi ), i.e. T = U so F
p p
is injective. For a matrix A =  x 1 . . . x n
 ∈ Mm×n (F ), write xi = (a1i , . . . , ami ). Then
p p

24
define f : B → W by f (vi ) = a1i w1 + . . . + ami wm . This defines a linear transformation
T : V → W with T (vi ) = f (vi ) so in coordinates, [T (vi )]γ = [f (vi )]γ = xi . This then says
[T ]γβ = A, so that F is surjective, so F is a bijection. By the above lemma, F is linear, so F
is an isomorphism as desired. The dimension result then follows immediately.

2.5 Invertibility
Recall that a function f : X → Y is said to be invertible if there is g : Y → X such that
f ◦ g = idY and g ◦ f = idX , and we denote g = f −1 . From set theory, f is invertible if
and only if f is a bijection. For linear transformations T : V → W and S : W → Z, we
denote the composition S ◦ T by ST , and clearly then T is an isomorphism if and only if T is
invertible. Since linear transformations correspond to matrices, we make a similar definition.

Definition 2.5.1. Let A ∈ Mn (F ). Then A is invertible if there is B ∈ Mn (F ) such that

AB = BA = In . If such a matrix exists it’s easy to see that it must be unique, and we then
write B = A−1 .

Definition 2.5.2. Let V be a vector space. We define GL(V ) = {T ∈ EndF (V ) : T is invertible}.

Similarly, we set GLn (F ) = {A ∈ Mn (F ) : A is invertible}.

Proposition 24. Let T : V → W and S : W → Z be linear transformations. Then ST :

V → Z is a linear transformation. If T is invertible, then T −1 is a linear transformation.

Proof. For x, y ∈ V we have ST (x + y) = S(T (x + y)) = S(T (x) + T (y)) = ST (x) + ST (y),
and for c ∈ F , we also have ST (cx) = S(T (cx)) = S(cT (x)) = cST (x) since S, T are
linear. Suppose that T is invertible with inverse T −1 . For w, w0 ∈ W , T −1 (w + w0 ) is the
vector that maps to w + w0 under T . Since T is linear, T (T −1 (w) + T −1 (w0 )) = w + w0 , so
T −1 (w + w0 ) = T −1 (w) + T −1 (w0 ). Similarly we see for c ∈ F that T −1 (cw) = cT −1 (w) so
T −1 is linear.

Proposition 25. Let S : W → Z and T : V → W be linear transformations, and let α, β, γ

be bases of the finite dimensional vector spaces V, W, Z respectively. Then [ST ]γα = [S]γβ [T ]βα

Proof. Let α = {v1 , . . . , vn } and β = {w1 , . . . , wk }. By definition, the i-th column of [ST ]γα
is given by [ST (vi )]γ . The i-th column of [S]γβ [T ]βα is Sβγ [T (vi )]β , so it’s sufficient to check
these expressions are equal. Write T (vi ) = c1i w1 + . . . + cki wk . Then ST (vi ) = S(c1i w1 +
. . . + cki wk ) = c1i S(w1 ) + . . . + c1k S(wk ). Applying Cγ then gives [ST (vi )]γ = c1i [S(w1 )]γ +
. . . + c1k [S(wk )]γ , which we then recognize as saying [ST (vi )]γ = [S]γβ [T (vi )]β as desired.

Proposition 26. Let T : V → W be a linear transformation, and let β, γ be bases of

the finite dimensional vector spaces V and W . Then T is invertible if and only if [T ]γβ is
invertible, and further ([T ]γβ )−1 = [T −1 ]βγ .

Proof. If T is invertible, then T −1 : W → V satisfies T T −1 = idW and T T −1 = idV , so the

above says [T ]γβ [T −1 ]βγ = [T −1 ]βγ [T ]γβ = In , so that ([T ]γβ )−1 = [T −1 ]βγ . Conversely, suppose

25
that [T ]γβ is invertible. Then the columns of [T ]γβ are linearly independent: if not, there
are c1 , . . . , cn ∈ F not all 0 such that c1 [T (v1 )]γ + . . . + cn [T (vn )]γ = 0, i.e. there is a
non-trivial solution to [T ]γβ x = 0. However, this is impossible because multiplying by the
inverse of [T ]γβ on the left shows that if the above holds then necessarily x = 0. This says
that the vectors [T (vi )]γ in F n are linearly independent, and as the coordinate mapping
is an isomorphism this then implies that the vectors wi = T (vi ) are linearly independent
vectors in W , and therefore are a basis of W . Define a linear transformation S : W → V
by S(wi ) = vi and extend linearly. By definition, ST (vi ) = S(wi ) = vi , so ST = idV , and
similarly T S(wi ) = T (vi ) = wi , so T S = idW so that T is invertible as desired.

Knowing if a linear transformation is an isomorphism or not is extremely important –

if two vector spaces V and W are isomorphic, this essentially says that W and V are the
“same” vector space up to relabeling of the elements, because addition of vectors in one
space corresponds uniquely to addition of vectors in the other. This then reduces the study
of vector spaces to studying vector spaces up to isomorphism. If T is a linear operator on
some vector space V , knowing that T is invertible is very powerful, as will hopefully be
demonstrated in the following examples.
Example 2.5.3. Let V = Span({eax sin(bx), eax cos(bx)}) ⊂ C ∞ (R) for a, b 6= 0, and let
D be the differential operator. Then V is a 2-dimensional D-invariant subspace. With
a −b
β = {eax sin(bx), eax cos(bx)}, the matrix [D|V ]β is given by [D|V ]β = . Since
b a
a b

2 2 a2 +b2 a2 +b2
det([D|V ]β ) = a + b 6= 0, D|V is invertible with inverse A = b a . As
− a2 +b 2 a2 +b2
R 0 d
R
f (x) dx = f (x) + C and dx f (x) dx = f (x), choosing the choice of constant
R to be C = 0
says the inverse of D is the indefinite integral operator. To integrate say, eaxR sin(bx) dx, we
a b
see A[eax sin(bx)]β = ( a2 +b 2 , − a2 +b2 ), so converting back into vectors of V says eax sin(bx) dx =
a b
a2 +b2
eax sin(bx) − a2 +b2e
ax
cos(bx) + C after appending back the arbitrary constant of inte-
gration.
Example 2.5.4. ASCII is an encoding standard that associates characters to 7-digit binary
strings, which we may think of as elements of (Z/2Z)7 . Fix a matrix A ∈ GL7 (Z/2Z). A
simple encryption method is as follows: given a message M , convert each character c of
M to ASCII and then convert it into a vector xc ∈ (Z/2Z)7 . Encrypt M character-wise
by computing Axc for all characters, and convert back to text. Since M is invertible, the
message can be decrpyted by again converting text to ASCII and multiplying characters
by A−1 . As an example, the message “TEST”  corresponds to the block
 of binary strings
1 1 1 0 1 0 1
0 1 0 1 1 0 1
 
1 0 0 1 1 1 1
 
“1010100 1000101 1010011 1010100”. With A =  0 1 0 1 0 1 0, this gets encrypted

0 0 0 0 1 0 1
 
0 0 1 0 0 1 1
1 0 0 0 0 1 0
to the message “gS∼g”, which can be decrypted if A or A−1 is known.

26
We showed above that checking the invertibility of a linear operator T is the same as
checking the invertibility of its corresponding matrix. We remind the reader of some of many
equivalent conditions for checking the latter:

Theorem 2.5.5. Let A ∈ Mn (F ). Then the following are equivalent:

(a) A is invertible.

(b) The only solution in F n to Ax = 0 is x = 0.

(c) The columns of A are linearly independent.

(d) A is row-equivalent to In .

(e) det(A) 6= 0.

(f ) The augmented matrix [A|I] is row equivalent to [I|B] for some non-zero matrix B.

2.6 Change of Basis and Similarity

Given a linear operator T on V , the matrix [T ]β depends on a choice of basis β of V . Picking
a different basis β 0 will produce a different looking matrix [T ]β 0 , but it still represents the
same operator T . A natural question is given two matrices A, B ∈ Mn (F ), how can one
check if they come from the same linear operator in GL(V )?

Pick bases β, γ of V , and consider the identity operator idV , along with the corresponding
matrix [idV ]γβ . This matrix satisfies [x]γ = [idV (x)]γ = [idV ]γβ [x]β for all x ∈ V , or in other
words, multiplication by [idV ]γβ converts the coordinates of the vector x from the basis β to
the basis γ.

Definition 2.6.1. Let V be a vector spaces with basis β = {v1 , . . . , vn }, and let γ =
{v10 , . . . , vn0 } be another basis. The change of basis matrix from β to γ, denoted Sβγ , is the
 
p p
matrix [idV ]γβ . Explicitly, Sβγ = [v1 ]γ . . . [vn ]γ .
p p

Since idV is invertible, this says Sβγ is invertible, and has inverse matrix [idV ]βγ = Sγβ .

Theorem 2.6.2. Let β, γ be two bases of a finite dimensional vector space V , and let T ∈
GL(V ). Then [T ]γ = Sβγ [T ]β Sγβ , and [T ]γ Sβγ = Sβγ [T ]β . In otherwords, the following diagram
[T ]β
[x]β [T (x)]β
commutes. Sβγ Sβγ
[T ]γ
[x]γ [T (x)]γ

27
Proof. Since composition of linear transformations corresponds to multiplication by their
corresponding matrices, we see [T ]γ = [idV ◦ T ◦ idV ]γγ = [idV ]γβ [T ]β [idV ]βγ = Sβγ [T ]β Sγβ .
Since Sβγ is invertible with inverse Sγβ , multiplication on the left gives Sγβ [T ]γ = [T ]β Sγβ as
desired.
2
Example 2.6.3. Let β = {e1 , e2 } be the standard basis ofR and γ = {(1, 1), (1, 2)} be
1 1
another basis. The change of basis matrix Sγβ is Sγβ = . To compute Sβγ , we take the
1 2
γ 2 −1
inverse to find Sβ = . To compute [e1 ]γ , we see [e1 ]γ = Sβγ [e1 ]β = Sβγ e1 = (2, −1),
−1 1
so that (1, 0) = 2(1, 1) − (1, 2).

Example 2.6.4. Let β = {1, x, x2 } and γ = {1, x, 23 x2 −21 } be bases of P2 (R). Then
0 − 21 1 0 13
 
1
β γ
Sγ = 0
 1 0 , and one can compute Sβ = 0 1 0 . Let T : P2 (R) → P2 (R) be
0 0 32  0 0 23
0 0 0
0
defined by T (f )(x) = xf (x). Then [T ]β = 0 1 0 and the change of basis formula says

  0 0 2
0 0 1
[T ]γ = 0
 1 0.
0 0 2

Example 2.6.5. Let T : R3 → R3 be given by T (x, y, z) = (2z, −2x + 3y + 2z, −x + 3z). Let
3
β = {e1 , e2 , e3 } be the standard
 basis
 of R , and let  γ = {(2, 1, 1), (1, 0, 1), 
(0, 1, 0)} be an- 
0 0 2 2 1 0 1 0 −1
other basis. Then [T ]β = −2 3 2. We see Sγβ = 1 0 1, and Sβγ = −1 0 2 ,
−1 0 3  1 1 0 −1 1 1
1 0 0
so the change of basis formula says [T ]γ = 0 2 0. With respect to the new basis γ,
0 0 3
this says that T acts along each γ-direction by scaling. Having a basis where an operator is
diagonal is extremely useful, as it allows one to easily compute values of compositions. For ex-
ample, to compute T n (1, 2, 3), we compute [T n (1, 2, 3)]γ = [T n ]γ [(1, 2, 3)]γ = [T ]nγ [(1, 2, 3)]γ .
We have [(1, 2, 3)]γ = Sβγ [(1, 2, 3)]β = (−2, 5, 4), so [T n (1, 2, 3)]γ = [T ]nγ (−2, 5, 4)T = (−2, 5 ·
2n , 4 · 3n ). This says T n (1, 2, 3) = −2(2, 1, 1) + 5 · 2n (1, 0, 1) + 4 · 3n (0, 1, 0) = (−4 + 5 · 2n , −2 +
4 · 3n , −2 + 5 · 2n ).

Example 2.6.6. Consider P3 (R), and set β = {1, x, x2 , x3 } and γ = { x0 , . . . , x3 }, where

x
x
x(x−1)...(x−k+1)
0
= 1 and for k ≥ 1 we have k
= k!
. Then γ is a basis, because  each polyno- 
1 0 0 0
0 1 − 1 1 
mial in γ has a different degree. The change of basis matrix Sγβ is given by  0 0 1 − 1 ,
2 3 
2 2
1
0 0 0 6

28
 
1 0 0 0
0 1 1 1
and one can check that Sβγ =  3 γ 3
0 0 2 6. Then [x ]γ = Sβ [x ]β = (0, 1, 6, 6), so x =
 3

0 0 0 6
x x x Pn−1 3 Pn−1 k
k = k=1 1 + 6 k2 + 6 k3 = n2 + 6 n3 + 6 n4 =

1
+ 6 2 + 6 3 . As an application, k=1
Pn−1 k
( n(n−1) n

2
)2 , which follows from the identity k=1 r
= r+1 , easily proven by induction.

The theorem from above leads us to the following definition:

Definition 2.6.7. For A, B ∈ Mn (F ), we say that A and B are similar and write A ∼ B
if there exists P ∈ GLn (F ) such that A = P BP −1 .

Since Sγβ = (Sβγ )−1 , this says that for any choice of bases β, γ of V the matrices [T ]β and
[T ]γ are similar. The following observation is easy to verify:

Proposition 27. Similarly is an equivalence relation on Mn (F ).

We showed that changing the basis of V from β to γ yields similar matrices [T ]β and [T ]γ .
The converse is true as well:

Theorem 2.6.8. A ∼ B in Mn (F ) if and only if there are bases β, γ of V and a linear

transformation T such that A = [T ]β and B = [T ]γ . That is, similar matrices correspond to
the same linear transformation under potentially different bases.

Proof. Suppose that A ∼ B in Mn (F ), so there is P ∈ GLn (F ) such that A = P BP −1 ,

and let β = {v1 , . . . , vn } be any basis of V . We have seen that we may choose T such that
[T ]β = A, so it remains to find γ such that [T ]γ = B. To do this, we would like to think
of P as some change of basis matrix. Define wi such that [wi ]β = P ei , where ei are the
standard basis vectors of F n . As P is invertible, its columns are linearly independent, and
because the coordinate map Cβ is an isomorphism, wi are also linearly independent so that
γ = {w1 , . . . , wn } is a basis of V . Then Sγβ is the matrix with columns [wi ]β = P ei , so
Sγβ = P , and Sβγ = P −1 . Since A = P BP −1 , this says B = P −1 AP = Sβγ [T ]β Sγβ = [T ]γ as
desired. The backwards direction was proven above.

The conjugacy classes of matrices in Mn (F ) under similarly correspond to the distinct

linear operators T ∈ GL(V ), regardless of choice of basis. Therefore if one cares only about
the different types of operators that arise on V , the importance of studying matrices up to
similarity is self evident. To be able to distinguish between conjugacy classes, it’s helpful to
known some quantities that are invariant under similarity.

Lemma 2.6.9. Let A, B ∈ Mn (F ). Then tr(AB) = tr(BA).

= ni=1 (AB)ii = ni=1 nk=1 aik bki . On

P P P
Proof. Write A = (aij ) andP B = (bij ). Then tr(AB)
n P n P n
the otherPhand, tr(BA) = P i=1 (BA) ii = i=1 k=1 bik aki . By renaming variables i and k,
we haveP ni=1P nk=1 bik aki = nk=1 ni=1 bki aik , and by swapping the order of summation this
P P
equals ni=1 nk=1 aik bki as desired.

29
Proposition 28. Let A, B ∈ Mn (F ), with A ∼ B. Then tr(A) = tr(B) and det(A) =
det(B).

Proof. As A ∼ B, there is P ∈ Mn (F ) such that A = P BP −1 . By the lemma, tr(A) =

tr(P (BP −1 )) = tr((BP −1 )P ) = tr(B). Similarly, we find that det(A) = det(P BP −1 ) =
det(P ) det(B) det(P −1 ) = det(B) det(P ) det(P −1 ) = det(B) det(In ) = det(B) by properties
of the determinant.
Since similarity corresponds to change of basis, this allows us to define these quantities
for a linear operator.

Definition 2.6.10. Let V be finite dimensional. For T ∈ GL(V ) we define the trace of T
and the determinant of T to be the quantities tr([T ]β ) and det([T ]β ) for any choice of basis
β of V .

1 2 0 2
Example 2.6.11. The matrices A = and B = are not similar, because
3 4 1 −3
tr(A) = 5 while tr(B) = −3. However, both det(A) = det(B) = −2.

Recall that the rank of a linear transformation T was defined as the dimension of it’s
image. We can also similarly define rank in terms of a matrix representation of T :

Proposition 29. Let A, B ∈ Mn (F ) and suppose that A ∼ B. Then rank(A) = rank(B).

Proof. Write A = P BP −1 for some P . Define T : Im(A) → F n by T (x) = P −1 x. Then

T is injective because P is invertible, so dim(Im(A)) = dim(Im(T )). We also see that if
y ∈ Im(T ), then y = P −1 x for some x ∈ Im(A). Write x = Az, so y = P −1 (Az) =
B(P −1 z) ∈ Im(B). This gives rank(A) ≤ rank(B). Similarly with S : Im(B) → F n defined
by S(x) = P x, we find rank(B) ≤ rank(A), so that rank(A) = rank(B).

30
Chapter 3

Diagonalization

After developing the basic theory of linear transformations, we turn our attention to the
study of linear operators. As we saw, a change of basis of V from β to β 0 corresponds to
β0
conjugation of the matrix [T ]β by some invertible change of basis matrix Sβ . A natural
question is if there is a “best” basis β to pick so that [T ]β will be as easy to understand
as possible. The best we could hope for in general is that [T ]β is a diagonal matrix, so the
question becomes if it is possible to find a basis β of V such that [T ]β is diagonal. Answering
this question will be the primary purpose of this handout.

Throughout this document, V will denote a vector space over an arbitrary field F and
T : V → V will denote a linear operator.

3.1 Basic Definitions

Definition 3.1.1. An eigenvector of T is a non-zero vector v ∈ V such that T (v) = λv for
some λ ∈ F . The number λ is called an eigenvalue of T . We sometimes refer to the data
(v, λ) as an eigenpair.

Our first order of business is to determine what the possible eigenvalues of a linear
operator are. When V is finite dimensional, this is quite easily done using the theory of
determinants. We remind the reader of the following definition:

Definition 3.1.2. Let V be an n dimensional vector space. The determinant of a linear

operator T : V → V denoted det(T ) is defined as det([T ]β ) for any basis β of V .

Elementary properties of the determinant show that similar matrices have the same
determinant, so the above definition actually is independent of a choice of basis and so the
notation makes sense.

Theorem 3.1.3. Let V be an n dimensional vector space. Then λ ∈ F is an eigenvalue of

T if and only if det(λ · IV − T ) = 0. In terms of matrices, λ is an eigenvalue of T if and
only if det(λ · In − [T ]β ) = 0 for any basis β of V .

31
Proof. Pick a basis β of V . Suppose that λ ∈ F is an eigenvalue of T with eigenvector v.
Then (λ · IV − T )(v) = 0, i.e. the operator λ · IV − T is not invertible, and from the theory
of matrices, this says [λ · IV − T ]β is not invertible. Therefore det([λ · IV − T ]β ) = 0, so by
definition this says that det(λ · IV − T ) = 0. Conversely, if det(λ · IV − T ) = 0, then λ · IV − T
is not invertible, so there is some vector v in the kernel of λ · IV − T , i.e. a vector v such
that T (v) = λv so that λ is an eigenvalue of T .
The above says that checking if λ is an eigenvalue of T is equivalent to finding a root of
the polynomial det(x · IV − T ). We give this polynomial a name:
Definition 3.1.4. The polynomial pT (x) = det(x · IV − T ) is called the characteristic
polynomial of T .
Restated in the new definition, we have the following:
Theorem 3.1.5. Let V be an n dimensional vector space. λ ∈ F is an eigenvalue of
T : V → V if and only if λ is a root of the characteristic polynomial pT (x).
Notice that since the determinant of a linear operator does not depend on a choice of basis,
the characteristic polynomial is independent of such a choice as well, and so it is well defined.

Before moving on, we make a few remarks: notice that the definition of an eigenvalue
depends on which field we view V as a vector space over. If F is algebraically closed,
then every linear operator T : V → V has an eigenvalue, because then the characteristic
polynomial of T necessarily has a root in F . If F is not algebraically closed, it may be possible
for an operator to not have an eigenvalue. Additionally, the definition of an eigenvalue makes
sense for infinite dimensional vector spaces, even if our criterion for easily finding eigenvalues
only works for finite dimensional vector spaces. Some of the examples below will illustrate
this.
Example 3.1.6. Let T : R3 → R3 be given z) = (y, −5x+4y+z, −x+y+z). Then
by T (x, y,
0 1 0
with β the standard basis, we see [T ]β = −5 4 1. Then pT (x) = (x − 1)(x − 2)2 , so T
−1 1 1
has eigenvalues 1 and 2. One can check T has eigenvectors v1 = (1, 1, 2) and v2 = (1, 2, 1)
corresponding to the eigenvalues 1 and 2 respectively.
Example 3.1.7. Let T : R2 → R2 be a counterclockwise rotation by some angle θ ∈ (0, 2π).
Then T has no eigenvectors, because no vector v ∈ R2 is scaled along the same direction
2
T . Explicitly, with β = {e1 , e2 } the standard basis of R , one can check that [T ]β =
by
cos(θ) − sin(θ)
. The characteristic polynomial of T is given by pT (x) = x2 −2 cos(θ)x+1,
sin(θ) cos(θ)
and the quadratic formula says this has no real roots.
Example 3.1.8. Write V = W ⊕ W 0 for some subspaces W, W 0 of V . Let P = πW be
the projection onto W , so that P 2 = P . If (λ, v) is an eigenpair for P , then λ2 v = P 2 v =
P v = λv, so that λ2 = λ says that λ = 0, 1 are the only possible eigenvalues of P . For any
w ∈ W , P (w) = w, and for w0 ∈ W 0 , P (w0 ) = 0, so w, w0 are eigenvectors corresponding to
0, 1 respectively.

32
Example 3.1.9. Let T : C ∞ (R) → C ∞ (R) be the derivative map, T (f ) = f 0 . An eigenvector
of T is a function f such that f 0 = λf for some λ ∈ R. From calculus, we know that such
functions are of the form ceλt for some c ∈ R. This says the exponential functions ceλt
are eigenvectors of the derivative operator with eigenvalues λ ∈ R. This is of fundamental
importance in the theory of linear differential equations.
Example 3.1.10. Let L : F ∞ → F ∞ be the left shift operator, i.e. L((a1 , a2 , . . .)) =
(a2 , a3 , . . .). An eigenvector of L is a sequence (a1 , a2 , . . .) such that (a2 , a3 , . . .) = λ(a1 , a2 , . . .)
for some λ ∈ F . This says a2 = λa1 , a3 = λa2 = λ2 a1 , and by induction, that an = λn−1 a1 .
Let σ = {an } be a geometric sequence, that is a sequence defined by an = crn−1 for some
c, r ∈ F . Then we see that an eigenvector of L is a geometric sequence, and any such
geometric sequence σ is an eigenvector with eigenvalue r.

3.2 Properties of the Characteristic Polynomial

Before moving on, it will be useful to know some basic properties of the characteristic poly-
nomial pT (x) of a linear operator T .

Proposition 30. Let V be an n dimensional vector space and T : V → V a linear operator.

Then the characteristic polynomial pT (x) is a monic degree n polynomial.
Proof. Pick a basis β of V , so by definition pT (x) = det(x · In − [T ]β ), and write [T ]β = [aij ].
Then x · In − [T ]β has entries x − aii along the diagonal and −aij elsewhere. Expanding
the determinant out using co-factors along the first row, we see that we can write pT (x) =
(x − a11 ) · · · (x − ann ) + q(x) where q(x) is a polynomial of degree at most n − 2. It’s then
clear that pT (x) is a monic degree n polynomial.
Proposition 31. Let V be an n dimensional vector space and T a linear operator with
characteristic polynomial pT (x) = xn + an−1 xn−1 + . . . + a0 . Then an−1 = − tr(T ) and
a0 = (−1)n det(T ).
Proof. Write pT (x) = (x − a11 ) · · · (x − ann ) + q(x) where q(x) has degree at most n − 2 as
above. Then the coefficient of xn−1 comes entirely from the coefficient of xn−1 in the product
(x − a11 ) . . . (x − ann ), which is equal to the negative sum of its roots, i.e. −(a11 + . . . + ann ) =
− tr(T ). We also see a0 = pT (0) = det(0 · In − T ) = det(−T ) = (−1)n det(T ).
Example 3.2.1. A particularly useful special case is when V is 2 dimensional, so for a linear
operator T we have pT (x) = x2 − tr(T )x + det(T ).
The above result says that the characteristic polynomial pT (x) record both the trace and
determinant of T . Since the roots of pT (x) are eigenvalues of T , this gives the following
important relations:
Corollary 3.2.2. Let V be an n dimensional vector space and T a linear operator. Suppose
that the eigenvalues of T in some algebraic closure F of F are λ1 , . . . , λn (counted with
multiplicity). Then tr(T ) = λ1 + . . . + λn and det(T ) = λ1 · · · λn .
Proof. In F [x], pT (x) factors as (x − λ1 ) · · · (x − λn ). Expanding out the product then says
the coefficient of xn−1 is −(λ1 + . . . + λn ) and the constant term is (−1)n (λ1 · · · λn ).

33
3.3 Diagonalization
We now use the theory of eigenvalues to answer our main question.

Definition 3.3.1. A linear operator T is called diagonalizable if there exists a basis β of V

such that [T ]β is a diagonal matrix.
 
λ1 0 . . . 0
 0 λ2 . . . 0 
Necessarily, we see that if such a basis β = {v1 , . . . , vn } exists, if [T ]β =  ..
 
.. . . .. 
. . . .
0 0 . . . λn
that T (vi ) = λi vi for all i, so that β is a basis of eigenvectors. This can be rephrased using
the language of the previous section as such:

Theorem 3.3.2. A linear operator T is diagonalizable if and only if there is a basis β of V

consisting of eigenvectors of T .

In order to determine when such a basis exists, we will utilize the following key result:

Proposition 32. Suppose v1 , . . . , vn are eigenvectors of T corresponding to distinct eigen-

values λ1 , . . . , λn . Then {v1 , . . . , vn } is linearly independent.

Proof. We prove this by induction. If n = 1 the statement follows immediately since {v1 }
is linearly independent. Assume that the statement is true for any collection of n − 1
eigenvectors that correspond to distinct eigenvalues. Suppose that c1 v1 + . . . + cn vn = 0,
so that applying T says c1 λ1 v1 + . . . + cn λn vn = 0. Multiply the first equation by λn and
subtract to see c1 (λ1 − λn )v1 + . . . + cn−1 (λn−1 − λn )vn−1 = 0. By induction hypothesis,
the vectors {v1 , . . . , vn−1 } are linearly independent, and since all eigenvalues are distinct this
forces ci = 0 for 1 ≤ i ≤ n − 1. This then immediately gives cn = 0, so that {v1 , . . . , vn } is
linearly independent. By induction, we are done.

Corollary 3.3.3. Let V be an n dimensional vector space and T a linear operator. If T has
n distinct eigenvalues, then T is diagonalizable. If pT (x) factors into distinct linear factors
in F [x], then T is diagonalizable.

Proof. If T has n distinct eigenvalues, then the associated eigenvectors are a set of n linearly
independent vectors in V , hence a basis. Saying that pT (x) splits into distinct linear factors
is the same as saying that T has distinct eigenvalues.

Example 3.3.4. The converse to the above statement is not necessarily true. For example,
the identity operator IV is diagonalizable, but has characteristic polynomial (x − 1)n .

Proposition 33. Let V be an n dimensional vector space and T a linear operator. Then if
T is diagonalizable, pT (x) factors into a product of (not necssesarily distinct) linear factors
in F [x].

34
Proof. Suppose that T is diagonalizable.
 Let β be a basis of V consisting of eigenvalues
λ1 0 . . . 0
 0 λ2 . . . 0 
λ1 , . . . , λn of T . Then [T ]β =  .. .. , and det(xIn − [T ]β ) = (x − λ1 ) · · · (x − λn )
 
.. . .
. . . .
0 0 . . . λn
is a product of linear factors in F [x].

Example 3.3.5. The converse of the above statement is not necessarily true. Let T : R2 →
R2 be given by T (x, y) = (y, 0). Then it’s easy to see pT (x) = x2 , so the only eigenvalue of
T is 0. However, T is not diagonalizable: rank-nullity says dim(ker(T )) = 1, so that there is
no possible basis of eigenvectors for T .

The above examples show that the characteristic polynomial is not strong enough to
detect when an operator is diagonalizable or not, and we saw that what broke was that
nothing can be said when the characteristic polynomial has repeated roots. This leads us to
the following definitions, which will end up giving a test for diagonalizability.

Definition 3.3.6. Let λ be an eigenvalue of T . The eigenspace of λ, denoted Eλ , is defined

as Eλ = ker(T − λ · IV ). The algebraic multiplicity of λ is its multiplicity as a root of
pT (x). The geometric multiplicity of λ is dim(Eλ ).

Proposition 34. The geometric multiplicity of an eigenvalue λ of T is at most its algebraic

multiplicity.

Proof. Suppose that dim(Eλ ) = k. Pick a basis {v1 , . . . , vk } of Eλ and extend to a basis

λ · Ik A
β = {v1 , . . . , vk , w1 , . . . , wm } of V . Then [T ]β is a block matrix given by [T ]β =
0 B
where 0 is the m × kzero matrix, and A, B have dimensions k × m and m × m respectively.
(x − λ) · Ik −A
Then x · In − [T ]β = , so pT (x) = det(x · In − [T ]β ) = det((x − λ) ·
0 x · Im − B
Ik ) det(x · Im − B) = (x − λ)k g(x) where g(x) = det(x · Im − B). This says (x − λ)k divides
pT (x), so the algebraic multiplicity of λ is at least k as desired.

Theorem 3.3.7. Let V be an n dimensional vector space. Let T have distinct eigenvalues
λ1 , . . . , λk with algebraic multiplicities e1 , . . . , ek , so pT (x) = (x − λ1 )e1 · · · (x − λk )ek , and
e1 + . . . + ek = n. Then T is diagonalizable if and only if V = Eλ1 ⊕ . . . ⊕ Eλk .

Proof. If T is diagonalizable, then V has a basis of eigenvectors of T , so that V = Eλ1 + . . . +

Eλk . Since eigenvectors from different eigenvalues are linearly independent, if x1 +. . .+xk = 0
with xi ∈ Eλi , writing xi in terms of basis vectors of Eλi shows all xi = 0, so that the sum
is direct. Conversely, if V = Eλ1 ⊕ . . . ⊕ Eλk , then the union of bases for Eλi is a basis of V .
Since each vector in Eλi is an eigenvector of T , this says V has a basis of eigenvectors for T ,
i.e. T is diagonalizable.

Corollary 3.3.8. Let T be as above. Then T is diagonalizable if and only if for each
eigenvalue λi of T the algebraic multiplicity and geometric multiplicity of λi are equal.

35
Proof. If the algebraic and geometric multiplicity of λi are equal for all i, this says dim(Eλ1 ⊕
. . . ⊕ Eλk ) = e1 + . . . + ek = n, so Eλ1 ⊕ . . . ⊕ Eλk = V says T is diagonalizable. Conversely
if dim(Eλi ) < ei for some i, then dim(Eλ1 ⊕ . . . ⊕ Eλk ) < n, so Eλ1 ⊕ . . . ⊕ Eλk 6= V says T
is not diagonalizable.

Example 3.3.9. Let T : R3 → R3 be given by T (x, y, z) = (y, −5x + 4y + z, −x + y + z) be

the example from before. Then we saw pT (x) = (x−1)(x−2)2 , so 1 has algebraic multiplicity
1 and 2 has algebraic multiplicity 2. To check if T is diagonalizable, we need to see if the
eigenspace E2 is 2-dimensional. We  see that T −2 · IV has matrix representation with
−2 1 0
respect to the standard basis given by −5 2 1 . Notice that −2(1, 2, 1) + (0, 1, −1) =
−1 1 −1
(−2, −5, −1), so that the first column is a linear combination of the other two. The second
and third columns are obviously linearly independent, so that rank(T − 2 · IV ) = 2 says
dim(ker(T − 2 · IV )) = 1, so that T is not diagonalizable.

Example 3.3.10. Let T : R3 → R3 be given by T (x, y, z) = (−9x + 4y + 4z, −8x +

3
 + 4z, −16x
3y  + 8y + 7z). With β = {e1 , e2 , e3 } the standard basis of R , we have [T ]β =
−9 4 4
 −8 3 4. One can compute that pT (x) = (x − 3)(x + 1)2 , so that T has eigenvalues 3
−16 8 7
and −1 with algebraic multiplicities
 1 and 2 respectively. The operator T + IV has matrix
−8 4 4
representation  −8 4 4, so dim(E−1 ) = 2, so that T is diagonalizable. Using row
−16 8 8
reduction, a basis of E−1  is given byv1 = (1, 0, 2) and v2 = (1, 2, 0). The operator T −3·IV has
−12 4 4
matrix representation  −8 0 4, and a basis of E3 is given by v3 = (1, 1, 2). Then β 0 =
−16 8 4
{v1 , v2 , v3 } is a basis of R3 consisting of eigenvectors for T . The change of basis
 matrix Sβ 0 →β
1 1 1
is the matrix whose columns are these eigenvectors, i.e. Sβ 0 →β = 0 2 1, with inverse

  2 0 2
2 −1 −1/2
Sβ→β 0 =  1 0 −1/2. The change of basis formula says [T ]β = Sβ 0 →β [T ]β 0 Sβ→β 0 , and
−2 1  1 
−1 0 0
we see that [T ]β 0 =  0 −1 0, which gives a factorization of [T ]β .
0 0 3

36
Chapter 4

Inner Product Spaces

When we started our study of vector spaces, we had a goal in mind: find objects that
generalized the algebraic structure on Euclidean space Rn . However, if the ultimate goal of
linear algebra is to fully generalize Euclidean space, there’s something major that still hasn’t
been abstracted: the geometry of Rn . The definition of an abstract vector space V does not
include notions of length, distance, or angles, and therefore no concept of geometry. In order
for a vector space to truly “act” Euclidean, we need to add more structure.

4.1 Basic Definitions

Throughout this document, we assume F = R or F = C, and V is an inner product space
over F .

Definition 4.1.1. An inner product h−, −i : V × V → F is a function that satisfies the

following properties:

1. hx + z, yi = hx, yi + hz, yi for all x, y, z ∈ V

2. hcx, yi = chx, yi for all x, y ∈ V and c ∈ F

3. hx, yi = hy, xi for all x, y ∈ V

4. hx, xi > 0 for all x 6= 0 ∈ V .

An inner product space is a pair (V, h−, −i), i.e. a vector space V with a choice of inner
product on V . From the conjugate symmetry of the inner product, we deduce the following
basic properties:

Proposition 35. Let V be an inner product space. Then the following hold:

(a) hx, y + zi = hx, yi + hx, zi for all x, y, z ∈ V .

(b) hx, cyi = chx, yi For all x, y ∈ V and c ∈ F .

(c) hx, 0i = h0, xi = 0 for all x ∈ V .

37
(d) hx, xi = 0 if and only if x = 0.

(e) If hx, yi = hx, zi for all x ∈ V , then y = z.

Proof. These are routine verifications and are omitted.

The idea of an inner product is to generalize the dot product on Rn or Cn . Having an
inner product gives us a notion of length:
p
Definition 4.1.2. The norm of x ∈ V , denoted kxk, is defined by kxk = hx, xi.

Proposition 36. Let V be an inner product space. Then the norm k·k satisfies the following
properties:

(a) kcxk = |c|kxk for all x ∈ V , c ∈ F .

(b) kxk = 0 if and only if x = 0.

(d) (Triangle inequality) For all x, y ∈ V , kx + yk ≤ kxk + kyk and equality holds if and
only if x = cy for some c ∈ F .

Proof.

(a) kcxk = hcx, cxi = cchx, xi = |c|kxk.

(b) This is (d) from the above proposition.

(c) If y = 0 this is obvious, so assume y 6= 0. For c ∈ F , we have 0 ≤ kx − cyk2 =

(d) We have kx + yk2 = hx + y, x + yi = hx, xi + hy, xi + hx, yi + hy, yi = kxk2 +

2Re(hx, yi) + kyk2 by expanding out the inner product and using conjugate sym-
metry. Since 2Re(hx, yi) ≤ 2|hx, yi| ≤ 2kxkkyk by Cauchy-Schwarz, we then see
kx + yk2 ≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 , so take a square root to finish up.
The equality case is again left as an exercise.

We now give some standard examples of inner product spaces.

Example 4.1.3. If V has an inner product h−, −i then for any subspace W of V , h−, −i is
still an inner product on W .

38
Example 4.1.4. Set V = Rn and let · be the usual dot product, (a1 , . . . , an ) · (b1 , . . . , bn ) =
a1 b1 + . . . + an bn . This makes V a real inner product space. If instead V = Cn , we define the
dot product to be (a1 , . . . , an )·(b1 , . . . , bn ) = a1 b1 +. . .+an bn . This makes V a complex inner
product space. As an example in C2 , we have (1+2i, 3−i)·(2, i) = 2(1+2i)−i(3−i) = 1−i.

Example 4.1.5. If V is a finite dimensional F -vector space, we can always give V the
structure of an inner product space as follows. Say that dimF (V ) = n, and fix an isomorphism
ϕ : V → F n . Define an inner product h−, −i on V by hv, wi = ϕ(v) · ϕ(w), where the dot
product on the right hand side happens in F n .
Rb
Example 4.1.6. Set V = C([a, b], C) and define hf, gi = a f (t)g(t) dt. Then calculus says
this makes V an innerq product space. In C([−π, π], C) with f = 1 + 2x and g = cos(x), one
3 √
can check that kf k = 2π + 8π3 , kgk = π, and that hf, gi = 0.

Example 4.1.7. Let V = Mn (C) and define hA, Bi = tr(B ∗ A), where (A∗ )ij = Aji is the
conjugate transpose of A. Then by linearity of tr and definition of B ∗ , one sees that this
defines an inner product. For any A ∈ Mn (C), we see that the ij-th entry of A∗ A is simply
the i-th row of A∗ dotted with j-column
p of A. In particular, if vi is the i-th column of A, then
(A∗ A)ii = kvi k2 , so that kAk = kv 2
1 + . . . + kv
k 2
n k where v1 , . . . , vn are the columns of
√ √

1 2 i 0
A. In M2 (C), set A = and B = . Then we see kAk = 30, kBk = 2,
3 4 1 + i −i
and hA, Bi = 3.

Example 4.1.8. Consider the sequence space C∞ . Let `2 = {(an ) ∈ C∞ : ∞ 2

P
n=1 |an | < ∞}.
For sequences (an ), (bn ), note that for any n, we have 2|an ||bn | ≤ |an |2 + |bn |2 . Combined
with the triangle inequality, we then have |an + bn |2 ≤ (|an | + |bn |)2 ≤ 2|anP |2 + 2|bn |2 , which
then immediately tells us that ` is a C-vector space. Define h(an ), (bn )i = ∞
2
n=1 an bn . If this
2
expression is finite, then it’s clear that this makes ` a complex inner product space, since it’s
just the usual dot product extended to vectors q with infinitely many coordinates. To see this
PN PN 2
PN 2
is the case, for any N , we have n=1 |an ||bn | ≤ n=1 |an | n=1 |bn | by applying Cauchy-
Schwarz to the vector space RN with vectors (|a1 |, . . . , |aN |) and (|b1 |, . . . , |bN |) (here we use
that for complex numbers, |z̄| = |z|). Since (an ), (bn ) ∈ `2 , taking N → ∞ saysPthe right hand
N
side of the above inequality converges to something finite, so that limP N →∞ n=1 |an ||bn | =
P ∞ P ∞ ∞ ¯
n=1 |a n ||b n | < ∞. This says a b
n=1 n n converges absolutely, so that n=1 n bn converges.
a
We then have proved that this is indeed an inner product. This space is important in
functional analysis.

4.2 Orthogonality
In Rn , one of the most important properties of the dot product was that it was able to
x·y
measure the angle between two vectors: this was detected by the quantity kxkkyk . For a
general inner product space V , it doesn’t make sense to define general angles, since the ex-
hx,yi
pression kxkkyk may be a complex number. However, we may still make sense of orthogonality.

39
Definition 4.2.1. Vectors x, y ∈ V are called orthogonal if hx, yi = 0. A subset S of V is
called orthogonal if any two vectors in S are orthogonal, and S is called orthonormal if S
is orthogonal and kxk = 1 for all x ∈ S.

Since inner products have a notion of orthogonality, the Pythagorean theorem is still
true:

Theorem 4.2.2 (Pythagorean theorem). Let x, y ∈ V be orthogonal. Then kx + yk2 =

kxk2 + kyk2 .

Proof. kx+yk2 = hx+y, x+yi = hx, xi+hx, yi+hy, xi+hy, yi = hx, xi+hy, yi = kxk2 +kyk2
because hx, yi = hy, xi = 0 by assumption.
One of the reasons that we impose the extra structure of an inner product is that they
become very nice to work with: orthogonality makes linear independence easy to check, as
well as finding the coordinates of a vector with respect to some basis.

Proposition 37. Let {v1 , . . . , vk } be an orthogonal subset of non-zero vectors. Then {v1 , . . . , vk }
is linearly independent.

Proof. If c1 v1 +. . .+ck vk = 0 is a linear dependence relation among the vi ’s , then hc1 v1 +. . .+

ck vk , vi i = h0, vi i = 0. On the other hand, hc1 v1 +. . .+ck vk , vi i = c1 hv1 , vi i+. . .+ck hvk , vi i =
ci kvi k2 by orthogonality, so ci = 0. This says that {v1 , . . . , vk } is linearly independent.

Proposition 38. Let S = {v1 , . . . , vk } be an orthogonal subset of non-zero vectors. Then if

x = c1 v1 + . . . + ck vk , ci = hx,v ii
kvi k2
.

Proof. Taking an inner product with vi says hx, vi i = ci hvi , vi i = ci kvi k2 by orthogonality.
In particular, the above says that if we have a basis β for V consisting of orthogonal
vectors, then finding the coordinates [x]β is reduced to an inner product computation. If V
is finite dimensional, is it always possible to find an orthonormal basis? The answer is yes,
and follows from a more general result.

Theorem 4.2.3 (Gram-Schmidt process). Let S = {w1 , . . . , wm } be a linearly independent

hwk ,vj i
subset of V . Define S 0 = {v1 , . . . , vm } where v1 = w1 and vk = wk − k−1
P
j=1 kvj k2 vj for
2 ≤ j ≤ m. Then S 0 is an orthogonal subset of non-zero vectors and Span(S) = Span(S 0 ).

Proof. The proof is by induction on m. If m = 1, then the result is immediate. Now

suppose for every linearly independent set of size m − 1 the theorem is true. Define Sk =
{w1 , . . . , wk } and Sk0 = {v1 , . . . , vk }, so that in particular the set Sm−1
0
= {v1 , . . . , vm−1 }
0 0
is orthogonal. We will check that the theorem is true for S = Sm−1 ∪ {vm } where vm is
0
defined as in the statement of the theorem. If vm = 0, this says wm ∈ Span(Sm−1 ), but
0
Span(Sm−1 ) = Span(Sm−1 ) by induction hypothesis, which contradicts that S is linearly
independent. Therefore vm 6= 0. We then see hvm , vi i = hwm , vi i − hwm , vi i = 0 since by
0
assumption Sm−1 is orthogonal, so that S 0 is therefore orthogonal. As wi ∈ Span(Sm−1 0
) for
all 1 ≤ i ≤ m − 1 by assumption, combined with the definition of vm we get wi ∈ Span(S 0 )
for all i so that Span(S) = Span(S 0 ) as desired.

40
As an immediate corollary to the Gram-Schmidt process, we get the following:
Corollary 4.2.4. If V is a finite dimensional inner product space, then V has an orthonor-
mal basis.
Proof. Apply the Gram-Schmidt process to a basis of V to get a basis of orthogonal vectors.
Then normalize.
Example 4.2.5. Set V = R3 and β = {(1, 1, 1), (0, 1, 1), (0, 0, 1)} = {w1 , w2 , w3 }, which
is a basis of R3 . To construct an orthogonal basis, set v1 = (1, 1, 1). Then v2 = w2 −
hw2 ,v1 i
kv1 k2 1
v = w2 − 32 v1 = (−2/3, 1/3, 1/3), and v3 = w3 − hw 3 ,v1 i
kv1 k2 1
v − hw 3 ,v2 i
kv2 k2 2
v = w3 − 13 v1 −
1
v = (0, −1/2, 1/2). This produces an orthogonal basis, so normalizing each vector with
2 2
√ q
2
give an orthonormal basis. We see kv1 k = 3, kv2 k = 3
, and kv3 k = √12 . Then
q q q
{( √13 , √13 , √13 ), (− 23 , 16 , 16 ), (0, − √12 , − √12 )} is an orthonormal basis of R3 .

Example 4.2.6. Set V R= P2 (R),which may be viewed as a subspace of C([−1, 1]) with the
1
inner product hf, gi = −1 f (x)g(x) dt. Let β = {1, x, x2 } = {w1 , w2 , w3 } be the standard
basis of V . To produce an orthonormal basis, we use Gram-Schmidt. The vectors 1 and x
are already orthogonal, so we do not need to compute v1 and v2 . Then v3 = w3 − hw 3 ,v1 i
kv1 k2 q
v1 −
hw3 ,v2 i
√
kv2 k2 2
v = x2 − 13 , so {1, x, x2 − 31 } is an orthogonal basis. We compute k1k = 2, kxk = 23 ,
q q q
and kx − 3 k = 45 . This produces an orthonormal basis { 2 , 2 x, 58 (3x2 − 1)}. These
2 1 8 √1 3

are the first three Legendre polynomials, which have applications in physics. Repeating this
process with the basis β = {1, x, . . . , xn } of Pn (R) allows one to compute the n-th Legendre
polynomial.
Definition 4.2.7. Let S ⊂ V be a subset. The orthogonal complement of S, denoted
S ⊥ is defined by S ⊥ = {v ∈ V : hv, xi = 0 for all x ∈ S}.
It’s an easy verification that S ⊥ is always a subspace of V . When S itself is a subspace, we
have the following decomposition:
Theorem 4.2.8. Let W ⊂ V be a finite dimensional subspace. Then V = W ⊕ W ⊥ .
Proof. Let {w1 , . . . , wk } be an orthonormal basis of W . We will try to find a vector w ∈ W
such that x = w + (x − w) with x − w ∈ W ⊥ . Write w = c1 w1 + . . . + ck wk . If x − w ∈ W ⊥ ,
then necessarily, 0 = hx − w, wi i = hx − c1 w1 − . . . − ck wk , wi i = hx, wi i − ci kwi k2 . Since
kwi k = 1, this says ci = hx, wi i, so this choice of coefficients gives us the vector w that works.
This says V = W + W ⊥ . If w ∈ W ∩ W ⊥ , then hw, wi = 0 so that w = 0 says the sum is
direct.
An immediate consequence is the following dimension formula:
Corollary 4.2.9. If V is finite dimensional, dim(V ) = dim(U ) + dim(U ⊥ ).
Using this, we easily get the following:
Proposition 39. Let V be finite dimensional. Then (W ⊥ )⊥ = W .

41
Proof. Set n = dim(V ). Since W ⊂ (W ⊥ )⊥ , we get dim(W ) ≤ dim(W ⊥ )⊥ . This then says
n = dim(W ) + dim(W ⊥ ) ≤ dim((W ⊥ )⊥ ) + dim(W ⊥ ) = n so that dim(W ) = dim((W ⊥ )⊥ )
gives W = (W ⊥ )⊥ .

Example 4.2.10. Let V = R3 and W = Span{v1 } where v1 = (1, 1, 1). Then W ⊥ =

{(x, y, z) : (x, y, z) · (1, 1, 1) = 0}, i.e. W ⊥ is simply the plane x + y + z = 0.

Example 4.2.11. Let V = R4 and W = Span{v1 , v2 } where v1 = (1, 2, 3, −4) and v2 =
1 2 3 −4
(−5, 4, 3, 2)}. If x = (x, y, z, t) is in U ⊥ , we see that Ax = 0, where A = .
−5 4 3 2
Using row reduction, one can easily compute U ⊥ = ker(A) = Span{(−3, −9, 7, 0), (−10, 9, 0, 7)}.

Definition 4.2.12. Let W be a subspace with an orthonormal basis {w1 , . . . , wk }. Define

the orthogonal projection onto W , PW , by PW (x) = hx, w1 iw1 + . . . + hx, wk iwk .

Given x ∈ V , The orthogonal projection PW (x) has the property that it is the vector in
W that is closest to x:

Theorem 4.2.13. Let x ∈ V . Then kx − yk ≥ kx − PW (x)k for all y ∈ W .

Proof. Write x = PW (x) + z where z ∈ W ⊥ . Then for any y ∈ W , we have x − y =

(PW (x)−y)+z. Then we see z is orthogonal to PW (x)−y, so kx−yk2 = kPW (x)−yk2 +kzk2 ≥
kPW (x) − yk2 .

Example 4.2.14. Let V = R3 , and set v = (1, 2, 3). What’s the minimal distance from
v to a point on the plane W : x + 2y + z = 0? A basis of W can be easily computed
as {(−2, 1, 0), (−1, 0, 1)} = {w1 , w2 }. Running Gram-Schmidt gives an orthogonal basis of
{v1 , v2 } = {(−2, 1, 0), (−5, −2, 5)}. The minimal distance the the plane is given by the
quantity kv − PW (v)k.
√
One can check that PW (v) = 35 v2 , so v − PW (v) = (4/3, 8/3, 4/3)
which has length 4 3 6 .

Example 4.2.15. R 1 Set W = P2 (R) viewed as a xsubspace of V = C([−1, 1]) with the inner
product hf, gi = −1 f (x)g(x) dx. With f (x) = e , which polynomial p(x) of degree at most
R1
2 minimizes the quantity −1 (ex −p(x))2 dt, and what is this value? Equivalently, what is the
minimizer of kex − p(x)k? We saw before than an orthonormal basis of Pq 2 (R) with
q respect to
this inner product is given by the Legendre polynomials, with basis { √12 , 32 x, 58 (3x2 −1)},
x
so the minimizerq is just
q the orthogonal
q projection
q of e onto W . This is given by p(x) =
hex , √12 i √12 + hex , 32 xi 32 x + hex , 58 (3x2 − 1)i 58 (3x2 − 1) = ( 15e
4
− 105
4e
)x2 + 3e x + ( 33
4e
− 3e
4
).
Numerically, the actual minimal value of the integral is ≈ .00144.

4.3 The Adjoint of a Linear Operator

Definition 4.3.1. The dual space of V , denoted V ∗ is defined as V ∗ = HomF (V, F ). An
element ϕ ∈ V ∗ is called a linear functional.

42
If V is finite dimensional, then we have seen that V ∼ = V ∗ . However, this isomorphism is
not “natural” in the sense that it requires picking a basis if V . However, when V is an inner
product space, the isomorphism is natural:
Theorem 4.3.2 (Riesz Representation Theorem). Let V be a finite dimensional inner prod-
uct space. Then the map Φ : V → V ∗ given by Φ(v) = ϕv is an isomorphism, where
ϕv (x) = hx, vi.
Proof. First, we show that Φ is linear. For x, y ∈ V , We have Φ(x + y) = ϕx+y . For any
z ∈ V , we have ϕx+y (z) = hz, x + yi = hz, xi + hz, yi = ϕx (z)+ ϕy (z), so that ϕx+y = ϕx + ϕy .
This then says that Φ(x + y) = Φ(x) + Φ(y). Similarly, we conclude that for any c ∈ F ,
Φ(cx) = cΦ(x), so that Φ is linear. Now suppose that Φ(x) = 0. This says that ϕx (z) = 0
for all z ∈ V , i.e. hz, xi = 0 for all z ∈ V . Picking z = x, we get kxk2 = 0, so that
x = 0. This says that Φ is injective, and since dim V = dim V ∗ , we conclude that Φ is an
isomorphism.
The Riesz Representation Theorem says the structure of the dual space of a finite dimen-
sional inner product space is very rigid: for any linear functional ϕ ∈ V ∗ , the surjectivity
of the map Φ in the above proof says there is a vector v ∈ V such that ϕ = h−, vi. This is
very important in functional analysis (where it holds in a more general setting), but for our
purposes, we will only need it for the following:
Definition 4.3.3. Let V be a finite dimensional inner product space. The adjoint of a
linear operator T : V → V , denoted T ∗ is defined via the relation hT (x), yi = hx, T ∗ (y)i for
all x, y ∈ V .
Proposition 40. The adjoint T ∗ of a linear operator T exists and is unique, and furthermore
T ∗ ∈ HomF (V, V ).
Proof. Define ϕy (x) = hT (x), yi. Then ϕy (x + z) = hT (x + z), yi = hT (x) + T (z), yi =
hT (x), yi + hT (z), yi = ϕy (x) + ϕy (z). Similarly, ϕy (cx) = cϕy (x) for x, z ∈ V and c ∈ F ,
so ϕy (x) is a linear functional. By the Riesz Representation Theorem, ϕy (x) = hx, y 0 i for
some y 0 ∈ V . Define a map T ∗ : V → V by T ∗ (y) = y 0 . By definition T ∗ satisfies the
desired property. If there is another function S : V → V such that hT (x), yi = hx, S(y)i for
all x, y, this says hx, T ∗ (y)i = hx, S(y)i for all x, y so that T ∗ = S. Finally, it remains to
show linearity. We see hx, T ∗ (y + z)i = hT (x), y + zi = hT (x), yi + hT (x), zi = hx, T ∗ (y)i +
hx, T ∗ (z)i = hx, T ∗ (y) + T ∗ (z)i for all x, y, z ∈ V . This says T ∗ (y + z) = T ∗ (y) + T ∗ (z).
Similarly one can check T ∗ (cy) = cT ∗ (y), so that T ∗ ∈ HomF (V, V ).

Although it may not be clear from the above defintion, the point of the adjoint is that
it’s a analogous operation on linear operators to taking a conjugate transpose. The following
properties make this more clear:
Proposition 41. Let S, T ∈ HomF (V, V ). The following hold:
(a) (S + T )∗ = S ∗ + T ∗

(b) (cT )∗ = cT ∗

43
(c) (T ∗ )∗ = T

(d) I ∗ = I

(e) (ST )∗ = T ∗ S ∗

Proof. All the above properties can be proved using a similar approach to the one in the
proposition above by pulling the adjoint through the inner product. We omit the proofs.

Proposition 42. Let V be a finite dimensional inner product space, and let β be an or-
thonormal basis of V . Then [T ∗ ]β = [T ]∗β .

Proof. Let β = {v1 , . . . , vn } be an orthonormal basis for V . Set [T ]β = [aij ]. Then T (vi ) =
a1i v1 + . . . + ani vn , so aji = hT (vi ), vj i. This says ([T ]∗β )ij = aji = hT (vi ), vj i = hvj , T (vi )i =
hT ∗ (vj ), vi i = ([T ∗ ]β )ij , so that [T ∗ ]β = [T ]∗β .
Geometrically, the relationship between T ∗ and T is as follows:

Theorem 4.3.4. Let V be a finite dimensional inner product space, and let T : V → V be
a linear operator. Then ker(T ∗ ) = Im(T )⊥ and Im(T ∗ ) = ker(T )⊥ .

Proof. Let x ∈ ker(T ∗ ), so that T ∗ (x) = 0. Then for any y ∈ V , hy, T ∗ (x)i = 0. Pulling the
adjoint through the inner product says hT (y), xi = 0 for all y, so that ker(T ∗ ) ⊂ Im(T )⊥ .
Similarly, if x ∈ Im(T )⊥ this says hx, T (y)i = 0 for all y ∈ V so that hT ∗ (x), yi = 0 for all
y ∈ V . This says T ∗ (x) = 0, so that Im(T )⊥ ⊂ ker(T ∗ ) says ker(T ∗ ) = Im(T )⊥ . Setting
T = T ∗ and taking orthogonal complements of both sides gives the second statement.

Example 4.3.5. Let T : C2 → C2 be given by T (z1 , z2 ) = (z1 − 2iz2 , 3z1 + iz2 ), where C2 is
equipped with
the usual
dot product. Then the basis {e1 , e2 } is orthonormal. We
standard
1 −2i 1 3
see [T ]β = , so that [T ∗ ]β = [T ]∗β = .
3 i 2i −i

Example 4.3.6. Let T : M2 (R) → M2 (R) be the transpose map, T (A) = At . Equip
M2 (R) with the inner product hA, Bi = tr(B t A). With respect to this inner product, the
standard basis {E11 , E12 , E21 , E22 } is orthonormal. Then [T ∗ ]β = [T ]∗β = [T ]tβ . We see that
 
1 0 0 0
0 0 1 0 ∗
[T ]β = 
0 1 0 0. This matrix is symmetric, so T = T .


0 0 0 1

Example 4.3.7. Let V ⊂ C ∞ (R) be the vector space of infinitely differentiable functions
that are 1-periodic,
R1 i.e. f (x + 1) = f (x) for all x ∈ R. Give V an inner product structure
by hf, gi = 0 f (t)g(t) dt. Let D : V → V be the derivative map. To compute the adjoint
R1
of D, we use the definition. For f, g ∈ V , hD(f ), gi = 0 f 0 (t)g(t) dt. Integrating by parts
R1
and using f (1) = f (0), the latter integral equals − 0 f (t)g 0 (t) dt = hf (t), −D(g)i. This says
D∗ = −D.

44
4.4 The Spectral Theorem
We will now return to diagonalizability. We previously saw what conditions are necessary
for a linear operator on V to be diagonalizable, i.e. for V to have a basis of eigenvectors for
T . If V is an inner product space, a natural question is when can we find an orthonormal
basis of eigenvectors? The Spectral theorem gives a precise answer.
Definition 4.4.1. A linear operator T : V → V is called normal if T T ∗ = T ∗ T . T is called
self-adjoint if T = T ∗ .
Proposition 43. Suppose that T : V → V is normal. Then if v is an eigenvector of T with
eigenvalue λ, then v is an eigenvector of T ∗ with eigenvalue λ.
Proof. It’s easy to check that since T is normal, then so is T − cIV for any c ∈ F . Since
T (v) = λv, this says 0 = k(T −λIV )(v)k2 = h(T −λIV )(v), (T −λIV )(v)i = hv, (T ∗ −λIV )(T −
λIV )(v)i = hv, (T − λIV )(T ∗ − λIV )(v)i = h(T ∗ − λIV )(v), (T ∗ − λIV )(v) = k(T ∗ − λIV )(v)k2 .
This says T ∗ (v) = λv as desired.
Proposition 44. Suppose that T : V → V is normal. Then if λ1 , λ2 are distinct eigenvalues
of T with eigenvectors v1 and v2 respectively, then v1 and v2 are orthogonal.
Proof. Suppose T (v1 ) = λ1 v1 and T (v2 ) = λ2 v2 . Then hT (v1 ), v2 i = λ1 hv1 , v2 i. On the other
hand, hT (v1 ), v2 i = hv1 , T ∗ (v2 )i = hv1 , λ2 v2 i = λ2 hv1 , v2 i by the above proposition. Since
λ1 6= λ2 , this forces hv1 , v2 i = 0.
Theorem 4.4.2 (Complex Spectral Theorem). Let V be a finite dimensional complex inner
product space. Then a linear operator T : V → V is normal if and only if there is an
orthonormal basis for V consisting of eigenvectors of T .
Proof. First suppose that T is normal. We prove that T is orthogonally diagonalizable by
induction on the dimension of V . If dim(V ) = 1 then this is obvious, because any non-zero
vector is an eigenvector, so just normalize. Now suppose that any normal operator on an
n−1 dimensional complex inner product space is orthogonally diagonalizable. If dim(V ) = n
and T : V → V is a normal operator, because C is algebraically closed T has an eigenvector,
say v. Set U = Span({v}) and write V = U ⊕ U ⊥ . Note that because T is normal, both
T, T ∗ are U -invariant. If x ∈ U ⊥ , then for y ∈ U , we have hy, T (x)i = hT ∗ (y), xi = 0 because
T ∗ (y) ∈ U . This says T (x) ∈ U ⊥ so that T is U ⊥ -invariant. Similarly, T ∗ is U ⊥ -invariant.
Then we may write T (x) = T |U (u) + T |U ⊥ (u0 ) for x = u + u0 with u ∈ U and u0 ∈ U ⊥ . We
now show that T |U ⊥ is a normal operator on U ⊥ .

By definition, for x, y ∈ U ⊥ , hT |U ⊥ (x), yi = hx, (T |U ⊥ )∗ (y)i. However, by definition T |U ⊥

and T agree on U ⊥ , so hT |U ⊥ (x), yi = hT (x), yi = hx, T ∗ (y)i = hx, (T ∗ )|U ⊥ (y)i. This says
(T |U ⊥ )∗ = (T ∗ )U ⊥ . Then (T |U ⊥ )(T |U ⊥ )∗ = T |U ⊥ (T ∗ )|U ⊥ = T T ∗ = T ∗ T = (T ∗ )|U ⊥ T |U ⊥ =
(T |U ⊥ )∗ (T |U ⊥ ), which proves T |U ⊥ is normal. By induction, there is an orthonormal basis
{v2 , . . . , vn } of U ⊥ consisting of eigenvectors for T |U ⊥ . Then {v, v2 , . . . , vn } is an orthogonal
basis of V consisting of eigenvectors of T . Normalizing v makes this orthonormal, so we are
done.

45
Conversely, suppose that T is orthogonally diagonalizable. Let β = {v1 , . . . , vn } be a
basis of eigenvectors with eigenvalues λi . Then (T ∗ T )(vi ) = T ∗ (λi vi ) = λi T ∗ (vi ) = |λi |2 vi .
On the other hand, (T T ∗ )(vi ) = T (λi vi ) = |λi |2 vi . Then T ∗ T and T T ∗ agree on a basis of
V , so they are equal which shows T is normal as desired.
We now move onto the Spectral Theorem for operators on real inner product spaces.
In the complex case, we were able to make the argument work because the fundamental
theorem of algebra says every linear operator over a complex vector space has an eigenvalue,
which led to a decomposition V = U ⊕ U ⊥ . The key part of the proof is the normality of T
said that it restricted to normal operators on U and U ⊥ , allowing the induction to kick in. If
V is a real inner product space, this no longer remains true, as we have seen that a rotation
by some angle in R2 has no real eigenvalue. If we can find a class of normal operators that
are guaranteed to have a real eigenvalue, then the same argument as above goes through.
As it turns out, the key to this is self-adjointness:

Proposition 45. Suppose that T : V → V is self-adjoint. Then if λ is an eigenvalue of T ,

then λ is real.

Proof. Write T (v) = λv. Then hT (v), vi = λhv, vi = λkvk2 . On the other hand, because
T is self-adjoint we can write hT (v), vi = hv, T (v)i = λkvk2 . Since v is non-zero, this says
λ = λ so that λ is real.

Theorem 4.4.3. (Real Spectral Theorem) Let V be a finite dimensional real inner product
space. Then a linear operator T : V → V is self-adjoint if and only if there is an orthonormal
basis for V consisting of eigenvectors of T .

Proof. The characteristic polynomial pT of T has a complex root by the fundamental the-
orem of algebra. Since T is self-adjoint, the above says this root is real, so that T has an
eigenvector. Since a self-adjoint operator is normal, we can run the same argument in the
complex case and the proof still goes through, so that T is orthogonally diagonalizable.

Conversely, suppose that T is orthogonally diagonalizable. The argument from before

shows that T is normal. Let β = {v1 , . . . , vn } be an eigenbasis for V with corresponding
eigenvalues λ1 , . . . , λn . Then T (vi ) = λi vi , and T ∗ (vi ) = λi vi . However, λi are real, which
says that T = T ∗ so that T is self-adjoint.
We then immediately get the corresponding statements for matrices:

Corollary 4.4.4. Let T : V → V be a normal operator over a finite dimensional complex

inner product space, or a self-adjoint operator on a real inner product space. Then there is an
orthonormal basis γ of V such that [T ]γ = P DP ∗ where P is orthogonal, i.e. P P ∗ = P ∗ P = I
and D is diagonal.

Proof. Fix an orthonormal basis β of V . By the Spectral Theorem, there is a basis γ of V

consisting of orthonormal eigenvectors of T . Then Sββ0 is orthogonal, so the change of basis
formula gives the result with P = Sββ0 and D the diagonal matrix of eigenvalues of T .

46
The proof of the Spectral Theorem tells us how to orthogonally diagonalize an operator
when it is possible. If V = U ⊕ U ⊥ , running Gram-Schmidt on bases of U and U ⊥ give
orthogonal bases of these spaces, and then the union is an orthogonal basis of V , so after
normalizing, an orthonormal basis. Suppose T is normal with distinct eigenvalues λ1 , . . . , λk .
In the proof of the Spectral Theorem, we may instead run the argument with U = Eλ1 (the
invariance condition is still true). Then since Eλi ⊥ Eλ1 for i 6= 1, this says Eλ2 ⊕ . . . ⊕ Eλk ⊂
U ⊥ so that Eλ2 ⊕ . . . ⊕ Eλk = U ⊥ for dimensional reasons. By inductively applying the above
obeservation, this says running Gram-Schmidt on each eigenspace Eλi and taking the union
of these orthogonal basis is then an orthogonal basis for V consisting of eigenvalues of T ,
and then normalizing gives an orthonormal basis.

Example 4.4.5. The operator T : C2 → C2 given by T (z1 , z2 ) = (z2 , 0) is not normal,

because it is not diagonalizable.
2
Example 4.4.6. The operator T : R → R2given by T (x, y) = (−y, x) is normal. With
0 −1 0 1
respect the the standard basis, [T ]β = , so [T ∗ ]β = [T ]tβ = = −[T ]β . How-
1 0 −1 0
ever, T is not self-adjoint, because [T ]β is not a symmetric matrix. T has no real eigenvalues
so it is not diagonalizable over R, but over C has eigenvalues i, −i. To orthogonally diagonal-
ize T over C, a basis of eigenvectors is given by {(i, 1), (−i, 1)}, which we see is orthogonal.
−i √1
Normalizing says an orthonormal basis of eigenvectors is β 0 = {( √i2 , √12 ), ( √ 2
, 2 )}. Since β 0
0
is orthonormal, the change of basis matrix Sββ0 satisfies the relation (Sββ0 )−1 = Sββ = (Sββ0 )∗
! −i 1 !

√i −i
√ √ √
0 −1 i 0
This gives the matrix factorization = √12 √12 √i
2
√1
2
.
1 0 2 2
0 −i 2 2

3 3
Example 4.4.7.  Let T : R→ R be given by T (x, y, z) = (−2z, −x + 2y − z, x + 3z),
0 0 −2
so that [T ]β = −1 2 −1 with β the standard basis. We see that T is diagonalizable

1 0 3
with eigenvalues 1, 2 and basis of the eigenspaces E1 and E2 are given by {(2, 1, −1)} and
{(0, 1, 0), (−1, −1, 1)} respectively. However, T is not self-adjoint because [T ]β is not sym-
metric, so the Spectral Theorem says that T is not orthogonally diagonalizable. What goes
wrong? An orthogonal basis of E2 is given by {(0, 1, 0), (1, 0, −1)}. However, (2, 1, −1) ·
(0, 1, 0) = 1 6= 0. Since any eigenvector v ∈ E2 is of the form (c2 , c1 , −c2 ) for c1 , c2 ∈ R, we
see that (2, 1, −1) · (c2 , c1 , −c2 ) = 2c1 + 2c2 is 0 only when c2 = −c1 , i.e. the eigenvector
is of the form (−c1 , c1 , c1 ). Therefore it’s impossible to find two eigenvectors orthogonal to
(2, 1, −1), so that T cannot be orthogonally diagonalizable. Explicitly, with U = E1 , we see
that [T ∗ ]β = [T ]tβ . T ∗ is not U -invariant, because T ∗ (2, 1, −1) = (−2, 2, −8) 6∈ U , so that T ∗
is not U -invariant and the argument cannot continue. Since all the eigenvalues of T are real,
we see that even viewed as an operator on C3 , the only eigenvector in E2 that is orthogonal
to (2, 1, −1) is in the C-span of (−1, 1, 1), so again it is not possible to find two eigenvectors
orthogonal to (2, 1, −1). This then says that T is not normal when viewed as an operator on
C3 , and therefore not as an operator on R3 because the matrix of T ∗ is the same in either
case.

Linear System Theory 2E (Wilson J. Rugh)
88% (16)
Linear System Theory 2E (Wilson J. Rugh)
596 pages
Assignment3
No ratings yet
Assignment3
3 pages
Notes 610
No ratings yet
Notes 610
209 pages
Linear Algebra Notes
No ratings yet
Linear Algebra Notes
60 pages
MAT 213-304 Linear Algebra II Notes
No ratings yet
MAT 213-304 Linear Algebra II Notes
35 pages
Linear
No ratings yet
Linear
97 pages
LinAlg Notes - Leep
No ratings yet
LinAlg Notes - Leep
153 pages
Linear Algebra
No ratings yet
Linear Algebra
122 pages
Linear Algebra
No ratings yet
Linear Algebra
96 pages
Cambridge Part IB Linear Algebra Alex Chan
No ratings yet
Cambridge Part IB Linear Algebra Alex Chan
82 pages
Cambridge Linear Algebra Notes PDF
No ratings yet
Cambridge Linear Algebra Notes PDF
82 pages
Math853 JBrown Grad Linear Alg
No ratings yet
Math853 JBrown Grad Linear Alg
155 pages
Iiserb Mm1 Notes Oct 4
No ratings yet
Iiserb Mm1 Notes Oct 4
30 pages
MATH212 LA Notes
No ratings yet
MATH212 LA Notes
65 pages
2 Vector Spaces: V V V V
No ratings yet
2 Vector Spaces: V V V V
7 pages
Arindama Singh's MA2031 Notes
No ratings yet
Arindama Singh's MA2031 Notes
207 pages
Lecture Notes On Linear Algebra: S. K. Panda IIT Kharagpur October 25, 2019
No ratings yet
Lecture Notes On Linear Algebra: S. K. Panda IIT Kharagpur October 25, 2019
27 pages
Lecture Note Geometry 2023 2024[1]
No ratings yet
Lecture Note Geometry 2023 2024[1]
18 pages
Linear Algebra notes 2
No ratings yet
Linear Algebra notes 2
8 pages
Vector Spaces
No ratings yet
Vector Spaces
24 pages
Mathematical Spaces: °2011 by Taejeong Kim
No ratings yet
Mathematical Spaces: °2011 by Taejeong Kim
39 pages
Vector Spaces: Persson@berkeley - Edu
No ratings yet
Vector Spaces: Persson@berkeley - Edu
4 pages
Mat 224
No ratings yet
Mat 224
80 pages
UGSemsterSyllabus Maths 5Sem517Maths English LINEARALGEBRA
No ratings yet
UGSemsterSyllabus Maths 5Sem517Maths English LINEARALGEBRA
113 pages
What Is A Vector Space
No ratings yet
What Is A Vector Space
7 pages
Linear Algebra Solution PDF
No ratings yet
Linear Algebra Solution PDF
89 pages
Vector Spaces: C Michael C. Sullivan, Fall 2005
No ratings yet
Vector Spaces: C Michael C. Sullivan, Fall 2005
5 pages
Iiserb Mm1 Notes
No ratings yet
Iiserb Mm1 Notes
21 pages
Vector space
No ratings yet
Vector space
32 pages
LinearAlgebra-Ver1 4 PDF
No ratings yet
LinearAlgebra-Ver1 4 PDF
63 pages
LALect01
No ratings yet
LALect01
14 pages
Franz Luef Linear Methods
No ratings yet
Franz Luef Linear Methods
107 pages
Math 415 - Lecture 9: 1 Inverse of A Matrix (CNTD.)
No ratings yet
Math 415 - Lecture 9: 1 Inverse of A Matrix (CNTD.)
8 pages
Mathematical Methods WK 1: Vectors: 1 Linear Vector Spaces
No ratings yet
Mathematical Methods WK 1: Vectors: 1 Linear Vector Spaces
10 pages
Vector Spaces Crash Course
No ratings yet
Vector Spaces Crash Course
11 pages
115af18 Lecture Notes
No ratings yet
115af18 Lecture Notes
59 pages
Laa 2024
No ratings yet
Laa 2024
45 pages
LA - W1 VS, SB, Ins&Union
No ratings yet
LA - W1 VS, SB, Ins&Union
12 pages
mat1341-notes03-full
No ratings yet
mat1341-notes03-full
8 pages
Linear Solutions 2.3
No ratings yet
Linear Solutions 2.3
36 pages
Linear Algebra: A (Terse) Introduction To
No ratings yet
Linear Algebra: A (Terse) Introduction To
185 pages
DONALDSON-Linear Algebra I
No ratings yet
DONALDSON-Linear Algebra I
88 pages
Vector Spaces: 3.1 Abstract Definition
No ratings yet
Vector Spaces: 3.1 Abstract Definition
14 pages
Linear Algebra Original
No ratings yet
Linear Algebra Original
55 pages
Vectorspace PDF
No ratings yet
Vectorspace PDF
8 pages
Linear Algebra: MAT 217 Lecture Notes, Spring 2012
No ratings yet
Linear Algebra: MAT 217 Lecture Notes, Spring 2012
101 pages
MAT 217 All Lectures PDF
No ratings yet
MAT 217 All Lectures PDF
101 pages
S2 Vector Space
No ratings yet
S2 Vector Space
5 pages
Classnote Ma2031
100% (1)
Classnote Ma2031
185 pages
Linear Algebra Notes
No ratings yet
Linear Algebra Notes
31 pages
Linear Algebra Notes
No ratings yet
Linear Algebra Notes
3 pages
Linear Algebra Final
No ratings yet
Linear Algebra Final
139 pages
Vector Spaces
No ratings yet
Vector Spaces
4 pages
Unit4
No ratings yet
Unit4
5 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
Elements of Tensor Calculus
From Everand
Elements of Tensor Calculus
A. Lichnerowicz
3.5/5 (2)
Feynman Lectures Simplified 4A: Math for Physicists
From Everand
Feynman Lectures Simplified 4A: Math for Physicists
Robert Piccioni
5/5 (4)
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Linear Time Varying State Equation Solutions: EE-601: Linear System Theory
No ratings yet
Linear Time Varying State Equation Solutions: EE-601: Linear System Theory
16 pages
ASA Full Notes
No ratings yet
ASA Full Notes
37 pages
Class Xii Maths Study Material 2023-24 by KVS, Ernakulam
No ratings yet
Class Xii Maths Study Material 2023-24 by KVS, Ernakulam
518 pages
B Com Calicut Univercity
No ratings yet
B Com Calicut Univercity
143 pages
Chapter 14. Partial Derivatives: Do Ngoc Diep
No ratings yet
Chapter 14. Partial Derivatives: Do Ngoc Diep
55 pages
SRM Institute of Science and Technology Department of Mathematics 21MAB204T-Probability and Queueing Theory Unit - V Tutorial Sheet - 15 Questions
No ratings yet
SRM Institute of Science and Technology Department of Mathematics 21MAB204T-Probability and Queueing Theory Unit - V Tutorial Sheet - 15 Questions
1 page
Mathematics for Economics, fourth edition Michael Hoy pdf download
No ratings yet
Mathematics for Economics, fourth edition Michael Hoy pdf download
52 pages
Notification-CSPE 2020 N Engl
No ratings yet
Notification-CSPE 2020 N Engl
9 pages
Key Encoding Messages Into Matrices
No ratings yet
Key Encoding Messages Into Matrices
4 pages
University Questions and Answers On Tensors
No ratings yet
University Questions and Answers On Tensors
16 pages
Homework 8 Solutions
No ratings yet
Homework 8 Solutions
10 pages
Cramer's Rule
No ratings yet
Cramer's Rule
14 pages
Chapter 12: Forced Dynamic Response of MDOF Systems
No ratings yet
Chapter 12: Forced Dynamic Response of MDOF Systems
11 pages
MSC Mas6002, Introductory Material Block C Mathematical Methods Exercises
No ratings yet
MSC Mas6002, Introductory Material Block C Mathematical Methods Exercises
11 pages
BYJU'S All India Mock Board Exams: CBSE Grade XII - Term I Mock Papers Set - 1
No ratings yet
BYJU'S All India Mock Board Exams: CBSE Grade XII - Term I Mock Papers Set - 1
130 pages
UT Dallas Syllabus For Math2418.001 06s Taught by Paul Stanford (phs031000)
No ratings yet
UT Dallas Syllabus For Math2418.001 06s Taught by Paul Stanford (phs031000)
4 pages
Homework 1 Solutions: Math 1c Practical, 2008
No ratings yet
Homework 1 Solutions: Math 1c Practical, 2008
7 pages
Introduction
No ratings yet
Introduction
19 pages
Diagonalization Practice Problem
No ratings yet
Diagonalization Practice Problem
6 pages
Matrices and Determinants
No ratings yet
Matrices and Determinants
57 pages
Material de Estudio Humanidades Uni 2024
No ratings yet
Material de Estudio Humanidades Uni 2024
4 pages
MATH 2403 Lecture - Notes
100% (1)
MATH 2403 Lecture - Notes
64 pages
Mate1a1 Matrices Determinants and Applications
No ratings yet
Mate1a1 Matrices Determinants and Applications
6 pages
First Model Examination 2022-23
No ratings yet
First Model Examination 2022-23
7 pages
DETERMINANTS ASSIGNMENTS
No ratings yet
DETERMINANTS ASSIGNMENTS
1 page
CU-2021_B.Sc._(Honours)_Mathematics_Semester-5_Paper-CC-12_QP
No ratings yet
CU-2021_B.Sc._(Honours)_Mathematics_Semester-5_Paper-CC-12_QP
3 pages
1.2 Eigen Values and Eigen Vector of A Real Matrices
No ratings yet
1.2 Eigen Values and Eigen Vector of A Real Matrices
42 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages