Abstract Linear Algebra
Abstract Linear Algebra
Tim Smits
1 Vector Spaces 3
1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Operations on Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Finite Dimensional Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Zorn’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Linear Transformations 16
2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Dimension Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 The Vector Space HomF (V, W ) . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 The Matrix of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . 22
2.5 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Change of Basis and Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Diagonalization 31
3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Properties of the Characteristic Polynomial . . . . . . . . . . . . . . . . . . . 33
3.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1
Introduction
These notes arose from a math 115A course I was a teaching assistant for at UCLA in
fall 2018. They contain, in my opinion, what all students should learn in a first course on
abstract vector spaces. At UCLA, math 115A is taken by students from various disciplines.
Many students have a hard time adjusting to the abstract nature of the course. These notes
try to provide many abstract examples to make the transition easier, as well as provide some
interesting, non-standard examples of how the language of linear algebra can be applied to
solve (hopefully) interesting problems. There may be many typos. Let me know if any are
found!
2
Chapter 1
Vector Spaces
Linear algebra arose out of trying to solve systems of linear equations. In a first course, one
learns the proper way to think about solutions to a system of n linear equations in n variables
is by viewing them as solutions to a certain matrix equation of the form Ax = b in Rn , and
then develops the necessary theory of matrices and tools needed to solve these equations.
The goal of abstract linear algebra is to capture the special properties of Rn and of matrices
that made the theory useful in the first place, and expand it to work in larger settings. This
will lead to the abstract definitions of vector spaces and linear transformations, which will
be the main objects of study for us.
1. For all x, y ∈ V , x + y = y + x.
5. For all x ∈ V , 1 · x = x.
The elements of V are called vectors, and it is understood that we write cx to mean c · x.
From the axioms above, one can deduce the usual algebraic rules are true in vector spaces:
Proposition 1. Let V be a vector space. For x, y, z ∈ V , and a ∈ F , the following hold:
3
1. If x + y = x + z, then y = z.
2. The vectors 0 and −x are unique.
3. 0 · x = a · 0 = 0.
4. (−a)x = −(ax).
The proofs of the above all follow quickly from the vector space and field axioms, and are
left as an exercise.
Definition 1.1.2. For a vector space V and W ⊂ V , we call W a subspace of V if W is
vector space under the same operations as in V .
4
1.2 Operations on Vector Spaces
A natural question to ask is what operations can we do on vector spaces to create new vector
spaces? Below are some examples:
Proposition 5. Let V and W be vector spaces. Then V × W is a vector space with the
operations of componentwise addition and scalar multiplication.
Proof. Exercise.
The vector space V × W is sometimes called the external direct sum of V and W and
is commonly denoted V ⊕ W . However to avoid confusion with the definition below, we’ll
keep the notation V × W . There is a way to “add” vector spaces, but only if they are both
subspaces of some common vector space, so that addition of vectors makes sense.
Definition 1.2.1. Let W1 and W2 be subspace of a vector space V . The sum of W1 and
W2 , denoted W1 + W2 is defined as W1 + W2 = {w1 + w2 : w1 ∈ W1 , w2 ∈ W2 }. Further, if
W1 ∩ W2 = {0}, then we call W1 + W2 the interal direct sum of W1 and W2 and denote
this W1 ⊕ W2 .
The difference between external and interal direct sums is that in the latter case, both
spaces live internally inside a larger vector space to begin with. In an external direct sum, we
create a larger vector space in which copies of V and W can be identified, namely we identify
V with the subspaces {(v, 0) : v ∈ V } = V × {0} and W with {(0, w) : w ∈ W } = {0} × W .
A sum being internal or external is to be understood by the context, and will just be referred
to as a direct sum.
5
Proof. It’s clear that 0 ∈ W1 + W2 since 0 ∈ W1 and 0 ∈ W2 . If x, y ∈ W1 + W2 , then
x = w1 + w2 and y = w10 + w20 for w1 , w10 ∈ W1 and w2 , w20 ∈ W2 . Therefore x + y =
(w1 + w10 ) + (w2 + w20 ) and w1 + w10 ∈ W10 because W1 is a subspace of V , and w2 + w20 ∈ W2
for the same reasoning. The proof that W1 + W2 is closed under scalar multiplication is
similar.
The difference between being a sum of subspaces and a direct sum of subspaces is the
following:
Proposition 7. Suppose V = W1 +W2 for some subspaces W1 , W2 . Then V = W1 ⊕W2 ⇐⇒
every vector x in V can be written uniquely as x = w1 + w2 for w1 ∈ W1 and w2 ∈ W2 .
Proof. If V = W1 ⊕ W2 , and x has two different representations as a sum of the above form,
write x = w1 + w2 and x = w10 + w20 for some w1 , w2 , w10 , w20 ∈ W . Then w1 − w10 = w20 − w2 ,
and the left hand side lives in W1 while the right hand side lives in W2 . This says w1 − w10 ∈
W1 ∩ W2 = {0}, so w1 = w10 . Similarly w2 = w20 so the representation is unique. Conversely,
suppose that any vector x ∈ V can be written uniquely as x = w1 + w2 for some w1 ∈ W1
and w2 ∈ W2 . Then clearly, V = W1 + W2 . If x ∈ W1 ∩ W2 , we can write x = x + 0 by
taking w1 = x and w2 = 0. Similarly, we can write x = 0 + x by taking w1 = 0 and w2 = x.
By uniqueness, this says x = 0, so that W1 ∩ W2 = {0} says V = W1 ⊕ W2 .
Example 1.2.2. In R2 , set X = {(x, y) : y = 0} and Y = {(x, y) : x = 0}. Then R2 = X⊕Y .
Note these subspaces are simply the x and y axes. In R3 , set V = {(x, y, z) : z = 0}
and W = {(x, y, z) : x = 0}. Then R3 = V + W , but the sum is not direct because
V ∩ W = {(x, y, z) : x = z = 0}.
Example 1.2.3. Let F be a field not of characteristic 2, and let Symn (F ), Skewn (F ) ⊂
Mn (F ) be the subspaces of symmetric and skew-symmetric matrices respectively. Then
Mn (F ) = Symn (F ) ⊕ Skewn (F ). Any matrix A ∈ Mn (F ) can be written A = 12 (A + At ) +
1
2
(A − At ), so Mn (F ) = Symn (F ) + Skewn (F ), and if A ∈ Symn (F ) ∩ Skewn (F ), we have
A = At and A = −At so that 2At = 0 says At = 0, so that the sum is direct.
There’s one more common operation on subspaces that we’ll study, although it is quite
a bit more abstract.
Definition 1.2.4. Let V be a vector space, and W ⊂ V be a subspace. For a vector v ∈ V ,
we define the coset of v, denoted v + W , to be v + W = {v + w : w ∈ W }, the set of
translates of v by elements of W .
Example 1.2.5. Let V = R2 , W = {(x, 0) : x ∈ R}, and v = (0, 1). What set is v + W ?
Elements of the coset v + W look like (0, 1) + w for different choices of vectors w ∈ W . Since
an arbitrary w ∈ W looks like (a, 0) for some a ∈ R, such elements look like (a, 1) for some
a ∈ R. For any choice of a the vector (a, 0) is in W , so we see that v + W = {(a, 1) : a ∈ R}.
The point of cosets is that they
S give us a way of partitioning the vector space V : as an
equality of sets, we have V = (v + W ). We’ll use these cosets to construct a new vector
v∈V
space. Let V /W = {v + W : v ∈ V }. We can define addition and scalar multiplication
operations on V /W as follows:
6
Proposition 8. V /W is a vector space, where the operations are given by (v+W )+(v 0 +W ) =
(v + v 0 ) + W and c · (v + W ) = c · v + W .
Proof. Exercise.
Definition 1.2.6. The set V /W with the operations of addition and scalar multiplication
as given above is known as the quotient space of V by W .
The idea behind the quotient space is that it “crushes” the subspace W to the 0 vector.
This can be seen from the following:
Proposition 9. Two cosets v + W and v 0 + W are equal in V /W if and only if v − v 0 ∈ W .
In particular, v + W = 0 + W in V /W if and only if v ∈ W .
Proof. Exercise.
Example 1.2.7. Consider V = R2 and W = {(x, 0) : x ∈ R}, the x-axis. For any vector
v = (a, b), we have that v + W = {(a + x, b) : x ∈ R} is the horizontal line through the
vector v. The quotient space V /W “crushes” each of these horizontal lines to a single point,
namely the intersection of this line with the y-axis: in the quotient space, we have the
equality (a, b) + W = (0, b) + W because (a, b) − (0, b) = (a, 0) ∈ W . We see that points in
V /W can be “identified” with points on the y-axis, so that one can “picture” V /W as the
y-axis.
7
Span(S) is sometimes referred to as the subspace generated by S. If V = Span(S), then
we call S a generating set for V . Observe that any subspace of V containing S must
contain Span(S), and therefore Span(S) is the smallest subspace of V containing S.
a b
Example 1.3.3. In Sym2 (F ), any symmetric matrix is of the form for some a, b, c ∈
b c
1 0 0 1 0 0
F . We see Sym2 (F ) = Span , , .
0 0 1 0 0 1
Example 1.3.4. Let V = R3 . Then V = Span{(1, 0, 0), (0, 1, 0), (0, 0, 1)}. We may also
write V = Span{(1, 1, 0), (0, 1, 1), (1, 0, 1)}), or V = Span{(1, 1, 1), (1, 0, 0), (0, 1, 0), (0, 1, 1)}.
A spanning set need not be unique, nor must any spanning set have the same cardinality.
The above example shows that a spanning set may contain “redundant” information. In
the third spanning set above, notice that (1, 1, 1) is already contained in Span{(1, 0, 0), (0, 1, 0), (0, 1, 1)},
so removing it from S does not change Span(S). We give this condition a name:
Example 1.3.6. In the above example, the set {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is linearly inde-
pendent. The set {(1, 1, 1), (1, 0, 0), (0, 1, 0), (0, 1, 1)} is linearly dependent.
Example 1.3.7. In C ∞ (R), the vectors sin(x) and cos(x) are linearly independent: if
c1 sin(x) + c2 cos(x) = 0 for all x, plugging in x = 0 and x = π2 shows that c1 = c2 = 0.
Similarly if r 6= s, the functions erx and esx are linearly independent.
Since linear dependence is defined in terms of a finite quantity, an easy definition of linear
independence that handles the case of S being infinite is as follows:
Proposition 11. Let V be a vector space and S ⊂ V . Then S is linearly independent if and
only if all finite subsets of S are linearly independent.
8
Proposition 12. Let V be a vector space and S = {v1 , . . . , vn } for some vi ∈ V . Then
S is linearly dependent if and only if v1 = 0 or there exists 1 ≤ k < n such that vk+1 ∈
Span({v1 , . . . , vk }).
Proof. The backwards direction is immediate, so suppose that S is linearly dependent. Then
c1 v1 + . . . + cn vn = 0 for some ci not all 0. Set k = max{n : cn 6= 0}, which exists since
some coefficient is non-zero and there are finitely many. Notice that this says ci = 0 for all
k < i ≤ n. If k = 1, this says c1 is the only non-zero coefficient, so c1 v1 = 0 gives v1 = 0.
Otherwise, k > 1 so c1 v1 + . . . + cn vn = c1 v1 + . . . + ck vk = 0. Since ck 6= 0, this says
vk ∈ Span({v1 , . . . , vk−1 }), so we are done.
This gives a method of checking if a set is linearly independent that works well for sets
of small size. For example, to check if {v1 , v2 , v3 } is linearly independent one just needs to
check that v2 6∈ Span({v1 }) and v3 6∈ Span({v1 , v2 }). For sets of larger size, we will later
develop more efficient methods. We end the section with an extremely useful proposition.
Proposition 13. Let S ⊂ V be linearly independent and v ∈ V . Then S ∪ {v} is linearly
dependent if and only if v ∈ Span(S).
Proof. If S ∪ {v} is linearly dependent, then there are s1 , . . . , sn ∈ S and c1 , . . . , cn+1 ∈ F
not all 0 such that c1 s1 + . . . + cn sn + cn+1 v = 0. Necessarily, cn+1 6= 0 otherwise the linear
1
independence of S forces all ci = 0. Then solving for v gives v = − cn+1 (c1 s1 + . . . + cn sn )
so v ∈ Span(S). Conversely, if v ∈ Span(S) then v = c1 s1 + . . . + cn sn for some si ∈ S and
ci 6= 0. Then c1 s1 + . . . + cn sn − v = 0 is a non-trivial linear dependence relation among
elements of S ∪ {v}, so S ∪ {v} is linearly dependent.
Theorem 1.3.8. Let S ⊂ V be linearly independent. If v = 6 0 ∈ Span(S), then v =
c1 v1 + . . . + cn vn for unique distinct vectors vi ∈ S and unique ci =
6 0 ∈ F.
Proof. Suppose that v has two different representations using vectors in S. Write v =
c1 s1 + . . . + cn sn and v = d1 t1 + . . . + dm tm for some ci , dj 6= 0 ∈ F and si , tj ∈ S, where we
may assume none of the si are the same and none of the tj are the same. Subtracting shows
c1 s1 + . . . + cn sn − d1 t1 − . . . − dm tm = 0. If {s1 , . . . , sn } =
6 {t1 , . . . , tm }, then there is some i
such that si is not equal to any of tj . Since S is linearly independent, this forces ci = 0, since
there is no other term in the sum that can be grouped with ci si . This is a contradiction, so
n = m and {s1 , . . . , sn } = {t1 , . . . , tm }. Relabeling as necessary, we may assume that si = ti
so that the above can be written as (c1 − d1 )s1 + . . . + (cn − dn )sn = 0, so ci = di for all i
and therefore such a representation is unique.
9
Definition 1.4.1. A basis of a vector space V is a linearly independent spanning set. The
dimension of V is the cardinality of a basis of V .
Perhaps in more familiar terms, the above says that every vector space has a basis. The
fact that the dimension of a vector space is actually well defined is a fairly non-trivial result,
but the proof is a rather technical set theoretic argument that is unenlightening, so for our
purposes it will be taken for granted.
Proposition 14. Let B and B 0 be two bases of a vector space V . Then |B| = |B 0 |.
Dimension is one of the most useful ideas in linear algebra: it gives us a notion of size for
a vector space, and being able to translate questions about vector spaces into statements
about integers makes them easier to understand. At this stage, linear algebra branches off
in two directions: the study of infinite dimensional vector space, and the study of finite
dimensional vector spaces, the latter of which we will focus the majority of our attention on.
The above proof that every vector space has a basis is non-constructive – it tells us one must
exist but gives us no way of finding one. In the finite dimensional case, we actually have a
constructive method for finding bases of a vector space.
Proof. We may assume that the vi are non-zero, otherwise remove them. Let m be the
largest integer such that there is an m element subset B of S that is linearly independent.
As {vi } is linearly independent for any i, and S has at most k elements, in particular B must
exist and 1 ≤ m ≤ k. Then Span(S) = Span(B). To see this, we show that vi ∈ Span(B)
for all i. If vi 6∈ B, then B ∪ {vi } is a linearly dependent subset by definition of B, so
there are c1 , . . . , cm+1 ∈ F not all 0 such that c1 s1 + . . . + cm sm + cm+1 vi = 0. By linear
independence of elements of B, necessarily cm+1 6= 0, so we can solve for vi in terms of si ,
giving vi ∈ Span(B) as desired.
Theorem 1.5.2. Let S = {v1 , . . . vk } be a linearly independent subset of V . Then there exist
vectors w1 , . . . , wm ∈ V such that {v1 , . . . , vk , w1 , . . . , wm } is a basis of V .
10
Corollary 1.5.3. Let W ⊂ V be a subspace. Then there exists W 0 ⊂ V a subspace such that
V = W ⊕ W 0.
In linear algebra, it’s not uncommon to be interested in finding a basis with some particular
choices of basis vectors, so the extension result is quite useful. The following is a translation
of the above two results using the language of dimension.
2. If S spans V , then k ≥ n
Proof. Items 1 and 2 are immediately corollaries of the above two results. To prove 3, If S is
linearly independent and S doesn’t span V , then there is v ∈ V such that S ∪ {v} is linearly
independent. But then this says n+1 ≤ n, a contradiction. Therefore S spans V . Conversely,
if S is not linearly independent, we may trim S to a basis B with n = |B| < |S| = n, a
contradiction.
Example 1.5.5. In F n , the vectors ei where ei is the vector that is 1 in the i-th coordinate
and 0 elsewhere form a basis. It’s easy to see that if c1 e1 + . . . cn en = 0, then (c1 , . . . , cn ) =
(0, . . . , 0) so ci = 0, and it’s obvious this is a spanning set. This is an n-dimensional F -vector
space.
Example 1.5.6. In Mn (F ), the matrices Eij where Eij is the matrix with (i, j)-th entry
equal to 1 and 0 elsewhere is a basis – the argument is the same as above. This is an
n2 -dimensional F -vector space.
Example 1.5.7. In Pn (F ), the set {1, x, . . . , xn } is a basis. It’s clear that this is a spanning
set, so it remains to see linear independence. If c0 + c1 x + . . . cn xn = 0 in Pn (F ), then in
particular, this holds true for all x ∈ F . The left hand side is a degree at most n polynomial,
so it has at most n roots, while the right hand side is 0 everywhere. This is only possible if
all coefficients are 0. This is an n + 1-dimensional F -vector space.
Example 1.5.8. The space of all polynomials with coefficients in F , P (F ) is infinite dimen-
sional: any finite set of polynomials has a maximal degree m, so their F -span is contained
in Pm (F ). This says no finite subset of P (F ) is a spanning set, so it is infinite dimensional
as an F -vector space.
11
Example 1.5.9. In R3 , the set {(1, 0, 1), (1, 1, 0), (0, 1, 1)} is a basis, because it is linearly
independent: one can check by hand that (0, 1, 1) 6∈ Span({(1, 0, 1), (1, 1, 0)}).
Example 1.5.10. Let V = {(x, y, z) ∈ R3 : x − 2y + z = 0 and 2x − 3y + z = 0}. Then V
is a subspace of R3 , and has basis {(1, 1, 1)}.
Example 1.5.11. The dimension of a vector space depends on the underlying field. As a
C-vector space, Cn has dimension n with basis vectors ej for 1 ≤ j ≤ n. However, C is a
2-dimensional R-vector space: any complex number z is of the form z = a + bi for real a, b,
so {1, i} is basis. The vectors ej , iej for 1 ≤ j ≤ n form a basis of Cn as a 2n-dimensional
R-vector space.
Example 1.5.12. For a 6= 0 ∈ R, the set {1, x − a, (x − a)2 , . . . , (x − a)n } is a basis for
Pn (R): if c0 + c1 (x − a) + . . . + cn (x − a)n = 0 for all x, plugging in x = a shows c0 = 0, and
taking derivatives and repeating the argument shows ci = 0. This shows linear independence
and since a basis has n + 1 elements, this is a spanning set. Every polynomial p(x) can be
written in the form p(x) = c0 + c1 (x − a) + . . . + cn (x − a)n . One can solve for the coefficients
(k)
ci by taking derivatives as necessary and plugging in x = a, to see ck = p k!(a) , recovering
the usual Taylor expansion around x = a.
To illustrate why dimension is useful, we prove a quick result, which helps us understand a
vector space by understanding its subspaces.
Proposition 15. Let W ⊂ V be a subspace. Then dim(W ) ≤ n. If dim(W ) = n, then
W =V.
Proof. First we show that W is finite dimensional. If W = {0}, we are done. Otherwise,
pick w1 6= 0 ∈ W . If W = Span({w1 }), we are done, otherwise there is w2 ∈ W with
w2 6∈ Span({w1 }), so {w1 , w2 } is linearly independent. Continue choosing vectors w3 , . . . , wk
in this way such that {w1 , . . . , wk } is a linearly independent subset of W . Since W ⊂ V , it’s
also a linearly independent subset of V , so this process must stop before the n-th step, and
the termination of this process is equivalent to saying that W = Span({w1 , . . . , wk }). This
says {w1 , . . . , wk } is a basis of W , and we have k ≤ n. If k = n, these vectors are actually a
basis of V as well, so W = V .
Example 1.5.13. Let W ⊂ R3 be a subspace. Then dim(W ) = 0, 1, 2, 3. If dim(W ) = 0,
then W = {0}, and if dim(W ) = 3, then W = R3 . If dim(W ) = 1, then W = Span({v})
for some v ∈ W , i.e. W is the line through the origin in the direction of v. If dim(W ) = 2,
we have W = Span({v1 , v2 }) for some vectors v1 , v2 . Let v = v1 × v2 , so x · v = 0 for all
x ∈ W . This defines the equation of a plane with normal vector v1 × v2 , so that subspaces of
R3 are either {0}, R3 , lines through the origin or planes through the origin. The dimensions
of these objects should hopefully match your own geometric intuition.
Example 1.5.14. Set V = (Z/pZ)2 , which is a 2-dimensional Z/pZ-vector space with basis
vectors (1̄, 0̄) and (0̄, 1̄). What are all the subspaces of V ? If W ⊂ V is a subspace, we
have dim(W ) = 0, 1, 2. If dim(W ) = 0 then W = {0}, and if dim(W ) = 2 then W = V . If
dim(W ) = 1, then W = Span({v}) for some non-zero vector v. There are a total of p2 −1 such
vectors v, and each of the p − 1 non-zero multiples of v span the same subspace of V . Since
the 1-dimensional subspaces of V partition W , we conclude there are (p2 − 1)/(p − 1) = p + 1
different 1-dimensional subspaces of V , for a total of p + 3.
12
We end with some useful dimension counting results:
Proposition 16. Let be V a vector spaces and let W, W1 , W2 be subspaces.
(a) dim(W1 + W2 ) = dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ).
13
Theorem 1.6.7. Every vector space has a basis.
Proof. Let V be a vector space over some field F . The idea of the proof is as follows: use
Zorn’s lemma to show that V contains a maximal linearly independent subset B of V (in
the sense that there is no linearly independent subset S with B ( S), and then show that
B must be a basis of V .
If V = {0}, then by definition V = Span(∅), and the empty set is linearly independent.
Now suppose that V 6= {0} and let P = {S ⊂ V : S is linearly independent} be the set of
all linearly independent subset of V with an ordering on P given by inclusion. Then P 6= ∅,
because there exists v 6= 0 ∈ V so {v} is a linearly independent subset of V . We now check
the conditions of Zorn’s lemma. Suppose
S that C ⊂ P is a chain, and write C = {Sα }α∈I
for some indexing set I. Set M = α∈I Sα . The claim is that M is an upper bound of C
S of P . The first statement is immediate by definition: for any Sα ∈ C,
that is an element
we have Sα ⊂ α∈I Sα , so C ≤ M . Therefore, we only need to check that M is a linearly
independent subset of V , so that M ∈ P , letting Zorn’s lemma kicks in.
Suppose that M is not linearly independent, then there are vectors s1 , . . . , sn where
si ∈ Sαi for some Sαi and scalars c1 , . . . , cn ∈ F not all 0 such that c1 s1 + · · · cn sn = 0. As C
is totally ordered, one of the sets Sα1 , . . . , Sαn must contain the others, so each of the vectors
si live in some common set, which we denote Sα . This says there is a non-trivial dependence
relation among vectors in Sα , contradicting that Sα is linearly independent (because Sα lives
in P !). Therefore, M is a linearly independent subset of V . By Zorn’s lemma, P contains a
maximal element with respect to inclusion, say B.
To finish up, we need to show that B spans V . Suppose otherwise, then there is some
v ∈ V such that v 6∈ Span(B). This says that B ∪ {v} is a linearly independent subset of
V with B ⊂ B ∪ {v}, contradicting the maximality of B. Therefore B spans V , and we are
done.
It’s important to note that the proof only shows that a basis exists – it gives absolutely
zero indication of what one is. The proof technique of using Zorn’s lemma is a rather stan-
dard one for proving existence theorems in mathematics (especially in algebra) and is worth
understanding.
For vector spaces spanned by finite sets, you saw in lecture that it’s not too hard to
show that any two bases have the same number of elements. This allows us to define the
dimension of a vector space. What happens if the vector space has a basis of infinitely many
elements? The dimension of a vector space is still well defined, but this now becomes a fairly
non-trivial result. Instead of talking about the number of elements in a basis, we have to
talk about the cardinality of the basis, and if you know anything about set theory, there are
many different “sizes” of infinite sets which is what causes complications. The proof is a
rather technical set theoretic argument that is unenlightening, so we will take it for granted.
Proposition 17. Let B and B 0 be two bases of a vector space V . Then |B| = |B 0 |.
This gives us the following definition that works for any vector space:
14
Definition 1.6.8. Let V be a vector space. The dimension of V is defined as the cardinality
of a basis of V . V is said to be infinite dimensional if it’s dimension is not finite.
If you’re familiar with the notion of countability, the following are examples of infinite
vector spaces of different “sizes”:
Example 1.6.9. The vector space Q[x] is infinite dimensional as a Q-vector space, because
the span of any finite set of polynomials has bounded degree. The set {1, x, x2 , . . .} is a basis
of Q[x] as a Q-vector space, and so Q[x] has countable dimension.
Example 1.6.10. R is infinite dimensional as a Q-vector space, because any finite dimen-
sional vector space over Q must be countable, and R is not countable. It turns out that R
has uncountable dimension as a Q-vector space (but this is much harder to show).
15
Chapter 2
Linear Transformations
A general philosophy is that to study algebraic structures, one needs to not just study the
objects but structure preserving maps between these objects as well. There is no simple
explanation for the latter, but historically it has been very productive. When studying how
to solve systems of linear equations in Rn , one is naturally led to matrix equations of the
form Ax = b. A matrix A defines a function T : Rn → Rn given by T (x) = Ax. This func-
tion T respects the structure of Euclidean space, in the sense that T (x + y) = T (x) + T (y)
and T (cx) = cT (x) for all x, y ∈ Rn and c ∈ R. Since vector spaces are nothing more than
abstracted versions of Euclidean space, we should look at abstract analogues of matrices, i.e.
functions that preserve the vector space structure.
Unless otherwise stated, through the handout V is a finite dimensional vector space of
dimension n over a field F . The letter T will always denote a linear transformation.
Note that necessarily a linear transformation satisfies T (0) = 0. We also see by induction
that for any finite collection of vectors v1 , . . . , vn and scalars c1 , . . . , cn ∈ F we have T (c1 v1 +
. . . + cn vn ) = c1 T (v1 ) + . . . + cn T (vn ).
Definition 2.1.2. The kernel ker(T ) is defined by ker(T ) = {x ∈ V : T (x) = 0}. The
image Im(T ) is defined by Im(T ) = {T (x) : x ∈ V }.
The image and kernel of T are two important subspace of V and W respectively, and we
can translate set theoretic statements about injectivity and surjectivity into the language of
linear algebra.
16
Proof. Since T is linear, we have T (0) = T (0 + 0) = T (0) + T (0), so 0 = T (0) gives
0 ∈ ker(T ). If x, y ∈ ker(T ) then T (x + y) = T (x) + T (y) = 0 by linearity. Similarly, if
c ∈ F , T (cx) = cT (x) = 0 so cx ∈ ker(T ) giving ker(T ) is a subspace of V . Since T (0) = 0,
this says 0 ∈ Im(W ). If x, y ∈ Im(W ) then there are u, v ∈ V such that x = T (u) and
y = T (v). Then x + y = T (u) + T (v) = T (u + v) so x + y ∈ Im(T ). Finally, if x = T (u) then
cx = cT (u) = T (cu) so cx ∈ Im(T ) which says Im(T ) is a subspace of W .
Proof.
(a) Suppose that T is injective. If x ∈ ker(T ), then T (x) = T (0) so injectivity says
x = 0 giving ker(T ) = {0}. If ker(T ) = {0}, if T (x) = T (y) then T (x − y) = 0 says
x − y ∈ ker(T ) so x − y = 0, i.e. x = y so T is injective.
(b) If T is surjective, then for every y ∈ W there is x ∈ V such that T (x) = w, which is
precisely the same as saying W = Im(T ). On the other hand, if W = Im(T ) then for
all w ∈ W there is x ∈ V with w = T (x), so T is surjective.
Example 2.1.3. For any vector space V , the identity transformation idV : V → V given
by idV (x) = x is linear.
Example 2.1.4. For any field F and a ∈ F , the map T : F → F given by T (x) = ax is a
linear transformation by the field axioms.
Example 2.1.6. For any matrix A ∈ Mm×n (F ), the map T : F n → F m given by T (x) = Ax
is a linear transformation, since A(x + y) = Ax + Ay and A(cx) = c(Ax) by how matrices
work.
Rx
Example 2.1.7. In P (R), the maps D(p)(x) = p0 (x) and I(p)(x) = 0 p(t) dt are linear
operators on P (R) by calculus. D is not Rinjective, because any constant polynomial has
x
derivative 0, but D is surjective since D( 0 p(t) dt) = p(x) by the fundamental theorem
of calculus. The operator I is injective but not surjective because nothing maps to the
polynomial p(x) = 1.
Example 2.1.8. The map D : C ∞ ([0, 1]) → C ∞ ([0, 1]) given by D(f )(x) = f (x) − f 0 (x) is
a linear transformation. Saying f ∈ ker(D) is the same as saying f 0 (x) = f (x), so ker(D)
is precisely the set of functions that satisfy the differential equation f = f 0 . From calculus,
we know the only such functions are of the form cex for c ∈ R, so ker(D) = Span({ex }) is a
1-dimensional subspace of C ∞ ([0, 1]).
17
Example 2.1.9. The map T : Mn (F ) → Mn (F ) given by T (A) = A − At is linear. ker(T )
is the set of matrices with A = At , i.e. ker(T ) = Symn (F ). Any matrix in Im(T ) is of the
form A − At for some A, which is skew-symmetric, so Im(T ) ⊂ Skewn (F ). If F does not
have characteristic 2, for any skew-symmetric matrix B, we have T (B) = B − B t = 2B, so
T ( 21 B) = B says Im(T ) = Skewn (F ).
Example 2.1.10. Let F ∞ be the sequence space of elements of F . That is, F ∞ = {(a1 , a2 , . . .) :
ai ∈ F }. Define maps R : F ∞ → F ∞ by R((a1 , a2 , . . .)) = (a2 , a3 , . . .) and L((a1 , a2 , . . .)) =
(0, a1 , a2 , . . .), the right and left shift operators respectively. Then both R and L are linear
operators on F ∞ .
Example 2.1.11. Let W ⊂ V be a subspace. The map π : V → V /W given by T (v) = v+W
is a linear transformation, called the quotient map.
Example 2.1.12. Suppose V = W ⊕ U for some subspaces W, U of V . The projection
πW of V onto W along U is defined by πW (x) = w where x = w + u for unique w ∈ W
and u ∈ U . Then πW is linear, and ker(πW ) = U and Im(πW ) = W . If we assume V is
finite dimensional, for any subspace W there is U such that V = W ⊕ U . This then says
that any subspace W is the kernel of some linear transformation, namely πU where U is the
complement of W in V . Similarly, W appears as the image of πW .
A natural question is given a linear operator T : V → V and a subspace W of V , when
does T restrict to a linear operator on W ? Necessarily, if T restricts to an operator on W
we must have T (W ) ⊂ W , and actually this is sufficient: if T (W ) ⊂ W , then for x, y ∈ W
we have T (x + y) = T (x) + T (y) since x, y ∈ V and T (cx) = cT (x) for c ∈ F . We give such
subspaces a name:
Definition 2.1.13. Given a linear operator T : V → V , a subspace W ⊂ V is called
T-invariant if T (W ) ⊂ W . The restriction of T to W , denoted by T |W , is the linear
transformation T |W (x) = T (x) for all x ∈ W .
Example 2.1.14. Let V = W ⊕ U , and consider πW . Then W is πW -invariant, and πW |W
is the identity map on W .
Example 2.1.15. The map T : Mn (F ) → Mn (F ) given by T (A) = A − At is Symn (F )-
invariant. The restriction T |Symn (F ) is simply the 0 map.
18
Proof. Let y ∈ Im(T ). Then y = T (x) for some x ∈ V , and we may write x = c1 v1 +. . .+cn vn
for some ci ∈ F . Then y = T (x) = T (c1 v1 + . . . + cn vn ) = c1 T (v1 ) + . . . + cn T (vn ), so
y ∈ Span({T (v1 ), . . . , T (vn )}, which says this is a spanning set of Im(T ). If further we
assume that T is injective, if c1 T (v1 ) + . . . + cn T (vn ) = 0, then T (c1 v1 + . . . + cn vn ) = 0, so
c1 v1 + . . . + cn vn ∈ ker(T ). Since T is injective, this says c1 v1 + . . . + cn vn = 0, and since the
vectors vi are linearly independent this says ci = 0, i.e. that {T (v1 ), . . . , T (vn )} is linearly
independent and therefore a basis of Im(T ), so that rank(T ) = n.
Proof. Pick a basis {v1 , . . . , vk } of ker(T ) and extend this to a basis {v1 , . . . , vk , w1 , . . . , w` }
of V . The above shows that Im(T ) = Span({T (w1 ), . . . , T (w` )}), so {T (w1 ), . . . , T (w` )}
is a spanning set and therefore it’s sufficient to prove it is a basis of Im(T ). Suppose
c1 T (w1 ) + . . . + c` T (w` ) = 0. Then T (c1 w1 + . . . + c` w` ) = 0, so c1 w1 + . . . + c` w` ∈
ker(T ). We may then write c1 w1 + . . . + c` w` = a1 v1 + . . . + ak vk for some ai ∈ F , so
c1 w1 + . . . + c` w` − a1 v1 − . . . − ak vk = 0. Since the vectors wi , vj are a basis of V , this says
all ci = 0 and all ai = 0, so that {T (w1 ), . . . , T (w` )} is a basis as desired.
Sometimes dim(ker(T )) is referred to as the nullity of T , hence the name of the theorem,
but this terminology is not commonly used outside of linear algebra textbooks. As an
immediate corollary of the rank-nullity theorem, we getting the following analogous result
for functions between finite sets of the same size:
Corollary 2.2.3. Let V and W be vector spaces of the same dimension. Then T : V → W
is injective ⇐⇒ T is surjective ⇐⇒ T is bijective.
Proof. T is injective if and only if ker(T ) = {0}, so by rank-nullity this says n = rank(T )+0,
i.e. Im(T ) = W so T is surjective. Similarly if T is surjective, rank(T ) = n so rank-
nullity says n = n + dim(ker(T )) so dim(ker(T )) = 0 gives ker(T ) = {0} and therefore T is
injective.
We give some examples to illustrate how the rank-nullity theorem is used to compute images
and kernels of linear transformations.
Example 2.2.4. Let T : R3 → R3 be given by T (x, y, z) = (x+y +2z, 2x+2y +4z, 2x+3y +
5z). Then T is a linear transformation, and Im(T ) = Span({(1, 2, 2), (1, 2, 3), (2, 4, 5)}) =
Span({(1, 2, 2), (1, 2, 3)}). The latter set is a basis for Im(T ), so that rank(T ) = 2, i.e. Im(T )
is a plane in R3 . By rank-nullity the kernel ofT is 1-dimensional,
so it must be a line. Which
1 1 2
line is it? Representing T as the matrix A = 2 2 4, one sees that any vector orthogonal
2 3 5
to the rows of A is contained in the kernel. Taking the cross product of the first and third
rows shows (−1, 1, 1) ∈ ker(T ) so that ker(T ) = Span({(−1, 1, 1)}).
19
nothing to do with entries off the diagonals, we see that the matrices Eij with i 6= j along
the matrices −E11 + Eii with 2 ≤ i ≤ n form a spanning set for ker(T ), and therefore a basis
because there are n2 − 1 of them.
Example 2.2.6. The map T : P (R) → P (R) defined by T (p) = 5p00 + 3p0 is surjective. Let q
be a polynomial of degree n. Restricting T to Pn+1 (R) defines a map T 0 : Pn+1 (R) → Pn (R).
with T 0 (p) = T (p). By rank-nullity, rank(T 0 ) + dim(ker(T 0 )) = n + 2. If p ∈ ker(T 0 ), then
5p00 + 3p0 = 0 says 5p00 = −3p0 . Since deg(p00 ) = deg(p0 ) − 1, this is impossible unless both p0
and p00 are 0, i.e. p is a constant. This says ker(T 0 ) = Span({1}), so ker(T 0 ) is 1-dimensional,
and rank(T 0 ) = n + 1 says T 0 is surjective.
Example 2.2.8. Let dim(V ) = n and dim(W ) = m with n < m. Then there is no surjective
linear transformation T from V to W : by rank-nullity, rank(T ) + dim(ker(T )) = n, so
rank(T ) = n − dim(ker(T )) ≤ n < m says Im(T ) 6= W . Similarly, if n > m there is no
injective linear transformation from V to W . This says if n < m then an m-dimensional
vector space is “larger” than an n-dimensional vector space, which hopefully matches with
your intuition.
20
Proposition 22. HomF (V, W ) is a subspace of F(V, W ).
Proof. If T, U ∈ HomF (V, W ) recall that by definition, we have (T + U )(v) = T (v) + U (v)
and (cT )(v) = cT (v). To check that HomF (V, W ) is a subspace, we need to check that it
is non-empty, and that for T, U linear transformations and c ∈ F that T + U and cT are
linear transformations. Note that the 0 function T (x) = 0 for all x ∈ V is certainly linear.
If x, y ∈ V then (T + U )(x + y) = T (x + y) + U (x + y) = T (x) + U (x) + T (y) + U (y) =
(T + U )(x) + (T + U )(y). Next, for c ∈ F we have (T + U )(cx) = T (cx) + U (cx) =
cT (x) + cU (x) = c(T (x) + U (x)) = c(T + U )(x), so T + U is a linear transformation which
says T +U ∈ HomF (V, W ). For x, y ∈ V , c, k ∈ F , we have (cT )(x+y) = (cT )(x)+(cT )(y) =
cT (x) + cT (y) = (cT )(x) + (cT )(y), and (cT )(kx) = cT (kx) = k(cT (x)) = k(cT )(x). This
says cT is a linear transformation, so cT ∈ HomF (V, W ) so that HomF (V, W ) is a subspace
of F(V, W ).
In most linear algebra books the space HomF (V, W ) is denoted as L(V, W ) and EndF (V )
as L(V ), but the above notation is more common elsewhere in mathematics. One of the rea-
sons why finite dimensional vector spaces are so easy to study is that linear transformations
between V and W are the same things as functions defined on a basis of V . This reduces
much of the study of linear algebra to studying functions defined on a finite set. This is
stated precisely in the following form.
Theorem 2.3.3. Let V be finite dimensional, and let B be a basis of V . Then HomF (V, W ) ∼
=
F(B, W ). In other words, every linear transformation is determined uniquely by what it does
on a basis of V .
Proof. Suppose that T : V → W is a linear transformation, and let B = {v1 , . . . , vn } be a
basis of V . For x ∈ V , we may uniquely write x = c1 v1 + . . . + cn vn , so T (x) = c1 T (v1 ) + . . . +
cn T (vn ) by linearity. This defines a function f : B → W by f (vi ) = T (vi ). Now suppose
we have a function f : B → W . Define Tf : V → W by Tf (c1 v1 + . . . + cn vn ) = c1 f (v1 ) +
. . . + cn f (vn ). We need to check that Tf is a linear transformation, and that it is the only
transformation that agrees with f on B. Write x = c1 v1 +. . .+cn vn and y = d1 v1 +. . .+dn vn .
Then Tf (x + y) = Tf ((c1 + d1 )vn + . . . + (cn + dn )vn ) = (c1 + d1 )f (v1 ) + . . . + (cn + vn )f (vn ) =
c1 f (v1 )+. . .+cn f (vn )+d1 f (v1 )+. . .+dn f (vn ) = Tf (x)+Tf (y). For c ∈ F , we have Tf (cx) =
Tf ((cc1 )v1 +. . .+(ccn )vn ) = (cc1 )f (v1 )+. . .+(ccn )f (vn ) = c(c1 f (v1 )+. . .+cn f (vn )) = cTf (x),
which shows that Tf is linear. Finally, suppose there is some other linear transformation
T 0 : V → W such that T 0 (vi ) = f (vi ). As mentioned above then says for any x ∈ V ,
T 0 (x) = T 0 (c1 v1 + . . . + cn vn ) = c1 T 0 (v1 ) + . . . + cn T 0 (vn ) = c1 f (v1 ) + . . . cn f (vn ) = Tf (x), i.e.
T 0 = Tf so Tf is the only linear transformation with this property.
Putting this all together, this says the map G : F(B, W ) → HomF (V, W ) with G(f ) = Tf is
a bijection: it is injective because if G(f ) = G(g), then Tf = Tg for all x ∈ V . This then says
Tf (vi ) = f (vi ) = g(vi ) = Tg (vi ) for all vi , so that f = g because they agree on all elements
of B. It is surjective because T ∈ HomF (V, W ) defines a map f : B → W by f (vi ) = T (vi )
and by definition we have G(f ) = T . It remains to show that G is linear, however this is
clear because G(f + g) = Tf +g = Tf + Tg because Tf +g (vi ) = (f + g)(vi ) = f (vi ) + g(vi ) =
Tf (vi ) + Tg (vi ) so Tf +g and Tf + Tg agree on B and therefore on all of V . Similarly we see
Tcf = cTf for c ∈ F , so G is linear and we are done.
21
Theorem 2.3.4. Two finite dimensional vector spaces are isomorphic if and only if they
have the same dimension.
Proof. Suppose that V, W are finite dimensional with V ∼ = W . Then by definition, there is a
bijective linear transformation T : V → W . By rank-nullity, rank(T ) + dim(ker(T )) =
dim(V ), and since T is a bijection this says Im(T ) = W so rank(T ) = dim(W ) and
dim(ker(T )) = 0, i.e. dim(V ) = dim(W ). Now Suppose that V and W are vector spaces
of the same dimension. Let B = {v1 , . . . , vn } be a basis of V and B 0 = {w1 , . . . , wn } be
a basis of W . Define f : B → W by f (vi ) = wi . The previous theorem gives us a linear
transformation Tf : V → W . Write x = c1 v1 + . . . + cn vn . Then if Tf (x) = 0, this says
c1 T (v1 ) + . . . + cn T (vn ) = c1 w1 + . . . + cn wn = 0, so all ci = 0 because wi are linearly
independent. This says x = 0, so ker(T ) = {0}. Thus, T is injective and therefore bijective,
so V ∼ = W.
Example 2.4.2. Let x = (1, 3) ∈ R2 and let β = {e1 , e2 } be the standard basis of R2 .
Then x = e1 + 3e2 so [x]β = (1, 3). Set γ = {(1, 1), (1, −1)}. Then x = 2(1, 1) − (1, −1) so
[x]γ = (2, −1). With α = {(1, 3), (1, 0)} we have [x]α = (1, 0).
Example 2.4.3. Let β = {1, x, x2 } be the standard basis of P2 (R). Then with p(x) =
4 − 3x + 3x2 , we have [p(x)]β = (4, −3, 3). If γ = {1, x, 23 x2 − 21 }, we have [p(x)]γ = (5, −3, 2),
as 5 − 3x + 2( 32 x2 − 12 ) = p(x).
Example 2.4.5. View C as an R-vector space with basis β = {1, i}. Then x = 3 + 5i has
[x]β = (3, 5). As a C-vector space, C has basis γ = {1}, so [x]γ = 3 + 5i
22
Proof. Let x, y ∈ V with x = c1 v1 + . . . + cn vn and y = d1 v1 + . . . + dn vn . Then x + y =
(c1 + d1 )v1 + . . . + (cn + dn )vn . We have Cβ (x + y) = (c1 + d1 , . . . , cn + dn ) = (c1 , . . . , cn ) +
(d1 , . . . , dn ) = Cβ (x) + Cβ (y). For any k ∈ F , we have kx = kc1 v1 + . . . + kcn vn , so Cβ (kx) =
(kc1 , . . . , kcn ) = k(c1 , . . . , cn ) = kCβ (x). This proves that Cβ is linear. If Cβ (x) = 0, this
says that x = 0v1 + . . . + 0vn = 0. This says ker(Cβ ) = {0}, so Cβ is injective and therefore
bijective giving V ∼ = F n.
Coordinates are one of the best ideas in mathematics, and in linear algebra this is no
different. Coordinates give us a way of viewing a vector in an abstract vector space as a
more concrete n-tuple of elements of F . In fact, we can do more: using coordinates, we can
associate to every linear transformation T : V → W a matrix [T ]γβ ∈ Mm×n (F ). This reduces
the study of linear maps from V to W , and therefore linear algebra as a whole, to studying
Mm×n (F ).
The definition of the matrix of T says that [T (x)]γ = [T ]γβ [x]β , and so one can then recover
the actual vector T (x) by setting up the corresponding linear combination of basis vectors
in γ.
23
Example 2.4.9. Let α = a + bi ∈ C. View C as an R-vector space with the standard basis
β = {1, i} , and consider the linear transformation
T : C → C defined by T (x) = αx, the
a −b
multiplication by α map. Then [T ]β = . This says any complex number a + bi can
b a
a −b
be thought of as the matrix .
b a
Example 2.4.10. Let D : P3 (R) → P2 (R) be the derivative map, and β= {1, x, x2 , x3 } and
0 1 0 0
2 γ
γ = {1, x, x } be the standard bases of P3 (R) and P2 (R). Then [D]β = 0 0 2 0.
0 0 0 3
Example 2.4.11. Set V = Span({sin(x), cos(x)}) ⊂ C ∞ (R) and define T : V→ V by
4 4
T (f ) = 3f + 2f 0 − f 00 . With β = {sin(x), cos(x)}, we see that [T ]β = because
2 −2
T (sin(x)) = 4 sin(x) + 2 cos(x) and T (cos(x)) = 4 sin(x) − 2 cos(x). Using row reduction, one
can check the only solution to [T ]β x = 0 is x = 0. This says no non-trivial linear combination
of sin(x) and cos(x) are solutions the the differential equation 3f + 2f 0 − f 00 = 0.
Example 2.4.12. Let T : V → V be linear and suppose W is a T -invariant subspace. Let
U be the complement of W in V , so V = W ⊕ U . If {w1 , . . . , wk } and {u1 , . . . , u` } are bases
of W and U , then {w1 , . . . , wk , u1 , . . . , u` } is a basis of V . Since T (W ) ⊂ W , we may write
T (wi ) = ci1 w1 + . . . + cikwk , so [T (wi )]β = (c1i , . . . , cki , 0, . . . , 0). This gives that [T ]β is a
A B
block matrix of the form , where A is the k ×k matrix [cij ], O is the (n−k)×(n−k)
O C
zero matrix, and B and C are some matrices of size (n − k) × (n − k) and k × k respectively.
Therefore having a T -invariant subspace allows one to decompose the matrix of T into an
easier to work with block form.
The results of this section that there is a correspondence between matrices and linear
transformations can be summed up in the below theorem.
Lemma 2.4.13. Let T, U : V → W be linear transformations, and c ∈ F . Then [T + U ]γβ =
[T ]γβ + [U ]γβ and [cT ]γβ = c[T ]γβ .
Proof. By definition, the i-th column of [T +U ]γβ is equal to [(T +U )(vi )]γ = [T (vi )+U (vi )]γ =
[T (vi )]γ + [U (vi )]γ by linearity of the map Cγ . However, this is clearly also the i-th column
of the matrix [T ]γβ + [U ]γβ so [T + U ]γβ = [T ]γβ + [U ]γβ . Similarly, the i-th column of the matrix
[cT ]γβ is given by [(cT )(vi )]γ = [cT (vi )]γ = c[T (vi )]γ , which is again the i-th column of the
matrix c[T ]γβ .
Theorem 2.4.14. HomF (V, W ) ∼
= Mm×n (F ). So in particular, dim(HomF (V, W )) = mn.
Proof. Define F : HomF (V, W ) → Mm×n (F ) by F (T ) = [T ]γβ . Suppose that F (T ) = F (U ).
Then [T ]γβ = [U ]γβ , so in particular the columns of these matrices are the same so [T (vi )]γ =
[U (vi )]γ for all i. Translating back
to the actual vectors says T (vi ) = U (vi ), i.e. T = U so F
p p
is injective. For a matrix A = x 1 . . . x n
∈ Mm×n (F ), write xi = (a1i , . . . , ami ). Then
p p
24
define f : B → W by f (vi ) = a1i w1 + . . . + ami wm . This defines a linear transformation
T : V → W with T (vi ) = f (vi ) so in coordinates, [T (vi )]γ = [f (vi )]γ = xi . This then says
[T ]γβ = A, so that F is surjective, so F is a bijection. By the above lemma, F is linear, so F
is an isomorphism as desired. The dimension result then follows immediately.
2.5 Invertibility
Recall that a function f : X → Y is said to be invertible if there is g : Y → X such that
f ◦ g = idY and g ◦ f = idX , and we denote g = f −1 . From set theory, f is invertible if
and only if f is a bijection. For linear transformations T : V → W and S : W → Z, we
denote the composition S ◦ T by ST , and clearly then T is an isomorphism if and only if T is
invertible. Since linear transformations correspond to matrices, we make a similar definition.
Proof. For x, y ∈ V we have ST (x + y) = S(T (x + y)) = S(T (x) + T (y)) = ST (x) + ST (y),
and for c ∈ F , we also have ST (cx) = S(T (cx)) = S(cT (x)) = cST (x) since S, T are
linear. Suppose that T is invertible with inverse T −1 . For w, w0 ∈ W , T −1 (w + w0 ) is the
vector that maps to w + w0 under T . Since T is linear, T (T −1 (w) + T −1 (w0 )) = w + w0 , so
T −1 (w + w0 ) = T −1 (w) + T −1 (w0 ). Similarly we see for c ∈ F that T −1 (cw) = cT −1 (w) so
T −1 is linear.
Proof. Let α = {v1 , . . . , vn } and β = {w1 , . . . , wk }. By definition, the i-th column of [ST ]γα
is given by [ST (vi )]γ . The i-th column of [S]γβ [T ]βα is Sβγ [T (vi )]β , so it’s sufficient to check
these expressions are equal. Write T (vi ) = c1i w1 + . . . + cki wk . Then ST (vi ) = S(c1i w1 +
. . . + cki wk ) = c1i S(w1 ) + . . . + c1k S(wk ). Applying Cγ then gives [ST (vi )]γ = c1i [S(w1 )]γ +
. . . + c1k [S(wk )]γ , which we then recognize as saying [ST (vi )]γ = [S]γβ [T (vi )]β as desired.
25
that [T ]γβ is invertible. Then the columns of [T ]γβ are linearly independent: if not, there
are c1 , . . . , cn ∈ F not all 0 such that c1 [T (v1 )]γ + . . . + cn [T (vn )]γ = 0, i.e. there is a
non-trivial solution to [T ]γβ x = 0. However, this is impossible because multiplying by the
inverse of [T ]γβ on the left shows that if the above holds then necessarily x = 0. This says
that the vectors [T (vi )]γ in F n are linearly independent, and as the coordinate mapping
is an isomorphism this then implies that the vectors wi = T (vi ) are linearly independent
vectors in W , and therefore are a basis of W . Define a linear transformation S : W → V
by S(wi ) = vi and extend linearly. By definition, ST (vi ) = S(wi ) = vi , so ST = idV , and
similarly T S(wi ) = T (vi ) = wi , so T S = idW so that T is invertible as desired.
26
We showed above that checking the invertibility of a linear operator T is the same as
checking the invertibility of its corresponding matrix. We remind the reader of some of many
equivalent conditions for checking the latter:
(a) A is invertible.
(d) A is row-equivalent to In .
(e) det(A) 6= 0.
(f ) The augmented matrix [A|I] is row equivalent to [I|B] for some non-zero matrix B.
Pick bases β, γ of V , and consider the identity operator idV , along with the corresponding
matrix [idV ]γβ . This matrix satisfies [x]γ = [idV (x)]γ = [idV ]γβ [x]β for all x ∈ V , or in other
words, multiplication by [idV ]γβ converts the coordinates of the vector x from the basis β to
the basis γ.
Definition 2.6.1. Let V be a vector spaces with basis β = {v1 , . . . , vn }, and let γ =
{v10 , . . . , vn0 } be another basis. The change of basis matrix from β to γ, denoted Sβγ , is the
p p
matrix [idV ]γβ . Explicitly, Sβγ = [v1 ]γ . . . [vn ]γ .
p p
Since idV is invertible, this says Sβγ is invertible, and has inverse matrix [idV ]βγ = Sγβ .
Theorem 2.6.2. Let β, γ be two bases of a finite dimensional vector space V , and let T ∈
GL(V ). Then [T ]γ = Sβγ [T ]β Sγβ , and [T ]γ Sβγ = Sβγ [T ]β . In otherwords, the following diagram
[T ]β
[x]β [T (x)]β
commutes. Sβγ Sβγ
[T ]γ
[x]γ [T (x)]γ
27
Proof. Since composition of linear transformations corresponds to multiplication by their
corresponding matrices, we see [T ]γ = [idV ◦ T ◦ idV ]γγ = [idV ]γβ [T ]β [idV ]βγ = Sβγ [T ]β Sγβ .
Since Sβγ is invertible with inverse Sγβ , multiplication on the left gives Sγβ [T ]γ = [T ]β Sγβ as
desired.
2
Example 2.6.3. Let β = {e1 , e2 } be the standard basis ofR and γ = {(1, 1), (1, 2)} be
1 1
another basis. The change of basis matrix Sγβ is Sγβ = . To compute Sβγ , we take the
1 2
γ 2 −1
inverse to find Sβ = . To compute [e1 ]γ , we see [e1 ]γ = Sβγ [e1 ]β = Sβγ e1 = (2, −1),
−1 1
so that (1, 0) = 2(1, 1) − (1, 2).
Example 2.6.4. Let β = {1, x, x2 } and γ = {1, x, 23 x2 −21 } be bases of P2 (R). Then
0 − 21 1 0 13
1
β γ
Sγ = 0
1 0 , and one can compute Sβ = 0 1 0 . Let T : P2 (R) → P2 (R) be
0 0 32 0 0 23
0 0 0
0
defined by T (f )(x) = xf (x). Then [T ]β = 0 1 0 and the change of basis formula says
0 0 2
0 0 1
[T ]γ = 0
1 0.
0 0 2
Example 2.6.5. Let T : R3 → R3 be given by T (x, y, z) = (2z, −2x + 3y + 2z, −x + 3z). Let
3
β = {e1 , e2 , e3 } be the standard
basis
of R , and let γ = {(2, 1, 1), (1, 0, 1),
(0, 1, 0)} be an-
0 0 2 2 1 0 1 0 −1
other basis. Then [T ]β = −2 3 2. We see Sγβ = 1 0 1, and Sβγ = −1 0 2 ,
−1 0 3 1 1 0 −1 1 1
1 0 0
so the change of basis formula says [T ]γ = 0 2 0. With respect to the new basis γ,
0 0 3
this says that T acts along each γ-direction by scaling. Having a basis where an operator is
diagonal is extremely useful, as it allows one to easily compute values of compositions. For ex-
ample, to compute T n (1, 2, 3), we compute [T n (1, 2, 3)]γ = [T n ]γ [(1, 2, 3)]γ = [T ]nγ [(1, 2, 3)]γ .
We have [(1, 2, 3)]γ = Sβγ [(1, 2, 3)]β = (−2, 5, 4), so [T n (1, 2, 3)]γ = [T ]nγ (−2, 5, 4)T = (−2, 5 ·
2n , 4 · 3n ). This says T n (1, 2, 3) = −2(2, 1, 1) + 5 · 2n (1, 0, 1) + 4 · 3n (0, 1, 0) = (−4 + 5 · 2n , −2 +
4 · 3n , −2 + 5 · 2n ).
28
1 0 0 0
0 1 1 1
and one can check that Sβγ = 3 γ 3
0 0 2 6. Then [x ]γ = Sβ [x ]β = (0, 1, 6, 6), so x =
3
0 0 0 6
x x x Pn−1 3 Pn−1 k
k = k=1 1 + 6 k2 + 6 k3 = n2 + 6 n3 + 6 n4 =
1
+ 6 2 + 6 3 . As an application, k=1
Pn−1 k
( n(n−1) n
2
)2 , which follows from the identity k=1 r
= r+1 , easily proven by induction.
Definition 2.6.7. For A, B ∈ Mn (F ), we say that A and B are similar and write A ∼ B
if there exists P ∈ GLn (F ) such that A = P BP −1 .
Since Sγβ = (Sβγ )−1 , this says that for any choice of bases β, γ of V the matrices [T ]β and
[T ]γ are similar. The following observation is easy to verify:
We showed that changing the basis of V from β to γ yields similar matrices [T ]β and [T ]γ .
The converse is true as well:
29
Proposition 28. Let A, B ∈ Mn (F ), with A ∼ B. Then tr(A) = tr(B) and det(A) =
det(B).
Definition 2.6.10. Let V be finite dimensional. For T ∈ GL(V ) we define the trace of T
and the determinant of T to be the quantities tr([T ]β ) and det([T ]β ) for any choice of basis
β of V .
1 2 0 2
Example 2.6.11. The matrices A = and B = are not similar, because
3 4 1 −3
tr(A) = 5 while tr(B) = −3. However, both det(A) = det(B) = −2.
Recall that the rank of a linear transformation T was defined as the dimension of it’s
image. We can also similarly define rank in terms of a matrix representation of T :
30
Chapter 3
Diagonalization
After developing the basic theory of linear transformations, we turn our attention to the
study of linear operators. As we saw, a change of basis of V from β to β 0 corresponds to
β0
conjugation of the matrix [T ]β by some invertible change of basis matrix Sβ . A natural
question is if there is a “best” basis β to pick so that [T ]β will be as easy to understand
as possible. The best we could hope for in general is that [T ]β is a diagonal matrix, so the
question becomes if it is possible to find a basis β of V such that [T ]β is diagonal. Answering
this question will be the primary purpose of this handout.
Throughout this document, V will denote a vector space over an arbitrary field F and
T : V → V will denote a linear operator.
Our first order of business is to determine what the possible eigenvalues of a linear
operator are. When V is finite dimensional, this is quite easily done using the theory of
determinants. We remind the reader of the following definition:
Elementary properties of the determinant show that similar matrices have the same
determinant, so the above definition actually is independent of a choice of basis and so the
notation makes sense.
31
Proof. Pick a basis β of V . Suppose that λ ∈ F is an eigenvalue of T with eigenvector v.
Then (λ · IV − T )(v) = 0, i.e. the operator λ · IV − T is not invertible, and from the theory
of matrices, this says [λ · IV − T ]β is not invertible. Therefore det([λ · IV − T ]β ) = 0, so by
definition this says that det(λ · IV − T ) = 0. Conversely, if det(λ · IV − T ) = 0, then λ · IV − T
is not invertible, so there is some vector v in the kernel of λ · IV − T , i.e. a vector v such
that T (v) = λv so that λ is an eigenvalue of T .
The above says that checking if λ is an eigenvalue of T is equivalent to finding a root of
the polynomial det(x · IV − T ). We give this polynomial a name:
Definition 3.1.4. The polynomial pT (x) = det(x · IV − T ) is called the characteristic
polynomial of T .
Restated in the new definition, we have the following:
Theorem 3.1.5. Let V be an n dimensional vector space. λ ∈ F is an eigenvalue of
T : V → V if and only if λ is a root of the characteristic polynomial pT (x).
Notice that since the determinant of a linear operator does not depend on a choice of basis,
the characteristic polynomial is independent of such a choice as well, and so it is well defined.
Before moving on, we make a few remarks: notice that the definition of an eigenvalue
depends on which field we view V as a vector space over. If F is algebraically closed,
then every linear operator T : V → V has an eigenvalue, because then the characteristic
polynomial of T necessarily has a root in F . If F is not algebraically closed, it may be possible
for an operator to not have an eigenvalue. Additionally, the definition of an eigenvalue makes
sense for infinite dimensional vector spaces, even if our criterion for easily finding eigenvalues
only works for finite dimensional vector spaces. Some of the examples below will illustrate
this.
Example 3.1.6. Let T : R3 → R3 be given z) = (y, −5x+4y+z, −x+y+z). Then
by T (x, y,
0 1 0
with β the standard basis, we see [T ]β = −5 4 1. Then pT (x) = (x − 1)(x − 2)2 , so T
−1 1 1
has eigenvalues 1 and 2. One can check T has eigenvectors v1 = (1, 1, 2) and v2 = (1, 2, 1)
corresponding to the eigenvalues 1 and 2 respectively.
Example 3.1.7. Let T : R2 → R2 be a counterclockwise rotation by some angle θ ∈ (0, 2π).
Then T has no eigenvectors, because no vector v ∈ R2 is scaled along the same direction
2
T . Explicitly, with β = {e1 , e2 } the standard basis of R , one can check that [T ]β =
by
cos(θ) − sin(θ)
. The characteristic polynomial of T is given by pT (x) = x2 −2 cos(θ)x+1,
sin(θ) cos(θ)
and the quadratic formula says this has no real roots.
Example 3.1.8. Write V = W ⊕ W 0 for some subspaces W, W 0 of V . Let P = πW be
the projection onto W , so that P 2 = P . If (λ, v) is an eigenpair for P , then λ2 v = P 2 v =
P v = λv, so that λ2 = λ says that λ = 0, 1 are the only possible eigenvalues of P . For any
w ∈ W , P (w) = w, and for w0 ∈ W 0 , P (w0 ) = 0, so w, w0 are eigenvectors corresponding to
0, 1 respectively.
32
Example 3.1.9. Let T : C ∞ (R) → C ∞ (R) be the derivative map, T (f ) = f 0 . An eigenvector
of T is a function f such that f 0 = λf for some λ ∈ R. From calculus, we know that such
functions are of the form ceλt for some c ∈ R. This says the exponential functions ceλt
are eigenvectors of the derivative operator with eigenvalues λ ∈ R. This is of fundamental
importance in the theory of linear differential equations.
Example 3.1.10. Let L : F ∞ → F ∞ be the left shift operator, i.e. L((a1 , a2 , . . .)) =
(a2 , a3 , . . .). An eigenvector of L is a sequence (a1 , a2 , . . .) such that (a2 , a3 , . . .) = λ(a1 , a2 , . . .)
for some λ ∈ F . This says a2 = λa1 , a3 = λa2 = λ2 a1 , and by induction, that an = λn−1 a1 .
Let σ = {an } be a geometric sequence, that is a sequence defined by an = crn−1 for some
c, r ∈ F . Then we see that an eigenvector of L is a geometric sequence, and any such
geometric sequence σ is an eigenvector with eigenvalue r.
33
3.3 Diagonalization
We now use the theory of eigenvalues to answer our main question.
In order to determine when such a basis exists, we will utilize the following key result:
Proof. We prove this by induction. If n = 1 the statement follows immediately since {v1 }
is linearly independent. Assume that the statement is true for any collection of n − 1
eigenvectors that correspond to distinct eigenvalues. Suppose that c1 v1 + . . . + cn vn = 0,
so that applying T says c1 λ1 v1 + . . . + cn λn vn = 0. Multiply the first equation by λn and
subtract to see c1 (λ1 − λn )v1 + . . . + cn−1 (λn−1 − λn )vn−1 = 0. By induction hypothesis,
the vectors {v1 , . . . , vn−1 } are linearly independent, and since all eigenvalues are distinct this
forces ci = 0 for 1 ≤ i ≤ n − 1. This then immediately gives cn = 0, so that {v1 , . . . , vn } is
linearly independent. By induction, we are done.
Corollary 3.3.3. Let V be an n dimensional vector space and T a linear operator. If T has
n distinct eigenvalues, then T is diagonalizable. If pT (x) factors into distinct linear factors
in F [x], then T is diagonalizable.
Proof. If T has n distinct eigenvalues, then the associated eigenvectors are a set of n linearly
independent vectors in V , hence a basis. Saying that pT (x) splits into distinct linear factors
is the same as saying that T has distinct eigenvalues.
Example 3.3.4. The converse to the above statement is not necessarily true. For example,
the identity operator IV is diagonalizable, but has characteristic polynomial (x − 1)n .
Proposition 33. Let V be an n dimensional vector space and T a linear operator. Then if
T is diagonalizable, pT (x) factors into a product of (not necssesarily distinct) linear factors
in F [x].
34
Proof. Suppose that T is diagonalizable.
Let β be a basis of V consisting of eigenvalues
λ1 0 . . . 0
0 λ2 . . . 0
λ1 , . . . , λn of T . Then [T ]β = .. .. , and det(xIn − [T ]β ) = (x − λ1 ) · · · (x − λn )
.. . .
. . . .
0 0 . . . λn
is a product of linear factors in F [x].
Example 3.3.5. The converse of the above statement is not necessarily true. Let T : R2 →
R2 be given by T (x, y) = (y, 0). Then it’s easy to see pT (x) = x2 , so the only eigenvalue of
T is 0. However, T is not diagonalizable: rank-nullity says dim(ker(T )) = 1, so that there is
no possible basis of eigenvectors for T .
The above examples show that the characteristic polynomial is not strong enough to
detect when an operator is diagonalizable or not, and we saw that what broke was that
nothing can be said when the characteristic polynomial has repeated roots. This leads us to
the following definitions, which will end up giving a test for diagonalizability.
Proof. Suppose that dim(Eλ ) = k. Pick a basis {v1 , . . . , vk } of Eλ and extend to a basis
λ · Ik A
β = {v1 , . . . , vk , w1 , . . . , wm } of V . Then [T ]β is a block matrix given by [T ]β =
0 B
where 0 is the m × kzero matrix, and A, B have dimensions k × m and m × m respectively.
(x − λ) · Ik −A
Then x · In − [T ]β = , so pT (x) = det(x · In − [T ]β ) = det((x − λ) ·
0 x · Im − B
Ik ) det(x · Im − B) = (x − λ)k g(x) where g(x) = det(x · Im − B). This says (x − λ)k divides
pT (x), so the algebraic multiplicity of λ is at least k as desired.
Theorem 3.3.7. Let V be an n dimensional vector space. Let T have distinct eigenvalues
λ1 , . . . , λk with algebraic multiplicities e1 , . . . , ek , so pT (x) = (x − λ1 )e1 · · · (x − λk )ek , and
e1 + . . . + ek = n. Then T is diagonalizable if and only if V = Eλ1 ⊕ . . . ⊕ Eλk .
Corollary 3.3.8. Let T be as above. Then T is diagonalizable if and only if for each
eigenvalue λi of T the algebraic multiplicity and geometric multiplicity of λi are equal.
35
Proof. If the algebraic and geometric multiplicity of λi are equal for all i, this says dim(Eλ1 ⊕
. . . ⊕ Eλk ) = e1 + . . . + ek = n, so Eλ1 ⊕ . . . ⊕ Eλk = V says T is diagonalizable. Conversely
if dim(Eλi ) < ei for some i, then dim(Eλ1 ⊕ . . . ⊕ Eλk ) < n, so Eλ1 ⊕ . . . ⊕ Eλk 6= V says T
is not diagonalizable.
36
Chapter 4
When we started our study of vector spaces, we had a goal in mind: find objects that
generalized the algebraic structure on Euclidean space Rn . However, if the ultimate goal of
linear algebra is to fully generalize Euclidean space, there’s something major that still hasn’t
been abstracted: the geometry of Rn . The definition of an abstract vector space V does not
include notions of length, distance, or angles, and therefore no concept of geometry. In order
for a vector space to truly “act” Euclidean, we need to add more structure.
An inner product space is a pair (V, h−, −i), i.e. a vector space V with a choice of inner
product on V . From the conjugate symmetry of the inner product, we deduce the following
basic properties:
Proposition 35. Let V be an inner product space. Then the following hold:
37
(d) hx, xi = 0 if and only if x = 0.
Proposition 36. Let V be an inner product space. Then the norm k·k satisfies the following
properties:
(c) (Cauchy-Schwarz) For all x, y ∈ V |hx, yi| ≤ kxkkyk and equality holds if and only if
x = cy for some c ∈ F .
(d) (Triangle inequality) For all x, y ∈ V , kx + yk ≤ kxk + kyk and equality holds if and
only if x = cy for some c ∈ F .
Proof.
Example 4.1.3. If V has an inner product h−, −i then for any subspace W of V , h−, −i is
still an inner product on W .
38
Example 4.1.4. Set V = Rn and let · be the usual dot product, (a1 , . . . , an ) · (b1 , . . . , bn ) =
a1 b1 + . . . + an bn . This makes V a real inner product space. If instead V = Cn , we define the
dot product to be (a1 , . . . , an )·(b1 , . . . , bn ) = a1 b1 +. . .+an bn . This makes V a complex inner
product space. As an example in C2 , we have (1+2i, 3−i)·(2, i) = 2(1+2i)−i(3−i) = 1−i.
Example 4.1.5. If V is a finite dimensional F -vector space, we can always give V the
structure of an inner product space as follows. Say that dimF (V ) = n, and fix an isomorphism
ϕ : V → F n . Define an inner product h−, −i on V by hv, wi = ϕ(v) · ϕ(w), where the dot
product on the right hand side happens in F n .
Rb
Example 4.1.6. Set V = C([a, b], C) and define hf, gi = a f (t)g(t) dt. Then calculus says
this makes V an innerq product space. In C([−π, π], C) with f = 1 + 2x and g = cos(x), one
3 √
can check that kf k = 2π + 8π3 , kgk = π, and that hf, gi = 0.
Example 4.1.7. Let V = Mn (C) and define hA, Bi = tr(B ∗ A), where (A∗ )ij = Aji is the
conjugate transpose of A. Then by linearity of tr and definition of B ∗ , one sees that this
defines an inner product. For any A ∈ Mn (C), we see that the ij-th entry of A∗ A is simply
the i-th row of A∗ dotted with j-column
p of A. In particular, if vi is the i-th column of A, then
(A∗ A)ii = kvi k2 , so that kAk = kv 2
1 + . . . + kv
k 2
n k where v1 , . . . , vn are the columns of
√ √
1 2 i 0
A. In M2 (C), set A = and B = . Then we see kAk = 30, kBk = 2,
3 4 1 + i −i
and hA, Bi = 3.
4.2 Orthogonality
In Rn , one of the most important properties of the dot product was that it was able to
x·y
measure the angle between two vectors: this was detected by the quantity kxkkyk . For a
general inner product space V , it doesn’t make sense to define general angles, since the ex-
hx,yi
pression kxkkyk may be a complex number. However, we may still make sense of orthogonality.
39
Definition 4.2.1. Vectors x, y ∈ V are called orthogonal if hx, yi = 0. A subset S of V is
called orthogonal if any two vectors in S are orthogonal, and S is called orthonormal if S
is orthogonal and kxk = 1 for all x ∈ S.
Since inner products have a notion of orthogonality, the Pythagorean theorem is still
true:
Proof. kx+yk2 = hx+y, x+yi = hx, xi+hx, yi+hy, xi+hy, yi = hx, xi+hy, yi = kxk2 +kyk2
because hx, yi = hy, xi = 0 by assumption.
One of the reasons that we impose the extra structure of an inner product is that they
become very nice to work with: orthogonality makes linear independence easy to check, as
well as finding the coordinates of a vector with respect to some basis.
Proposition 37. Let {v1 , . . . , vk } be an orthogonal subset of non-zero vectors. Then {v1 , . . . , vk }
is linearly independent.
Proof. Taking an inner product with vi says hx, vi i = ci hvi , vi i = ci kvi k2 by orthogonality.
In particular, the above says that if we have a basis β for V consisting of orthogonal
vectors, then finding the coordinates [x]β is reduced to an inner product computation. If V
is finite dimensional, is it always possible to find an orthonormal basis? The answer is yes,
and follows from a more general result.
40
As an immediate corollary to the Gram-Schmidt process, we get the following:
Corollary 4.2.4. If V is a finite dimensional inner product space, then V has an orthonor-
mal basis.
Proof. Apply the Gram-Schmidt process to a basis of V to get a basis of orthogonal vectors.
Then normalize.
Example 4.2.5. Set V = R3 and β = {(1, 1, 1), (0, 1, 1), (0, 0, 1)} = {w1 , w2 , w3 }, which
is a basis of R3 . To construct an orthogonal basis, set v1 = (1, 1, 1). Then v2 = w2 −
hw2 ,v1 i
kv1 k2 1
v = w2 − 32 v1 = (−2/3, 1/3, 1/3), and v3 = w3 − hw 3 ,v1 i
kv1 k2 1
v − hw 3 ,v2 i
kv2 k2 2
v = w3 − 13 v1 −
1
v = (0, −1/2, 1/2). This produces an orthogonal basis, so normalizing each vector with
2 2
√ q
2
give an orthonormal basis. We see kv1 k = 3, kv2 k = 3
, and kv3 k = √12 . Then
q q q
{( √13 , √13 , √13 ), (− 23 , 16 , 16 ), (0, − √12 , − √12 )} is an orthonormal basis of R3 .
Example 4.2.6. Set V R= P2 (R),which may be viewed as a subspace of C([−1, 1]) with the
1
inner product hf, gi = −1 f (x)g(x) dt. Let β = {1, x, x2 } = {w1 , w2 , w3 } be the standard
basis of V . To produce an orthonormal basis, we use Gram-Schmidt. The vectors 1 and x
are already orthogonal, so we do not need to compute v1 and v2 . Then v3 = w3 − hw 3 ,v1 i
kv1 k2 q
v1 −
hw3 ,v2 i
√
kv2 k2 2
v = x2 − 13 , so {1, x, x2 − 31 } is an orthogonal basis. We compute k1k = 2, kxk = 23 ,
q q q
and kx − 3 k = 45 . This produces an orthonormal basis { 2 , 2 x, 58 (3x2 − 1)}. These
2 1 8 √1 3
are the first three Legendre polynomials, which have applications in physics. Repeating this
process with the basis β = {1, x, . . . , xn } of Pn (R) allows one to compute the n-th Legendre
polynomial.
Definition 4.2.7. Let S ⊂ V be a subset. The orthogonal complement of S, denoted
S ⊥ is defined by S ⊥ = {v ∈ V : hv, xi = 0 for all x ∈ S}.
It’s an easy verification that S ⊥ is always a subspace of V . When S itself is a subspace, we
have the following decomposition:
Theorem 4.2.8. Let W ⊂ V be a finite dimensional subspace. Then V = W ⊕ W ⊥ .
Proof. Let {w1 , . . . , wk } be an orthonormal basis of W . We will try to find a vector w ∈ W
such that x = w + (x − w) with x − w ∈ W ⊥ . Write w = c1 w1 + . . . + ck wk . If x − w ∈ W ⊥ ,
then necessarily, 0 = hx − w, wi i = hx − c1 w1 − . . . − ck wk , wi i = hx, wi i − ci kwi k2 . Since
kwi k = 1, this says ci = hx, wi i, so this choice of coefficients gives us the vector w that works.
This says V = W + W ⊥ . If w ∈ W ∩ W ⊥ , then hw, wi = 0 so that w = 0 says the sum is
direct.
An immediate consequence is the following dimension formula:
Corollary 4.2.9. If V is finite dimensional, dim(V ) = dim(U ) + dim(U ⊥ ).
Using this, we easily get the following:
Proposition 39. Let V be finite dimensional. Then (W ⊥ )⊥ = W .
41
Proof. Set n = dim(V ). Since W ⊂ (W ⊥ )⊥ , we get dim(W ) ≤ dim(W ⊥ )⊥ . This then says
n = dim(W ) + dim(W ⊥ ) ≤ dim((W ⊥ )⊥ ) + dim(W ⊥ ) = n so that dim(W ) = dim((W ⊥ )⊥ )
gives W = (W ⊥ )⊥ .
Example 4.2.11. Let V = R4 and W = Span{v1 , v2 } where v1 = (1, 2, 3, −4) and v2 =
1 2 3 −4
(−5, 4, 3, 2)}. If x = (x, y, z, t) is in U ⊥ , we see that Ax = 0, where A = .
−5 4 3 2
Using row reduction, one can easily compute U ⊥ = ker(A) = Span{(−3, −9, 7, 0), (−10, 9, 0, 7)}.
Given x ∈ V , The orthogonal projection PW (x) has the property that it is the vector in
W that is closest to x:
Example 4.2.14. Let V = R3 , and set v = (1, 2, 3). What’s the minimal distance from
v to a point on the plane W : x + 2y + z = 0? A basis of W can be easily computed
as {(−2, 1, 0), (−1, 0, 1)} = {w1 , w2 }. Running Gram-Schmidt gives an orthogonal basis of
{v1 , v2 } = {(−2, 1, 0), (−5, −2, 5)}. The minimal distance the the plane is given by the
quantity kv − PW (v)k.
√
One can check that PW (v) = 35 v2 , so v − PW (v) = (4/3, 8/3, 4/3)
which has length 4 3 6 .
Example 4.2.15. R 1 Set W = P2 (R) viewed as a xsubspace of V = C([−1, 1]) with the inner
product hf, gi = −1 f (x)g(x) dx. With f (x) = e , which polynomial p(x) of degree at most
R1
2 minimizes the quantity −1 (ex −p(x))2 dt, and what is this value? Equivalently, what is the
minimizer of kex − p(x)k? We saw before than an orthonormal basis of Pq 2 (R) with
q respect to
this inner product is given by the Legendre polynomials, with basis { √12 , 32 x, 58 (3x2 −1)},
x
so the minimizerq is just
q the orthogonal
q projection
q of e onto W . This is given by p(x) =
hex , √12 i √12 + hex , 32 xi 32 x + hex , 58 (3x2 − 1)i 58 (3x2 − 1) = ( 15e
4
− 105
4e
)x2 + 3e x + ( 33
4e
− 3e
4
).
Numerically, the actual minimal value of the integral is ≈ .00144.
42
If V is finite dimensional, then we have seen that V ∼ = V ∗ . However, this isomorphism is
not “natural” in the sense that it requires picking a basis if V . However, when V is an inner
product space, the isomorphism is natural:
Theorem 4.3.2 (Riesz Representation Theorem). Let V be a finite dimensional inner prod-
uct space. Then the map Φ : V → V ∗ given by Φ(v) = ϕv is an isomorphism, where
ϕv (x) = hx, vi.
Proof. First, we show that Φ is linear. For x, y ∈ V , We have Φ(x + y) = ϕx+y . For any
z ∈ V , we have ϕx+y (z) = hz, x + yi = hz, xi + hz, yi = ϕx (z)+ ϕy (z), so that ϕx+y = ϕx + ϕy .
This then says that Φ(x + y) = Φ(x) + Φ(y). Similarly, we conclude that for any c ∈ F ,
Φ(cx) = cΦ(x), so that Φ is linear. Now suppose that Φ(x) = 0. This says that ϕx (z) = 0
for all z ∈ V , i.e. hz, xi = 0 for all z ∈ V . Picking z = x, we get kxk2 = 0, so that
x = 0. This says that Φ is injective, and since dim V = dim V ∗ , we conclude that Φ is an
isomorphism.
The Riesz Representation Theorem says the structure of the dual space of a finite dimen-
sional inner product space is very rigid: for any linear functional ϕ ∈ V ∗ , the surjectivity
of the map Φ in the above proof says there is a vector v ∈ V such that ϕ = h−, vi. This is
very important in functional analysis (where it holds in a more general setting), but for our
purposes, we will only need it for the following:
Definition 4.3.3. Let V be a finite dimensional inner product space. The adjoint of a
linear operator T : V → V , denoted T ∗ is defined via the relation hT (x), yi = hx, T ∗ (y)i for
all x, y ∈ V .
Proposition 40. The adjoint T ∗ of a linear operator T exists and is unique, and furthermore
T ∗ ∈ HomF (V, V ).
Proof. Define ϕy (x) = hT (x), yi. Then ϕy (x + z) = hT (x + z), yi = hT (x) + T (z), yi =
hT (x), yi + hT (z), yi = ϕy (x) + ϕy (z). Similarly, ϕy (cx) = cϕy (x) for x, z ∈ V and c ∈ F ,
so ϕy (x) is a linear functional. By the Riesz Representation Theorem, ϕy (x) = hx, y 0 i for
some y 0 ∈ V . Define a map T ∗ : V → V by T ∗ (y) = y 0 . By definition T ∗ satisfies the
desired property. If there is another function S : V → V such that hT (x), yi = hx, S(y)i for
all x, y, this says hx, T ∗ (y)i = hx, S(y)i for all x, y so that T ∗ = S. Finally, it remains to
show linearity. We see hx, T ∗ (y + z)i = hT (x), y + zi = hT (x), yi + hT (x), zi = hx, T ∗ (y)i +
hx, T ∗ (z)i = hx, T ∗ (y) + T ∗ (z)i for all x, y, z ∈ V . This says T ∗ (y + z) = T ∗ (y) + T ∗ (z).
Similarly one can check T ∗ (cy) = cT ∗ (y), so that T ∗ ∈ HomF (V, V ).
Although it may not be clear from the above defintion, the point of the adjoint is that
it’s a analogous operation on linear operators to taking a conjugate transpose. The following
properties make this more clear:
Proposition 41. Let S, T ∈ HomF (V, V ). The following hold:
(a) (S + T )∗ = S ∗ + T ∗
(b) (cT )∗ = cT ∗
43
(c) (T ∗ )∗ = T
(d) I ∗ = I
(e) (ST )∗ = T ∗ S ∗
Proof. All the above properties can be proved using a similar approach to the one in the
proposition above by pulling the adjoint through the inner product. We omit the proofs.
Proposition 42. Let V be a finite dimensional inner product space, and let β be an or-
thonormal basis of V . Then [T ∗ ]β = [T ]∗β .
Proof. Let β = {v1 , . . . , vn } be an orthonormal basis for V . Set [T ]β = [aij ]. Then T (vi ) =
a1i v1 + . . . + ani vn , so aji = hT (vi ), vj i. This says ([T ]∗β )ij = aji = hT (vi ), vj i = hvj , T (vi )i =
hT ∗ (vj ), vi i = ([T ∗ ]β )ij , so that [T ∗ ]β = [T ]∗β .
Geometrically, the relationship between T ∗ and T is as follows:
Theorem 4.3.4. Let V be a finite dimensional inner product space, and let T : V → V be
a linear operator. Then ker(T ∗ ) = Im(T )⊥ and Im(T ∗ ) = ker(T )⊥ .
Proof. Let x ∈ ker(T ∗ ), so that T ∗ (x) = 0. Then for any y ∈ V , hy, T ∗ (x)i = 0. Pulling the
adjoint through the inner product says hT (y), xi = 0 for all y, so that ker(T ∗ ) ⊂ Im(T )⊥ .
Similarly, if x ∈ Im(T )⊥ this says hx, T (y)i = 0 for all y ∈ V so that hT ∗ (x), yi = 0 for all
y ∈ V . This says T ∗ (x) = 0, so that Im(T )⊥ ⊂ ker(T ∗ ) says ker(T ∗ ) = Im(T )⊥ . Setting
T = T ∗ and taking orthogonal complements of both sides gives the second statement.
Example 4.3.5. Let T : C2 → C2 be given by T (z1 , z2 ) = (z1 − 2iz2 , 3z1 + iz2 ), where C2 is
equipped with
the usual
dot product. Then the basis {e1 , e2 } is orthonormal. We
standard
1 −2i 1 3
see [T ]β = , so that [T ∗ ]β = [T ]∗β = .
3 i 2i −i
Example 4.3.6. Let T : M2 (R) → M2 (R) be the transpose map, T (A) = At . Equip
M2 (R) with the inner product hA, Bi = tr(B t A). With respect to this inner product, the
standard basis {E11 , E12 , E21 , E22 } is orthonormal. Then [T ∗ ]β = [T ]∗β = [T ]tβ . We see that
1 0 0 0
0 0 1 0 ∗
[T ]β =
0 1 0 0. This matrix is symmetric, so T = T .
0 0 0 1
Example 4.3.7. Let V ⊂ C ∞ (R) be the vector space of infinitely differentiable functions
that are 1-periodic,
R1 i.e. f (x + 1) = f (x) for all x ∈ R. Give V an inner product structure
by hf, gi = 0 f (t)g(t) dt. Let D : V → V be the derivative map. To compute the adjoint
R1
of D, we use the definition. For f, g ∈ V , hD(f ), gi = 0 f 0 (t)g(t) dt. Integrating by parts
R1
and using f (1) = f (0), the latter integral equals − 0 f (t)g 0 (t) dt = hf (t), −D(g)i. This says
D∗ = −D.
44
4.4 The Spectral Theorem
We will now return to diagonalizability. We previously saw what conditions are necessary
for a linear operator on V to be diagonalizable, i.e. for V to have a basis of eigenvectors for
T . If V is an inner product space, a natural question is when can we find an orthonormal
basis of eigenvectors? The Spectral theorem gives a precise answer.
Definition 4.4.1. A linear operator T : V → V is called normal if T T ∗ = T ∗ T . T is called
self-adjoint if T = T ∗ .
Proposition 43. Suppose that T : V → V is normal. Then if v is an eigenvector of T with
eigenvalue λ, then v is an eigenvector of T ∗ with eigenvalue λ.
Proof. It’s easy to check that since T is normal, then so is T − cIV for any c ∈ F . Since
T (v) = λv, this says 0 = k(T −λIV )(v)k2 = h(T −λIV )(v), (T −λIV )(v)i = hv, (T ∗ −λIV )(T −
λIV )(v)i = hv, (T − λIV )(T ∗ − λIV )(v)i = h(T ∗ − λIV )(v), (T ∗ − λIV )(v) = k(T ∗ − λIV )(v)k2 .
This says T ∗ (v) = λv as desired.
Proposition 44. Suppose that T : V → V is normal. Then if λ1 , λ2 are distinct eigenvalues
of T with eigenvectors v1 and v2 respectively, then v1 and v2 are orthogonal.
Proof. Suppose T (v1 ) = λ1 v1 and T (v2 ) = λ2 v2 . Then hT (v1 ), v2 i = λ1 hv1 , v2 i. On the other
hand, hT (v1 ), v2 i = hv1 , T ∗ (v2 )i = hv1 , λ2 v2 i = λ2 hv1 , v2 i by the above proposition. Since
λ1 6= λ2 , this forces hv1 , v2 i = 0.
Theorem 4.4.2 (Complex Spectral Theorem). Let V be a finite dimensional complex inner
product space. Then a linear operator T : V → V is normal if and only if there is an
orthonormal basis for V consisting of eigenvectors of T .
Proof. First suppose that T is normal. We prove that T is orthogonally diagonalizable by
induction on the dimension of V . If dim(V ) = 1 then this is obvious, because any non-zero
vector is an eigenvector, so just normalize. Now suppose that any normal operator on an
n−1 dimensional complex inner product space is orthogonally diagonalizable. If dim(V ) = n
and T : V → V is a normal operator, because C is algebraically closed T has an eigenvector,
say v. Set U = Span({v}) and write V = U ⊕ U ⊥ . Note that because T is normal, both
T, T ∗ are U -invariant. If x ∈ U ⊥ , then for y ∈ U , we have hy, T (x)i = hT ∗ (y), xi = 0 because
T ∗ (y) ∈ U . This says T (x) ∈ U ⊥ so that T is U ⊥ -invariant. Similarly, T ∗ is U ⊥ -invariant.
Then we may write T (x) = T |U (u) + T |U ⊥ (u0 ) for x = u + u0 with u ∈ U and u0 ∈ U ⊥ . We
now show that T |U ⊥ is a normal operator on U ⊥ .
45
Conversely, suppose that T is orthogonally diagonalizable. Let β = {v1 , . . . , vn } be a
basis of eigenvectors with eigenvalues λi . Then (T ∗ T )(vi ) = T ∗ (λi vi ) = λi T ∗ (vi ) = |λi |2 vi .
On the other hand, (T T ∗ )(vi ) = T (λi vi ) = |λi |2 vi . Then T ∗ T and T T ∗ agree on a basis of
V , so they are equal which shows T is normal as desired.
We now move onto the Spectral Theorem for operators on real inner product spaces.
In the complex case, we were able to make the argument work because the fundamental
theorem of algebra says every linear operator over a complex vector space has an eigenvalue,
which led to a decomposition V = U ⊕ U ⊥ . The key part of the proof is the normality of T
said that it restricted to normal operators on U and U ⊥ , allowing the induction to kick in. If
V is a real inner product space, this no longer remains true, as we have seen that a rotation
by some angle in R2 has no real eigenvalue. If we can find a class of normal operators that
are guaranteed to have a real eigenvalue, then the same argument as above goes through.
As it turns out, the key to this is self-adjointness:
Proof. Write T (v) = λv. Then hT (v), vi = λhv, vi = λkvk2 . On the other hand, because
T is self-adjoint we can write hT (v), vi = hv, T (v)i = λkvk2 . Since v is non-zero, this says
λ = λ so that λ is real.
Theorem 4.4.3. (Real Spectral Theorem) Let V be a finite dimensional real inner product
space. Then a linear operator T : V → V is self-adjoint if and only if there is an orthonormal
basis for V consisting of eigenvectors of T .
Proof. The characteristic polynomial pT of T has a complex root by the fundamental the-
orem of algebra. Since T is self-adjoint, the above says this root is real, so that T has an
eigenvector. Since a self-adjoint operator is normal, we can run the same argument in the
complex case and the proof still goes through, so that T is orthogonally diagonalizable.
46
The proof of the Spectral Theorem tells us how to orthogonally diagonalize an operator
when it is possible. If V = U ⊕ U ⊥ , running Gram-Schmidt on bases of U and U ⊥ give
orthogonal bases of these spaces, and then the union is an orthogonal basis of V , so after
normalizing, an orthonormal basis. Suppose T is normal with distinct eigenvalues λ1 , . . . , λk .
In the proof of the Spectral Theorem, we may instead run the argument with U = Eλ1 (the
invariance condition is still true). Then since Eλi ⊥ Eλ1 for i 6= 1, this says Eλ2 ⊕ . . . ⊕ Eλk ⊂
U ⊥ so that Eλ2 ⊕ . . . ⊕ Eλk = U ⊥ for dimensional reasons. By inductively applying the above
obeservation, this says running Gram-Schmidt on each eigenspace Eλi and taking the union
of these orthogonal basis is then an orthogonal basis for V consisting of eigenvalues of T ,
and then normalizing gives an orthonormal basis.
3 3
Example 4.4.7. Let T : R→ R be given by T (x, y, z) = (−2z, −x + 2y − z, x + 3z),
0 0 −2
so that [T ]β = −1 2 −1 with β the standard basis. We see that T is diagonalizable
1 0 3
with eigenvalues 1, 2 and basis of the eigenspaces E1 and E2 are given by {(2, 1, −1)} and
{(0, 1, 0), (−1, −1, 1)} respectively. However, T is not self-adjoint because [T ]β is not sym-
metric, so the Spectral Theorem says that T is not orthogonally diagonalizable. What goes
wrong? An orthogonal basis of E2 is given by {(0, 1, 0), (1, 0, −1)}. However, (2, 1, −1) ·
(0, 1, 0) = 1 6= 0. Since any eigenvector v ∈ E2 is of the form (c2 , c1 , −c2 ) for c1 , c2 ∈ R, we
see that (2, 1, −1) · (c2 , c1 , −c2 ) = 2c1 + 2c2 is 0 only when c2 = −c1 , i.e. the eigenvector
is of the form (−c1 , c1 , c1 ). Therefore it’s impossible to find two eigenvectors orthogonal to
(2, 1, −1), so that T cannot be orthogonally diagonalizable. Explicitly, with U = E1 , we see
that [T ∗ ]β = [T ]tβ . T ∗ is not U -invariant, because T ∗ (2, 1, −1) = (−2, 2, −8) 6∈ U , so that T ∗
is not U -invariant and the argument cannot continue. Since all the eigenvalues of T are real,
we see that even viewed as an operator on C3 , the only eigenvector in E2 that is orthogonal
to (2, 1, −1) is in the C-span of (−1, 1, 1), so again it is not possible to find two eigenvectors
orthogonal to (2, 1, −1). This then says that T is not normal when viewed as an operator on
C3 , and therefore not as an operator on R3 because the matrix of T ∗ is the same in either
case.
47