Linear Alg Notes 2018
Linear Alg Notes 2018
H = R + R · i + R · j + R · k.
These rules then force the rules for multiplication for H, and a somewhat
lengthy check shows that H is a (noncommutative) division algebra.
We shall occasionally relax the condition that every nonzero element has
a multiplicative inverse. Thus a ring R is a set with two binary operations
+ and ·, that (R, +) is an abelian group, · is associative and distributes over
addition, and there exists a multiplicative identity 1. If · is commutative as
well, we call R a commutative ring. A standard example of a noncommuta-
tive ring is the ring Mn (R) of n × n matrices with coefficients in R with the
usual operations of matrix addition and multiplication, for n > 1, and with
multiplicative identity the n × n identity matrix I. (For n = 1, it is still
a ring but in fact it is isomorphic to R and hence is commutative.) More
1
generally, if k is any field, for example k = C and n > 1, then the set Mn (k)
of n × n matrices with coefficients in k with the usual operations of matrix
addition and multiplication is a noncommutative ring.
4. For all v ∈ V , 1 · v = v.
2
(3) With X a set as in (2), we consider the subset k[X] ⊆ k X of functions
f : X → k such that the set {x ∈ X : f (x) 6= 0} is a finite subset of X,
then k[X] is easily seen to be a vector space under pointwise addition and
scalar multiplication of functions, so that k[X] is a vector subspace of k X .
Of course, k[X] = k X if and only if X is finite. In general, for a function
f : X → k, we define the support of f or Supp f by:
The function δx is often called the delta function at x. Then clearly, for
every f ∈ k[X], X
f (x) = f (x)δx ,
x∈X
where in fact the above sum is finite (and therefore meaningful). We will
often identify the function δx with the element x ∈ X. In this case, we
can view XP as a subset of k[X], and every element of k[X] can be uniquely
written as x∈X tx · x, where the tx ∈ k and tx = 0 for all but finitely
many x. The case X = {1, . . . , n} corresponds to k n , and the element δi
to the vector ei = (0, . . . , 1, . . . , 0) (the ith component is 1 and all other
components are 0).
4. For all m ∈ M , 1 · m = m.
3
Left submodules of an R-module M are defined in the obvious way.
A right R-module is an abelian group M with a multiplication by R on
the right, i.e. a function R × M → M whose value at (r, m) is denoted
mr, satisfying (ms)r = m(sr) and the remaining analogues of (2), (3), (4)
above. Thus, if R is commutative, there is no difference between left and
right R-modules.
2 Linear maps
Linear maps are the analogue of homomorphisms and isomorphisms:
4
also linear. We say that V1 and V2 are isomorphic, written V1 ∼
= V2 , if there
exists an isomorphism F from V1 to V2 .
If F : V1 → V2 is a linear map, then we define
Lemma 2.4. Let X be a set and let V be a vector space. For each func-
Pf : X → V , P
tion there is a unique linear map F : k[X] → V defined by
F ( x∈X tx · x) = x∈X tx f (x) (a finite sum). In particular, F is specified
by the requirement that F (x) = f (x) for all x ∈ X. Finally, every linear
function k[X] → V is of this form.
For this reason, k[X] is sometimes called the free vector space on the set
X.
5
(i) span{v1 , . . . , vd } is a vector subspace of V containing vi for every
i. In fact, it isP the image of the linear map F : k d → V defined by
F (t1 , . . . , td ) = di=1 ti vi above.
span{v1 , . . . , vd } ⊆ W.
V = span{v1 , . . . , vd }
t1 w1 + · · · + tr wr = 0,
t1 w1 + · · · + tr wr = s1 w1 + · · · + sr wr ,
6
Note that the definition of linear independence does not depend only
on the set {w1 , . . . , wr }—if there are any repeated vectors wi = wj , then we
can express 0 as the nontrivial linear combination wi − wj . Likewise if one
of the wi is zero then the set is linearly dependent.
7
4. If V is a finite-dimensional vector space and w1 , . . . , w` are linearly
independent vectors in V , then there exist vectors
w`+1 , . . . , wr ∈ V
8
Again, it is straightforward to check that the sums above are finite and that
Supp f1 ∗ f2 is finite as well.
With either of the above descriptions, one can then check that k[G] is a
ring. The messiest calculation is associativity of the product. For instance,
we can write
X XX
f1 ∗ (f2 ∗ f3 )(g) = f1 (x)(f2 ∗ f3 )(x−1 g) = f1 (x)f2 (y)f3 (y −1 x−1 g)
x∈G x∈G y∈G
X
= f1 (x)f2 (y)f3 (z).
x,y,z∈G
xyz=g
Similarly,
X XX
(f1 ∗ f2 ) ∗ f3 (g) = (f1 ∗ f2 )(gz −1 )f3 (z) = f1 (x)f2 (x−1 gz −1 )f3 (z)
z∈G x∈G z∈G
X
= f1 (x)f2 (y)f3 (z).
x,y,z∈G
xyz=g
9
We can P write F (ei ) in terms of the basis e1 , . . . , em of k m : suppose that
F (ei ) = mj=1 aji ej = (a1i , . . . , ami ). We can then associate to F an m × n
matrix with coefficients in k as follows: Define
a11 a12 . . . a1n
A = ... .. .. .. .
. . .
am1 am2 . . . amn
Then the columns of A are the vectors F (ei ). Thus every linear map
F : k n → k m corresponds to an m × n matrix with coefficients in k. We
denote by Mm,n (k) the set of all such. Clearly Mm,n (k) is a vector space of
dimension mn, with basis
{Eij , 1 ≤ i ≤ m, 1 ≤ j ≤ n},
where Eij is the n × n matrix whose (i, j)th entry is 1 and such that all
other entries are 0. Equivalently, Mm,n (k) ∼ = k[X], where X = {1, . . . , m} ×
{1, . . . , n} is the set of ordered pairs (i, j) with 1 ≤ i ≤ m and 1 ≤ j ≤ n. If
m = n, we abbreviate Mm,n (k) by Mn (k); it is the space of square matrices
of size n.
Composition of linear maps corresponds to matrix multiplication: If
F : k n → k m and G : k m → k ` are linear maps corresponding to the matrices
A and B respectively, then G ◦ F : k n → k ` corresponds to the ` × n matrix
BA. Here, the (i, j) entry of BA is given by
m
X
bik akj .
k=1
10
The subset of Mn (k) consisting of invertible matrices is then a group, de-
noted by GLn (k).
Suppose that V1 and V2 are two finite dimensional vector spaces, and
choose bases v1 , . . . , vn of V1 and w1 , . . . , wm of V2 , so that n = dim V1 and
m = dim V2 . If F : V1 → V2 is linear, then, given the bases v1 , . . . , vn and
w1 , . . . , wm we again get an m × n matrix A by the formula: A = (aij ),
where aij is defined by
Xm
F (vi ) = aji wj .
j=1
We say that A is the matrix associated to the linear map f and the bases
v1 , . . . , vn and w1 , . . . , wm . Given any collection of vectors u1 , . . . , uk in V2 ,
there is a unique linear map F : V1 → V2 defined by F (vi ) = ui for all i.
n
In fact, if we let PnH : k → V1 be the “change of basis” map defined by
H(t1 , . . . , tn ) = i=1 ti wi , then H is an isomorphism and so has an inverse
−1 n
Pn , which is again linear. Let G : k → V2−1be the map G(t1 , . . . , tn ) =
H
i=1 ti ui . Then the map F is equal to G ◦ H , proving the existence and
uniqueness of F .
A related question is the following: Suppose that v1 , . . . , vn is a basis of
k and that w1 , . . . , wm is a basis of k m . Let F be a linear map k n → k m .
n
A = C2 · B · C1−1 .
A case that often arises is when n = m, so that A and B are square matrices,
and the two bases v1 , . . . , vn and w1 , . . . , wn are equal. In this case C1 =
11
C2 = C, say, and the above formula reads
A = C · B · C −1 .
det I = 1.
12
One can show that det A can be evaluated by expanding about the ith row
for any i: X
det A = (−1)i+k aik det Aik .
k
Here (ai1 , . . . , ain ) is the ith row of A and Aik is the (n − 1) × (n − 1) matrix
obtained by deleting the ith row and k th column of A. det A can also be
evaluated by expanding about the j th column of A:
X
det A = (−1)k+j akj det Akj .
k
where, if A = (aij ), then t A is the matrix with (i, j)th entry aji .
We have the important result:
13
An important consequence of the corollary is the following: let V be a
finite dimensional vector space and let F : V → V be a linear map. Choosing
a basis v1 , . . . , vn for V gives a matrix A associated to F and the basis
v1 , . . . , vn . Changing the basis v1 , . . . , vn replaces A by CAC −1 for some
invertible matrix C. Since det A = det(CAC −1 ), in fact the determinant
det F is well-defined, i.e. independent of the choice of basis.
Thus for example if v1 , . . . , vn is a basis of k n and, for every i, there
exists a di ∈ k such that Avi = di vi , then det A = d1 · · · dn .
One important application of determinants is to finding eigenvalues and
eigenvectors. Recall that, if F : k n → k n (or F : V → V , where V is a
finite dimensional vector space) is a linear map, a nonzero vector v ∈ k n
(or V ) is an eigenvector of F with eigenvalue λ ∈ k if F (v) = λv. Since
F (v) = v ⇐⇒ λv − F (v) = 0 ⇐⇒ v ∈ Ker(λ Id −F ), we see that v
is an eigenvector of F with eigenvalue λ ⇐⇒ det(λI − A) = 0, where
A is the matrix associated to F (and some choice of basis in the case of a
finite dimensional vector space V ). Defining the characteristic polynomial
pA (t) = det(tI − A), it is easy to see that pA is a polynomial of degree n
whose roots are the eigenvalues of F .
A linear map F : V → V is diagonalizable if there exists a basis v1 , . . . , vn
of V consisting of eigenvectors; similarly for a matrix A ∈ Mn (k). This
says that the matrix A for F in the basis v1 , . . . , vn is a diagonal matrix
(the only nonzero entries are along the diagonal), and in fact aii = λi ,
where λi is the eigenvalue corresponding to the eigenvector vi . Hence, for
all positive integers d (and all integers if F is invertible), v1 , . . . , vn are also
eigenvectors for F d or Ad , and the eigenvalue corresponding to vi is λdi . If the
characteristic polynomial pA has n distinct roots, then A is diagonalizable.
In general, not every matrix is diagonalizable, even for an algebraically closed
field, but we do have the following:
14
Then one has the Cayley-Hamilton theorem: If pA (t) is the characteristic
polynomial of A, then pA (A) = 0 and hence mA (t) divides pA (t). For
example, if A is diagonalizable with eigenvalues λ1 , . . . , λn , then
pA (t) = (t − λ1 ) · · · (t − λn ),
and it is easy to see that pA (A) = 0 in this case. On the other hand,
in this case the minimal polynomial is the product of the factors (t − λi )
for distinct eigenvalues λi . Hence we see that, in the case where A is
diagonalizble, mA (t) divides pA (t), but they are equal ⇐⇒ there are no
repeated eigenvalues.
Finally, we define the trace of a matrix. If A = (aij ) ∈ Mn (k), we define
n
X
Tr A = aii .
i=1
Thus the trace of a matrix is the sum of its diagonal entries. Clearly,
Tr : Mn (k) → k is a linear map. As such, it is much simpler that the
very nonlinear function det. However, they are related in several ways. For
example, expanding out the characteristic polynomial pa (t) = det(t Id −A)
as a function of t, one can check that
Tr(AB) = Tr(BA).
Proof. By
Pdefinition of matrix multiplication, AB is the matrix whose (i, j)
entry is nr=1 air brj , and hence
n X
X n
Tr(AB) = air bri .
i=1 r=1
15
Corollary 6.7. For A ∈ Mn (k) and C ∈ GLn (k),
Tr(CAC −1 ) = Tr A.
7 Direct sums
Definition 7.1. Let V1 and V2 be two vector spaces. We define the direct
sum or external direct sum V1 ⊕ V2 to be the product V1 × V2 , with + and
scalar multiplication defined componentwise:
The direct sum of V1 and V2 also has a related property that makes it
into the product of V1 and V2 . Let π1 : V1 ⊕ V2 → V1 be the projection onto
the firsy factor: π1 (v1 , v2 ) = v1 , and similarly for π2 : V1 ⊕ V2 → V2 . It is
easy to check from the definitions that the πi are linear and surjective, with
π1 ◦i1 = Id on V1 and π2 ◦i2 = Id on V2 . Note that π1 ◦i2 = 0 and π2 ◦i1 = 0.
16
Lemma 7.3. If V1 , V2 , W are vector spaces and G1 : W → V1 , G2 : W → V2
are linear maps, then the function (G1 , G2 ) : W → V1 ⊕ V2 defined by
Remark 7.4. It may seem strange to introduce the notation ⊕ for the
Cartesian product. One reason for doing so is that we can define the direct
sum and direct product for an arbitrary collection Vi , i ∈ I of vector spaces
Vi . In this case, when I is infinite, the direct sum and direct product have
very different properties.
V1 ⊕ V2 ⊕ V3 ∼
= (V1 ⊕ V2 ) ⊕ V3 ∼
= V1 ⊕ (V2 ⊕ V3 ).
17
G : k r → k m corresponding to matrices A, B, thelinear map (F, G) : k r →
A
k m+n corresponds to the (n + m) × r matrix . Finally, if given linear
B
maps F : k n → k r and G : k m → k s corresponding to matrices A, B, the
n+m → k r+s corresponds to the (r + s) × (n + m) matrix
linear
F ⊕G: k
map
A O A O
. In particular, in the case n = r and m = s, A, B, and
O B O B
are square matrices (of sizes n, m, and n + m respectively). We thus have:
Tr(F ⊕ G) = Tr F + Tr G.
W1 + W2 = {w1 + w2 : w1 ∈ W1 , w2 ∈ W2 }.
(i1 ⊕ i2 )(w1 , w2 ) = w1 + w2 ,
Definition 8.1. The vector space V is an internal direct sum of the two
subspaces W1 and W2 if the linear map (i1 ⊕ i2 ) : W1 ⊕ W2 → V defined
above is an isomorphism.
We shall usually omit the word “internal” and understand the statement
that V is a direct sum of the subspaces W1 and W2 to always mean that it
is the internal direct sum.
18
(i) W1 ∩ W2 = {0}.
(ii) V = W1 + W2 .
Proof. The map (i1 ⊕ i2 ) : W1 ⊕ W2 → V is an isomorphism ⇐⇒ it is
injective and surjective. Now Ker(i1 ⊕ i2 ) = {(w1 , w2 ) ∈ W1 ⊕ W2 : w1 +
w2 = 0}. But w1 + w2 = 0 ⇐⇒ w2 = −w1 , and this is only possible if
w1 ∈ W1 ∩ W2 . Hence
(ii) V = W1 + · · · + Wd .
In general, to write a finite dimensional vector space as a direct sum
W1 ⊕ · · · ⊕ Wd , where none of the Wi is 0, is to decompose the vector
space into hopefully simpler pieces. For example, every finite dimensional
vector space V is a direct sum V = L1 ⊕ · · · ⊕ Ln , where the Li are one
dimensional subspaces. Note however that, if V is one dimensional, then it
is not possible to write V ∼= W1 ⊕ W2 , where both of W1 , W2 are nonzero.
So, in this sense, the one dimensional vector spaces are the building blocks
for all finite dimensional vector spaces.
Definition 8.4. If W is a subspace of V , then a complement to W is a
subspace W 0 of V such that V is the direct sum of W and W 0 . Given W ,
it is easy to check that a complement to W always exists, and in fact there
are many of them.
One important way to realize V as a direct sum is as follows: suppose
that V = W1 ⊕W2 . On the vector space W1 ⊕W2 , we have the projection map
p1 : W1 ⊕ W2 → W1 ⊕ W2 defined by p1 (w1 , w2 ) = (w1 , 0). Let p : V → W1
19
denote the corresponding linear map via the isomorphism V ∼ = W1 ⊕ W2 .
Concretely, given v ∈ V , p(v) is defined as follows: write v (uniquely) as
w1 + w2 , where w1 ∈ W1 and w2 ∈ W2 . Then p(v) = w1 . The linear map p
has the following properties:
1. Im p = W1 .
3. Ker p = W2 .
The next proposition says that we can reverse this process, and will be
a basic tool in decomposing V as a direct sum.
Proof. We must show that W and W 0 satisfy (i) and (ii) of Proposition 8.3.
First suppose that w ∈ W ∩ W 0 = W ∩ Ker p. Then p(w) = w since w ∈ W ,
and p(w) = 0 since w ∈ W 0 = Ker p. Hence w = 0, i.e. W ∩ W 0 = {0}. Now
let v ∈ V . Then w = p(v) ∈ W , and v = p(v) + (v − p(v)) = w + w0 , say,
where we have set w = p(v) and w0 = v − p(v) = v − w. Since w ∈ W ,
Tr p = dim W.
20
9 New vector spaces from old
There are many ways to construct new vector spaces out of one or more
given spaces; the direct sum V1 ⊕ V2 is one such example. As we saw there,
we also want to be able to construct new linear maps from old ones. We
give here some other examples:
Quotient spaces: Let V be a vector space, and let W be a subspace of
V . Then, in particular, W is a subgroup of V (under addition), and since
V is abelian W is automatically normal. Thus we can consider the group of
cosets
V /W = {v + W : v ∈ V }.
It is a group under coset addition. Also, given a coset v + W , we can define
scalar multiplication by the rule
t(v + W ) = tv + W.
21
maps F : V /W → U and the set of linear maps G : V → U such that G(w) =
0 for all w ∈ W .
Dual spaces and spaces of linear maps: Let V be a vector space and
define the dual vector space V ∗ (sometimes written V ∨ ) by:
V ∗ = {F ∈ k V : F is linear}.
(Recall that k V denotes the set of functions from V to k.) Thus the elements
of V ∗ are linear maps V → k. Similarly, if V and W are two vector spaces,
define
Hom(V, W ) = {F ∈ W V : F is linear}.
(Here, again, W V denotes the set of functions from V to W .) Then the
elements of Hom(V, W ) are linear maps F : V → W , and V ∗ = Hom(V, k).
22
vi∗ wj : V → W as follows:
(
0, if ` 6= i;
vi∗ wj (v` ) =
wj , if ` = i.
In other words, for all v ∈ V , (vi∗ wj )(v) = vi∗ (v)wj . Suppose that F : V → W
corresponds to the matrix A = (aij ) withP respect to the bases v1 , . . . , vn
m
and wP1 , . . . , w m . In other words, F (v i ) = j=1 aji wj . Then by definition
F = i,j aji vi wj . It then follows that the vi∗ wj , 1 ≤ i ≤ n, 1 ≤ j ≤ m,
∗
are a basis for Hom(V, W ) (we saw a special case of this earlier for V ∼ = kn ,
W ∼= k m , where we denoted e∗j ei by Eij ). Thus
Hom(V ⊕ W, U ) ∼
= Hom(V, U ) ⊕ Hom(W, U );
Hom(U, V ⊕ W ) ∼
= Hom(U, V ) ⊕ Hom(U, W ).
In fact, the statements of Lemma 7.2 and Lemma 7.3 give an explicit
construction of the isomorphisms.
Next we see how linear maps behave for these constructions. Suppose
that V1 and V2 are two vector spaces and that G : V1 → V2 is a linear map.
Given an element F of V2∗ , in other words a linear map F : V2 → k, we can
consider the composition on the right with G: define
G∗ (F ) = F ◦ G : V1 → k.
23
Hom(k n , k m ) = Mm,n (k) to Hom(k m , k n ) = Mn,m (k). In other words, we
have a natural way to associate an n × m matrix to the m × n matrix A. It
is easy to check that this is the transpose matrix
t
A = (aji ).
(G2 ◦ G1 )∗ (F ) = F ◦ (G2 ◦ G1 ) = (F ◦ G2 ) ◦ G1 ,
The change in the order above is just a fact of life, and it is the only
possible formula along these lines if we want to line up domains and ranges
in the right way. However, there is one case where we can keep the original
order of composition. Suppose that V1 = V2 = V , say, and that G is
invertible, i.e. is a linear isomorphism. Then we have the operation G 7→
G−1 , and it also reverses the order of operations: (G1 ◦ G2 )−1 = G−1 −1
2 ◦ G1 .
∗ −1
So if we combine this with G 7→ G , we get a function G 7→ (G ) , from ∗
G · F = F ◦ (G−1 ).
24
(Note: a calculation shows that, if G is invertible, then G∗ is invertible, and
in fact we have the formula (G∗ )−1 = (G−1 )∗ , so that these two operations
commute.) In this case, the order is preserved, since, if G1 , G2 are two
isomorphisms, then
((G1 ◦ G2 )−1 )∗ = (G−1 −1 ∗ −1 ∗ −1 ∗
2 ◦ G1 ) = (G1 ) ◦ (G2 ) ,
25
Proposition 9.8. The functions G∗ ◦ H∗ : Hom(V2 , W1 ) → Hom(V1 , W2 )
and H∗ ◦ G∗ : Hom(V2 , W1 ) → Hom(V1 , W2 ) agree.
E(f, v) = f (v),
26
Another example of duality arises as follows: Given two vector spaces V
and W and F ∈ Hom(V, W ), we have defined F ∗ : Hom(W, k) = W ∗ →
Hom(V, k) = V ∗ , and the linear map Hom(V, W ) → Hom(W ∗ , V ∗ ) de-
fined by F 7→ F ∗ is an isomorphism. Repeating this construction, given
F ∈ Hom(V, W ) we have defined F ∗ ∈ Hom(W ∗ , V ∗ ), and thus F ∗∗ ∈
Hom(V ∗∗ , W ∗∗ ). If V and W are finite dimensional, then V ∗∗ ∼ = V , by
an explicit isomorphism which is the inverse of ev, and similarly W ∗∗ ∼
= W.
Thus we can view F ∗∗ as an element of Hom(V, W ), and it is straightforward
to check that, via this identification,
F ∗∗ = F.
In fact, translating this statement into the case V = k n , W = k m , this
statement reduces to the identity tt A = A for all A ∈ Mm,n (k) of Remark 9.5.
10 Tensor products
Tensor products are another very useful way of producing new vector spaces
from old. We begin by recalling a definition from linear algebra:
Definition 10.1. Let V, W, U be vector spaces. A function F : V × W → U
is bilinear if it is linear in each variable when the other is held fixed: for all
w ∈ W , the function f (v) = F (v, w) is a linear function from V to U , and
all v ∈ V , the function g(w) = F (v, w) is a linear function from W to U .
Multilinear maps F : V1 × V2 × · · · × Vn → U are defined in a similar way.
Bilinear maps occur throughout linear algebra. For example, for any field
k we have the standard inner product, often denoted by h·, ·i : k n × k n → k,
and it is defined by: if v = (v1 , . . . , vn ) and w = (w1 , . . . , wn ), then
n
X
hv, wi = v i wi .
i=1
27
using the defining properties of bilinearity.
Another important example of a bilinear function is composition of linear
maps: given three vector spaces V, W, U , we have the composition function
and the statement that matrix multiplication distributes over matrix addi-
tion (on both sides) and commutes with scalar multiplication of matrices (in
the sense that (tA)B = A(tB) = t(AB)) is just the statement that matrix
multiplication is bilinear.
The tensor product V ⊗ W of two vector spaces is a new vector space
which is often most usefully described by a “universal property” with respect
to bilinear maps: First, for all v ∈ V , w ∈ W , there is a symbol v ⊗ w ∈
V ⊗ W , which is bilinear, i.e. for all v, vi ∈ V , w, wi ∈ W and t ∈ k,
28
2. Given vector spaces V1 , V2 , W1 , W2 and linear maps G : V1 → V2 ,
H : W1 → W2 , then there is an induced linear map, denoted G ⊗ H,
from V1 ⊗ W1 to V2 ⊗ W2 . It satisfies: for all v ∈ V1 and w ∈ W1 ,
S(v ⊗ w) = w ⊗ v.
Thus, as with direct sum (but unlike Hom), tensor product is sym-
metric.
(V1 ⊕ V2 ) ⊗ W ∼
= (V1 ⊗ W ) ⊕ (V2 ⊗ W ),
and similarly for the second factor, so that tensor product distributes
over direct sum.
V1 ⊗ (V2 ⊗ V3 ) ∼
= (V1 ⊗ V2 ) ⊗ V3 ∼
= V1 ⊗ V2 ⊗ V3 ,
29
Hom(V2 , W1 ) and L0 : V1∗ ⊗ W2 → Hom(V1 , W2 ) in the sense that the
following diagram is commutative:
L
V2∗ ⊗ W1 −−−−→ Hom(V2 , W1 )
∗
G∗ ⊗H y
yG ◦H∗
V1∗ ⊗ W2 −−−−
0
→ Hom(V1 , W2 ).
L
E(f, v) = f (v)
30