0% found this document useful (0 votes)
19 views30 pages

Linear Alg

Uploaded by

yaqi.huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

Linear Alg

Uploaded by

yaqi.huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Linear algebra

M340L lecture notes


Hans Koch

The recommended textbook for our class is “Linear Algebra and its Applications”
by D.C. Lay, fourth edition. This book will be referred to as [Lay]. In order to simplify
cross-referencing, the sections and subsections in these notes correspond roughly to those
in [Lay]. Exceptions are marked.

1. Linear equations

1.2. Row reduction


Here we solve a system of linear equations using the Gauss-Jordan algorithm. Most com-
putations in linear algebra are based on this algorithm, and we can even use it to prove
theorems. So it is important to fully understand it!
Let us start with a system of 4 equations with 4 unknowns x1 , x2 , x3 , and x4 .
system augmented matrix
 
3x1 + x2 + 7x3 + 2x4 = 13 , 3 1 7 2 : 13
2x1 − 4x2 + 14x3 − x4 = −10 , 2 −4 14 −1 : −10 
 . (1.1)
5x1 + 11x2 − 7x3 + 8x4 = 59 , 5 11 −7 8 : 59
2x1 + 5x2 − 4x3 + 3x4 = 27 , 2 5 −4 3 : 27

The “augmented matrix” on the right is simply an abbreviation for the system on the left,
where we keep just the coefficients and constants.
The task is to find all solutions of this system. We will take advantage of the fact
that the system can be modified without changing the set of solutions, using elementary
row operations.
Definition 1.1. An elementary row operations is one of the following.
(i) Multiply a row (equation) by a nonzero number.
(ii) Add a multiple of one row (equation) to another.
(iii) Exchange two rows (equations).
We will now perform a sequence of such elementary row operations to simplify the
system in a specific way. Starting from the left and moving right, one column after another
will be transformed to a reduced form. (If the entire column is zero, then it is considered
reduced and we move on to the next column.) We will refer to this column as the column
to be reduced. Starting from the top and moving down, one row after another will be
designated to be the pivot row.
At this point the pivot row is the first. The goal is to have a 1 in this row, in the
column (presently the first) to be reduced. Right now the entry is a 3. So we could multiply

1
2 Hans Koch August 28, 2023

the first row by 1/3. To avoid fractions, let us instead add −1 times the second row to the
first. After this step, we have
 
x1 + 5x2 − 7x3 + 3x4 = 23 , 1 5 −7 3 : 23
2x1 − 4x2 + 14x3 − x4 = −10 , 2 −4 14 −1 : −10 
 . (1.2)
5x1 + 11x2 − 7x3 + 8x4 = 59 , 5 11 −7 8 : 59
2x1 + 5x2 − 4x3 + 3x4 = 27 , 2 5 −4 3 : 27

The leftmost 1 in the pivot row is called the pivot. (If the original matrix had a zero at
this position, then we would first exchange rows in order to get a nonzero entry into that
position.)
The next goal is to have all entries above and below the pivot (in the same column)
to be zero. This can be done by first adding −2 times the first row to the second, then add
−5 times the first row to the third, and finally add −2 times the first row to the fourth.
After these three steps, we have
 
x1 + 5x2 − 7x3 + 3x4 = 23 , 1 5 −7 3 : 23
− 14x2 + 28x3 − 7x4 = −56 , 0 −14 28 −7 : −56 
 . (1.3)
− 14x2 + 28x3 − 7x4 = −56 , 0 −14 28 −7 : −56
− 5x2 + 10x3 − 3x4 = −19 , 0 −5 10 −3 : −19

The first column is now in reduced form. No subsequent row operation will modify it.
Now the second column becomes the one to be reduced. And the new pivot row is the
second, since the first row already has a pivot. So the new pivot position is (2, 2), where
there is presently an entry −14. The goal is to turn this into a 1. But first, let us add −1
times the second row to the third. Then the new third row becomes identically zero. (The
corresponding equation is 0 = 0.) Such rows are meant to be moved to the bottom. So let
us exchange the third row (now all zeros) with the fourth row. We still have a −14 where
the next pivot 1 should be. So let us subtract 3 times the third row from the second. Now
we have
 
x1 + 5x2 − 7x3 + 3x4 = 23 , 1 5 −7 3 : 23
x2 − 2x3 + 2x4 = 1, 0 1 −2 2 : 1 
 . (1.4)
− 5x2 + 10x3 − 3x4 = −19 , 0 −5 10 −3 : −19
0 = 0, 0 0 0 0 : 0

Next, all non-pivot entries in the second column need to be converted to zero. This can be
done by adding −5 times the second row to the first, and then adding 5 times the second
row to the third. The resulting matrix (system) is
 
x1 + 3x3 − 7x4 = 18 , 1 0 3 −7 : 18
x2 − 2x3 + 2x4 = 1, 0 1 −2 2 : 1 
 . (1.5)
7x4 = −14 , 0 0 0 7 : −14
0 = 0, 0 0 0 0 : 0

The second column is now in reduced form as well.


M340L notes 3

Having pivot 1s in the first two rows, the new pivot row becomes the third. The
column to be reduced is the third. But there is no way to get a pivot 1 in this column. So
the third column is considered in reduced form, and the fourth becomes the new column
to be reduced. The entry that needs to be converted to a 1 is 7. So we multiply the third
row by 1/7. The new third row is [0 0 0 1 : −2]. Now add −2 times this row to the second
row, and then add 7 times this row to the first row. The result is
 
x1 + 3x3 = 4, 1 0 3 0 : 4
x2 − 2x3 = 5, 0 1 −2 0 : 5 
 . (1.6)
x4 = −2 , 0 0 0 1 : −2
0 = 0, 0 0 0 0 : 0

Now we are done: the matrix (system) is in reduced echelon form.


Definition 1.2. A matrix is said to be in reduced echelon form if
(1) The first nonzero entry (pivot) in each row is 1.
(2) Later rows have pivots to the right.
(3) Above and below pivots are all zeros.
(4) Zero rows are at the end.

Definition 1.3. xj is said to be a free variable if the j-th column of the reduced echelon
matrix contains no pivot 1.
In the system (1.6) we have a single free variable, namely x3 . The reason why x3 is
called a free variable becomes clear if we write (1.6) as

x1 = 4 − 3x3 ,
x2 = 5 + 2x3 , (1.7)
x4 = −2 .

We see immediately that the value of x3 can be chosen arbitrarily. After that, the values
of the other variables (those that have a pivot 1 in their column) are determined.
We give the following result without proof.
Theorem 1.4. Every matrix can be transformed to reduced echelon form via elementary
row operations. The reduced echelon form is unique.
The uniqueness part of this theorem allows us to characterize precisely what the
solution sets can be.
Possible solution sets. (The statements refer to the reduced echelon form.)
(1) The system has exactly one solution.
This is the case when every column to the left of “:” has a pivot.
(0) The system has no solution.
This happens if one of the rows is [0 0 . . . 0 : b] with b 6= 0. The corresponding system
includes the equation 0 = b. (In the reduced echelon form we have b = 1. But if we
obtain 0 = b during reduction, with b 6= 0, then there is no point in continuing.)
4 Hans Koch August 28, 2023

(∞) The system has infinitely many solutions.


This occurs when neither (1) nor (0) apply.
Definition 1.5. A matrix A is said to be row equivalent to a matrix B, in symbols A ∼ B,
if A can be transformed into B via a sequence of elementary row operations.
In other words, Theorem 1.4 states that every matrix is row equivalent to a unique
reduced echelon form matrix.
Example 1. Here we have a unique solution (x1 = 1, x2 = −1, x3 = 2).
   
4 −2 1 : 8 10 0: 1
1 1 1 : 2 ∼  0 1 0 : −1  . (1.8)
9 3 1:8 0 01: 2

Example 2. The following system has no solution.


   
11:2 11:2
∼ . (1.9)
11:3 00:1

Example 3. A system with infinitely many solutions is the one considered at the begin-
ning:
   
3 1 7 2 : 13 1 0 3 0 : 4
2 −4 14 −1 : −10  0 1 −2 0 : 5 
  ∼  . (1.10)
5 11 −7 8 : 59 0 0 0 1 : −2
2 5 −4 3 : 27 0 0 0 0 : 0

Remark 4. Notice that every elementary row operation can be reversed by another
elementary row operation (of the same type). So if A can be can be transformed into B
via a sequence of elementary row operations, then B can also be transformed into A via a
sequence of elementary row operations.

1.3. Vector equations


In what follows, a scalar is a real number. Later on, we will also consider complex numbers.
Let m be a fixed but arbitrary positive integer. A vector in Rm is an m-tuple of
scalars, like
     
u1 v1 0
 u2   v2  0
u=  ...  ,
 v=  ...  ,
 0=  ...  .
 (1.11)

um vm 0
The set of all such vectors is denoted by Rm . The last vector in this equation is called the
zero vector in Rm . One defines vector addition, multiplication by scalars, and the negative
M340L notes 5

of a vector in Rm as follows:

     
u1 + v 1 cv1 −v1
 u2 + v 2   cv2   −v2 
u+v=
 .. ,
 cv = 
 ...  ,
 −v = 
 ...  .
 (1.12)
.
um + v m cvm −vm

Clearly, 0v = 0 and (−1)v = −v. One also defines u − v = u + (−v).

Example 5. Here are some operations with vectors in R2 .

        
−1 3 −2 −3 −5
2 − = + = . (1.13)
3 −2 6 2 8

Notice that these basic vector operations work independently on each component:
the k-th component of the result depends only on the k-th component(s) of the original
vector(s). So one performs basically m independent scalar operations. Using this fact, one
easily verifies the following vector space properties.

Theorem 1.6. For any vectors u, v, w in Rm , and for any real numbers c, d,
(1) u + v = v + u.
(2) (u + v) + w = u + (v + w).
(3) u + 0 = 0 + u = u.
(4) u + (−u) = (−u) + u = 0.
(5) c(u + v) = cu + cv.
(6) (c + d)u = cu + du.
(7) c(du) = (cd)u.
(8) 1u = u.

There are two basic but fundamental concepts in linear algebra. Both concern sets
of vectors. The first is the notion of spanning, and the second in the notion of linear
independence (covered later).

Definition 1.7. Let A = {a1 , a2 , . . . , an } be a set of n vectors in Rm . The subset of Rm


spanned by the vectors in A, also denoted by Span(A), is the set of all vectors in Rm that
can be written as
x1 a 1 + x2 a 2 + . . . + xn a n , (1.14)

for some numbers x1 , x2 , . . . , xn . Such a vector (1.14) is also called a linear combination
of the vectors in A.

For vectors in R3 , a single nonzero vector spans a line; and when combined with a
6 Hans Koch August 28, 2023

second vector a2 that is not a multiple of a1 , the two span a plane:

a1

a1

a2

(1.15)
 
Span {a1 } (a line) Span {a1 , a2 } (a plane)

For some explicit computations, consider the following two vectors in R3 .


   
1 2
a1 =  −2  , a2 = 5  .
 (1.16)
−5 6

Among the many vectors spanned by the set {a1 , a2 } is


     
1 2 7
3a1 + 2a2 = 3  −2  + 2  5  =  4  . (1.17)
−5 6 −3

In order to determine whether some given vector b ∈ R3 lies in the span of {a1 , a2 }, we
have to find numbers x1 and x2 , if possible, such that

x1 a 1 + x2 a 3 = b . (1.18)
 4 
Say b = 1 . Then the equation for x1 and x2 is
−4

     
1 2 4
x1  −2  + x2  5  =  1  , (1.19)
−5 6 −4

or    
x1 + 2x2 4
 −2x1 + 5x2  =  1  , (1.20)
−5x1 + 6x2 −4
M340L notes 7

or
x1 + 2x2 = 4 ,
−2x1 + 5x2 = 1 , (1.21)
−5x1 + 6x2 = −4 .
This system can be solved by reducing the augmented matrix
 
1 2: 4
[a1 a2 : b] =  −2 5 : 1  . (1.22)
−5 6 : −4

More generally, given a set of vectors A = {a1 , a2 , . . . , an } in Rm , and a vector b ∈ Rm ,


consider the problem of determining whether b belongs to the span of A. This leads to . . .

1.4. The matrix equation Ax = b


A m × n matrix A is an array
A = [a1 a2 · · · an ] , (1.23)
composed of n column vectors a1 , a2 , . . . , an ∈ Rm .
Remark 6. The matrix (1.23) and the set A = {a1 , a2 , . . . , an } are not the same thing!

Definition 1.8. The product of an m × n matrix A = [a1 a2 · · · an ] with a vector


 
x1
 x2 
x=  ..  ∈ Rn (1.24)
.
xn

is the vector
Ax = x1 a1 + x2 a2 + . . . + xn an ∈ Rm . (1.25)

Example 7. A product of a 3 × 2 matrix with a vector in R2 .


       
1 2   1 2 4
 −2 5  2 = 2  −2  + 1  5  =  1  . (1.26)
1
−5 6 −5 6 −4

Notice that the equation x1 a1 + x2 a2 = b in (1.19) is the same as Ax = b, where


A = [a1 a2 ] is the 3 × 2 matrix whose columns are the vectors (1.16). Solving this equation
amounts to reducing the augmented matrix [a1 a2 : b] given in (1.22).
More generally, given a m × n matrix A = [a1 a2 · · · an ] and a vector b ∈ Rm , the
matrix equation
Ax = b (1.27)
8 Hans Koch August 28, 2023

for a vector x ∈ Rn is the same as the vector equation

x1 a1 + x2 a2 + . . . + xn a n = b , (1.28)

which is the same as the system of linear equations represented by the augmented matrix

[a1 a2 · · · an : b] . (1.29)

1.5. Solution sets


Homogeneous equations
Definition 1.9. The equation Ax = b is said to be homogeneous if b = 0. Otherwise the
equation is said to be inhomogeneous.
Notice that the homogeneous equation

x1 a1 + x2 a2 + . . . + xn an = Ax = 0 (1.30)

always has the trivial solution x = 0. It has a nontrivial solution x 6= 0 if and only if the
system [a1 a2 · · · an : 0] has a free variable.
    
1 2 x1 0
Example 8. The equation = only has the solution x = 0, since
2 3 x2 0
     
12:0 1 2 :0 10:0
∼ ∼ . (1.31)
23:0 0 −1 : 0 01:0

   x1   
3 1 0 1 0
 x2   
Example 9. The equation 0 −2 12 −8   = 0 has nontrivial solutions, since
 
x3
2 −3 22 −14 0
x4
     
3 1 0 1 :0 1 4 −22 15 : 0 1 0 2 −1 : 0
 0 −2 12 −8 : 0  ∼  0 −2 12 −8 : 0  ∼ . . . ∼  0 1 −6 4 : 0  . (1.32)
2 −3 22 −14 : 0 0 −11 66 −44 : 0 0 0 0 0 :0

Here we have two free variables, namely x3 and x4 . After choosing arbitrary values x3 = s
and x4 = t for these free variables, we obtain

x1 = −2s + t , −2
 
1

x2 = 6s − 4t ,  6   −4 
x = s  + t . (1.33)
x3 = s , 1 0
x4 = t . 0 1
M340L notes 9

This is also called the parametric form of the solution.


   
−2 1
Remark 10. Let u =  6  and v =  .
−4
By setting s = 1 and t = 0 in (1.33), we
1 0
0 1
see that x = u is a solution of the homogeneous equation in Example 9. Similarly, setting
s = 0 and t = 1, we see that x = v is also a solution. The equation (1.33) shows that any
linear combination x = su + tv of the vectors u and v is a solution. In fact, the solution
set is spanned by {u, v} in this example.
A useful property of the matrix-vector product is the following.
Theorem 1.10. For any m× n matrix A, any two vectors u, v in Rn , and any real number
c,
(a) A(u + v) = Au + Av.
(b) A(cv) = c(Av).

Proof. Denote by uj and vj the components of u and v, respectively. Then the components
of u + v are uj + vj . By using the vector space properties in Theorem 1.6, we obtain

A(u + v) = (u1 + v1 )a1 + (u2 + v2 )a2 + . . . + (un + vn )an


= (u1 a1 + u2 a2 + . . . + un an ) + (v1 a1 + v2 a2 + . . . + vn an )
= Au + Av .

This proves (a). Concerning (b), notice that the components of cu are cuj . Using again
Theorem 1.6, we obtain

A(cv) = (cv1 )a1 + (cv2 )a2 + . . . + (avn )an


= c(v1 a1 + v2 a2 + . . . + vn an ) = c(Av) .

This proves (b). QED

This implies the following important fact about the solution set of homogeneous equa-
tions.
Theorem 1.11. If x = u and x = v are solutions of the homogeneous equation Ax = 0,
then so is every linear combination x = su + tv.

Proof. Assume that Au = 0 and Av = 0. Using Theorem 1.10, we obtain

A(su + tv) = A(su) + A(sv) = s(Au) + t(Av) = s0 + t0 = 0 , (1.34)

as claimed. QED

Inhomogeneous equations
10 Hans Koch August 28, 2023

Theorem 1.12. Assume that the equation Ax = b has a solution x = p. (The letter “p”
stands for “particular” solution.) Then the solution set of the equation Ax = b is precisely
the set of all vectors x = p+h, where h is a solution of the homogeneous equation Ax = 0.

Proof. Assume that Ap = b.


If h satisfies the homogeneous equation Ah = 0, then the sum x = p + h satisfies the
equation Ax = A(p + h) = Ap + Ah = b + 0 = b.
Conversely, if x satisfies the equation Ax = b, then the difference h = x − p satisfies
the equation Ah = A(x − p) = Ax − Ap = b − b = 0. QED

Example 11. Consider


   
3 1 7 2 13
2 −4 14 −1   −10 
A= , b= . (1.35)
5 11 −7 8 59
2 5 −4 3 27
Notice that the equation Ax = b is the system (1.2) that we solved by reducing the
augmented matrix [A : b]. In (1.7) we found that
x1 = 4 − 3s ,
x2 = 5 + 2s ,
(1.36)
x3 = s ,
x4 = −2 ,
which we can also write as
    
x1 4 −3
x   5   2 
x= 2= +s . (1.37)
x3 0 1
x4 −2 0
| {z } | {z }
p hs
Geometrically, the solution set of the homogeneous equation Ax = 0 is a line through the
origin 0. The vector hs lies on this line, for any value of s. As s is varied, x = hs moves
along this line.
The solution set of the inhomogeneous equation Ax = b is a line through p. The
vector p + hs lies on this line, for any value of s. As s is varied, x = p + hs moves along
this line.

p + hs Ax = b
p

Ax = 0
hs

(1.38)
M340L notes 11

Theorem 1.13. Let A be a m × n matrix. Then the following are equivalent.


(a) The column vectors of A span Rm , that is, their span is all of Rm .
(b) The equation Ax = b has a solution for every b ∈ Rm .
(c) The reduced echelon matrix for A has a pivot 1 in every row.

Proof. Clearly (a) and (b) are equivalent.


Let B be the reduced echelon matrix for A. In order to show that (c) implies (b),
assume that B has a pivot in every row. Then B cannot have a row of zeros, so the system
[B : v] has a solution for every v ∈ Rm . Thus, the system [A : u] has a solution for every
u ∈ Rm .
To show that (b) implies (c), assume that B has no pivot in the last row. Then that
last row is zero. Let v be a vector in Rm whose last component vm is nonzero. Then the
system [B : v] has no solution. Now recall from Remark 4 that [B : 0] can be transformed
to [A : 0] via a sequence of elementary row operations. Applying the same sequence of
elementary row operations to [B : v] yields a system [A : u] for some u ∈ Rm . Since [B : v]
has no solution, [A : u] has no solution either. QED

Corollary 1.14. A set {a1 , a2 , . . . , an } in Rm with n < m vectors cannot span Rm .

Proof. Consider A = [a1 a2 · · · an ]. The reduced echelon matrix for A can have at most
n pivots. If n < m then it has fewer than m pivots, meaning that there cannot be a pivot
in every row. QED

1 4
Example 12. The vectors 2 and 5 do not span R3 .
3 6

1.7. Linear independence


Whether or not the homogeneous equation Ax = 0 has only the trivial solution x = 0 can
be viewed as a property of the column vectors of A.
Recall that, if A = [a1 a2 · · · an ], then
Ax = x1 a1 + x2 a2 + . . . + xn an . (1.39)

Definition 1.15. A collection of vectors a1 , a2 , . . . , an ∈ Rm is called linearly independent


if the equation
x1 a1 + x2 a 2 + . . . + xn an = 0 (1.40)
only has the trivial solution x1 = x2 = . . . = xn = 0. Otherwise, these vectors are said to
be linearly dependent.
0 1  4 
Example 13. The vectors a1 = 1 , a2 = 2 , and a3 = −1 are linearly indepen-
5 8 5
dent, as the following shows (there are no free variables).
     
0 1 4 :0 1 2 −1 : 0 1 2 −1 : 0
 1 2 −1 : 0  ∼  0 1 4 : 0  ∼  0 1 4 : 0  ∼ . . . (1.41)
5 8 5 :0 0 −2 10 : 0 0 0 18 : 0
12 Hans Koch August 28, 2023

1 4 2


Example 14. The vectors a1 = 2 , a2 = 5 , and a3 = 1 are linearly dependent,
3 6 0
as the following shows.
   
1 42:0 1 0 −2 : 0 x1 − 2x3 = 0 ,
2 5 1 : 0 ∼ ... ∼ 0 1 1 : 0 , x2 + x3 = 0 , (1.42)
3 60:0 0 0 0 :0 0 = 0.
We can choose e.g. x3 = 1. Then x1 = 2 and x2 = −1. So we get the linear dependence
relation        
1 4 2 0
2 2 − 1 5 + 1 1 = 0 .
       (1.43)
3 6 0 0

Theorem 1.16. Let A be a m × n matrix. Then the following are equivalent.


(a) The column vectors of A are linearly independent.
(b) The equation Ax = 0 only has the trivial solution x = 0.
(c) The reduced echelon matrix for A has a pivot 1 in every column (no free variable).

Proof. Write A = [a1 a2 · · · an ]. The equivalence of (a) and (b) follows from the fact that
the equation x1 a1 + x2 a2 + . . . + xn an = 0 is the same as Ax = 0. And the equivalence of
(b) and (c) follows from the fact that the system [A : 0] has a unique solution if and only
if it has no free variable. QED

Corollary 1.17. Any set {a1 , a2 , . . . , an } in Rm containing n > m vectors is linearly


dependent.

Proof. Consider A = [a1 a2 · · · an ]. If n > m then the system Ax = 0 has more variables
than equations. So there must be a free variable; in other words, the equation Ax = 0 has
a nontrivial solution. QED

h i h i h i
Example 15. The vectors 12 , 23 , and 34 are linearly dependent.

1.8. Linear transformations


A transformation T : Rn → Rm can be viewed as a “rule” that assigns to every vector
x ∈ Rn a vector T (x) ∈ Rm . (Other words for “transformation” are “mapping”, “map”,
“function”, and “operator”.) The range of T , also denoted by Range(T ), is the set of all
vectors b ∈ Rm for which the equation T (x) = b has a solution x ∈ Rn .
Definition 1.18. A transformation T : Rn → Rm is said to be linear if for any vectors
u, v ∈ Rn and any real number c,
(a) T (u + v) = T (u) + T (v).
(b) T (cv) = cT (v).
A trivial consequence of linearity is that T maps the zero vector in Rn to the zero
vector in Rm . This is seen by setting c = 0 in (b).
M340L notes 13

We already have plenty of examples of linear transformations. Theorem 1.10 implies


the following.
Corollary 1.19. If A is any m × n matrix, then the map T : Rn → Rm defined by the
equation T (x) = Ax is linear.
1 2
Example 16. The 3 × 2 matrix A = [a1 a2 ] with columns a1 = 2 and a2 = 3
3 5
defines a linear transformation T : R2 → R3 
via   
1 2
T (x) = Ax = x1 2 + x2 3  .
   (1.44)
3 5
The range of T is the plane in R3 spanned by the vectors a1 and a2 .
If T is an arbitrary linear transformation from Rn to Rm , then a repeated application
of the property (a) in Definition 1.18 yields

T (u1 + u2 + . . . + uk ) = T (u1 ) + T (u2 + . . . + uk )


= ... (1.45)
= T (u1 ) + T (u2 ) + . . . + T (uk ) .

Setting uj = cj vj for each j and applying the property (2), we obtain

T (c1 v1 + c2 v2 + . . . + ck vk ) = c1 T (v1 ) + c2 T (v2 ) + . . . + ck T (vk ) . (1.46)

In particular, if we know the image under T of each of the vectors v1 , v2 , . . . , vk , then we


know the image under T of every linear combination c1 v1 + c2 v2 + . . . + ck vk .
Example 17. Let T be some fixed but arbitrary linear transformation
 0 from R2 to Rm .
1
Suppose we know the images under T of the two vectors 0 and 1 , say
h i h i
1 0
T 0 = a1 , T 1 = a2 . (1.47)

2
Then we know theh image
i under
h i T of hevery
i vector x ∈ R . To see why, write x as a linear
combination x = xx1 = x1 10 + x2 01 . Using that T is linear, we find that
2

 h i h i h i h i
1 0 1 0
T (x) = T x1 0 + x2 1 = x1 T 0 + x 2 T 1 = x1 a 1 + x2 a 2 . (1.48)

Notice that T (x) = Ax for the matrix A = [a1 a2 ].

1.9. The matrix of a linear transformation


Example 17 shows that for every linear transformation T : R2 → Rm we can find an m × 2
matrix A, such that T (x) = Ax for every vector x ∈ R2 .
14 Hans Koch August 28, 2023

We can generalize this to a linear transformation T : Rn → Rm by writing a vector


x ∈ Rn as a linear combination
 x1  1 0 0
 xx2   00   10   0. 
3 
x =  .  = x1  .  +x2  .  + . . . + xn 
    
 ..  .
 (1.49)
. . . . . 0
.
xn 0 0 1
| {z } | {z } | {z }
e1 e2 en

Theorem 1.20. Let T : Rn → Rm be a linear transformation. Then there exist a unique


m × n matrix A such that T (x) = Ax for all x ∈ Rn . In fact,
 
A = T (e1 ) T (e2 ) · · · T (en ) . (1.50)

Proof. Set aj = T (ej ) for all j. Here, and in what follows, we assume that 1 ≤ j ≤ n.
By (1.46), we have

T (x) = T (x1 e1 + x2 e2 + . . . + xn en )
= x1 T (e1 ) + x2 T (e2 ) + . . . + xn T (en ) (1.51)
= x1 a1 + x2 a2 + . . . + xn an = Ax .

This shows that T (x) = Ax for all x ∈ Rn , with A as described in (1.50).


In order to prove uniqueness, let B = [b1 b2 · · · bn ] be any m × n matrix with the
property that T (x) = Bx for all x ∈ Rn . Then bj = Bej = T (ej ) = aj for all j. In other
words, the column vectors of B are the same as the column vectors of A. So B = A, and
uniqueness is proved. QED

Example 18. Let T : R3 → R3 be a linear rotation in R3 by an angle 2π/3 about


1 T T T
the axis in the direction of 1 . It exchanges the vectors e1 7−→ e3 7−→ e2 7−→ e1 . So the
1
corresponding matrix is  
010
A = [e3 e1 e2 ] =  0 0 1  . (1.52)
100

Definition 1.21. A transformation T is said to be one-to-one (or injective) if T (u) = T (v)


implies that u = v. In other words, T : Rn → Rm is one-to-one if the equation T (x) = b
has at most one solution x ∈ Rn for any given vector b ∈ Rm .

Remark 19. The spaces Rm for different values of m have no vectors in common. Thus,
whenever confusion could arise, the zero vector in Rm will be denoted by 0m .

Theorem 1.22. Let T : Rn → Rm be linear. Then T is one-to-one if and only if the


equation T (x) = 0m has only the trivial solution x = 0n .
M340L notes 15

Proof. Assume first that T is one-to-one. Then the equation T (x) = b has at most one
solution x ∈ Rn , for any given b ∈ Rm . In the case b = 0m , one solution is x = 0n , so
this must be the only solution.
Conversely, assume that 0n is the only vector in Rn that is mapped to 0m by T .
Consider now the equation T (x) = b for a given b ∈ Rm . If x = u and x = v are both
solutions of this equation, then by linearity, T (u − v) = T (u) − T (v) = b − b = 0m . So
we must have u − v = 0n , or equivalently, u = v. This shows that T is one-to-one. QED

Example 20. The linear transformation T : R2 → R3 , defined by


 
1 2  
x
T (x) =  1 −1  1 (1.53)
x2
−2 −1
is one-to-one, since the equation T (x) = 0 only has the trivial solution, as the following
reduction shows:    
1 2 :0 1 0:0
 1 −1 : 0 ∼ 0 1 : 0 . (1.54)
−2 −1 :0 0 0:0

Definition 1.23. A transformation T : Rn → Rm is said to be onto (or surjective) if its


range is all of Rm . In other words, T is onto if the equation T (x) = b has a solution
x ∈ Rn for any given b ∈ Rm .

Example 21. The transformation


1 T : R2 → R3 defined by (1.53) is not onto. In
particular, the vector b = 1 does not belong to the range of T , meaning that the
1
equation T (x) = b has no solution.
 Geometrically,
  the range of T is the plane in R3
1 2
spanned by the two vectors 1 and −1 . And the above-mentioned vector b does not
−2 −1
lie in this plane.

Theorem 1.24. Let T : Rn → Rm be linear. Let A be the standard matrix for T .


(a) T is onto if and only if the column vectors of A span Rm .
(b) T is one-to-one if and only if the column vectors of A are linearly independent.

Proof. The claim (a) follows from the Definition 1.8, since the range of T is the span of
the column vectors of A.
Consider now the claim (b). By Theorem 1.22, T is one-to-one if and only if the
equation Ax = 0 has only the trivial solution x = 0, which is the same as saying that the
column vectors of A are linearly independent. QED

Example 22. Consider the linear transformation T : R3 → R2 defined by the matrix


 
123
A= . (1.55)
234
16 Hans Koch August 28, 2023

 
Reducing 12 23 shows that the first two columns of A span R2 . So the column vectors of
A span R2 , implying that T is onto. But T is not one-to-one, since three vectors in R2 are
always linearly dependent, by Corollary 1.17.

2. Matrix algebra

2.0. Some special matrices


First some notation. It is customary to omit square brackets when they enclose only a
single scalar. In particular, a vector a = [a] in R1 is identified with the number a ∈ R.
Using this notation, here is 1 × n matrix A, a m × 1 matrix B, and a 1 × 1 matrix C:

b1
 b2 
A = [a1 a2 · · · an ] , B = [b] = 
 ...  ,
 C = [c] = c . (2.1)

bm

The 1 × n matrix A represents a linear transformation from Rn to R: a vector x ∈ Rn is


mapped to the number x1 a1 + x2 a2 + . . . + xn an . The m × 1 matrix B represents a linear
transformation from R to Rm : a number x is mapped to the vector xb in Rm . The 1 × 1
matrix C represents a linear transformation from R to R: a number x is mapped to the
number cx.
A trivial linear transformation from Rn to Rm is the transformation T that maps
every vector in Rn to the zero vector in Rm . The corresponding m × n matrix is called
the zero matrix. Since T (ej ) = 0 for all j, we have
 
0 0 ··· 0
0 0 ··· 0 
0 = [0 0 · · · 0] = 
 ... .. . . ..  . (2.2)
. . . 
0 0 ··· 0

Another trivial linear transformation from Rm to Rm is the identity map T (x) = x.


The corresponding matrix is called the m × m identity matrix and is denoted by Im . Since
T (ej ) = ej for all j, we have
 
1 0 ··· 0
0 1 0 
Im = [e1 e2 · · · em ] = 
 ... .. . (2.3)
. 
0 0 1
M340L notes 17

2.1. Matrix operations


We can define matrix addition, multiplication by scalars, and the negative of a matrix as
follows. If A = [a1 a2 · · · an ] and B = [b1 b2 · · · bn ] are both m × n matrices, then

A + B = [a1 + b1 a2 + b2 · · · an + bn ] ,
cB = [cb1 cb2 · · · cbn ] , (2.4)
−B = [−b1 − b2 · · · − bn ] ,

where c can be any scalar. Clearly 00 = 0 and (−1)B = −B. One also defines A − B =
A + (−B).
Example 23. Here are some operations with 2 × 2 matrices.
         
−1 3 3 −2 −2 6 −3 2 −5 8
2 − = + = . (2.5)
2 0 −1 1 4 0 1 −1 5 −1

The operations (2.4) are just vector operations, performed independently on each
column. The j-th column of the resulting matrix only depends on the j-th column(s) of
the original matrix (matrices). Using this fact, one easily verifies the following.
Theorem 2.1. For any m × n matrices A,B,C, and for any scalars c, d,
(1) A + B = B + A.
(2) (A + B) + C = A + (B + C).
(3) A + 0 = 0 + A = A.
(4) A + (−A) = (−A) + A = 0.
(5) c(A + B) = cA + cB.
(6) (c + d)A = cA + dA.
(7) c(dA) = (cd)A.
(8) 1A = A.
We will get back to these vector space properties later. More important at this point
are properties that arise from the fact that matrices represent linear transformations.
Consider a n × p matrix B = [b1 b2 · · · bp ] and a m × n matrix A. The matrix B
defines a transformation from Rp to Rn , and the matrix A defines a transformation from
Rn to Rm . The two transformations can be composed:

B· A·
x ∈ Rp 7−−→ Bx ∈ Rn 7−−→ A(Bx) ∈ Rm . (2.6)

We will now show that the composed transformation x 7→ A(Bx) from Rp to Rm is


associated with a m × p matrix. This matrix will be called AB.
More specifically, we have

A(Bx) = A(x1 b1 + x2 b2 + . . . + xp bp )
= x1 Ab1 + x2 Ab2 + . . . + xk Abp (2.7)
= [Ab1 Ab2 · · · Abp ]x .
18 Hans Koch August 28, 2023

This shows that A(Bx) = (AB)x, where AB is the following matrix.


Definition 2.2. The product of a m × n matrix A with a n × p matrix B = [b1 b2 · · · bp ]
is defined as
AB = A[b1 b2 · · · bp ] = [Ab1 Ab2 · · · Abp ] . (2.8)

Example 24. Let  


  3 5
2 −4 1
A= , B = [b1 b2 ] =  1 2  . (2.9)
2 1 −2
2 3
Then the product AB = [Ab1 Ab2 ] is given by
       
2 −4 1 4
Ab1 = 3 +1 +2 = ,  
2 1 −2 3 4 5
        AB = . (2.10)
2 −4 1 5 3 6
Ab2 = 5 +2 +3 = ,
2 1 −2 6

Remark 25. The product of an 1 × n matrix A = [a1 a2 · · · an ] with a n × 1 matrix


B = [ b ] is the 1 × 1 matrix

AB = [Ab] = [b1 a1 + b2 a2 + . . . + bn an ] . (2.11)

Identifying a 1 × 1 matrix [c] with the number c, we will also write this as
 
b1
 b2 
[a1 a2 · · · an ] 
 ...  = a1 b1 + a2 b2 + . . . + an bn .

bn

Theorem 2.3. If A is any m × n matrix, B and B ′ any n × p matrices, C any p × q matrix,


and c any scalar, then
(a) A(BC) = (AB)C.
(b) A(B + B ′ ) = AB + AB ′ .
(c) (B + B ′ )C = BC + B ′ C.
(d) c(AB) = (cA)B = A(cB).
(e) Im A = A = A In .

Proof. Denote the column vectors of A, B, B ′ , and C, by aj , bj , b′j , and cj , respectively.


Consider first the claim (a). Using the Definition 2.2 of a matrix product, and the
identity A(Bx) = (AB)x that follows from (2.7), we obtain

A(BC) = A[· · · Bcj · · ·] = [· · · A(Bcj ) · · ·] = [· · · (AB)cj · · ·] = (AB)C . (2.12)


M340L notes 19

Next, consider the claim (b). Using the definitions (2.4) and (2.8), together with
Theorem 1.10, we obtain

A(B + B ′ ) = A[· · · bj + b′j · · ·] = [· · · A(bj + b′j ) · · ·] = [· · · Abj + Ab′j · · ·]


(2.13)
= [· · · Abj · · ·] + [· · · Ab′j · · ·] = AB + AB ′ .

The remaining identities are equally straightforward to prove, so we leave this task as
an exercise. QED

Warning 26. If AB is well defined, then BA need not be defined. For example, if A is
4 × 2 and B is 2 × 3, then AB is 4 × 3, but BA is not defined. Even if AB and BA are
both defined, they can have different dimensions. For example if A is 3 × 2 and B is 2 × 3,
then AB is 3 × 3 and BA is 2 × 2. Even if AB and BA are both defined and have the
same dimensions, then AB and BA need not agree!

Remark 27. The property (a) in Theorem 2.3 justifies writing ABC in place of A(BC)
or (AB)C.

Lemma 2.4. Let A be a m × n matrix, B a n × p matrices, and C a p × q matrix.


(1) If the transformation x 7→ ABx is onto, then so is y 7→ Ay.
(2) If the transformation x 7→ BCx is one-to-one, then so is x 7→ Cx.

Proof. Assume first that x 7→ ABx is onto. Let z be any vector in Rn . Then the equation
ABx = z has a solution x ∈ Rp . Setting y = Bx, we have Ay = ABx = z. This shows
that the transformation y 7→ Ay is onto.
Next, assume that x 7→ BCx is one-to-one. Let x ∈ Rq be a solution of Cx = 0p .
Then BCx = B0p = 0n . Since x 7→ BCx is one-to-one, we must have x = 0q . This shows
that the transformation x 7→ Cx is one-to-one. QED

Row and column indices


It is of course possible to reduce matrix operations directly to scalar operations, with-
out passing through vector operations for the columns. To this end, let is write an m × n
matrix A as A 
1,1 A1,2 · · · A1,n
 A2,1 A2,2 · · · A2,n 
A=  .. .. .. , (2.14)
. . .
Am,1 Am,2 . . . Am,n
↑ ↑ ↑
a1 a2 an
where the matrix entry Ai,j in row i and column j is the i-th component (aj )i of the j-th
column vector aj . Let B be a second m × n matrix, with entries Bi,j . Then the definition
(2.4) is equivalent to

(A + B)i,j = Ai,j + Bi,j , (cA)i,j = cAi,j , (−A)i,j = −Ai,j , (2.15)


20 Hans Koch August 28, 2023

for all positive i ≤ m and j ≤ n. This is completely analogous to the vector operations
defined in (1.12).
First, let us consider the product Ax of an m × n matrix A with a vector x ∈ Rn .
The i-th component of Ax is given by

(Ax)i = (x1 a1 + x2 a2 + . . . + xn an )i
(2.16)
= x1 (a1 )i + x2 (a2 )i + . . . + xn (an )i .

Using that Ai,j = (aj )i , we obtain the formula

(Ax)i = Ai,1 x1 + Ai,2 x2 + . . . + Ai,n xn . (2.17)

Another way of writing this is


 
x1
 x2 
(Ax)i = [Ai,1 Ai,2 · · · Ai,n ] 
 ... 

(2.18)
xn
= Ai,1 x1 + Ai,2 x2 + . . . + Ai,n xn .

Next, we consider the product AB of a m × n matrix A with a n × p matrix B. Denote


the columns of A and B by aj and bj , respectively. Then the j-th column of AB is

Abj = (bj )1 a1 + (bj )2 a2 + . . . + (bj )n an , (2.19)

and the i-th component of this column is



(Abj )i = (bj )1 a1 + (bj )2 a2 + . . . + (bj )n an i
(2.20)
= (bj )1 (a1 )i + (bj )2 (a2 )i + . . . + (bj )n (an )i .

Written in terms of the matrix elements, we have

(AB)i,j = Ai,1 B1,j + Ai,2 B2,j + . . . + Ai,n Bn,j . (2.21)

Another way of writing this is


B 
1,j
 B2.j 
(AB)i,j = [Ai,1 Ai,2 · · · Ai,n ] 
 .. 

. (2.22)
Bn,j
= Ai,1 B1,j + Ai,2 B2,j + . . . + Ai,n Bn,j .

In other words, the (i, j) entry of AB is the product of the i-th row of A with the j-th
column of B.
M340L notes 21

Example 28. The product in Example 24 can also be written as



 3 5    
2 −4 1  2 · 3 + (−4)1 + 1 · 2 2 · 5 + (−4)2 + 1 · 3 4 5
1 2 =
 = . (2.23)
2 1 −2 2 · 3 + 1 · 1 + (−2)2 2 · 5 + 1 · 2 + (−2) · 3 3 6
2 3

Some loose ends


Powers of a n × n matrix A are defined for k ≥ 1 as

Ak = AA · · · A (k factors) . (2.24)

Given that A represents a linear transformation T : x 7→ Ax on Rn , the k-th power of A


represents the k-th iterate of the transformation T ,

T T T T
x 7−→ Ax 7−→ A2 x 7−→ . . . 7−→ Ak x . (2.25)

Example 29. Consider a rotation T in R2 by an angle ϑ. The column  vectors


 0 of the
1
corresponding matrix A are obtained by rotating the vectors e1 = 0 and e2 = 1 ,
 
 cos(ϑ)
 − sin(ϑ)
A = T (e1 ) T (e2 ) = . (2.26)
sin(ϑ) cos(ϑ)

We can either compute Ak = AA · · · A by using trigonometric identities, or simply use


(2.25) to get  
k cos(kϑ) − sin(kϑ)
A = . (2.27)
sin(kϑ) cos(kϑ)

Definition 2.5. The transpose of a m×n matrix A is the n×m matrix A⊤ whose elements
are (A⊤ )i,j = Aj,i for all positive i ≤ n and j ≤ m.

Example 30.  
  1 2
135
A= , A⊤ =  3 4  . (2.28)
246
5 6

Remark 31. It will become clear later what the significance of the transpose is. For
now, we use the transpose mainly in examples, and to simplify some notation.

Theorem 2.6. If A and C are any m × n matrices, B is any n × p matrix, and c any
scalar, then
(a) (A⊤ )⊤ = A.
(b) (A + C)⊤ = A⊤ + C ⊤ .
(c) (cA)⊤ = cA⊤ .
22 Hans Koch August 28, 2023

(d) (AB)⊤ = B ⊤ A⊤

Proof. The first three properties are straightforward to check and intuitively clear, so we
leave them as exercises.
Consider now (d). For all positive i ≤ p and j ≤ m, we have

((AB)⊤ )i,j = (AB)j,i = Aj,1 B1,i + Aj,2 B2,i + . . . + Aj,n Bn,i


= (B ⊤ )i,1 (A⊤ )1,j + (B ⊤ )i,2 (A⊤ )2,j + . . . + (B ⊤ )i,n (A⊤ )n,j (2.29)
= (B ⊤ A⊤ )i,j .

This shows that (AB)⊤ = B ⊤ A⊤ , as claimed. QED

The trace of a n × n matrix A is defined as the sum of the diagonal elements of A,

tr(A) = A1,1 + A2,2 + . . . + An,n . (2.30)

A noteworthy property of the trace is the following. Let A be a m × n matrix and B a


n × m matrix. Then AB and BA have the same trace:
m
X m X
X n
tr(AB) = (AB)i,i = Ai,j Bj,i
i=1 i=1 j=1
n Xm n
(2.31)
X X
= Bj,i Ai,j = (BA)j,j = tr(BA) .
j=1 i=1 j=1

2.2. Inverse of a matrix


Here we consider only square matrices, meaning matrices with the same number of rows
as columns.
Definition 2.7. A n × n matrix A is said to be invertible if there exists a n × n matrix C
with the property that
AC = I and CA = I , (2.32)
where I denotes the n × n identity matrix.
   
3 7 5 −7
Example 32. A= and C = satisfy (2.32).
2 5 −2 3
The following theorem lists six properties that are all equivalent. The equivalence of
(f ) with the others will not be proved until the end of this section. (And it will be used
only later.) But it is useful to have it listed here with all the others.
Theorem 2.8. Let A be an n × n matrix. Then the following are equivalent.
(a) The linear transformation x 7→ Ax is onto.
(b) The transformation x 7→ Ax is one-to-one.
M340L notes 23

(c) The column vectors of A span Rn .


(d) The column vectors of A are linearly independent.
(e) A is row equivalent to the identity.
(f ) A is invertible.

Proof. We already know from Theorem 1.24 that (a) is equivalent to (c) and that (b) is
equivalent to (d). The equivalence of (e) and (f ) will be proved later in this section.
In order to show that (c), (d) and (e) are equivalent, let B be the reduced echelon form
of A. By Theorem 1.13, the column vectors of A span Rm if and only if B has a pivot in
every row. And by Theorem 1.16, the column vectors of A are linearly independent if and
only if B has a pivot in every column. But B has the same number of rows as columns,
so having a pivot in each column is equivalent to having a pivot in each row. And in both
cases, B = I. QED

Before we can define “the inverse” of an invertible matrix, we need to prove that there
cannot be more than one.
Proposition 2.9. There is at most one matrix C that satisfies (2.32).

Proof. Assume that AC = CA = I and AC ′ = C ′ A = I. Then

C ′ = C ′ I = C ′ (AC) = (C ′ A)C = IC = C . (2.33)

QED

Definition 2.10. If A is an invertible n × n matrix, then the (unique) n × n matrix C


that satisfies (2.32) is called the inverse of A and is denoted by A−1 . So we have

A−1 A = I and AA−1 = I . (2.34)

Example 33. The matrix A in Example 18 satisfies A3 = I, as can be seen from the
corresponding transformation T . Thus, it is invertible, and A−1 = A2 .
Notice that the products CA and AC need not agree for general n × n matrices. But
if one is the identity matrix, then so is the other:
Lemma 2.11. Let A and C be n × n matrices. Then CA = I if and only if AC = I.

Proof. Assume first that AC = I. Then x → 7 ACx is the identity map on Rn , and in
particular, it is onto. So x →
7 Ax is onto, by Lemma 2.4. And it is one-to-one as well, by
Theorem 2.8.
Using Theorems 2.1 and 2.3, we have

A(CA − I)x = (AC − I)Ax = 0x = 0 , (2.35)

for every x ∈ Rn . Since x 7→ Ax is one-to-one, this implies that (CA − I)x = 0 for every
x ∈ Rn . This in turn implies that CA−I is the zero matrix, since the linear transformation
24 Hans Koch August 28, 2023

that maps every vector to 0 is represented by the zero matrix. But CA − I = 0 implies
CA = I.
The above proves that AC = I implies CA = I. The converse is obtained by exchang-
ing the matrices A and C. QED

For 2 × 2 matrices, we have the following explicit formula.


 
a b
Theorem 2.12. Let A = .
c d
(1) If ad − bc = 0, then A is not invertible.
(2) If ad − bc 6= 0, then A is invertible, and
 
−1 1 d −b
A = . (2.36)
ad − bc −c a

Proof. Consider first the claim (1). Assume that ad−bc = 0. If A = 0 then A is clearly not
invertible, since there is no
h 2×2
i matrix
h i C such that CA = I. Consider now the case A 6= 0.
d
Then one of the vectors −c or −b a is nonzero. Let x be one of the two that is nonzero.
An explicit computation shows that Ax = 0. But then (CA)x = C(Ax) = C0 = 0 holds
for any 2 × 2 matrix C. So we cannot have CA = I; otherwise (CA)x = Ix = x would be
nonzero.
Verifying claim (2) is a trivial computation: define C to be the right hand side of
(2.36), and then check that AC = I. QED

Remark 34. (basic logic) The fact that the formula (2.36) makes no sense if ad − bc = 0
does not prove (1).
Here are a few basic facts about inverses.
Theorem 2.13. Let A be an invertible n × n matrix. Then for every b ∈ Rn , the equation
Ax = b has a unique solution, namely x = A−1 b.

Proof. Concerning existence: if x = A−1 b, then Ax = A(A−1 b) = (AA−1 )b = Ib = b.


Concerning uniqueness: if Ax = b, then x = Ix = (A−1 A)x = A−1 (Ax) = A−1 b.
This shows that Ax = b if and only if x = A−1 b. QED

This theorem is very useful in practice, if the equation Ax = b has to be solved for
many different vectors b. If A is invertible and A−1 is known, then it is much faster to
compute A−1 b than to solve Ax = b.
Theorem 2.14. Let A and B be  invertible n × n matrices. Then
−1 −1 −1
(a) A is invertible, and A = A.
(b) AB is invertible, and (AB)−1 = B −1A−1 .

(c) A⊤ is invertible, and (A⊤ )−1 = A−1 .

Proof. The claim (a) follows from (2.34). The following proves (b).
    
(AB) B −1 A−1 = A B B −1 A−1 = A BB −1 A−1 = A IA−1 = AA−1 = I . (2.37)
M340L notes 25

The following uses Theorem 2.6 and proves (c).


⊤
A⊤ A−1 = A−1 A)⊤ = I⊤ = I . (2.38)

QED

Remark 35. If the expression in (b) for the inverse of a product looks unfamiliar, then
the following may help to understand it:
B A
x 7−−→ Bx 7−−→ z = ABx
B −1 A−1 . (2.39)
x= B −1 A−1 z ←−−7 A−1 z ←−−7 z

An algorithm for finding A−1


In what follows, all matrices are n × n. Given a matrix A, the goal is to find a matrix
C = [c1 c2 · · · cn ] that satisfies

AC = I ,
(2.40)
[Ac1 Ac1 · · · Acn ] = [e1 e2 · · · en ] .

Equivalently,
Ac1 = e1 Ac2 = e2 ... Acn = en . (2.41)
If A is row equivalent to the identity, then we can solve via row reduction:

[A : e1 ] [A : e2 ] ... [A : en ]
↓ ↓ ↓ (2.42)
[ I : c1 ] [ I : c2 ] ... [ I : cn ] .

Notice that the choice of row operations in the Gauss-Jordan algorithm only depends on
the part to the left of “:” in the augmented matrix. So all n reductions in (2.42) use the
same sequence of elementary row operations. This allows us to combine them all into one:

[A : I ]
↓ (2.43)
[ I : C] .

Given that the resulting matrix C is a solution of the equation AC = I, we find that A is
invertible, and that A−1 = C.
Example 36. With a bit of work one finds
" # " #
1 −4 1 : 1 0 0 1 0 0 : 3 5 7
1 1 −2 : 0 1 0 ∼ ··· ∼ 0 1 0 : 1 2 3 .
−1 1 1 :0 0 1 0 0 1 : 2 3 5 (2.44)
| {z } | {z }
A A−1
26 Hans Koch August 28, 2023

Finally, let us prove the equivalence of (e) and (f ) in Theorem 2.8.


Theorem 2.15. A square matrix is invertible if and only if it is row equivalent to the
identity.

Proof. Let A be an n × n matrix.


Assume first that A is row equivalent to the identity. Then row reduction (2.42) yields
a matrix C that satisfies CA = I. And AC = I as well, by Lemma 2.11, so A is invertible.
Next, assume that A is invertible. Then there exists an n × n matrix C such that
CA = I. The transformation x 7→ CAx is the identity map on R, and in particular, it is
one-to-one. So x 7→ Ax is one-to-one, by Lemma 2.4. Now Theorem 2.8 implies that A is
row equivalent to the identity. QED

3. Determinants
This part is short. Determinants are interesting but of limited practical use.

3.1. Definition of the determinant


The determinant det(A) of a square matrix A will be defined near the end of this section.
In order to motivate the definition, we first discuss what a determinant is meant to be.
Here it is useful to regard a n × n matrix A as an array of rows. The i-th row of A can be
considered a 1 × n matrix. Its transpose, which is a n × 1 matrix or vector in Rn , will be
called the i-th row vector of A.
The row vectors α1 , α2 , . . . , αn of a n × n matrix A define a parallelotope in Rn ,
namely the set of all vectors
x = x1 α1 + x2 α2 + . . . + xn αn (3.1)
n
with coordinates 0 ≤ xi ≤ 1 for all i. The 2 vertices (corners) of this parallelotope are
the points (3.1) with xi ∈ {0, 1} for all i. For n = 1 we get a line segment with endpoints
0 and α1 . For n = 2, we have a parallelogram with vertices 0, α1 , α2 , and α1 + α2 . For
n = 3 we have a parallelepiped with vertices 0, α1 , α2 , α3 , α1 + α2 , α1 + α3 , α2 + α3 ,
and α1 + α2 + α3 .
The absolute value of det(A) is meant to be the volume of the parallelotope defined
by A. (The volume in R1 is also called a length, and the volume in R2 is also called an
area.) As such, it should have the following properties.
First, the volume should not depend on the labeling of the vectors αi . In particular,
it should stay the same if we exchange two vectors: αi ↔ αk .
Second, the volume should be preserved under a shear, e.g. when a multiple of one
vector is added to another: αi + c αk → αi .
α 2 + c α1
α2

α1 α1 (3.2)
M340L notes 27

Third, if one of the vectors is scaled by a constant r, say rαi → αi , then the volume should
change by a factor |r|.
These are precisely the elementary row operations from Definition 1.1. This motivates
the following definition. In order to simplify notation, we write hii for the i-th row of A.
Definition 3.1. Denote by Rn×n the set of all n × n matrices. A function det : Rn×n → R
is called a determinant if it satisfies det(I) = 1 and transforms as follows under scalings,
shears, and row exchanges:
(i) scaling rhii → hii
det(new matrix) = r det(old matrix)
(ii) shear hii + chki → hii (k 6= i)
det(new matrix) = det(old matrix)
(iii) exchange hii ↔ hki (k 6= i)
det(new matrix) = − det(old matrix)

Remark 37. The requirement det(I) = 1 means that the “unit cube” in Rn defined by
the vectors e1 , e2 , . . . , en has volume 1.

Remark 38. In this definition, r can be any scalar, including zero. However, the scaling
in (i) is an elementary row operation only if r 6= 0. So when a matrix is being reduced via
elementary row operations, then the determinant cannot change from nonzero to zero, or
from zero to nonzero.
We state the following result without proof. (The existence part is clear, but the
uniqueness part requires some work.)
Theorem 3.2. There exists exactly one determinant function on Rn×n .

Remark 39. The obvious way to compute det(A) is via row reduction. While reducing
the matrix A, all we need to do is to keep track of the factors arising from (i) and (iii).

Notation 40. The determinant det(A) is also written as |A|.

3.2. Properties of the determinant

Remark 41. If A has a zero row, then det(A) = 0. Namely, if hii is zero, then a scaling
chii → hii with c = 0 does not change A, while the determinant gets multiplied by 0.

Theorem 3.3. Let A be an n × n matrix. Then det(A) 6= 0 if and only if A is row


equivalent to the identity matrix In (and thus invertible).

Proof. If A ∼ B then |A| and |B| are either both zero or both nonzero. Let now B be the
reduced echelon form matrix for A. If B = In , then |B| = 1 and thus |A| 6= 0. If B 6= In ,
then B has a zero row, implying |B| = 0 and thus |A| = 0. QED
28 Hans Koch August 28, 2023

Theorem 3.4. If A is an upper triangular matrix (meaning that Ai,j = 0 when i > j),
then
det(A) = A1,1 A2,2 · · · An,n . (3.3)

Proof. If Ai,i 6= 0 for all i, then by scaling one row after another, we obtain

A1,1 ∗ ··· ∗ 1 ∗ ··· ∗


.. .. .. ..
0 A2,2 . . 0 A2,2 . .
.. = A1,1 . .. ..
.. .. .. . .
. . . ∗ ∗
0 ··· 0 An,n 0 ··· 0 An,n
1 ∗ ··· ∗
.. ..
0 1 . . (3.4)
= A1,1 A2,2 . .. .. = ···
.. . . ∗
0 ··· 0 An,n
1 ∗ ··· ∗
.. ..
0 1 . .
= A1,1 A2,2 · · · An,n .. . . .. .
. . . ∗
0 ··· 0 1

The last determinant in this equation has the value 1, since the matrix can be reduced to
the identity via row operations of type (ii), which do not change the determinant.
If Ai,i = 0 for some i, then A is not reducible to the identity matrix, so |A| = 0 in
this case. QED

Remark 42. The above theorem shows that, in order to compute the determinant of a
general square matrix A, it suffices to reduce A to upper triangular form.
h i
Example 43. Consider A = ac db . The following shows that det(A) = ad − bc.
If a = c = 0, then A is not row equivalent to the identity matrix, so |A| = 0 = ad − bc.
Assume now that a 6= 0 or c 6= 0. If a 6= 0, then a shear h2i − ac h1i → h2i transforms
   
a b a′ b
A= −→ A = , (3.5)
c d 0 d − bc/a

so we have |A| = |A′ | = a(d − bc/a) = ad − bc. If c 6= 0, then a row exchange h1i ↔ h2i
followed by a shear h2i − ac h1i → h2i transforms
     
a b c′ d c′′ 0
A= −→ A = −→ A = , (3.6)
c d a b 0 b − ad/c

so we have |A| = −|A′ | = −|A′′ | = −c(b − ad/c) = ad − bc.


M340L notes 29

Example 44. With some more work than in the 2 × 2 case, one finds that

a b c
d e f = aei + bf g + cdh − ceg − bdi − af h . (3.7)
g h i

Remarks 45.
◦ There is no “analogous” formula for the determinant of matrices 4 × 4 or larger.
◦ There is something like a formula, called the “cofactor expansion”. It is useful for
proving certain theorems. And textbook authors seem to like it. But is has absolutely
no practical relevance for computing.
◦ For all but “toy matrices”, the most efficient way to compute determinants is via row
reduction.

Example 46.
1 −1 5 1 1 −1 5 1 1 −1 5 1
−2 1 −7 1 0 1 −1 −1 0 1 −1 −1
=− =− = −4 . (3.8)
−3 2 −12 −2 0 −1 3 1 0 0 2 0
2 −1 9 1 0 −1 3 3 0 0 0 2

To get the first equality, we performed h2i + 2h1i → h2i, then h3i + 3h1i → h3i, then
h4i−2h1i → h4i, and finally h2i ↔ h4i. To get the second equality, we performed h4i−h3i →
h4i and h3i + h2i → h3i. For the last equality we used Theorem 3.4.
The following shows that a row operation is just left-multiplication by some matrix.
Lemma 3.5. Let R be an elementary row operation acting on matrices with m rows.
Then there exists an m × m matrix R such that R(A) = RA for every matrix A with m
rows. In fact, R is the following matrix.
(i) scaling rhii → hii (r 6= 0)
R = [e1 · · · ei−1 rei ei+1 · · · em ]
(ii) shear hii + chki → hii (k 6= i)
R = [e1 · · · ek−1 ek + cei ek+1 · · · em ]
(iii) exchange hii ↔ hki (k > i)
R = [e1 · · · ei−1 ek ei+1 · · · ek−1 ei ek+1 · · · em ]

Proof. Consider first row operations performed on vectors a ∈ Rm (or m × 1 matrices).


The row hii is just the component ai of a. Looking at the matrices given above, it should
be clear that a 7→ Ra performs the indicated row operation.
Now consider A = [a1 a2 · · · an ] for arbitrary n ≥ 1. Using the same matrix R, we
have
RA = [Ra1 Ra2 · · · Ran ] . (3.9)
Given that R performs the indicated row operation on each column vector aj , it does the
same for the entire matrix A. QED
30 Hans Koch August 28, 2023

Remark 47. Notice that R = R(I). So we know the determinant of R. If R is a scaling


by r, then |R| = |R(I)| = r|I| = r. If R is a shear, then |R| = |R(I)| = |I| = 1. And if R
is a row exchange, then |R| = |R(I)| = −|I| = −1.

Theorem 3.6. If A and B are n × n matrices, then |AB| = |A||B|.

Proof. Consider first the case where |A| = 0. Assume for contradiction that |AB| 6= 0.
Then AB has an inverse (AB)−1 . Now the identity AB(AB)−1 = I shows that A has an
inverse as well, namely B(AB)−1 . But this contradicts |A| = 0, so we must have |AB| = 0.
Next, consider the case where A = I. Then |AB| = |IB| = |B| = 1|B| = |I||B| =
|A||B|, as claimed.
Finally, consider the case |A| 6= 0 and A 6= I. Then there exists a sequence of elemen-
tary row operations R1 , R2 , . . . , Rk that transforms the n × n identity matrix I into the
matrix A. So by Lemma 3.5, we have

A = Rk · · · R2 R1 I , (3.10)

where Rj is the matrix for Rj . Recall that applying Rj to a matrix multiplies the deter-
minant by some number rj . Thus,

|AB| = |(Rk · · · R2 R1 I)B| = |Rk · · · R2 R1 B|


(3.11)
= rk · · · r2 r1 |B| = |Rk · · · R2 R1 I||B| = |A||B| .
QED

1
Corollary 3.7. If |A| 6= 0, then A is invertible, and A−1 = .
|A|

Proof. Assume that |A| 6= 0. Then A is invertible by Theorem 3.3. Since A−1 A = I, we
have A−1 |A| = |I| = 1 by Theorem 3.6. This proves the claim. QED

Theorem 3.8. If A is n × n, then |A⊤ | = |A|.

Proof. First consider the transpose of the matrix R for an elementary row operation
R. The following is straightforward to check. If R is a scaling or a row exchange, then
R⊤ = R. If R is the matrix for a shear hii + chki → hii, then R⊤ is the matrix for the
shear hki + chii → hki. And as observed in Remark 47, the determinant of a shear matrix
is 1. So in all cases, we have |R⊤ | = |R|.
Given an n × n matrix A, consider its representation (3.10). Using Theorem 3.6 and
part (d) of Theorem 2.6, we have

|A⊤ | = (Rk · · · R2 R1 )⊤ = |R1⊤ R2⊤ · · · Rk⊤ |


(3.12)
= |R1⊤ ||R2⊤ | · · · |Rk⊤ | = |Rk | · · · |R2 ||R1 | = |Rk · · · R2 R1 | = |A| ,

as claimed. QED

You might also like