Linear Alg
Linear Alg
The recommended textbook for our class is “Linear Algebra and its Applications”
by D.C. Lay, fourth edition. This book will be referred to as [Lay]. In order to simplify
cross-referencing, the sections and subsections in these notes correspond roughly to those
in [Lay]. Exceptions are marked.
1. Linear equations
The “augmented matrix” on the right is simply an abbreviation for the system on the left,
where we keep just the coefficients and constants.
The task is to find all solutions of this system. We will take advantage of the fact
that the system can be modified without changing the set of solutions, using elementary
row operations.
Definition 1.1. An elementary row operations is one of the following.
(i) Multiply a row (equation) by a nonzero number.
(ii) Add a multiple of one row (equation) to another.
(iii) Exchange two rows (equations).
We will now perform a sequence of such elementary row operations to simplify the
system in a specific way. Starting from the left and moving right, one column after another
will be transformed to a reduced form. (If the entire column is zero, then it is considered
reduced and we move on to the next column.) We will refer to this column as the column
to be reduced. Starting from the top and moving down, one row after another will be
designated to be the pivot row.
At this point the pivot row is the first. The goal is to have a 1 in this row, in the
column (presently the first) to be reduced. Right now the entry is a 3. So we could multiply
1
2 Hans Koch August 28, 2023
the first row by 1/3. To avoid fractions, let us instead add −1 times the second row to the
first. After this step, we have
x1 + 5x2 − 7x3 + 3x4 = 23 , 1 5 −7 3 : 23
2x1 − 4x2 + 14x3 − x4 = −10 , 2 −4 14 −1 : −10
. (1.2)
5x1 + 11x2 − 7x3 + 8x4 = 59 , 5 11 −7 8 : 59
2x1 + 5x2 − 4x3 + 3x4 = 27 , 2 5 −4 3 : 27
The leftmost 1 in the pivot row is called the pivot. (If the original matrix had a zero at
this position, then we would first exchange rows in order to get a nonzero entry into that
position.)
The next goal is to have all entries above and below the pivot (in the same column)
to be zero. This can be done by first adding −2 times the first row to the second, then add
−5 times the first row to the third, and finally add −2 times the first row to the fourth.
After these three steps, we have
x1 + 5x2 − 7x3 + 3x4 = 23 , 1 5 −7 3 : 23
− 14x2 + 28x3 − 7x4 = −56 , 0 −14 28 −7 : −56
. (1.3)
− 14x2 + 28x3 − 7x4 = −56 , 0 −14 28 −7 : −56
− 5x2 + 10x3 − 3x4 = −19 , 0 −5 10 −3 : −19
The first column is now in reduced form. No subsequent row operation will modify it.
Now the second column becomes the one to be reduced. And the new pivot row is the
second, since the first row already has a pivot. So the new pivot position is (2, 2), where
there is presently an entry −14. The goal is to turn this into a 1. But first, let us add −1
times the second row to the third. Then the new third row becomes identically zero. (The
corresponding equation is 0 = 0.) Such rows are meant to be moved to the bottom. So let
us exchange the third row (now all zeros) with the fourth row. We still have a −14 where
the next pivot 1 should be. So let us subtract 3 times the third row from the second. Now
we have
x1 + 5x2 − 7x3 + 3x4 = 23 , 1 5 −7 3 : 23
x2 − 2x3 + 2x4 = 1, 0 1 −2 2 : 1
. (1.4)
− 5x2 + 10x3 − 3x4 = −19 , 0 −5 10 −3 : −19
0 = 0, 0 0 0 0 : 0
Next, all non-pivot entries in the second column need to be converted to zero. This can be
done by adding −5 times the second row to the first, and then adding 5 times the second
row to the third. The resulting matrix (system) is
x1 + 3x3 − 7x4 = 18 , 1 0 3 −7 : 18
x2 − 2x3 + 2x4 = 1, 0 1 −2 2 : 1
. (1.5)
7x4 = −14 , 0 0 0 7 : −14
0 = 0, 0 0 0 0 : 0
Having pivot 1s in the first two rows, the new pivot row becomes the third. The
column to be reduced is the third. But there is no way to get a pivot 1 in this column. So
the third column is considered in reduced form, and the fourth becomes the new column
to be reduced. The entry that needs to be converted to a 1 is 7. So we multiply the third
row by 1/7. The new third row is [0 0 0 1 : −2]. Now add −2 times this row to the second
row, and then add 7 times this row to the first row. The result is
x1 + 3x3 = 4, 1 0 3 0 : 4
x2 − 2x3 = 5, 0 1 −2 0 : 5
. (1.6)
x4 = −2 , 0 0 0 1 : −2
0 = 0, 0 0 0 0 : 0
Definition 1.3. xj is said to be a free variable if the j-th column of the reduced echelon
matrix contains no pivot 1.
In the system (1.6) we have a single free variable, namely x3 . The reason why x3 is
called a free variable becomes clear if we write (1.6) as
x1 = 4 − 3x3 ,
x2 = 5 + 2x3 , (1.7)
x4 = −2 .
We see immediately that the value of x3 can be chosen arbitrarily. After that, the values
of the other variables (those that have a pivot 1 in their column) are determined.
We give the following result without proof.
Theorem 1.4. Every matrix can be transformed to reduced echelon form via elementary
row operations. The reduced echelon form is unique.
The uniqueness part of this theorem allows us to characterize precisely what the
solution sets can be.
Possible solution sets. (The statements refer to the reduced echelon form.)
(1) The system has exactly one solution.
This is the case when every column to the left of “:” has a pivot.
(0) The system has no solution.
This happens if one of the rows is [0 0 . . . 0 : b] with b 6= 0. The corresponding system
includes the equation 0 = b. (In the reduced echelon form we have b = 1. But if we
obtain 0 = b during reduction, with b 6= 0, then there is no point in continuing.)
4 Hans Koch August 28, 2023
Example 3. A system with infinitely many solutions is the one considered at the begin-
ning:
3 1 7 2 : 13 1 0 3 0 : 4
2 −4 14 −1 : −10 0 1 −2 0 : 5
∼ . (1.10)
5 11 −7 8 : 59 0 0 0 1 : −2
2 5 −4 3 : 27 0 0 0 0 : 0
Remark 4. Notice that every elementary row operation can be reversed by another
elementary row operation (of the same type). So if A can be can be transformed into B
via a sequence of elementary row operations, then B can also be transformed into A via a
sequence of elementary row operations.
um vm 0
The set of all such vectors is denoted by Rm . The last vector in this equation is called the
zero vector in Rm . One defines vector addition, multiplication by scalars, and the negative
M340L notes 5
of a vector in Rm as follows:
u1 + v 1 cv1 −v1
u2 + v 2 cv2 −v2
u+v=
.. ,
cv =
... ,
−v =
... .
(1.12)
.
um + v m cvm −vm
−1 3 −2 −3 −5
2 − = + = . (1.13)
3 −2 6 2 8
Notice that these basic vector operations work independently on each component:
the k-th component of the result depends only on the k-th component(s) of the original
vector(s). So one performs basically m independent scalar operations. Using this fact, one
easily verifies the following vector space properties.
Theorem 1.6. For any vectors u, v, w in Rm , and for any real numbers c, d,
(1) u + v = v + u.
(2) (u + v) + w = u + (v + w).
(3) u + 0 = 0 + u = u.
(4) u + (−u) = (−u) + u = 0.
(5) c(u + v) = cu + cv.
(6) (c + d)u = cu + du.
(7) c(du) = (cd)u.
(8) 1u = u.
There are two basic but fundamental concepts in linear algebra. Both concern sets
of vectors. The first is the notion of spanning, and the second in the notion of linear
independence (covered later).
for some numbers x1 , x2 , . . . , xn . Such a vector (1.14) is also called a linear combination
of the vectors in A.
For vectors in R3 , a single nonzero vector spans a line; and when combined with a
6 Hans Koch August 28, 2023
a1
a1
a2
(1.15)
Span {a1 } (a line) Span {a1 , a2 } (a plane)
In order to determine whether some given vector b ∈ R3 lies in the span of {a1 , a2 }, we
have to find numbers x1 and x2 , if possible, such that
x1 a 1 + x2 a 3 = b . (1.18)
4
Say b = 1 . Then the equation for x1 and x2 is
−4
1 2 4
x1 −2 + x2 5 = 1 , (1.19)
−5 6 −4
or
x1 + 2x2 4
−2x1 + 5x2 = 1 , (1.20)
−5x1 + 6x2 −4
M340L notes 7
or
x1 + 2x2 = 4 ,
−2x1 + 5x2 = 1 , (1.21)
−5x1 + 6x2 = −4 .
This system can be solved by reducing the augmented matrix
1 2: 4
[a1 a2 : b] = −2 5 : 1 . (1.22)
−5 6 : −4
is the vector
Ax = x1 a1 + x2 a2 + . . . + xn an ∈ Rm . (1.25)
x1 a1 + x2 a2 + . . . + xn a n = b , (1.28)
which is the same as the system of linear equations represented by the augmented matrix
[a1 a2 · · · an : b] . (1.29)
x1 a1 + x2 a2 + . . . + xn an = Ax = 0 (1.30)
always has the trivial solution x = 0. It has a nontrivial solution x 6= 0 if and only if the
system [a1 a2 · · · an : 0] has a free variable.
1 2 x1 0
Example 8. The equation = only has the solution x = 0, since
2 3 x2 0
12:0 1 2 :0 10:0
∼ ∼ . (1.31)
23:0 0 −1 : 0 01:0
x1
3 1 0 1 0
x2
Example 9. The equation 0 −2 12 −8 = 0 has nontrivial solutions, since
x3
2 −3 22 −14 0
x4
3 1 0 1 :0 1 4 −22 15 : 0 1 0 2 −1 : 0
0 −2 12 −8 : 0 ∼ 0 −2 12 −8 : 0 ∼ . . . ∼ 0 1 −6 4 : 0 . (1.32)
2 −3 22 −14 : 0 0 −11 66 −44 : 0 0 0 0 0 :0
Here we have two free variables, namely x3 and x4 . After choosing arbitrary values x3 = s
and x4 = t for these free variables, we obtain
x1 = −2s + t , −2
1
x2 = 6s − 4t , 6 −4
x = s + t . (1.33)
x3 = s , 1 0
x4 = t . 0 1
M340L notes 9
Proof. Denote by uj and vj the components of u and v, respectively. Then the components
of u + v are uj + vj . By using the vector space properties in Theorem 1.6, we obtain
This proves (a). Concerning (b), notice that the components of cu are cuj . Using again
Theorem 1.6, we obtain
This implies the following important fact about the solution set of homogeneous equa-
tions.
Theorem 1.11. If x = u and x = v are solutions of the homogeneous equation Ax = 0,
then so is every linear combination x = su + tv.
as claimed. QED
Inhomogeneous equations
10 Hans Koch August 28, 2023
Theorem 1.12. Assume that the equation Ax = b has a solution x = p. (The letter “p”
stands for “particular” solution.) Then the solution set of the equation Ax = b is precisely
the set of all vectors x = p+h, where h is a solution of the homogeneous equation Ax = 0.
p + hs Ax = b
p
Ax = 0
hs
(1.38)
M340L notes 11
Proof. Consider A = [a1 a2 · · · an ]. The reduced echelon matrix for A can have at most
n pivots. If n < m then it has fewer than m pivots, meaning that there cannot be a pivot
in every row. QED
1 4
Example 12. The vectors 2 and 5 do not span R3 .
3 6
Proof. Write A = [a1 a2 · · · an ]. The equivalence of (a) and (b) follows from the fact that
the equation x1 a1 + x2 a2 + . . . + xn an = 0 is the same as Ax = 0. And the equivalence of
(b) and (c) follows from the fact that the system [A : 0] has a unique solution if and only
if it has no free variable. QED
Proof. Consider A = [a1 a2 · · · an ]. If n > m then the system Ax = 0 has more variables
than equations. So there must be a free variable; in other words, the equation Ax = 0 has
a nontrivial solution. QED
h i h i h i
Example 15. The vectors 12 , 23 , and 34 are linearly dependent.
2
Then we know theh image
i under
h i T of hevery
i vector x ∈ R . To see why, write x as a linear
combination x = xx1 = x1 10 + x2 01 . Using that T is linear, we find that
2
h i h i h i h i
1 0 1 0
T (x) = T x1 0 + x2 1 = x1 T 0 + x 2 T 1 = x1 a 1 + x2 a 2 . (1.48)
Proof. Set aj = T (ej ) for all j. Here, and in what follows, we assume that 1 ≤ j ≤ n.
By (1.46), we have
T (x) = T (x1 e1 + x2 e2 + . . . + xn en )
= x1 T (e1 ) + x2 T (e2 ) + . . . + xn T (en ) (1.51)
= x1 a1 + x2 a2 + . . . + xn an = Ax .
Remark 19. The spaces Rm for different values of m have no vectors in common. Thus,
whenever confusion could arise, the zero vector in Rm will be denoted by 0m .
Proof. Assume first that T is one-to-one. Then the equation T (x) = b has at most one
solution x ∈ Rn , for any given b ∈ Rm . In the case b = 0m , one solution is x = 0n , so
this must be the only solution.
Conversely, assume that 0n is the only vector in Rn that is mapped to 0m by T .
Consider now the equation T (x) = b for a given b ∈ Rm . If x = u and x = v are both
solutions of this equation, then by linearity, T (u − v) = T (u) − T (v) = b − b = 0m . So
we must have u − v = 0n , or equivalently, u = v. This shows that T is one-to-one. QED
Proof. The claim (a) follows from the Definition 1.8, since the range of T is the span of
the column vectors of A.
Consider now the claim (b). By Theorem 1.22, T is one-to-one if and only if the
equation Ax = 0 has only the trivial solution x = 0, which is the same as saying that the
column vectors of A are linearly independent. QED
Reducing 12 23 shows that the first two columns of A span R2 . So the column vectors of
A span R2 , implying that T is onto. But T is not one-to-one, since three vectors in R2 are
always linearly dependent, by Corollary 1.17.
2. Matrix algebra
bm
A + B = [a1 + b1 a2 + b2 · · · an + bn ] ,
cB = [cb1 cb2 · · · cbn ] , (2.4)
−B = [−b1 − b2 · · · − bn ] ,
where c can be any scalar. Clearly 00 = 0 and (−1)B = −B. One also defines A − B =
A + (−B).
Example 23. Here are some operations with 2 × 2 matrices.
−1 3 3 −2 −2 6 −3 2 −5 8
2 − = + = . (2.5)
2 0 −1 1 4 0 1 −1 5 −1
The operations (2.4) are just vector operations, performed independently on each
column. The j-th column of the resulting matrix only depends on the j-th column(s) of
the original matrix (matrices). Using this fact, one easily verifies the following.
Theorem 2.1. For any m × n matrices A,B,C, and for any scalars c, d,
(1) A + B = B + A.
(2) (A + B) + C = A + (B + C).
(3) A + 0 = 0 + A = A.
(4) A + (−A) = (−A) + A = 0.
(5) c(A + B) = cA + cB.
(6) (c + d)A = cA + dA.
(7) c(dA) = (cd)A.
(8) 1A = A.
We will get back to these vector space properties later. More important at this point
are properties that arise from the fact that matrices represent linear transformations.
Consider a n × p matrix B = [b1 b2 · · · bp ] and a m × n matrix A. The matrix B
defines a transformation from Rp to Rn , and the matrix A defines a transformation from
Rn to Rm . The two transformations can be composed:
B· A·
x ∈ Rp 7−−→ Bx ∈ Rn 7−−→ A(Bx) ∈ Rm . (2.6)
A(Bx) = A(x1 b1 + x2 b2 + . . . + xp bp )
= x1 Ab1 + x2 Ab2 + . . . + xk Abp (2.7)
= [Ab1 Ab2 · · · Abp ]x .
18 Hans Koch August 28, 2023
Identifying a 1 × 1 matrix [c] with the number c, we will also write this as
b1
b2
[a1 a2 · · · an ]
... = a1 b1 + a2 b2 + . . . + an bn .
bn
Next, consider the claim (b). Using the definitions (2.4) and (2.8), together with
Theorem 1.10, we obtain
The remaining identities are equally straightforward to prove, so we leave this task as
an exercise. QED
Warning 26. If AB is well defined, then BA need not be defined. For example, if A is
4 × 2 and B is 2 × 3, then AB is 4 × 3, but BA is not defined. Even if AB and BA are
both defined, they can have different dimensions. For example if A is 3 × 2 and B is 2 × 3,
then AB is 3 × 3 and BA is 2 × 2. Even if AB and BA are both defined and have the
same dimensions, then AB and BA need not agree!
Remark 27. The property (a) in Theorem 2.3 justifies writing ABC in place of A(BC)
or (AB)C.
Proof. Assume first that x 7→ ABx is onto. Let z be any vector in Rn . Then the equation
ABx = z has a solution x ∈ Rp . Setting y = Bx, we have Ay = ABx = z. This shows
that the transformation y 7→ Ay is onto.
Next, assume that x 7→ BCx is one-to-one. Let x ∈ Rq be a solution of Cx = 0p .
Then BCx = B0p = 0n . Since x 7→ BCx is one-to-one, we must have x = 0q . This shows
that the transformation x 7→ Cx is one-to-one. QED
for all positive i ≤ m and j ≤ n. This is completely analogous to the vector operations
defined in (1.12).
First, let us consider the product Ax of an m × n matrix A with a vector x ∈ Rn .
The i-th component of Ax is given by
(Ax)i = (x1 a1 + x2 a2 + . . . + xn an )i
(2.16)
= x1 (a1 )i + x2 (a2 )i + . . . + xn (an )i .
In other words, the (i, j) entry of AB is the product of the i-th row of A with the j-th
column of B.
M340L notes 21
Ak = AA · · · A (k factors) . (2.24)
T T T T
x 7−→ Ax 7−→ A2 x 7−→ . . . 7−→ Ak x . (2.25)
Definition 2.5. The transpose of a m×n matrix A is the n×m matrix A⊤ whose elements
are (A⊤ )i,j = Aj,i for all positive i ≤ n and j ≤ m.
Example 30.
1 2
135
A= , A⊤ = 3 4 . (2.28)
246
5 6
Remark 31. It will become clear later what the significance of the transpose is. For
now, we use the transpose mainly in examples, and to simplify some notation.
Theorem 2.6. If A and C are any m × n matrices, B is any n × p matrix, and c any
scalar, then
(a) (A⊤ )⊤ = A.
(b) (A + C)⊤ = A⊤ + C ⊤ .
(c) (cA)⊤ = cA⊤ .
22 Hans Koch August 28, 2023
(d) (AB)⊤ = B ⊤ A⊤
Proof. The first three properties are straightforward to check and intuitively clear, so we
leave them as exercises.
Consider now (d). For all positive i ≤ p and j ≤ m, we have
Proof. We already know from Theorem 1.24 that (a) is equivalent to (c) and that (b) is
equivalent to (d). The equivalence of (e) and (f ) will be proved later in this section.
In order to show that (c), (d) and (e) are equivalent, let B be the reduced echelon form
of A. By Theorem 1.13, the column vectors of A span Rm if and only if B has a pivot in
every row. And by Theorem 1.16, the column vectors of A are linearly independent if and
only if B has a pivot in every column. But B has the same number of rows as columns,
so having a pivot in each column is equivalent to having a pivot in each row. And in both
cases, B = I. QED
Before we can define “the inverse” of an invertible matrix, we need to prove that there
cannot be more than one.
Proposition 2.9. There is at most one matrix C that satisfies (2.32).
QED
Example 33. The matrix A in Example 18 satisfies A3 = I, as can be seen from the
corresponding transformation T . Thus, it is invertible, and A−1 = A2 .
Notice that the products CA and AC need not agree for general n × n matrices. But
if one is the identity matrix, then so is the other:
Lemma 2.11. Let A and C be n × n matrices. Then CA = I if and only if AC = I.
Proof. Assume first that AC = I. Then x → 7 ACx is the identity map on Rn , and in
particular, it is onto. So x →
7 Ax is onto, by Lemma 2.4. And it is one-to-one as well, by
Theorem 2.8.
Using Theorems 2.1 and 2.3, we have
for every x ∈ Rn . Since x 7→ Ax is one-to-one, this implies that (CA − I)x = 0 for every
x ∈ Rn . This in turn implies that CA−I is the zero matrix, since the linear transformation
24 Hans Koch August 28, 2023
that maps every vector to 0 is represented by the zero matrix. But CA − I = 0 implies
CA = I.
The above proves that AC = I implies CA = I. The converse is obtained by exchang-
ing the matrices A and C. QED
Proof. Consider first the claim (1). Assume that ad−bc = 0. If A = 0 then A is clearly not
invertible, since there is no
h 2×2
i matrix
h i C such that CA = I. Consider now the case A 6= 0.
d
Then one of the vectors −c or −b a is nonzero. Let x be one of the two that is nonzero.
An explicit computation shows that Ax = 0. But then (CA)x = C(Ax) = C0 = 0 holds
for any 2 × 2 matrix C. So we cannot have CA = I; otherwise (CA)x = Ix = x would be
nonzero.
Verifying claim (2) is a trivial computation: define C to be the right hand side of
(2.36), and then check that AC = I. QED
Remark 34. (basic logic) The fact that the formula (2.36) makes no sense if ad − bc = 0
does not prove (1).
Here are a few basic facts about inverses.
Theorem 2.13. Let A be an invertible n × n matrix. Then for every b ∈ Rn , the equation
Ax = b has a unique solution, namely x = A−1 b.
This theorem is very useful in practice, if the equation Ax = b has to be solved for
many different vectors b. If A is invertible and A−1 is known, then it is much faster to
compute A−1 b than to solve Ax = b.
Theorem 2.14. Let A and B be invertible n × n matrices. Then
−1 −1 −1
(a) A is invertible, and A = A.
(b) AB is invertible, and (AB)−1 = B −1A−1 .
⊤
(c) A⊤ is invertible, and (A⊤ )−1 = A−1 .
Proof. The claim (a) follows from (2.34). The following proves (b).
(AB) B −1 A−1 = A B B −1 A−1 = A BB −1 A−1 = A IA−1 = AA−1 = I . (2.37)
M340L notes 25
QED
Remark 35. If the expression in (b) for the inverse of a product looks unfamiliar, then
the following may help to understand it:
B A
x 7−−→ Bx 7−−→ z = ABx
B −1 A−1 . (2.39)
x= B −1 A−1 z ←−−7 A−1 z ←−−7 z
AC = I ,
(2.40)
[Ac1 Ac1 · · · Acn ] = [e1 e2 · · · en ] .
Equivalently,
Ac1 = e1 Ac2 = e2 ... Acn = en . (2.41)
If A is row equivalent to the identity, then we can solve via row reduction:
[A : e1 ] [A : e2 ] ... [A : en ]
↓ ↓ ↓ (2.42)
[ I : c1 ] [ I : c2 ] ... [ I : cn ] .
Notice that the choice of row operations in the Gauss-Jordan algorithm only depends on
the part to the left of “:” in the augmented matrix. So all n reductions in (2.42) use the
same sequence of elementary row operations. This allows us to combine them all into one:
[A : I ]
↓ (2.43)
[ I : C] .
Given that the resulting matrix C is a solution of the equation AC = I, we find that A is
invertible, and that A−1 = C.
Example 36. With a bit of work one finds
" # " #
1 −4 1 : 1 0 0 1 0 0 : 3 5 7
1 1 −2 : 0 1 0 ∼ ··· ∼ 0 1 0 : 1 2 3 .
−1 1 1 :0 0 1 0 0 1 : 2 3 5 (2.44)
| {z } | {z }
A A−1
26 Hans Koch August 28, 2023
3. Determinants
This part is short. Determinants are interesting but of limited practical use.
α1 α1 (3.2)
M340L notes 27
Third, if one of the vectors is scaled by a constant r, say rαi → αi , then the volume should
change by a factor |r|.
These are precisely the elementary row operations from Definition 1.1. This motivates
the following definition. In order to simplify notation, we write hii for the i-th row of A.
Definition 3.1. Denote by Rn×n the set of all n × n matrices. A function det : Rn×n → R
is called a determinant if it satisfies det(I) = 1 and transforms as follows under scalings,
shears, and row exchanges:
(i) scaling rhii → hii
det(new matrix) = r det(old matrix)
(ii) shear hii + chki → hii (k 6= i)
det(new matrix) = det(old matrix)
(iii) exchange hii ↔ hki (k 6= i)
det(new matrix) = − det(old matrix)
Remark 37. The requirement det(I) = 1 means that the “unit cube” in Rn defined by
the vectors e1 , e2 , . . . , en has volume 1.
Remark 38. In this definition, r can be any scalar, including zero. However, the scaling
in (i) is an elementary row operation only if r 6= 0. So when a matrix is being reduced via
elementary row operations, then the determinant cannot change from nonzero to zero, or
from zero to nonzero.
We state the following result without proof. (The existence part is clear, but the
uniqueness part requires some work.)
Theorem 3.2. There exists exactly one determinant function on Rn×n .
Remark 39. The obvious way to compute det(A) is via row reduction. While reducing
the matrix A, all we need to do is to keep track of the factors arising from (i) and (iii).
Remark 41. If A has a zero row, then det(A) = 0. Namely, if hii is zero, then a scaling
chii → hii with c = 0 does not change A, while the determinant gets multiplied by 0.
Proof. If A ∼ B then |A| and |B| are either both zero or both nonzero. Let now B be the
reduced echelon form matrix for A. If B = In , then |B| = 1 and thus |A| 6= 0. If B 6= In ,
then B has a zero row, implying |B| = 0 and thus |A| = 0. QED
28 Hans Koch August 28, 2023
Theorem 3.4. If A is an upper triangular matrix (meaning that Ai,j = 0 when i > j),
then
det(A) = A1,1 A2,2 · · · An,n . (3.3)
Proof. If Ai,i 6= 0 for all i, then by scaling one row after another, we obtain
The last determinant in this equation has the value 1, since the matrix can be reduced to
the identity via row operations of type (ii), which do not change the determinant.
If Ai,i = 0 for some i, then A is not reducible to the identity matrix, so |A| = 0 in
this case. QED
Remark 42. The above theorem shows that, in order to compute the determinant of a
general square matrix A, it suffices to reduce A to upper triangular form.
h i
Example 43. Consider A = ac db . The following shows that det(A) = ad − bc.
If a = c = 0, then A is not row equivalent to the identity matrix, so |A| = 0 = ad − bc.
Assume now that a 6= 0 or c 6= 0. If a 6= 0, then a shear h2i − ac h1i → h2i transforms
a b a′ b
A= −→ A = , (3.5)
c d 0 d − bc/a
so we have |A| = |A′ | = a(d − bc/a) = ad − bc. If c 6= 0, then a row exchange h1i ↔ h2i
followed by a shear h2i − ac h1i → h2i transforms
a b c′ d c′′ 0
A= −→ A = −→ A = , (3.6)
c d a b 0 b − ad/c
Example 44. With some more work than in the 2 × 2 case, one finds that
a b c
d e f = aei + bf g + cdh − ceg − bdi − af h . (3.7)
g h i
Remarks 45.
◦ There is no “analogous” formula for the determinant of matrices 4 × 4 or larger.
◦ There is something like a formula, called the “cofactor expansion”. It is useful for
proving certain theorems. And textbook authors seem to like it. But is has absolutely
no practical relevance for computing.
◦ For all but “toy matrices”, the most efficient way to compute determinants is via row
reduction.
Example 46.
1 −1 5 1 1 −1 5 1 1 −1 5 1
−2 1 −7 1 0 1 −1 −1 0 1 −1 −1
=− =− = −4 . (3.8)
−3 2 −12 −2 0 −1 3 1 0 0 2 0
2 −1 9 1 0 −1 3 3 0 0 0 2
To get the first equality, we performed h2i + 2h1i → h2i, then h3i + 3h1i → h3i, then
h4i−2h1i → h4i, and finally h2i ↔ h4i. To get the second equality, we performed h4i−h3i →
h4i and h3i + h2i → h3i. For the last equality we used Theorem 3.4.
The following shows that a row operation is just left-multiplication by some matrix.
Lemma 3.5. Let R be an elementary row operation acting on matrices with m rows.
Then there exists an m × m matrix R such that R(A) = RA for every matrix A with m
rows. In fact, R is the following matrix.
(i) scaling rhii → hii (r 6= 0)
R = [e1 · · · ei−1 rei ei+1 · · · em ]
(ii) shear hii + chki → hii (k 6= i)
R = [e1 · · · ek−1 ek + cei ek+1 · · · em ]
(iii) exchange hii ↔ hki (k > i)
R = [e1 · · · ei−1 ek ei+1 · · · ek−1 ei ek+1 · · · em ]
Proof. Consider first the case where |A| = 0. Assume for contradiction that |AB| 6= 0.
Then AB has an inverse (AB)−1 . Now the identity AB(AB)−1 = I shows that A has an
inverse as well, namely B(AB)−1 . But this contradicts |A| = 0, so we must have |AB| = 0.
Next, consider the case where A = I. Then |AB| = |IB| = |B| = 1|B| = |I||B| =
|A||B|, as claimed.
Finally, consider the case |A| 6= 0 and A 6= I. Then there exists a sequence of elemen-
tary row operations R1 , R2 , . . . , Rk that transforms the n × n identity matrix I into the
matrix A. So by Lemma 3.5, we have
A = Rk · · · R2 R1 I , (3.10)
where Rj is the matrix for Rj . Recall that applying Rj to a matrix multiplies the deter-
minant by some number rj . Thus,
1
Corollary 3.7. If |A| 6= 0, then A is invertible, and A−1 = .
|A|
Proof. Assume that |A| 6= 0. Then A is invertible by Theorem 3.3. Since A−1 A = I, we
have A−1 |A| = |I| = 1 by Theorem 3.6. This proves the claim. QED
Proof. First consider the transpose of the matrix R for an elementary row operation
R. The following is straightforward to check. If R is a scaling or a row exchange, then
R⊤ = R. If R is the matrix for a shear hii + chki → hii, then R⊤ is the matrix for the
shear hki + chii → hki. And as observed in Remark 47, the determinant of a shear matrix
is 1. So in all cases, we have |R⊤ | = |R|.
Given an n × n matrix A, consider its representation (3.10). Using Theorem 3.6 and
part (d) of Theorem 2.6, we have
as claimed. QED