3108 Slides
3108 Slides
1 / 323
Contents.
I Chapter 1. Slides 3–70
I Chapter 2. Slides 71–118
I Chapter 3. Slides 119–136
I Chapter 4. Slides 137–190
I Chapter 5. Slides 191–234
I Chapter 6. Slides 235–287
I Chapter 7. Slides 288–323
2 / 323
Chapter 1. Linear Equations in Linear Algebra
3 / 323
A linear system of m equations with n unknowns can be
represented by a matrix with m rows and n + 1 columns:
The system
a11 x1 + · · · + a1n xn = b1
.. .. ..
. . .
am1 x1 + · · · + amn xn = bm
4 / 323
I A is the m × n coefficient matrix:
a11 · · · a1n
A = ... ..
.
am1 · · · amn
5 / 323
Example. The 2x2 system
3x + 4y = 5
6x + 7y = 8
6 / 323
To solve linear systems, we manipulate and combine the individual
equations (in such a way that the solution set of the system is
preserved) until we arrive at a simple enough form that we can
determine the solution set.
7 / 323
Example. Let us solve
3x + 4y = 5
6x + 7y = 8.
3x + 4y = 5
0x − y = −2.
3x + 0y = −3
0x − y = −2.
1
Multiply the first equation by 3 and the second by −1:
x + 0y = −1
0x + y = 2.
8 / 323
Example. (continued) We have transformed the linear system
3x + 4y = 5 x + 0y = −1
into
6x + 7y = 8 0x + y = 2
9 / 323
The manipulations used to solve the linear system above
correspond to elementary row operations on the augmented
matrix for the system.
Elementary row operations.
I Replacement: replace a row by the sum of itself and a
multiple of another row.
I Interchange: interchange two rows.
I Scaling: multiply all entries in a row by a nonzero constant.
Row operations do not change the solution set for the
associated linear system.
10 / 323
Example. (revisited)
3 4 5 R2 7→−2R1 +R2 3 4 5
−−−−−−−−−→
6 7 8 0 −1 −2
R1 7→4R2 +R1 3 0 −3
−
−−−−−−−
→
0 −1 −2
R1 7→ 13 R1 1 0 −1
−−−−−→
0 −1 −2
R 7→−R
2 2 1 0 −1
−− −−−→ .
0 1 2
(i) it is simple to determine the solution set for the last matrix
(ii) row operations preserve the solution set.
11 / 323
It is always possible to apply a series of row reductions to put an
augmented matrix into echelon form or reduced echelon form,
from which it is simple to discern the solution set.
Echelon form:
I Nonzero rows are above any row of zeros.
I The leading entry (first nonzero element) of each row is in a
column to the right of the leading entry of the row above it.
I All entries in a column below a leading entry are zeros.
Reduced echelon form: (two additional conditions)
I The leading entry of each nonzero row equals 1.
I Each leading 1 is the only nonzero entry in its column.
12 / 323
Examples.
3 −9 12 −9 6 15
0 2 −4 4 2 −6 not in echelon from
0 3 −6 6 4 −5
3 −9 12 −9 6 15
0 2 −4 4 2 −6 echelon form, not reduced
0 0 0 0 1 4
1 0 2 3 0 −24
0 1 −2 2 0 −7 reduced echelon form
0 0 0 0 1 4
13 / 323
Remark. Every matrix can be put into reduced echelon form in a
unique manner.
Definition.
A pivot position in a matrix is a location that corresponds to a
leading 1 in its reduced echelon form.
A pivot column is a column that contains a pivot position.
Remark. Pivot positions lie in columns corresponding to
dependent variables for the associated systems.
14 / 323
Row Reduction Algorithm.
1. Begin with the leftmost column; if necessary, interchange rows
to put a nonzero entry in the first row.
2. Use row replacement to create zeros below the pivot.
3. Repeat steps 1. and 2. with the sub-matrix obtained by
removing the first column and first row. Repeat the process
until there are no more nonzero rows.
15 / 323
Example.
3 −9 12 −9 6 15
3 −7 8 −5 8 9
0 3 −6 6 4 −5
3 −9 12 −9 6 15
R2 7→−R1 +R2
−−−−−−−−→ 0 2 −4 4 2 −6
0 3 −6 6 4 −5
R3 7→− 3 R2 +R3
3 −9 12 −9 6 15
−−−−−2−−−−→ 0 2 −4 4 2 −6
0 0 0 0 1 4
16 / 323
Example. (continued)
3 −9 12 −9 6 15 3 −9 12 −9 0 −9
0 2 −4 4 2 −6 →
− 0 2 −4 4 0 −14
0 0 0 0 1 4 0 0 0 0 1 4
3 −9 12 −9 0 −9
→
− 0 1 −2 2 0 −7
0 0 0 0 1 4
3 0 −6 9 0 −72
→
− 0 1 −2 2 0 −7
0 0 0 0 1 4
1 0 −2 3 0 −24
→
− 0 1 −2 2 0 −7 .
0 0 0 0 1 4
17 / 323
Solving systems.
I Find the augmented matrix [A|b] for the given linear system.
I Put the augmented matrix into reduced echelon form [A0 |b 0 ]
I Find solutions to the system associated to [A0 |b 0 ]. Express
dependent variables in terms of free variables if necessary.
18 / 323
Example 1. The system
2x − 4y + 4z = 6
x − 2y + 2z = 3
x − y + 0z = 2
2 −4 4 6 1 0 −2 1
− 1 −2 2
→ 3 →− 0 1 −2 −1
1 −1 0 2 0 0 0 0
x − 2z = 1
→
−
y − 2z = −1.
The solution set is
19 / 323
Example 2. The system
2x − 4y + 4z = 6
x − 2y + 2z = 4
x − y + 0z = 2
2 −4 4 6 1 0 −2 1
→ 1 −2 2 4 →− 0 1 −2 −1
1 −1 0 2 0 0 0 1
x − 2z = 1
→ y − 2z = −1,
0 = 1.
21 / 323
Chapter 1. Linear Equations in Linear Algebra
22 / 323
A matrix with one column or one row is called a vector, for
example
1
2 or 1 2 3 .
3
By using vector arithmetic, for example
1 4 α + 4β
α 2 + β 5 = 2α + 5β ,
3 6 3α + 6β
23 / 323
The linear system
x + 2y + 3z = 4
5x + 6y + 7z = 8
9x + 10y + 11z = 12
24 / 323
Geometric interpretation.
The solution set S may be interpreted in different ways:
I S consists of the points of intersection of the three planes
x + 2y + 3z = 4
5x + 6y + 7z = 8
9x + 10y + 11z = 12.
25 / 323
Linear combinations and the span.
The set of linear combinations of the vectors v1 , . . . vn is called the
span of these vectors:
A vector equation
x v1 + y v2 = v3
is consistent (that is, has solutions) if and only if
v3 ∈ span{v1 , v2 }.
26 / 323
Example. Determine whether or not
1 1 1 1
1 ∈ span 2 , 3 , 4 .
1 3 4 5
x +y +z =1
2x + 3y + 4z = 1
3x + 4y + 5z = 1.
27 / 323
The augmented matrix is
1 1 1 1
2 3 4 1 .
3 4 5 1
28 / 323
Geometric description of span.
Let
0 0 1
S = span 1 , T = span 1 , 0 .
1 1 1
Then
I S is the line through the points (0, 0, 0) and (0, 1, 1).
I T is the plane through the points (0, 0, 0), (0, 1, 1), and
(1, 0, 1).
29 / 323
Geometric description of span. (continued)
Write
0 1
v1 = 1 , v2 = 0 .
1 1
The following are equivalent:
I Is v3 spanned by v1 and v2 ?
I Can v3 be written as a linear combination of v1 and v2 ?
I Is v3 in the plane containing the vectors v1 and v2 ?
30 / 323
Cartesian equation for span.
Recall the definition of the plane T above.
A point (x, y , z) belongs to T when the following vector equation
is consistent:
0 1 x
α 1 + β 0 = y .
1 1 z
The augmented matrix and its reduced echelon form are as follows:
0 1 x 1 0 y
1 0 y ∼ 0 1 x .
1 1 z 0 0 z −x −y
0 = z − x − y.
31 / 323
Matrix equations. Consider a matrix of the form
A = a1 a2 a3 ,
32 / 323
Example. (Revisited) The system
x +y +z =1
2x + 3y + 4z = 1
3x + 4y + 5z = 1
33 / 323
Question. When does the vector equation AX = b have a solution
for every b ∈ Rm ?
Answer. When the columns of A span Rm .
An equivalent condition is the following: the reduced echelon form
of A has a pivot position in every row.
To illustrate this, we study a non-example:
34 / 323
Non-example. Let
1 1 1 1 0 −1
reduced echelon form
A= 2
3 4 −−−−−−−−−−−−→ 0 1 2 .
3 4 5 0 0 0
f3 (b1 , b2 , b3 ) = 0
35 / 323
If instead the reduced echelon form of A had a pivot in any row,
then we could use the reduced echelon form for the augmented
system to find a solution to AX = b.
36 / 323
Chapter 1. Linear Equations in Linear Algebra
37 / 323
The system of equations AX = b is
I homogeneous if b = 0,
I inhomogeneous if b 6= 0.
For homogeneous systems:
I The augmented matrix for a homogeneous system has a
column of zeros.
I Elementary row operations will not change this column.
Thus, for homogeneous systems it is sufficient to work with the
coefficient matrix alone.
38 / 323
Example.
2x − 4y + 4z =6 2 −4 4 6 1 0 −2 1
x − 2y + 2z = 3 → 1 −2 2 3 ∼ 0 1 −2 −1 .
x −y =2 1 −1 0 2 0 0 0 0
x = 1 + 2z, y = −1 + 2z, z ∈ R.
2x − 4y + 4z =0
x − 2y + 2z =0 → x = 2z, y = 2z, z ∈ R.
x −y =0
39 / 323
The solution set for the previous inhomogeneous system AX = b
can be represented in parametric vector form:
2x − 4y + 4z =6
x = 1 + 2z
x − 2y + 2z =3 →
y = −1 + 2z
x −y =2
x 1 2
→X= y = −1 + z 2 , z ∈ R.
z 0 1
40 / 323
Example. Express the solution set for AX = b in parametric
vector form, where
1 1 1 −1 −1 −1
[A|b] = 1 −1 0 2 0 2
0 0 2 −2 −2 2
x1 = −2 − x2 + 2x4 , x3 = 1 − x4 + x5 , x2 , x4 , x5 ∈ R.
41 / 323
Example. (continued) In parametric form, the solution set is given
by
" −2 − x2 + 2x4 # " −2 # " −1 # " 2 # " 0 #
x2 0 1 0 0
X= 1 − x4 + x5 = 1 + x2 0 + x4 −1 + x5 1 ,
x4 0 0 1 0
x5 0 0 0 1
where x2 , x4 , x5 ∈ R.
To solve the corresponding homogeneous system, simply erase the
first vector.
42 / 323
General form of solution sets. If Xp is any particular solution to
AX = b, then any other solutions to AX = b may be written in the
form
X = Xp + Xh ,
where Xh is some solution to AX = 0.
Indeed, given any solution X,
A(X − Xp ) = AX − AXp = b − b = 0,
43 / 323
Example. (Line example) Suppose the solution set of AX = b is a
line passing through the points
X = q + tv, t ∈ R.
X = (1 − t)q + tp, t ∈ R.
44 / 323
Example. (Plane example) Suppose the solution set of AX = b is
a plane passing through
v1 = p − q, v2 = p − r.
X = p + t1 v1 + t2 v2 , t1 , t2 ∈ R.
45 / 323
Chapter 1. Linear Equations in Linear Algebra
46 / 323
Definition. A set of vectors
S = {v1 , . . . , vn }
is (linearly) independent if
x1 v1 + · · · + xn vn = 0 =⇒ x1 = · · · = xn = 0
for any x1 , . . . , xn ∈ R.
Equivalently, S is independent if the only solution to AX = 0 is
X = 0, where A = [v1 · · · vn ].
Otherwise, we call S (linearly) dependent.
47 / 323
Example. Let
1 4 7 1 4 7
" # " # " # " #
v1 = 2 , v2 = 5 , v3 = 8 , A := 2 5 8 .
3 6 9 3 6 9
Then
1 0 −1
A∼ 0 1 2
0 0 0
In particular, the equation AX = 0 has a nontrivial solution set,
namely
1
X = z −2 , z ∈ R.
1
Thus the vectors are dependent.
48 / 323
Dependence has another useful characterization:
The vectors {v1 , . . . , vn } are dependent if and only if (at least) one
of the vectors can be written as a linear combination of the others.
Continuing from the previous example, we found that
1
AX = 0, where X = −2
1
49 / 323
Some special cases.
I If S = {v1 , v2 }, then S is dependent if and only if v1 is a
scalar multiple of v2 (if and only if v1 and v2 are co-linear).
I If 0 ∈ S, then S is always dependent. Indeed, if
S = {0, v1 , · · · , vn },
0 = 1 · 0 + 0v1 + · · · + 0vn .
50 / 323
Pivot columns. Consider
1 2 3 4 1 2 0 −2
" # " #
A = [v1 v2 v3 v4 ] = −2 −4 −5 −6 ∼ 0 0 1 2 .
3 6 7 8 0 0 0 0
v1 − 2v2 + v3 = 0.
52 / 323
Example. (continued)
h 1 1 1 1
i
A= 1 2 3 5
2 3 4 5
53 / 323
Chapter 1. Linear Equations in Linear Algebra
54 / 323
Definition. A linear transformation from Rn to Rm is a function
T : Rn → Rm such that
I T (u + v) = T (u) + T (v) for all u, v ∈ Rn ,
I T (αv) = αT (v) for all v ∈ Rn and α ∈ R.
Note that for any linear transformation, we necessarily have
55 / 323
Definition. Let T : Rn → Rm be a linear transformation. The
range of T is the set
R(T ) := {T (X) : X ∈ Rn }.
56 / 323
Example. Determine if b is in the range of T (X) = AX, where
0 1 2 3
" # " #
A= 3 0 4 , b= 7 .
5 6 0 11
1
" #
T (X) = b, where X= 1 .
1
57 / 323
Example. Determine if T (X) = AX is onto, where
0 1 2
" #
A= 2 3 4 .
3 2 1
1 0 −1
" #
A∼ 0 1 2 .
0 0 0
Thus T is not onto.
58 / 323
Example. (Continued) In fact, by performing row reduction on
[A|b] we can describe R(T ) explicitly:
3 1
0 1 2 b1 1 0 −1 − 2 b1 + 2 b2
" # " #
2 3 4 b2 ∼ 0 1 2 b1
5 3
3 2 1 b3 0 0 0 2 b1 − 2 b2 + b3
Thus
R(T ) = {b ∈ R3 : 25 b1 − 32 b2 + b3 = 0}.
59 / 323
Definition. A linear transformation T : Rm → Rn is one-to-one
(or injective) if
T (X) = 0 =⇒ X = 0.
More generally, a function f is one-to-one if
f (x) = f (y ) =⇒ x = y .
for each b, the solution set for T (X) = b has at most one element.
60 / 323
Example. Let T (X) = AX, where
1 2 3 4 1 0 0 −1
" # " #
A= 4 3 2 1 ∼ 0 1 0 1
1 3 2 4 0 0 1 1
61 / 323
Summary.
For a matrix transformation T (X) = AX.
I Let B denote the reduced echelon form of A.
I T is onto if and only if B has a pivot in every row.
I T is one-to-one if and only if B has a pivot in every column.
62 / 323
Matrix representations.
I Not all linear transformations are matrix transformations.
I However, each linear transformation T : Rn → Rm has a
matrix representation.
Let T : Rn → Rm . Let {e1 , . . . , en } denote the standard basis
vectors in Rn , e.g. 1
0
e1 = .
.
∈ Rn .
.
0
63 / 323
Matrix representations. Suppose T : Rn → Rm is linear,
" #
x1
.
[T ] = [T (e1 ) · · · T (en )], X= .
.
= x1 e1 + · · · + xn en .
xn
By linearity,
Then
1 2 3
2 2 2
[T ] = 3 3 3 .
4 4 4
64 / 323
Matrix representations.
If T : Rn → Rm is linear, then [T ] ∈ Rm×n , and so:
I [T ] ∈ Rm×n has m rows and n columns.
I T onto ⇐⇒ [T ] has pivot in every row.
I T one-to-one ⇐⇒ [T ] has pivot in every column.
I If m > n, then T cannot be onto.
I If m < n, then T cannot be one-to-one.
65 / 323
Linear transformations of the plane R2 .
Suppose T : R2 → R2 is linear. Then
66 / 323
Example. (Shear) Let λ ∈ R and consider
1 λ
[T ] = , T yx = x + λy
y .
0 1
Then
1
1
[T ] 0 = 0 ,
0
[T ] 1 = λ1 ,
0
−λ
[T ] −1 = −1 .
67 / 323
Example. (Reflection across the line y = x)
Let
0 1 x y
T (X) = = .
1 0 y x
Note
1
0
0
[T ] 0 = 1 , [T ] 1 = 10 ,
−2
2
1
−1
[T ] 1 = 2 , [T ] −1 = −2 .
68 / 323
Example. (Rotation by angle θ) Let
cos θ − sin θ x
T (X) = .
sin θ cos θ y
Then
1
cos θ
[T ] 0 = sin θ ,
− sin θ
0
[T ] 1 = cos θ .
69 / 323
Example. (Composition) Let us now construct T that
(i) reflects about the y-axis (x = 0) and then
(ii) reflects about y = x.
−1 0 0 1
(i) [T1 ] = (ii) [T2 ] = .
0 1 1 0
that is,
−1 −x
0 1
0
x
0 1
y
T (X) = 1 0 0 1 y = 1 0 y = −x
70 / 323
Chapter 2. Matrix Algebra
71 / 323
Addition and scalar multiplication of matrices.
Let A, B ∈ Rm×n with entries Aij , Bij and let α ∈ R.
We define A ± B and αA by specifying the ij th entry:
Example.
1 2 6 7 31 37
+5 =
3 4 8 9 43 49
72 / 323
Matrix multiplication. Let A ∈ Rm×r and B ∈ Rr ×n have entries
aij , bij .
The matrix product AB ∈ Rm×n is defined via its ij th entry:
r
X
(AB)ij = aik bkj .
k=1
73 / 323
Matrix multiplication. (Continued) If we view
a1
then
(AB)ij = ai · bj .
We may also write
74 / 323
Example. Let
1 2
1 2 3
A= ∈ R2×3 , B = 3 4 ∈ R3×2 .
4 5 6
5 6
Then
9 12 15
22 28
AB = , BA = 19 26 33 .
49 64
29 40 51
Remark. You should not expect AB = BA in general.
Can you think of any examples for which AB = BA does hold?
75 / 323
Definition. The identity matrix In ∈ Rn×n is given by
(
1 i =j
(In )ij = .
0 i 6= j
76 / 323
Definition. If A ∈ Rm×n has ij th entry aij , then the matrix
transpose (or transposition) of A is the matrix AT ∈ Rn×m with
ij th entry aji .
One also writes AT = A0 .
Example.
1 4
" #
1 2 3 T
A= =⇒ A = 2 5 .
4 5 6
3 6
77 / 323
Proof of the last property.
X
(AB)T
ij = (AB)ji = ajk bki .
k
X
(B T AT )ij = (B T )ik (AT )kj
k
X
= bki ajk .
k
Thus (AB)T = B T AT .
78 / 323
Example. The transpose of a row vector is a column vector.
Let
1 3
a= , b= .
−2 −4
Then a, b ∈ R2×1 (column vectors), aT , bT ∈ R1×2 (row vectors):
T T 3 −4
a b = 11, ab = .
−6 8
79 / 323
Key fact. If T1 : Rm → Rn and T2 : Rn → Rk are linear
transformations, then the matrix representation of the composition
is given by
[T2 ◦ T1 ] = [T2 ][T1 ].
Remark. The dimensions are correct:
I T2 ◦ T1 : Rm → Rk .
I [T2 ◦ T1 ] ∈ Rk×m
I [T1 ] ∈ Rn×m
I [T2 ] ∈ Rk×n
I [T2 ][T1 ] ∈ Rk×m .
For matrix transformations, this is clear: if T1 (x) = Ax and
T2 (x) = Bx, then
80 / 323
Example. Recall that rotation by θ in R2 is given by
cos θ − sin θ
[Tθ ] = .
sin θ cos θ
Thus rotation by 2θ is
cos 2θ − sin 2θ
[T2θ ] = .
sin 2θ cos 2θ
81 / 323
Proof of the key fact. Recall that for T : Rn → Rm :
So
82 / 323
Chapter 2. Matrix Algebra
83 / 323
Definition. Let A ∈ Rn×n (square matrix). We call B ∈ Rn×n an
inverse of A if
AB = BA = In .
Remark. If A has an inverse, then it is unique. Proof. Suppose
AB = BA = In and AC = CA = In .
Then
B = BIn = BAC = In C = C .
If A has an inverse, then we denote it by A−1 . Note (A−1 )−1 = A.
Remark. If A, B ∈ Rn×n are invertible, then AB is invertible.
Indeed,
(AB)−1 = B −1 A−1 .
84 / 323
Example. If
1 2 −1 −2 1
A= , then A = 3 .
3 4 2 − 21
85 / 323
Questions.
1. When does A ∈ Rn×n have an inverse?
2. If A has an inverse, how do we compute it?
Note that if A is invertible (has an inverse), then:
I Ax = b has a solution for every b (namely, x = A−1 b).
Equivalently, A has a pivot in every row.
I If Ax = 0, then x = A−1 0 = 0. Thus the columns of A are
independent.
Equivalently, A has a pivot in every column.
Conversely, we will show that if A has a pivot in every column or
row, then A is invertible.
Thus all of the above conditions are equivalent.
86 / 323
Goal. If A has a pivot in every column, then A is invertible.
Since A is square, this is equivalent to saying that if the reduced
echelon form of A is In , then A is invertible.
Key observation. Elementary row operations correspond to
multiplication by an invertible matrix. (See below.)
With this observation, our hypothesis means that
Ek · · · E1 A = In
In particular, A is invertible.
Furthermore, this computes the inverse of A. Indeed,
A−1 = Ek · · · E1 .
87 / 323
It remains to show that elementary row operations correspond to
multiplication by a invertible matrix (known as elementary
matrices).
In fact, to write down the corresponding elementary matrix, one
simply applies the row operation to In .
Remark. A does not need to be square; the following works for
any A ∈ Rn×m .
For concreteness, consider the 3x3 case.
88 / 323
I “Multiply row one by non-zero α ∈ R” corresponds to
multiplication by
α 0 0
" #
E= 0 1 0 .
0 0 1
Indeed,
α 0 0 1 2 3 α 2α 3α
" #" # " #
0 1 0 4 5 6 = 4 5 6
0 0 1 7 8 9 7 8 9
89 / 323
I “Interchange rows one and two” corresponds to multiplication
by
0 1 0
" #
E= 1 0 0 .
0 0 1
Indeed,
0 1 0 1 2 3 4 5 6
" #" # " #
1 0 0 4 5 6 = 1 2 3 .
0 0 1 7 8 9 7 8 9
90 / 323
I “Multiply row three by α and add it to row two” corresponds
to multiplication by
1 0 0
" #
E= 0 1 α .
0 0 1
Indeed,
1 0 0 1 2 3 1 2 3
" #" # " #
0 1 α 4 5 6 = 4 + 7α 5 + 8α 6 + 9α
0 0 1 7 8 9 7 8 9
91 / 323
Summary. A is invertible if and only if there exist a sequence of
elementary matrices Ej so that
Ek · · · E1 A = In .
Note that
92 / 323
Example 1.
1 2 1 0 1 0 −2 1
[A|I2 ] = ∼ 3 = [I2 |A−1 ]
3 4 0 1 0 1 2 − 12
93 / 323
Some additional properties.
I If A is invertible, then (AT )−1 = (A−1 )T .
Indeed,
AT (A−1 )T = (A−1 A)T = InT = In ,
and similarly (A−1 )T AT = In .
I Suppose AB is invertible. Then
A[B(AB)−1 ] = (AB)(AB)−1 = In .
Thus
A[B(AB)−1 b] = b for any b ∈ Rn ,
so that Ax = b has a solution for every b. Thus A has a pivot
in every row, so that A is invertible. Similarly, B is invertible.
Conclusion. AB is invertible if and only if A, B are invertible.
94 / 323
Some review. Let A ∈ Rm×n and x, b ∈ Rm .
Row pivots. The following are equivalent:
I A has a pivot in every row
I Ax = b is consistent for every b ∈ Rm
I the columns of A span Rm
I the transformation T (x) = Ax maps Rn onto Rm
I the rows of A are independent (see below)
Column pivots. The following are equivalent:
I A has a pivot in every column
I Ax = 0 =⇒ x = 0
I the columns of A are independent
I the transformation T (x) = Ax is one-to-one
95 / 323
Claim. If A has m pivots, then the rows of A ∈ Rm×n are
independent. (The converse is also true — why?)
Proof. By hypothesis,
BA = U, or equivalently A = B −1 U
U T [(B −1 )T x] = 0 =⇒ (B −1 )T x = 0 =⇒ x = 0
96 / 323
When A is square, all of the above equivalences hold, in addition
to the following:
I There exists C ∈ Rn×n so that CA = In .
(This gives Ax = 0 =⇒ x = 0.)
I There exists D ∈ Rn×n so that AD = In .
(This gives Ax = b is consistent for every b.)
I A is invertible.
I AT is invertible.
97 / 323
Definition. Let T : Rn → Rn be a linear transformation. We say
T is invertible if there exists S : Rn → Rn such that
98 / 323
Chapter 2. Matrix Algebra
99 / 323
Definition. A matrix A = [aij ] ∈ Rm×n is lower triangular if
1 0 0 1 0 0
" # " #
R2 7→ αR1 + R2 =⇒ E = α 1 0 , E −1 = −α 1 0 .
0 0 1 0 0 1
100 / 323
We can similarly define upper triangular and unit upper
triangular matrices.
Note that the product of (unit) lower triangular matrices is (unit)
lower triangular:
X X
(AB)ij = aik bkj = aik bkj = 0 for i < j.
k j≤k≤i
101 / 323
LU Factorization. For any A ∈ Rm×n , there exists a permutation
matrix P ∈ Rm×m and an upper triangular matrix U ∈ Rm×n (in
echelon form) such that
PA ∼ U.
Moreover, the elementary matrices used to reduce PA to U may all
be taken to be lower triangular and of the type
Thus
Ek · · · E1 PA = U
for some unit lower triangular (elementary) matrices Ej , and so
PA = (E1−1 · · · Ek−1 )U = LU
102 / 323
The LU factorization is also used to solve systems of linear
equations.
Example. Solve Ax = b, where
1 0 −2 2 3
A = LU = , b= .
1 1 0 2 3
1. Solve Ly = b:
1 0 y1 3 y1 = 3, y1 = 3
= =⇒ =⇒
1 1 y2 3 y1 + y2 = 3 y2 = 0
2. Solve Ux = y.
(
−2 2 x1 3 −2x1 + 2x2 = 3
= =⇒
0 2 x2 0 x2 = 0
(
x1 = − 23
=⇒ .
x2 = 0
103 / 323
This process is computationally efficient when A is very large and
solutions are required for systems in which A stays fixed but b
varies.
See the ‘Numerical notes’ section in the book for more details.
We next compute some examples of LU factorization, beginning
with the case that P is the identity matrix.
104 / 323
Example 1. Let
2 3 4 1
" #
A= 1 3 2 4 .
1 2 3 4
We put A into upper triangular form form via row replacements:
2 3 4 1
" #
R2 7→− 1 R1 +R2
A −−−−−2−−−−→ 0 3
2 0 7
2
R3 7→− 12 R1 +R3 0 1
1 7
2 2
2 3 4 1
" #
R3 7→− 13 R2 +R3
3 7
−−−−−−−−−→ 0 2 0 2 = U.
7
0 0 1 3
105 / 323
Example 1. (continued) Note that
1 0 0
E1 = − 1
2
1 0 7 − 12 R1 + R2 )
, (R2 →
0 0 1
1 0 0
E2 = 0 1 0 , (R3 →7 − 12 R1 + R3 )
− 12 0 1
1 0 0
E3 = 0 1
1
0 , (R3 → 7 − 31 R2 + R3 ),
0 −3 1
1 0 0 1 0 0
− 12 −1 1
E3 E2 E1 = 1 0 , L = (E3 E2 E1 ) = 2
1 0 ,
− 12 − 31 1 1
2
1
3
1
In particular, E3 E2 E1 L = I3 .
106 / 323
Example 1. (continued)
Altogether, we have the LU factorization:
2 3 4 1 1 0 0 2 3 4 1
" # " #" #
1 3 7
1 3 2 4 = 2 1 0 0 2 0 2 .
1 1 7
1 2 3 4 2 3 1 0 0 1 3
107 / 323
Example 2. Let
2 1 1 −1
" #
A= −2 −1 −1 1 .
4 2 1 0
R2 7→ R1 + R2 , R3 7→ −2R1 + R3 .
This corresponds to
1 0 0 2 1 1 −1
" # " #
EA := 1 1 0 A= 0 0 0 0
−2 0 1 0 0 −1 2
108 / 323
Example 2. (continued)
So far, we have written PEA = U with E unit lower triangular and
U in echelon form. Thus (since P = P −1 ),
A = E −1 PU.
109 / 323
Chapter 2. Matrix Algebra
110 / 323
Definition. We call (y, h) ∈ Rn+1 (with h 6= 0) homogeneous
coordinates for x ∈ Rn if
111 / 323
Example. Let x0 ∈ Rn and define
In x0 x x + x0
T ((x, 1)) = = .
0 1 1 1
x 7→ x + x0 .
112 / 323
To represent a linear transformation on Rn , say T (x) = Ax, in
homogeneous coordinates, we use
A 0 x Ax
T ((x, 1)) = = .
0 1 1 1
113 / 323
Graphics in three dimensions. Applying successive linear
transformations and translation to the homogeneous coordinates of
the points that define an outline of an object in R3 will produce
the homogeneous coordinates of the translated/deformed outline
of the object.
See the Practice Problem in the textbook.
This also works in the plane.
114 / 323
Example 1. Find the transformation that translates by (0, 8) in
the plane and then reflects across the line y = −x.
Solution:
0 −1 0 1 0 0 x 0 −1 −8 x
" #" #" # " #" #
−1 0 0 0 1 8 y = −1 0 0 y
0 0 1 0 0 1 1 0 0 1 1
115 / 323
Example 1. (Continued) What is the effect of the transformation
in Example 1 on the following vertices:
Solution:
0 −1 −8 0 3 3 −8 −8 −12
−1 0 0 0 0 4 = 0 −3 −3
0 0 1 1 1 1 1 1 1
Thus
116 / 323
Perspective projection. Consider a light source at the point
(0, 0, d) ∈ R3 , where d > 0.
A ray of light passing through a point (x, y , z) ∈ R3 with
0 ≤ z < d will intersect the xy -plane at a point (x ∗ , y ∗ , 0).
Understanding the map (x, y , z) 7→ (x ∗ , y ∗ ) allows us to represent
‘shadows’. (One could also imagine projection onto other 2d
surfaces.)
By some basic geometry (similar triangles, for example), one can
deduce
x y
x∗ = , y∗ = .
1 − dz 1 − dz
In particular, we find that
117 / 323
Perspective projection. (Continued) Note that the mapping
1 0 0 0 x x
0 1 0 0 y y
0 0 0 0 z = 0
0 0 − d1 1 1 1 − dz
118 / 323
Chapter 3. Determinants
119 / 323
Definition. The determinant of a matrix A ∈ Rn×n , denoted
det A, is defined inductively. Writing A = (aij ), we have the
following:
I If n = 1, then det A = a11 .
I If n ≥ 2, then
n
X
det A = (−1)j+1 a1j det A1j ,
j=1
120 / 323
Example. Consider the 2 × 2 case:
a b
A= .
c d
A11 = d, A12 = c.
Thus
2
X
det A = (−1)1+j a1j det A1j
j=1
121 / 323
Examples.
1 2
det = 1 · 4 − 2 · 3 = −2.
3 4
λ 0
det = λµ
0 µ
1 2
det = 1 · 6 − 2 · 3 = 0.
3 6
122 / 323
Example. Consider
1 2 3
A= 1 3 4
1 3 6
Then
3 4 1 4 1 3
A11 = , A12 = , A13 = ,
3 6 1 6 1 3
and
3
X
det A = (−1)1+j a1j det A1j
j=1
123 / 323
Definition. Given an n × n matrix A, the terms
124 / 323
Claim. The determinant can be computed using the cofactor
expansion of det A with any row or any column. That is,
n
X
det A = aij Cij for any i
j=1
Xn
= aij Cij for any j.
i=1
125 / 323
Example. Consider again
1 2 3
" #
A= 1 3 4 , det A = 2.
1 3 6
3
X
det A = ai1 Ci1
i=1
= a11 det A11 − a21 det A21 + a31 det A31
= 1 · 6 − 1 · 3 + 1 · (−1) = 2.
126 / 323
• Use the flexibility afforded by cofactor expansion to simplify your
computations: use the row or columns with the most zeros.
Example. Consider
1 2 3
" #
A= 4 0 5 .
0 6 0
Using row 3,
127 / 323
Properties of the determinant.
I det In = 1
I det AB = det A det B
I det AT = det A
I Note that if A ∈ Rn×n is invertible, then
128 / 323
Determinants of elementary matrices.
I If E corresponds to Ri 7→ αRi (for α 6= 0), then
det E = α.
det E = −1.
det E = 1.
129 / 323
Determinants of elementary matrices. Recall that the
elementary matrix corresponding to a row operation is obtained by
applying this operation to the identity matrix.
I Scaling:
α 0
E= =⇒ det E = α.
0 1
I Interchange:
0 1
E= =⇒ det E = −1.
1 0
130 / 323
Row reduction and determinants. Suppose A ∼ U, with
U = Ek · · · E1 A.
Then
1
det A = det E1 ··· det Ek det U.
Suppose that U is in upper echelon form. Then
Indeed, this is true for any upper trianglular matrix (use the right
cofactor expansion).
Thus, row reduction provides another means of computing
determinants!
131 / 323
Example 1.
1 2 3 1 2 3
" # " #
R 7→−2R +R
2
A= 2 4 10 −−−−−−−1−−→
2
0 0 4
R3 7→−3R1 +R3
3 8 9 0 2 0
1 2 3
" #
R2 ↔R3
−−−−→ 0 2 0 = U.
0 0 4
Then
det A = 1 · (−1) · det U = −1 · 2 · 4 = −8.
132 / 323
Example 2.
1 2 3 1 2 3
" # " #
A= 0 4 5 →
− 0 4 5 .
6 12 18 0 0 0
Thus det A = 0.
133 / 323
Invertibility.
We saw above that if U = (uij ) is an upper echelon form for A,
then
det A = c · u11 · · · unn for some c 6= 0.
We also saw that if A is invertible, then det A 6= 0. Equivalently,
134 / 323
Invertibility Theorem. The following are equivalent:
I A is invertible.
I The reduced row echelon form of A is In .
I A has n pivot columns (and n pivot rows).
I det A 6= 0.
135 / 323
Examples.
I Recall
1 2 3
" #
A= 4 0 5 =⇒ det A = 42.
0 6 0
136 / 323
Chapter 4. Vector Spaces
137 / 323
Definition. A vector space V over a field of scalars F is a
non-empty set together with two operations, namely addition and
scalar multiplication, which obey the following rules: for
u, v, w ∈ V and α, β ∈ F :
I u+v ∈V I αv ∈ V
I u+v =v+u
I α(u + v) = αu + αv
I (u + v) + w = u + (v + w)
I there exists 0 ∈ V such I (α + β)u = αu + βu
that 0 + u = u
I α(βu) = (αβ)u
I there exists −u ∈ V such
that −u + u = 0 I 1u = u
138 / 323
Remark 1. A field is another mathematical object with its own
long list of defining axioms, but in this class we will always just
take F = R or F = C.
Remark 2. One typically just refers to the vector space V without
explicit reference to the underlying field.
Remark 3. The following are consequences of the axioms:
0u = 0, α0 = 0, −u = (−1)u.
139 / 323
Examples.
I V = Rn and F = R
I V = Cn and F = R or C
I V = Pn (polynomials of degree n or less), and F = R
I V = S, the set of all doubly-infinite sequences
(. . . , x−2 , x−1 , x0 , x1 , . . . ) and F = R
I V = F(D), the set of all functions defined on a domain D and
F = R.
140 / 323
Definition. Let V be a vector space and W a subset of V . If W is
also a vector space under vector addition and scalar multipliation,
then W is a subspace of V . Equivalently, W ⊂ V is a subspace if
u+v ∈W and αv ∈ W
W = {x ∈ R2 : x1 + x2 = 2} is not a subspace of R2 .
W = {x ∈ R2 : x1 x2 ≥ 0} is not a subspace of R2 .
141 / 323
Further examples and non-examples.
I W = Rn is a subspace of V = Cn with F = R
I W = Rn is not a subspace of V = Cn with F = C
I W = Pn is a subspace of V = F(D)
I W = S+ , the set of doubly-infinite sequences such that
x−k = 0 for k > 0 is a subspace of S
I W = {(x, y ) ∈ R2 : x, y ∈ Z} is not a subspace of R2
142 / 323
Span as subspace. Let v1 , . . . , vk be a collection of vectors in Rn .
Then
W := span{v1 , . . . , vk } = {c1 v1 + · · · + ck vk : c1 , . . . , ck ∈ F }
is a subspace of Rn .
Indeed, if u, v ∈ W and α ∈ F then u + v ∈ W and αu ∈ W .
(Why?)
143 / 323
Subspaces associated with A ∈ Rm×n .
I The column space of A, denoted col(A) is the span of the
columns of A.
I The row space of A, denoted row(A) is the span of the rows
of A.
I The null space of A, denoted nul(A), is
nul(A) = {x ∈ Rn : Ax = 0} ⊂ Rn .
{x ∈ Rn : Ax = b}
144 / 323
Example. Let
1 2 0 3 1 2 0 3
A= ∼ .
−1 −2 1 1 0 0 1 4
That is,
2 −1
1 0
nul(A) = span ,
.
0 1
0 1
145 / 323
Example. Let
s + 3t
W = 8t : s, t ∈ R .
s −t
and hence
1 3
W = span 0 , 8 .
1 −1
146 / 323
Chapter 4. Vector Spaces
147 / 323
Null space.
Recall that the null space of A ∈ Rm×n is the solution set to
Ax = 0, denoted nul(A).
Note nul(A) is a subspace of Rn : for x, y ∈ nul(A) and α ∈ R,
148 / 323
Column space.
Recall that the column space of A ∈ Rm×n is the span of the
columns of A, denoted col(A).
Recall that col(A) is a subspace of Rm .
Note that b ∈ col(A) precisely when Ax = b is consistent.
Note that col(A) = Rm when A has a pivot in every row.
Using row reduction, we can describe col(A) as the span of a set of
vectors.
149 / 323
Example.
1 1 1 1 b1 1 1 1 1 b1
[A|b] = −1 −1 0 0 b2 ∼ 0 0 1 1 b1 + b2
1 1 3 3 b3 0 0 0 0 b3 − 2b2 − 3b1
In particular,
1 0
col(A) = span 0 , 1 .
3 2
150 / 323
Definition. Let V and W be vector spaces. A linear
transformation T : V → W is a function such that for all
u, v ∈ V and α ∈ F ,
151 / 323
Review. We can add some new items to our list of equivalent
conditions:
Row pivots. A matrix A ∈ Rm×n has a pivot in every row if and
only if
col(A) = Rm .
152 / 323
Chapter 4. Vector Spaces
153 / 323
Definition. A set of vectors {v1 , . . . , vn } in a vector space V is
linearly independent if
c1 v1 + · · · + cn vn = 0 =⇒ c1 = · · · = cn = 0.
154 / 323
Example 1. The set {cos t, sin t} is linearly independent in F(R);
indeed, if
c1 cos t + c2 sin t ≡ 0,
then c1 = 0 (set t = 0) and c2 = 0 (set t = π2 ).
Example 2. The set {1, cos2 t, sin2 t} is linearly dependent in
F(R); indeed,
cos2 t + sin2 t − 1 ≡ 0.
155 / 323
Example. Show that
2x1 + x2 + 4x3 = 0
x1 + 3x3 = 0.
156 / 323
Example. (Continued) To solve this linear system, we use the
augmented matrix:
2 1 4 1 0 3
∼ .
1 0 3 0 1 −2
157 / 323
Definition. Let W be a subspace of V . A set of vectors
B = {b1 , . . . , bn } is a basis for W if
(i) B is linearly independent, and
(ii) W = span(B).
The plural of basis is bases.
Examples.
I B = {e1 , . . . , en } is the standard basis for Rn .
I B = {1, t, . . . , t n } is the standard basis for Pn .
I B = {v1 , . . . , vn } ⊂ Rn is a basis for Rn if and only if
A = [v1 · · · vn ] ∼ In .
158 / 323
Bases for the null space. Recall that for A ∈ Rm×n we have the
subspace
nul(A) = {x ∈ Rn : Ax = 0} ⊂ Rn .
Suppose
−1 −2
h 1 2 3 4
i h 1 0
i
A= 2 4 6 8 ∼ 0 1 2 3 .
1 1 1 1 0 0 0 0
159 / 323
Bases for the column space. Consider
−2
1 1 0 0 1 0 0
A = [a1 a2 a3 a4 ] = 10 21 22 12 ∼ 0
0
1
0
2
0
0
1 .
0 0 0 1 0 0 0 0
160 / 323
Bases for the row space. Recall that row(A) is the span of the
rows of A.
A basis for row(A) is obtained by taking the non-zero rows in the
reduced echelon form of A.
This is based on the fact that A ∼ B =⇒ row(A) = row(B).
Example. Consider
−2
1 1 0 0
1 0 0
b1
A = 10 21 2
2
1
2 ∼ 0
0
1
0
2
0
0
1 = b2
b3 .
0 0 0 1 0 0 0 0 b4
161 / 323
Two methods. We now have two methods for finding a basis for
a subspace spanned by a set of vectors.
1. Let W = span{a1 , a2 , a3 , a4 } = row(A), where
a1 1 1 0 0 1 0 −2 0 b1
A = aa23 = 10 21 22 12 ∼ 00 10 20 01 = b2
b3 .
a4 0 0 0 1 0 0 0 0 b4
Then B = {aT T T
1 , a2 , a3 } is a basis for W .
162 / 323
Chapter 4. Vector Spaces
163 / 323
Unique representations. Suppose B = {v1 , . . . , vn } is a basis for
some subspace W ⊂ V . Then every v ∈ W can be written as a
unique linear combination of the elements in B.
Indeed, B spans W by definition. For uniqueness suppose
v = c1 v1 + · · · + cn vn = d1 v1 + · · · dn vn .
Then
(c1 − d1 )v1 + · · · + (cn − dn )vn = 0,
and hence linear independence of B (also by definition) implies
c1 − d1 = · · · = cn − dn = 0.
164 / 323
Definition. Given a basis B = {v1 , . . . , vn } for a subspace W ,
there is a unique pairing of vectors v ∈ W and vectors in Rn , i.e.
v ∈ W 7→ (c1 , . . . , cn ) ∈ Rn
[v]B = (c1 , · · · , cn ).
165 / 323
Example. Let B = {v1 , v2 , v3 } and v be given as follows:
1 1 0 3
" # " # " # " #
v1 = 0 , v2 = 1 , v3 = 0 , v= 1 .
0 0 1 8
166 / 323
Example. Let B 0 = {p1 , p2 , p3 } ⊂ P2 and p ∈ P2 be given by
3 + t + 8t 2 = x1 + x2 (1 + t) + x3 t 2 = (x1 + x2 ) + x2 t + x3 t 2 .
x1 + x2 + 0x3 = 3
0x1 + x2 + 0x3 = 1
0x1 + 0x2 + x3 = 8
167 / 323
Isomorphism property. Suppose B = {v1 , . . . , vn } is a basis for
W . Define the function T : W → Rn by
T (v) = [v]B .
168 / 323
Example. (again) Let E = {1, t, t 2 }. This is a basis for P2 , and
p = x1 p1 + x2 p2 + x3 p3
⇐⇒ T (p) = x1 T (p1 ) + x2 T (p2 ) + x3 T (p3 )
⇐⇒ [p]E = x1 [p1 ]E + x2 [p2 ]E + x3 [p3 ]E
⇐⇒ v = x1 v1 + x2 v2 + x3 v3 .
169 / 323
Chapter 4. Vector Spaces
170 / 323
Question. Given a vector space V , does there exist a finite
spanning set?
Note that every vector space V has a spanning set, namely V itself.
Also note that every vector space (except for the space containing
only 0) has infinitely many vectors.
Suppose W = span{v1 , . . . , vk }.
I If {v1 , . . . , vk } are independent, then it is a basis for W .
I Otherwise, at least one of the vectors (say vk ) is a linear
combination of the others. Then W = span{v1 , · · · , vk−1 }.
Continuing in this way, one can obtain a finite, independent
spanning set for W (i.e. a basis).
171 / 323
Claim. If V has a basis B = {v1 , . . . , vn }, then every basis for V
has n elements.
To see this, consider the isomorphism T : V → Rn given by
T (v) = [v]B .
First, we find that a set S ⊂ V is independent if and only if
T (S) = {T (u) : u ∈ S} ⊂ Rn is independent. This implies that
any basis in V can have at most n elements. (Why?)
Similarly, S ⊂ V spans V if and only if T (S) spans Rn . This
implies that any basis V must have at least n elements. (Why?)
In fact, using this we can deduce that isomorphic vector spaces
must have the same number of vectors in a basis.
172 / 323
Definition.
If V has a finite spanning set, then we call V finite dimensional.
The dimension of V , denoted dim(V ), is the number of vectors in
a basis for V .
The dimension of {0} is zero by definition.
If V is not finite dimensional, it is infinite dimensional.
Examples.
I dim(Rn ) = n
I dim Pn = n + 1
I If P is the vector space of all polynomials, P is
infinite-dimensional.
I F(R) is infinite-dimensional
173 / 323
Bases and subspaces. Suppose dim(V ) = n. and
B = {v1 , . . . , vn } ⊂ V .
If B is independent, then B is a basis for V .
(If not, there is an independent set B = {v1 , . . . , vn , vn+1 } ⊂ V .
However, this yields an independent set in Rn with n + 1 elements,
a contradiction).
Similarly, if span(B) = V , then B is a basis for V .
(If not, then there is a smaller spanning set that is independent and
hence a basis. This contradicts that all bases have n elements.)
The following also hold:
I Any independent set with less than n elements may be
extended to a basis for V .
I If W ⊂ V is a subspace, then dim(W ) ≤ dim(V ).
174 / 323
Note that for V = Rn , we have the following:
I Subspaces can have any dimension 0, 1, . . . , n.
I For R3 , subspaces of dimension 1 and 2 are either lines or
planes through the origin.
175 / 323
Example 1. Find a basis for and the dimension of the subspace W
spanned by
3 1 5 4
v1 = 1 , v2 = 2 , v3 = 0 , v4 = 3 ,
8 3 13 11
Then
1 0 2 1
A = [v1 v2 v3 v4 ] ∼ 0 1 −1 1 .
0 0 0 0
It follows that dim W = 2 and {v1 , v2 } is a basis.
In particular, W is a plane through the origin. We also see
v3 = 2v1 − v2 , v4 = v1 + v2 .
176 / 323
Example 2. Find a basis for and the dimension of the subspace
a + 3c
2b − 4c
W = : a, b, c ∈ R .
−a − 3c
a+b+c
Writing
a + 3c 1 0 3
2b − 4c 0 2 −4
−a − 3c = au + bv + cw = a −1 + b +c
0 −3
a+b+c 1 1 1
177 / 323
Example. (Null space, column space, row space) Let
1 −2 −1 −2 −1
" #
A = [a1 a2 a3 a4 a5 ] = −1 2 2 5 2
0 0 2 6 2
b1 1 −2 0 1 0
" # " #
∼ b2 = 0 0 1 3 1 .
b3 0 0 0 0 0
178 / 323
Example. (continued) For the previous example:
I dim(nul(A)) = 3. This is the number of free variables in the
solution set of Ax = 0.
I dim(col(A)) = 2. This is the number of pivot columns.
I dim(row(A)) = 2. This is the number of pivot rows.
I The total number of columns equals the number of pivot
columns plus the number of free variables.
179 / 323
Chapter 4. Vector Spaces
4.6 Rank
180 / 323
Last time, we finished with the example
1 −2 −1 −2 −1
" #
A = [a1 a2 a3 a4 a5 ] = −1 2 2 5 2
0 0 2 6 2
and found
Note that
1 −1 0 1 0 2
" # " #
−2 2 0 0 1 2
T
A = [v1 v2 v3 ] = −1 2 2 ∼ 0 0 0
−2 5 6 0 0 0
−1 2 2 0 0 0
Thus {v1 , v2 } is a basis for col(AT ), and hence {v1T , v2T } is a basis
for row(A).
181 / 323
Thus we have seen that dim(col(A)) = dim(row(A)), and that this
number is equal to the number of (column or row) pivots of A.
Furthermore, these are all equal to the corresponding quantities for
AT .
This is true in general.
Definition. The rank of A ∈ Rm×n is the number of pivots of A.
We denote it rank(A).
182 / 323
Rank. Fix A ∈ Rm×n . Note that rank(A) = rank(AT ) and
In particular
n − m = dim(nul(A)) − dim(nul(AT ))
183 / 323
Row equivalence and rank. Let A, B ∈ Rm×n . Note that
A ∼ B =⇒ rank(A) = rank(B);
184 / 323
Examples.
I Suppose A ∈ R3×8 and rankA = 3. Then:
dim(nul(A)) = 5, rank(AT ) = 3.
dim(col(A)) = 2.
185 / 323
Note that if u ∈ Rm×1 and v ∈ R1×n are nonzero, then uv ∈ Rm×n
and rank(uv) = 1; indeed, if v = [β1 · · · βn ] then
uv = [β1 u · · · βn u]
vm
A = u1 v1 + · · · + uk vk
186 / 323
Chapter 4. Vector Spaces
187 / 323
Let A = {a1 , . . . , an } and B = {b1 , . . . , bn } be bases for a vector
space V . Let us describe the ‘coordinate change’ transformation
T : Rn → Rn , T ([v]A ) = [v]B .
In conclusion,
[v]B = PA7→B [v]A , where PA7→B = [a1 ]B · · · [an ]B .
PA7−1
→B = PB7→A .
188 / 323
Let the columns of A, B be bases for Rn and denote
E = {e1 , . . . , en }. Then in fact
189 / 323
Example. Let A = {a1 , a2 } and B = {b1 , b2 }, where
3 4 1 2
a1 = a2 = b1 = b2 =
8 9 1 1
Then
1 0 5 5
[B|A] ∼ ∼ [I2 |PA7→B ].
0 1 −2 −1
Suppose [v]A = [2, −3]T . Then
3 4 2 −6
v = [v]E = PA7→E [v]A = = ,
8 9 −3 −11
and
5 5 2 −5
[v]B = PA7→B [v]A = =
−2 −1 −3 −1
190 / 323
Chapter 5. Eigenvalues and Eigenvectors
191 / 323
Definition. Let A ∈ Cn×n . Suppose v ∈ Rn and λ ∈ C satisfy
Av = λv and v 6= 0.
192 / 323
Examples.
I Is v an eigenvector of A, where
3 6 7
" #
h 1
i
v= −2 , A= 3 3 7 ?
1
5 6 5
Check:
−2
h 3 6 7
ih 1
i h i
Av = 3 3 7 −2 = 4 = −2v,
5 6 5 1 −2
Eλ = nul(A − λIn ).
194 / 323
Example. Let
−1 −1
5 2
1 4 −1 1
A= , which has eigenvalue 2.
7 8 −2 1
7 4 −2 −1
Note
−1 −1 0 −1
3 2 1 0
1 2 −1 1 0 1 − 12 1
A − 2I4 = ∼
7 8 −4 1 0 0 0 0
7 4 −2 −3 0 0 0 0
Thus
E2 = nul(A − 2I4 ) = span{v1 , v2 },
where v1 = [1, −1, 0, 1]T and v2 = [0, 21 , 1, 0]T are two particular
eigenvectors that form a basis for E2 .
195 / 323
Theorem. (Independence) Let S be a set of eigenvectors of a
matrix A corresponding to distinct eigenvalues. Then S is
independent.
Proof. Suppose {v1 , . . . , vp−1 } are independent but {v1 , . . . , vp } are
dependent. Then there exists a non-trivial combination so that
c1 v1 + · · · + cp vp = 0. (∗)
c1 λ1 v1 + · · · + cp λp vp = 0.
196 / 323
Example. Let
−1 −2 1
A= 2 3 0 .
−2 −2 4
Eigenvalue and eigenvector pairs are given by
1
" #
λ1 = 1, v1 = −1 ,
0
−1
" #
λ2 = 2, v2 = 2 ,
1
0
" #
λ3 = 3, v3 = 1 .
2
197 / 323
Triangular matrices.
Theorem. The eigenvalues of a triangular matrix are the entries
along the diagonal.
To see this, recall that
198 / 323
Example. Consider
1 2 3
" #
A= 0 2 4 .
0 0 3
Then
−1 2 3 1 −2 0
A − 2I3 = 0 0 4 ∼ 0 0 1 .
0 0 1 0 0 0
199 / 323
Invertibility.
Theorem. A is invertible if and only if λ = 0 is not an eigenvalue
of A.
Indeed, A is invertible if and only if rankA = n, which means
200 / 323
Chapter 5. Eigenvalues and Eigenvectors
201 / 323
Definition. Given A ∈ Rn×n , det(λIn − A) is a polynomial of
degree n in λ. It is known as the characteristic polynomial of A.
Its roots are the eigenvalues of A.
Example. Consider
1 2
A= =⇒ det(λI2 − A) = (1 − λ)2 − 4 =⇒ λ = −1, 3.
2 1
202 / 323
Repeated eigenvalues.
Example 1.
−4 −3 1
" #
A= 4 3 0 =⇒ det(λI3 − A) = λ3 − λ2 − λ + 1.
−1 −1 2
Example 2.
−5 −4 2
" #
B= 6 5 −2 =⇒ det(λI3 − B) = λ3 − λ2 − λ + 1.
0 0 1
203 / 323
Complex eigenvalues.
1 2
Example. A = =⇒ det(λI2 − A) = (λ − 1)2 + 4.
−2 1
The eigenvalues are 1 ± 2i. To find the eigenspaces, we proceed
exactly as before (row reduction):
−2i 2 1 i
A − (1 + 2i)I2 = ∼ .
−2 −2i 0 0
204 / 323
Similar matrices.
Definition. A matrix B ∈ Rn×n is similar to A ∈ Rn×n if there
exists an invertible matrix P ∈ Rn×n such that B = P −1 AP. We
write A ≈ B.
Similarity is an equivalence relation.
Note that if B = P −1 AP,
205 / 323
Similarity and row equivalence.
Neither implies the other.
Indeed,
−1
1 0 −1 1 1 −1 −1 1
=
−1 0 1 1 0 0 1 1
206 / 323
Chapter 5. Eigenvalues and Eigenvectors
5.3. Diagonalization
207 / 323
Definition. A matrix A ∈ Rn×n is called diagonalizable if it is
similar to a diagonal matrix.
Remark. If we can diagonalize A, then we can compute its powers
easily. Indeed,
A = P −1 DP =⇒ Ak = P −1 D k P,
208 / 323
Characterization of diagonalizability.
Theorem. A matrix A ∈ Cn×n is diagonalizable precisely when
there exists a basis for Cn consisting of eigenvectors of A. In this
case, writing (vk , λk ) for an eigenvector/eigenvalue pair,
209 / 323
Distinct eigenvalues. If A ∈ Cn×n has n distinct eigenvalues,
then A has n linearly independent eigenvectors and hence A is
diagonalizable.
Example. Consider
−1 −2 1
" #
A= 2 3 0 =⇒ det(λI3 − A) = (λ − 1)(λ − 2)(λ − 3).
−2 −2 4
Thus,
1 0 0
" #
0 2 0 = P −1 AP, P = [v1 v2 v3 ].
0 0 3
210 / 323
If an n × n matrix does not have n distinct eigenvalues, it may or
may not be diagonalizable.
Example. (from before)
−5 −4 2
" #
B= 6 5 −2 =⇒ det(λI3 − B) = λ3 − λ2 − λ + 1.
0 0 1
211 / 323
Example. (from before)
−4 −3 1
" #
A= 4 3 0 =⇒ det(λI3 − A) = λ3 − λ2 − λ + 1.
−1 −1 2
212 / 323
Example. We previously saw the matrix
1 2
A=
−2 1
213 / 323
Similarity, diagonalization, linear transformations. Suppose
214 / 323
Chapter 5. Eigenvalues and Eigenvectors
215 / 323
Transformation matrix. Let B = {v1 , · · · , vn } be a basis for the
vector space V , and C = {w1 , . . . , wm } a basis for the vector
space W . Given a linear transformation T : V → W , we define
T̂ : Cn → Cm by T̂ ([v]B ) = [T (v)]C .
216 / 323
Example. Let B = {v1 , v2 , v3 } be a basis for V and
C = {w1 , w2 } be a basis for W . Suppose T : V → W , with
217 / 323
Example. Let T : P2 → P3 be given by
T (p(t)) = (t + 5)p(t).
Since
T (1) = 5 + t, T (t) = 5t + t 2 , T (t 2 ) = 5t 2 + t 3 ,
we find
5 0 0
1 5 0
M = [T (1)]C [T (t)]C [T (t 2 )]C =
.
0 1 5
0 0 1
218 / 323
Matrix transformations. If B = {v1 , . . . , vn } is a basis for Cn
and C = {w1 , . . . , wm } is a basis for Cm and T (x) = Ax for some
A ∈ Cm×n , then the matrix for T relative to B and C is
For example:
I If B and C are the elementary bases, then M = A (the
standard matrix for T ).
I If B = C , then M = P −1 AP = [T ]B , where
P = [v1 , . . . , vn ] = PB7→E .
219 / 323
Example. Let B = {v1 , v2 , v3 } and C = {w1 , w2 } be bases for
R3 , R2 given by
h 1 i h 0 i
v1 = −2 , v2 = 1 , v3 = e3 , w1 = 12 , w2 = 23
1 −2
and
1 0 −2
T (x) = Ax, A=
3 4 0
Note
−1 −3 2
PB7→E = [v1 v2 v3 ], PC 7→E = [PE 7→C ] = .
2 −1
−1 1 −1
" # " # " #
v1 = 1 , v2 = 1 , v3 = 0 .
1 0 1
221 / 323
Chapter 5. Eigenvalues and Eigenvectors
222 / 323
Vectors in Cn . Recall that for z = α + iβ ∈ C (where α, β ∈ R),
we have
z̄ = α − iβ, Re z = α = 21 (z + z̄), Im z = β = 1
2i (z − z̄).
v̄ = (c̄1 , . . . , c̄n ).
v̄ = x − iy, Re v = x = 12 (v + v̄), Im v = 1
2i (v − v̄).
223 / 323
Conjugate pairs. Suppose λ = α + iβ ∈ C is an eigenvalue for
A ∈ Rn×n with eigenvector v = x + iy. Note that
λ 0
Av = λv =⇒ Av̄ = λ̄v̄, i.e. A[v v̄] = [v v̄] .
0 λ̄
Note that
In particular,
α β
A[x y] = [x y] = r [x y]Rθ ,
−β α
p
where r = α2 + β 2 and Rθ is the 2×2 rotation matrix by
θ ∈ [0, 2π), defined by cos θ = αr and sin θ = − βr .
224 / 323
Example. The matrix
1 5
A=
−2 3
225 / 323
Example. Let
2 2 1
" #
A= 2 4 3 .
−2 −4 −2
We have the following eigenvalues and eigenvector pairs:
h 2 i h 0 i h 1 i
(u, λ1 ) = ( 1 , 2), (v, v̄, α ± iβ) = ( −1 ± i 0 , 1 ± i).
−2 −1 −1
2 0 0
" #
Q= 0 1 1 .
0 −1 1
226 / 323
Example. (cont.) We can write A = PQP −1 , where P = [u x y]
and
2 0 0
" #
Q= 0 1 1 .
0 −1 1
Thus, writing T (x) = Ax and B = {u, x, y}, we can describe the
effect of T in the B-coordinate system as follows: T scales by a
factor of 2 along
√ the x-axis; in the yz-plane, T rotates by 3π/4
and scales by 2.
227 / 323
Chapter 5. Eigenvalues and Eigenvectors
228 / 323
Scalar linear homogeneous ODE. Consider a second order ODE
of the form
x 00 + bx 0 + cx = 0.
Defining
x1 = x, x2 = x 0 , x = (x1 , x2 )T ,
we can rewrite the ODE as a 1st order 2x2 system:
0 1
x0 = Ax, A = .
−c −b
229 / 323
Matrix Exponential. How do we solve
x(t) = e At x0 .
e A = diag(e λ1 , . . . , e λn ).
e A = Pe D P −1 .
231 / 323
Example 3. e 0 = I .
Example 4. If A is nilpotent (that is, Ak0 = 0 for some k0 ), then
0 −1
kX
Ak
eA = k! (a finite sum).
k=0
e A+B = e A e B = e B e A .
(e A )−1 = e −A .
232 / 323
Numerical example. Consider
00 0 0 0 1
x − 4x + 3x = 0 =⇒ x = Ax, A=
−3 4
Then
−1 1 1
A = PDP , P= , D = diag(1, 3)
1 3
e tA = P[diag(e t , e 3t )]P −1 .
233 / 323
Numerical example. (cont.) This gives two linearly independent
solutions, namely
t 3t
e e
x(t) = t and x(t) = .
e 3e 3t
234 / 323
Chapter 6. Orthogonality and Least Squares
235 / 323
Conjugate transpose. If A ∈ Cm×n , then we define
A∗ = (Ā)T ∈ Cn×m .
236 / 323
Definition. If u = (a1 , . . . , an ) ∈ Cn and v = (b1 , . . . , bn ) ∈ Cn ,
then we define the inner product of u and v by
u · v = ā1 b1 + · · · + ān bn ∈ C.
A∗ B = [ui · vj ] i = 1, . . . , k, j = 1, . . . , `.
237 / 323
Properties of the inner product. For u, v, w ∈ Cn and α ∈ C:
I u·v =v·u
I u · (v + w) = u · v + u · w
I α(u · v) = (αu) · v = u · (αv)
I If u = (a1 , · · · , an ) ∈ Cn , then
u · u = |a1 |2 + · · · + |an |2 ≥ 0,
and u · u = 0 only if u = 0.
238 / 323
Definition. If u = (a1 , . . . , an ) ∈ Cn , then the norm of u is given
by
√ q
kuk = u · u = |a1 |2 + · · · + |an |2 .
Properties. For u, v ∈ Cn and α ∈ C,
I kαuk = |α| kuk
I |u · v| ≤ kukkvk (Cauchy–Schwarz inequality)
I ku + vk ≤ kuk + kvk (triangle inequality)
239 / 323
1 i
Example. Let A = [v1 v2 ] = .
3 + 8i 2i
1 3 − 8i
Then A∗ = . So A is not hermitian.
−i −2i
240 / 323
Definition. Two vectors u, v ∈ Cn are orthogonal if u · v = 0. We
write u ⊥ v.
A set {v1 , . . . , vk } ⊂ Cn is an orthogonal set if vi · vj = 0 for each
i, j = 1, . . . , k (with i 6= j).
A set {v1 , . . . , vk } ⊂ Cn is an orthonormal set if it is orthogonal
and each vi is a unit vector.
Remark. In general, we have
ku + vk ≤ kuk + kvk.
However, we have
241 / 323
Definition. Let W ⊂ Cn . The orthogonal complement of W ,
denoted W ⊥ , is defined by
W ⊥ = {v ∈ Cn : v · w = 0 for every w ∈ W }.
242 / 323
Suppose A = [v1 · · · vk ] ∈ Cn×k . Then
[col(A)]⊥ = nul(A∗ ).
Indeed
v1∗
∗
v1 x v1 · x
0 = A∗ x = ... x = ... = ...
vk∗ vk∗ x vk · x
if and only if
v1 · x = · · · = vk · x = 0.
243 / 323
Example 1. Let v1 = [1, −1, 2]T and v2 = [0, 2, 1]T . Note that
v1 ⊥ v2 .
Let W = span{v1 , v2 } and A = [v1 v2 ]. Note that
We have
5
T 1 −1 2 1 0 2
A = ∼ 1 ,
0 2 1 0 1 2
244 / 323
Example 2. Let v1 = [1, −1, 1, −1]T and v2 = [1, 1, 1, 1]T . Again,
v1 · v2 = 0.
Let W = span{v1 , v2 } and A = [v1 v2 ] as before. Then
W ⊥ = nul(AT ), with
T 1 −1 1 −1 1 0 1 0
A = ∼ .
1 1 1 1 0 1 0 1
245 / 323
Chapter 6. Orthogonality and Least Squares
246 / 323
Definition. If S is an orthogonal set that is linearly independent,
then we call S an orthogonal basis for span(S).
Similarly, a linearly independent orthonormal set S is a
orthonormal basis for span(S).
Example. Let
1 0 −1
" # " # " #
v1 = 0 , v2 = 1 , v3 = 0 .
1 0 1
247 / 323
Test for orthogonality. Let A = [v1 · · · vp ] ∈ Cn×p . Note that
v1 · v1 · · · v1 · vp
A∗ A = .. .. .. p×p
∈C .
. . .
vp · v1 · · · vp · vp
248 / 323
Definition. A matrix A ∈ Cn×n is unitary if A∗ A = In .
The following conditions are equivalent:
I A ∈ Cn×n is unitary
I A ∈ Cn×n satisfies A−1 = A∗
I the columns of A are an orthonormal basis for Cn
I A ∈ Cn×n satisfies AA∗ = In
I the rows of A are an orthonormal basis for Cn
249 / 323
Theorem. (Independence) If S = {v1 , . . . , vp } is an orthogonal set
of non-zero vectors, then S is independent and S is a basis for
span(S).
Indeed, suppose
c1 v1 + · · · + cp vp = 0.
Now take an inner product with vj :
0 = c1 v1 · vj + · · · + cj vj · vj + · · · + cp vp · vj
= 0 + · · · + cj kvj k2 + · · · + 0.
250 / 323
Theorem. Suppose W ⊂ Cn has dimension p. Then
dim(W ⊥ ) = n − p.
Let A = [w1 · · · wp ], where {w1 , . . . , wp } is a basis for W . Note
W ⊥ = [col(A)]⊥ = nul(A∗ ) ⊂ Cn .
Thus
In particular, we find
251 / 323
Theorem. (Orthogonal decomposition) Let W be a subspace of
Cn . For every x ∈ Cn there exist unique y ∈ W and z ∈ W ⊥ such
that x = y + z.
Indeed, let B = {w1 , . . . , wp } be a basis for W and
C = {v1 , . . . , vn−p } a basis for W ⊥ . Then B ∪ C is a basis for Cn ,
and so every x has a unique representation x = y + z, where
y ∈ span(B) and z ∈ span(C ).
Uniqueness can also be deduced from the fact that
W ∩ W ⊥ = {0}.
Remark. Suppose B is an orthogonal basis or W . Then
x = α1 w1 + · · · + αp wp + z, z ∈ W ⊥.
252 / 323
Projection. Let W be a subspace of Cn . As above, for each
x ∈ Cn there exists a unique y ∈ W and z ∈ W ⊥ so that
x = y + z. We define
projW : Cn → W ⊂ Cn by projW x = y.
[projW ]E = 1
w w∗
kw1 k2 1 1
∈ Cn×n .
253 / 323
Example. Let w1 = [1, 0, 1]T and v = [−1, 2, 2]T , with
W = span{w1 }. Then
w1 ·v
projW (v) = w
kw1 k2 1
= 21 w1 .
In fact,
1 0 1
[projW ]E = kw11 k2 w1 w1∗ = 21 0 0 0 .
1 0 1
Thus
x1 + x3
projW (x) = 12 0 .
x1 + x3
254 / 323
Chapter 6. Orthogonality and Least Squares
255 / 323
Orthogonal projections. Let W be a subspace of Cn . Recall that
projW x = y, where x = y + z, y ∈ W , z ∈ W ⊥.
x = α1 w1 + · · · + αp wp + z, z ∈ W ⊥.
w1 · x = α1 w1 · w1 · · · + αp w1 · wp
.. .. ..
. . .
wp · x = α1 wp · w1 + · · · + αp wp · wp .
256 / 323
Normal system. Write A = [w1 · · · wp ] ∈ Cn×p . The system
αp
257 / 323
Theorem. (Null space and rank of A∗ A) If A ∈ Cn×p , then
A∗ A ∈ Cp×p satisfies
258 / 323
Solving the normal system. Suppose B = {w1 , . . . , wp } is a
basis for a subspace W ⊂ Cn . Writing A = [w1 , . . . , wp ], we have
that A∗ A is invertible and the normal system
A∗ Ax̂ = A∗ x
259 / 323
Example 1. If p = 1 (so W is a line), then
leading again to
w1 ·x
projW (x) = w .
kw1 k2 1
Thus
w1 ·x wp ·x
projW (x) = w
kw1 k2 1
+ ··· + w .
kwp k2 p
260 / 323
Example. Let
−2
1 4
2 1 5
w1 = , w2 = , x= .
1 −1 −3
1 1 3
261 / 323
Example 2. If A∗ A is not diagonal, then the columns of A are not
an orthogonal basis for col(A).
One can still compute the projection via
262 / 323
Distance minimization. Orthogonal projection is related to
minimizing a distance. To see this, supose w ∈ W and x ∈ Cn . By
the Pythagorean theorem,
and thus
263 / 323
Conclusion. Let B = {w1 , . . . , wp } be a basis for W ⊂ Cn ,
A = [w1 · · · wp ], and x ∈ Cn .
I nul(A∗ A) = nul(A), rank(A∗ A) = rank(A) = p, and so A∗ A is
invertible
I The solution to A∗ Ax̂ = A∗ x is x̂ = (A∗ A)−1 A∗ x
I x̂ = (A∗ A)−1 A∗ x = [projW (x)]B
I projW : Cn → W is given by projW (x) = A(A∗ A)−1 A∗ x
I projW ⊥ (x) = x − projW (x)
I x = projW (x) + projW ⊥ (x)
w1 ·x wp ·x
I if B is orthogonal, projW (x) = w
kw1 k2 1
+ ··· + w
kwp k2 p
I minw∈W kx − wk = kx − projW (x)k
264 / 323
Chapter 6. Orthogonality and Least Squares
265 / 323
Orthogonal projections. Recall that if B = {w1 , · · · , wp } is an
independent set and A = [w1 · · · wp ], then
If B is orthogonal, then
w1 ·x wp ·x
projW (x) = w
kw1 k2 1
+ ··· + w .
kwp k2 p
266 / 323
Gram-Schmidt algorithm. Let A = {w1 , · · · , wp }.
Let v1 := w1 and Ω1 := span{v1 }.
Let v2 = projΩ⊥ (w2 ) = w2 − projΩ1 (w2 ), Ω2 := span{v1 , v2 }
1
...
Let vj+1 = projΩ⊥ (wj+1 ), Ωj+1 := span{v1 , · · · , vj+1 }
j
vj+1 = 0 ⇐⇒ wj+1 ∈ Ωj .
267 / 323
Matrix representation. Write Vi = span{vi }. Since {vi } are
orthogonal, we can write
j
X j
X
projΩj (wj+1 ) = projVk (wj+1 ) = rk,j+1 vk ,
k=1 k=1
kv ·w
j+1
where rk,j+1 = kv 2 if vk 6= 0 and
kk
rk,j+1 can be anything if vk = 0.
r1,j+1
..
wj+1 = [v1 · · · vj+1 ] . , j = 1, . . . , p − 1.
r
j,j+1
1
268 / 323
Matrix representation (continued). The Gram-Schmidt
algorithm therefore has the matrix representation
where
1 r1,2 ··· r1,p
.. .. ..
. . .
R=
..
. rp−1,p
1
269 / 323
Example 1. Let
We apply Gram–Schmidt:
I v1 = w1 , Ω1 = span{v1 }
v1 ·w2
I v2 = w2 − v
kv1 k2 1
= [0, 1, 0, 1]T , Ω2 = span{v1 , v2 }
v1 ·w3 v2 ·w3
I v3 = w3 − v
kv1 k2 1
− v
kv2 k2 2
= 0, Ω3 = span{v1 , v2 }
v1 ·w4 v2 ·w4 1 T
I v4 = w4 − v
kv1 k2 1
− v
kv2 k2 2
= 2 [−1, −1, 1, 1]
Ω4 = span{v1 , v2 , v4 }.
In particular, {v1 , v2 , v4 } is an orthogonal basis for
span{w1 , w2 , w3 , w4 }.
270 / 323
Example 1. (cont.) Let A = [w1 w2 w3 w4 ] and Q = [v1 v2 0 v4 ].
Then we can write A = QR, where
1 21
1 1
0 1 −1 1
R= 2
0 0 1 c
0 0 0 1
271 / 323
Example 2. Let
272 / 323
Example 2. (Cont.) We now apply Gram–Schmidt to {x1 , x2 }.
I v1 = x1 = [−2, 1, 1, 0]T
v1 ·x2
I v2 = x2 − v
kv1 k2 1
= [0, − 12 , 12 , 1]T
273 / 323
Chapter 6. Orthogonality and Least Squares
274 / 323
The normal system. For A ∈ Cn×p and b ∈ Cn , the equation
A∗ Ax = A∗ b
275 / 323
Claim. A∗ Ax = A∗ b is consistent for every b ∈ Cn .
To see this we first show col(A∗ A) = col(A∗ ).
I Indeed, if y ∈ col(A∗ A), then we may write y = A∗ [Ax], so
that y ∈ col(A∗ ).
I On the other hand, we have previously shown that
rank(A∗ A) = rank(A∗ ). Thus col(A∗ A) = col(A∗ ).
Since A∗ b ∈ col(A∗ ), the claim follows.
If x̂ is a solution to the normal system, then
277 / 323
Least squares solutions of Ax = b There is a clear geometric
interpretation of the solution set to the normal system: let x̂ be a
solution to the normal system A∗ Ax = A∗ b. Then, with
W = col(A),
278 / 323
Example. Let
1 0 1 1
" # " #
A = [w1 w2 w3 ] = 1 −1 0 , b= 0 .
0 1 1 −1
279 / 323
Example. (cont.) We can also compute that
1
projW b = Ax̂ = 31 w1 − 31 w2 + 0 = 31 2 ,
−1
where W = col(A).
The least squares error for Ax = b is defined by
2
√
In this case, one can check that kb − Ax̂k = 3 3.
I This is a measurement of the smallest error possible when
approximating b by a vector in col(A).
280 / 323
Chapter 6. Orthogonality and Least Squares
281 / 323
Linear models. Suppose you have a collection of data from an
experiment, given by
{(xj , yj ) : j = 1, . . . , n}.
282 / 323
Linear models. (Cont.) In matrix form, we have
Terminology:
I X is the design matrix,
I β is the parameter vector,
I y is the observation vector,
I ε is the residual vector.
The goal is to find β to minimize kX β − yk2 .
To this end, we solve the normal system X ∗ X β = X ∗ y. This
solution gives the least squares best fit.
283 / 323
Example 1. (Fitting to a quadratic polynomial). Find a least
squares best fit to the data
284 / 323
Example 1. (cont.) The normal system X ∗ X β = X ∗ y has
solution β̂ = [.25, 1.05, .85]T , which implies the least squares best
fit to the data is
285 / 323
Example 2. Kepler’s first law asserts that the orbit of a comet
(parametrized by (r , θ)) is described by r = β + e(r cos θ), where
β, e are to be determined.
The orbit is elliptical when 0 < e < 1, parabolic when e = 1, and
hyperbolic when e > 1.
Given observational data
(θ, r ) = {(.88, 3), (1.1, 2.3), (1.42, 1.65), (1.77, 1.25), (2.14, 1.01)},
286 / 323
Example 2. (cont.) The associated linear model is
1 r1 cos θ1 r1
.. .. β
= ... + ε.
. . e
1 r5 cos θ5 r5
287 / 323
Chapter 7. Symmetric Matrices and Quadratic Forms
288 / 323
Schur Triangular Form.
Definition. A matrix P ∈ Cn×n is unitary if P ∗ P = In .
Schur Factorization. Any A ∈ Cn×n can be written in the form
A = PUP ∗ where P ∈ Cn×n is unitary and U ∈ Cn×n is upper
triangular.
This can be proven by induction. The case n = 1 is clear.
Now suppose the result holds for (n − 1) × (n − 1) matrices and let
A ∈ Cn×n .
Let {λ1 , v1 } be an eigenvalue/eigenvector pair for A with kv1 k = 1.
Extend v1 to an orthonormal basis {v1 . . . , vn } for Cn and set
P1 = [v1 , · · · vn ].
289 / 323
Schur Factorization. (cont.) Note P1∗ = P1−1 . We may write
λ1 w
AP1 = P1 , M ∈ C(n−1)×(n−1) , w ∈ C1×(n−1) .
0 M
290 / 323
Schur Triangular Form.
This result shows that every A ∈ Cn×n is similar to an upper
triangular matrix U ∈ Cn×n via a change of coordinate matrix
P ∈ Cn×n that is unitary.
That is: every matrix A is unitarily similar to an upper triangular
matrix.
291 / 323
Definition. (Normal matrices) A matrix A ∈ Cn×n is normal if
A∗ A = AA∗ .
292 / 323
Theorem. If A ∈ Cn×n is normal and (λ, v) is an
eigenvalue/eigenvector pair, then {λ̄, v} is an
eigenvalue/eigenvector pair for A∗ .
Indeed,
293 / 323
Theorem. (Spectral theorem for normal matrices)
• A matrix A ∈ Cn×n is normal if and only if it is unitarily similar
to a diagonal matrix. That is, A is normal if and only if
A = PDP ∗
294 / 323
Spectral Theorem. (cont.)
Now suppose A ∈ Cn×n is normal, i.e. AA∗ = A∗ A. We begin by
writing the Schur factorization of A, i.e.
A = PUP ∗ , P = [v1 · · · vn ],
295 / 323
Spectral Theorem. (cont.)
We have shown
c11 0
U= ,
0 Ũ
where Ũ ∈ C(n−1)×(n−1) is upper triangular.
But now AP = PU gives Av2 = c22 v2 , and arguing as above we
deduce c2j = 0 for j = 3, . . . , n.
Continuing in this way, we deduce that U is diagonal.
296 / 323
Spectral Theorem. (cont.)
To summarize, A ∈ Cn×n is normal (AA∗ = A∗ A) if and only if it
can be written as A = PDP ∗ where P = [v1 · · · vn ] is unitary and
D = diag(λ1 , · · · , λn ). Note
I P unitary means P −1 = P ∗
I A is unitarily similar to a diagonal matrix
I {λj , vj } are eigenvalue-eigenvector pairs for A
297 / 323
Theorem. (Spectral Theorem for Self-Adjoint Matrices)
• A matrix A ∈ Cn×n is self-adjoint (A = A∗ ) if and only if it is
unitarily similar to a real diagonal matrix, i.e. A = PDP ∗ for some
unitary P ∈ Cn×n and some diagonal D ∈ Rn×n . •
Indeed, this follows from the spectral theorem for normal matrices.
In particular,
PDP ∗ = A = A∗ = PD ∗ P =⇒ D = D ∗ ,
298 / 323
Eigenvectors and eigenvalues for normal matrices. Suppose A
is a normal matrix.
I Eigenvectors associated to different eigenvalues are
orthogonal:
v1 · Av2 = λ2 v1 · v2 ,
v1 · Av2 = A∗ v1 · v2 = λ1 v1 · v2 .
299 / 323
Spectral decomposition. If A ∈ Cn×n is a normal matrix, then
we may write A = PDP ∗ as above. In particular,
A = λ1 v1 v1∗ + · · · + λn vn vn∗
Recall that
1
v v∗
kvk k2 k k
= vk vk∗
is the projection matrix for the subspace Vk = span{vk }.
Thus, a normal matrix can be written as the sum of scalar
multiples of projections on to the eigenspaces.
300 / 323
Chapter 7. Symmetric Matrices and Quadratic Forms
301 / 323
Definition. Let A ∈ Cn×n be a self-adjoint matrix. The function
Q(x) = x∗ Ax, x ∈ Cn
Q : Cn → R.
302 / 323
Characteristic forms. Expanding the inner product, we find that
n
X X
x∗ Ax = ajj |xj |2 + 2 Re(aij xi xj ).
j=1 i<j
Example.
1 −2 3
xT −2 4 −5 x = x12 + 4x22 − 6x32 − 4x1 x2 + 6x1 x3 − 10x2 x3 .
3 −5 −6
303 / 323
Characterization of definiteness. Let A ∈ Cn×n , Q(x) = x∗ Ax.
I There exists an orthonormal basis B = {v1 , . . . , vn } s.t.
A = PDP ∗ , where P = [v1 · · · vn ] and
D = diag(λ1 , · · · , λn ) ∈ Rn×n .
Then, with y = P −1 x
We conclude:
Theorem. If A ∈ Cn×n is self-adjoint, then Q(x) = x∗ Ax is
positive definite if and only if the eigenvalues of A are all positive.
(Similarly for negative definite, or semidefinite...)
304 / 323
Quadratic forms and conic sections. The equation
can be written as
T T a b
x Ax + [d e]x = f , A=A = .
b c
yT Dy + [d 0 e 0 ]y = f , [d 0 e 0 ] = [d e]P,
305 / 323
Principle axis theorem. The change of variables y = P T x gives
xT Ax + [d e]x = f ⇐⇒ yT Dy + [d 0 e 0 ]y = f .
306 / 323
Example. Consider x12 − 6x1 x2 + 9x22 . This corresponds to
1 −3
A= .
−3 9
307 / 323
Example. (cont.) Consider the conic section described by
308 / 323
Chapter 7. Symmetric Matrices and Quadratic Forms
309 / 323
Recall: A self-adjoint matrix A ∈ Cn×n is unitarily similar to a real
diagonal matrix. Consequently, we can write
A = λ1 u1 u∗1 + · · · + λn un u∗n ,
310 / 323
Quadratic forms and boundedness. Let A be self-adjoint.
Continuing from above,
x∗ Ax = λ1 x∗ u1 (u∗1 x) + · · · + λn x∗ un (u∗n x)
= λ1 |u∗1 x|2 + · · · + λn |u∗n x|2 .
We deduce
λn kxk2 ≤ x∗ Ax ≤ λ1 kxk2 .
311 / 323
Rayleigh principle. We continue with A as above and set
Ω0 = {0}, Ωk := span{u1 , . . . , uk }.
Then for x ∈ Ω⊥
k−1 we have
=⇒ λn ≤ x∗ Ax ≤ λk for all x ∈ Ω⊥
k−1 with kxk = 1.
But since u∗n Aun = λn and u∗k Auk = λk , we deduce the Rayleigh
principle: for k = 1, . . . , n,
min x∗ Ax = min x∗ Ax = λn ,
kxk=1 kxk=1, x∈Ω⊥
k−1
max x∗ Ax = λk .
kxk=1, x∈Ω⊥
k−1
312 / 323
Example. Let Q(x1 , x2 ) = 3x12 + 9x22 + 8x1 x2 , which corresponds
to
3 4
A= .
4 9
The eigenvalues are λ1 = 11 and λ2 = 2, with
Note
min x∗ Ax = λ2 = 1, max x∗ Ax = λ1 = 11,
kxk=1 kxk=1
313 / 323
Example. (cont.)
The contour curves Q(x1 , x2 ) = const are ellipses in the x1 x2 plane.
Using the change of variables y = P ∗ x, where
1 −2
P = √15
2 1
314 / 323
Chapter 7. Symmetric Matrices and Quadratic Forms
315 / 323
Singular values. For a matrix A ∈ Cn×p , the matrix A∗ A ∈ Cp×p
is self-adjoint. By the spectral theorem, there exists an
orthonormal basis B = {v1 , . . . , vp } for Cp consisting of
eigenvectors for A∗ A with real eigenvalues λ1 ≥ · · · ≥ λp .
Noting that x∗ (A∗ A)x = (Ax)∗ Ax = kAxk2 ≥ 0 for all x, we
deduce
λj = λj kvj k2 = vj∗ (A∗ A)vj ≥ 0 for all j.
p
Definition. With the notation above, we call σj := λj the
singular values of A.
I If rankA = r , then σr +1 = · · · = σp = 0.
I In this case {v1 , . . . , vr } is an orthonormal basis for col(A∗ ),
while {vr +1 , . . . , vp } is an orthonormal basis for nul(A).
316 / 323
Singular Value Decomposition. Let A ∈ Cn×p with rankA = r
as above. The vectors
1
uj = σj Avj , j = 1, . . . , r
317 / 323
SVD and linear transformations. Let T (x) = Ax be a linear
transformation T : Cp → Cn .
Writing A = UΣV ∗ as above, we have B = {v1 , . . . , vp } and
C = {u1 , . . . , un } are orthonormal bases for Cp and Cn . Then
318 / 323
Transformations of R2 . If T : R2 → R2 is given by T (x) = Ax,
then there exist unitary matrices U, V so that A = UDV T for
D = diag(σ1 , σ2 ).
Unitary matrices in R2×2 represent rotations/reflections of the
plane.
Every linear transformation of the plane is the composition of three
transformations: a rotation/reflection, a scaling transformation,
and a rotation/refection.
319 / 323
Moore–Penrose inverse of A ∈ Cn×p . Write
Vr = [v1 · · · vr ] ∈ Cp×r and Ur = [u1 · · · ur ] ∈ Cn×r . Then
A = UΣV ∗ = Ur DVr∗
320 / 323
Least squares solutions for A ∈ Cn×p . Recall that the least
squares solutions of Ax = b are the solutions to the normal system
A∗ Ax = A∗ b. Equivalently, they are solutions to Ax = projcolA b.
When rankA∗ A = r < p, there are infinitely many least squares
solutions.
Note that since AA+ = projcolA , we have
so A+ b ⊥ x̂ − A+ b. Consequently,
322 / 323
Review: Matrix Factorizations Let A ∈ Cn×p .
I Permuted LU factorization: PA = LU, where P ∈ Cn×n is an
invertible permutation matrix, L ∈ Cn×n is invertible and
lower triangular, and U ∈ C n×p is upper triangular.
I QR factorization: A = QR, where the columns of Q ∈ Cn×p
are generated from the columns of A by Gram-Schmidt and
R ∈ Cp×p is upper triangular.
I SVD: A = UΣV ∗ , where U ∈ Cn×n , V ∈ Cp×p are unitary,
D 0
D= ∈ Cn×p , D = diag(σ1 , . . . , σr ).
0 0
For A ∈ Cn×n :
I Schur factorization: A = PUP ∗ where P is unitary and U is
upper triangular.
I Spectral theorems: A = PDP ∗ , where P is unitary and D is
diagonal. This holds if and only if A is normal. The matrix D
is real if and only if A is self-adjoint.
323 / 323