Math 225 Linear Algebra II Lecture Notes: John C. Bowman University of Alberta Edmonton, Canada
Math 225 Linear Algebra II Lecture Notes: John C. Bowman University of Alberta Edmonton, Canada
Math 225 Linear Algebra II Lecture Notes: John C. Bowman University of Alberta Edmonton, Canada
John C. Bowman
University of Alberta
Edmonton, Canada
April 7, 2016
c 2010
John C. Bowman
ALL RIGHTS RESERVED
Reproduction of these lecture notes in any form, in whole or in part, is permitted only for
nonprofit, educational use.
Contents
1 Vectors 4
2 Linear Equations 6
3 Matrix Algebra 8
4 Determinants 11
6 Linear Transformations 16
7 Dimension 17
9 Complex Numbers 23
10 Projection Theorem 28
11 Gram-Schmidt Orthonormalization 29
12 QR Factorization 31
16 Quadratic Forms 43
17 Vector Spaces: 46
21 The Pseudoinverse 57
Index 58
1 Vectors
Vectors in Rn :
u1 v1 0
. . .
u = .. , v = .. , 0 = .. .
un vn 0
Parallelogram law:
u1 + v1
..
u+v = . .
un + vn
Multiplication by scalar c R:
cv1
.
cv = .. .
cvn
Length (norm):
q
|v|= vv = v12 + + vn2 .
Unit vector:
1
v.
|v|
d(u, v) = |u v|.
Law of cosines:
|u v|2 = |u|2 +|v|2 2|u||v|cos .
4
Orthogonal vectors u and v:
uv = 0.
|uv| |u||v|.
Triangle inequality:
p p
|u + v| = (u + v)(u + v) = |u|2 +2uv + |v|2
p p p
|u|2 +2|uv|+|v|2 |u|2 +2|u||v|+|v|2 = (|u|+|v|)2 = |u|+|v|.
Component of v in direction u:
u
v = |v|cos .
|u|
Equation of line:
Ax + By = C.
v = (1 t)p + tq.
Ax + By + Cz = D.
v = v0 + tu + sw.
5
2 Linear Equations
Linear equation:
a1 x1 + a2 x2 + + an xn = b.
a1 x1 + a2 x2 + + an xn = 0.
Matrix formulation:
a11 a12 a1n x1 b1
a21 a22 a2n x2 b2
.
.. .. ... .. .. = .. .
. . . .
a am2 amn xn bm
| m1 {z }
nm coefficient matrix
Augmented matrix:
a11 a12 a1n b1
a21 a22 a2n b2
.
.. .. .. .. .. .
. . . .
am1 am2 amn bm .
6
Remark: Elementary row operations do not change the solution!
For example:
2x1 + x2 = 5,
7x1 + 4x2 = 17.
A diagonal matrix is a square matrix whose nonzero values appear only as entries aii
along the diagonal. For example, the following matrix is diagonal:
1 0 0 0
0 1 0 0
0 0 0 0 .
0 0 0 1
An upper triangular matrix has zero entries everywhere below the diagonal (aij = 0
for i > j).
A lower triangular matrix has zero entries everywhere above the diagonal (aij = 0
for i < j).
Problem 2.1: Show that the product of two upper triangular matrices of the same
size is an upper triangle matrix.
7
3 Matrix Algebra
Element aij appears in row i and column j of the mn matrix A = [aij ]mn .
Multiplication of matrices:
" n
#
X
[aij ]mn [bjk ]n` = aij bjk .
j=1 m`
A(B + C) = AB + AC
Problem 3.1: Give examples of 2 2 matrices that commute and ones that dont.
Transpose of a matrix:
[aij ]Tmn = [aji ]nm .
Problem 3.2: Does a matrix necessarily compute with its transpose? Prove or
provide a counterexample.
(AB)T = BT AT .
8
Problem 3.4: Show that the dot product can be expressed as a matrix multiplication:
uv = uT v. Also, since uv = vu, we see equivalently that uv = v T u.
Problem 3.5: Let A and B be square matrices of the same size. Prove that
Tr(AB) = Tr(BA).
Identity matrix:
In = [ij ]nn .
AA1 = A1 A = I.
x = A1 b.
Inverse of a 2 2 matrix:
a b d b 1 0
= (ad bc)
c d c a 0 1
1
a b 1 d b
= exists if and only if ad bc 6= 0.
c d ad bc c a
Determinant of a 2 2 matrix:
a b a b
= ad bc.
det =
c d c d
9
An elementary matrix is obtained on applying a single elementary row operation
to the identity matrix.
Two matrices are row equivalent if one can be obtained from the other by elementary
row operations.
10
The elementary row operations that reduce an invertible matrix A to the identity
matrix can be applied directly to the identity matrix to yield A1 .
4 Determinants
Determinant of a 2 2 matrix:
a b
c d = ad bc.
Determinant of a 3 3 matrix:
a b c
e f d f d e
h i b g i + c g h .
d e f = a
g h i
Properties of Determinants:
Multiplying a single row or column by a scalar c scales the determinant by c.
Exchanging two rows or columns swaps the sign of the determinant.
Adding a multiple of one row (column) to another row (column) leaves the
determinant invariant.
Given an nn matrix A = [aij ], the minor Mij of the element aij is the determinant
of the submatrix obtained by deleting the ith row and jth column from A. The
signed minor (1)i+j Mij is called the cofactor of the element aij .
11
In general, evaluating a determinant by cofactor expansion is inefficient. However,
cofactor expansion allows us to see easily that the determinant of a triangular
matrix is simply the product of the diagonal elements!
Problem 4.2: Let A and B be square matrices of the same size. Prove that
det(AB) = det(A) det(B).
.
1
Problem 4.3: Prove for an invertible matrix A that det(A1 ) = .
det A
Determinants can be used to solve linear systems of equations like
a b x u
= .
c d y v
Cramers rule:
u b a u
v d c v
x = , y = .
a b a b
c d c d
For matrices larger than 3 3, row reduction is more efficient than Cramers rule.
An upper (or lower) triangular system of equations can be solved directly by back
substitution:
2x1 + x2 = 5,
4x2 = 17.
The transpose of the matrix of cofactors, adj A = [(1)i+j Mji ], is called the adjugate
(formerly sometimes known as the adjoint) of A.
12
The cross product of two vectors u = (ux , uy , uz ) and v = (vx , vy , vz ) in R3 can be
expressed as the determinant
i j k
uv = ux uy uz = (uy vz uz vy )i + (uz vx ux vz )j + (ux vy uy vx )k.
vx vy vz
The magnitude |uv|= |u||v|sin (where is the angle between the vectors u and
v) represents the area of the parallelogram formed by u and v.
If [aij ]nn is a triangular matrix, each of the n diagonal entries aii are eigenvalues
of A:
det(I A) = ( a11 )( a22 ) . . . ( ann ).
13
Problem 5.1: Show that the eigenvalues and corresponding eigenvectors of the ma-
trix
1 2
A=
3 2
are 1, with eigenvector [1, 1], and 4, with eigenvector [2, 3]. Note that the trace
(3) equals the sum of the eigenvalues and the determinant (4) equals the product
of eigenvalues. (Any nonzero multiple of the given eigenvectors are also acceptable
eigenvectors.)
Problem 5.2: Compute Av where A is the matrix in Problem 5.1 and v = [7, 3]T ,
both directly and by expressing v as a linear combination of the eigenvectors of A:
[7, 3] = 3[1, 1] + 2[2, 3].
Problem 5.3: Show that the trace and determinant of a 5 5 matrix whose charac-
teristic polynomial is det(I A) = 5 + 24 + 3 + 4 + 5 + 6 are given by 2
and 6, respectively.
14
Problem 5.5: Show that the matrix
1 1
0 1
has a double eigenvalue of 1 but only a single eigenvector [1, 0]T . The eigenvalue 1
thus has algebraic multiplicity two, while its geometric multiplicity is only one.
(b) an eigenvalue 3 with algebraic multiplicity two and geometric multiplicity one.
15
6 Linear Transformations
A mapping T : Rn Rm is a linear transformation from Rn to Rm if for all vectors
u and v in Rn and all scalars c:
(a) T (u + v) = T (u) + T (v);
An orthogonal linear operator T preserves lengths: |T (x)|= |x| for all vectors xb.
1
A square matrix A with real entries is orthogonal if AT A = I; that is, A = AT .
This implies that det A = 1 and preserves lengths:
Since the columns of an orthogonal matrix are unit vectors in addition to being
mutually orthogonal, they can also be called orthonormal matrices.
The null space null A (also known as the kernel ker A) of a matrix A is the subspace
consisting of all vectors u such that Au = 0.
16
Problem 6.1: Show that a linear transformation is one-to-one if and only if ker T = {0}.
Note that the matrices PB 0 B and PBB 0 are inverses of each other.
7 Dimension
The number of linearly independent rows (or columns) of a matrix A is known
as its rank and written rank A. That is, the rank of a matrix is the dimension
dim(row(A)) of the span of its rows. Equivalently, the rank of a matrix is the
dimension dim(col(A)) of the span of its columns.
The dimension of the null space of a matrix A is known as its nullity and written
dim(null(A)) or nullity A.
17
The dimension theorem states that if A is an m n matrix then
rank A + nullity A = n.
If A is an m n matrix then
While the row space of a matrix is invariant to (unchanged by) elementary row
operations, the column space is not. For example, consider the row-equivalent
matrices
0 0 1 0
A= and B= .
1 0 0 0
Note that row(A) = span([1, 0]) = row(B). However col(A) = span([0, 1]T ) but
col(B) = span([1, 0]T ).
(c) Every row echelon form of A has k nonzero rows and m k zero rows;
(d) The homogeneous system Ax = 0 has k pivot variables and n k free variables.
18
Problem 8.2: Suppose B = P1 AP. Multiplying A by the matrix P1 on the left
certainly cannot increase its rank (the number of linearly independent rows or
columns), so rank P1 A rank A. Likewise rank B = rank P1 AP rank P1 A
rank A. But since we also know that A is similar to B, we see that rank A rank B.
Hence rank A = rank B.
Problem 8.3: Show that similar matrices also have the same nullity.
Similar matrices represent the same linear transformation under the change of bases
described by the matrix P.
Problem 8.4: Prove that similar matrices have the following similarity invariants:
(a) rank;
(b) nullity;
(c) characteristic polynomial and eigenvalues, including their algebraic and geometric
multiplicities;
(d) determinant;
(e) trace.
I B = I P1 AP = P1 IP P1 AP = P1 (I A)P.
The matrices I B and I A are thus similar and share the same nullity (and hence
geometric multiplicity) for each eigenvalue . Moreover, this implies that the matrices
A and B share the same characteristic polynomial (and hence eigenvalues and algebraic
multiplicities):
1
det(I B) = det(P1 ) det(I A) det(P) = det(I A) det(P) = det(I A).
det(P)
19
An n n matrix A is diagonalizable if and only if A has n linearly independent
eigenvectors. Suppose that P1 AP = D, where
Since P is invertible, its columns are nonzero and linearly independent. Moreover,
p1j p1j
p2j p2j
A
... = j ... .
pnj pnj
This says, for each j, that the nonzero column vector [p1j , p2j , . . . , pnj ]T is an eigen-
vector of A with eigenvalue j . Moreover, as mentioned above, these n column
vectors are linearly independent. This established one direction of the claim. On
reversing this argument, we see that if A has n linearly independent eigenvectors,
we can use them to form an eigenvector matrix P such that AP = PD.
has a double eigenvalue of but only one eigenvector [1, 0]T (i.e. the eigenvalue
has algebraic multiplicity two but geometric multiplicity one) and consequently is
not diagonalizable.
20
Eigenvectors of a matrix A associated with distinct eigenvalues are linearly inde-
pendent. If not, then one of them would be expressible as a linear combination of
the others. Let us order the eigenvectors so that vk+1 is the first eigenvector that
is expressible as a linear combination of the others:
vk+1 = c1 v1 + c2 v2 + + ck vk , (1)
where the coefficients ci are not all zero. The condition that vk+1 is the first such
vector guarantees that the vectors on the right-hand side are linearly independent.
If each eigenvector vi corresponds to the eigenvalue i , then on multiplying by A
on the left, we find that
k+1 vk+1 = Avk+1 = c1 Av1 + c2 Av2 + + ck Avk = c1 1 v1 + c2 2 v2 + + ck k vk .
If we multiply Eq.(2) by k+1 we obtain:
k+1 vk+1 = c1 k+1 v1 + c2 k+1 v2 + + ck k+1 vk ,
The difference of the previous two equations yields
0 = c1 (1 k+1 )v1 + (2 k+1 )c2 v2 + + (k k+1 )ck vk ,
Since the vectors on the right-hand side are linearly independent, we know that
each of the coefficients ci (i k+1 ) must vanish. But this is not possible since the
coefficients ci are nonzero and eigenvalues are distinct. The only way to escape this
glaring contradiction is that all of the eigenvectors of A corresponding to distinct
eigenvalues must in fact be independent!
However, the converse is not true (consider the identity matrix). In the words
of Anton & Busby, the key to diagonalizability rests with the dimensions of the
eigenspaces, not with the distinctness of the eigenvalues.
21
Problem 8.7: Show that the characteristic polynomial of the matrix that describes
rotation in the xy plane by = 90 :
0 1
A=
1 0
Remark: An efficient way to compute high powers and fractional (or even negative
powers) of a matrix A is to first diagonalize it; that is, using the eigenvector matrix
P to express A = PDP1 where D is the diagonal matrix of eigenvalues.
Problem 8.9: Check the result of the previous problem by manually computing the
product A5 . Which way is easier for computing high powers of a matrix?
22
1
Remark: We can use the same technique to find the square root A 2 of A (i.e. a
matrix B such that B2 = A, we will again need to work in the complex number
system C, where i2 = 1:
1 1
1
A 2 = PD
2P
1 1 2 i 0 3 2
=
5 1 3 0 2 1 1
1 i 4 3 2
=
5 i 6 1 1
1 4 + 3i 4 2i
= .
5 6 3i 6 + 2i
Then
1
1
1
1
1 1
1 1
A A = PD P
2 2
2 PD P
2 = PD 2 D 2 P1 = PDP1 = A,
as desired.
9 Complex Numbers
In order to diagonalize a matrix A with linearly independent eigenvectors (such as a
matrix with distinct eigenvalues), we first need to solve for the roots of the charac-
teristic polynomial det(I A) = 0. These roots form the n elements of the diagonal
matrix that A is similar to. However, as we have already seen, a polynomial of
degree n does not necessarily have n real roots.
Recall that z is a root of the polynomial P (x) if P (z) = 0.
23
but with the unusual multiplication rule
Note that this multiplication rule is associative, commutative, and distributive. Since
we see that (x, 0) and (u, 0) behave just like the real numbers x and u. In fact, we
can map (x, 0) C to x R:
(x, 0) x.
Hence R C.
Remark: We see that the complex number z = (0, 1) satisfies the equation z 2 +1 = 0.
That is, (0, 1) (0, 1) = (1, 0).
Denote (0, 1) by the letter i. Then any complex number (x, y) can be represented
as (x, 0) + (0, 1)(y, 0) = x + iy.
Remark: The frequently appearingnotation 1 for i is misleading and should be
avoided, because the rule xy = x y (which one might anticipate) does not hold
for negative x and y, as the following contradiction illustrates:
p
1 = 1 = (1)(1) = 1 1 = i2 = 1.
Furthermore, by definition x 0, but one cannot write i 0 since C is not
ordered.
x + iy = x iy.
24
p
The complex modulus |z| of z = x + iy is given by x2 + y 2 .
Remark: If z R then |z| = z 2 is just the absolute value of z.
(i)
zz = (x, y)(x, y) = (x2 + y 2 , yx xy) = (x2 + y 2 , 0) = x2 + y 2 = |z|2 ,
(ii)
z + w = z + w,
(iii)
zw = z w.
Remark: Property (i) provides an easy way to compute reciprocals of complex num-
bers:
1 z z
= = 2.
z zz |z|
Remark: Note that this definition of the dot product implies that uv = vu = v T u.
Problem 9.2: Prove for k C that (ku)v = k(uv) but u(kv) = k(uv).
25
In view of the definition of the complex modulus, it is sensible to define the length
of a vector v = [v1 , v2 , . . . , vn ] in Cn to be
p
|v|= vv = v T v = v1 v1 + + vn vn = |v1 |2 + + |vn |2 0.
Proof: Suppose P (z) = nk=0 ak z k = 0, where each of the coefficients ak are real.
P
Then
n
X n
X n
X n
X
k
P (z) = ak (z) = ak zk = ak zk = ak z k = P (z) = 0 = 0.
k=0 k=0 k=0 k=0
Remark: Lemma 1 implies that eigenvalues of a matrix A with real coefficients also
occur in conjugate pairs: if is an eigenvalue of A, then so if . Moreover, if x is
an eigenvector corresponding to , then x is an eigenvector corresponding to :
Ax = x Ax = x
Ax = x
Ax = x.
Problem 9.3: Find the eigenvalues and eigenvectors of the real matrix [Anton &
Busby, p. 528]
2 1
.
5 2
If A is a real symmetric matrix, it must have real eigenvalues. This follows from
the fact that an eigenvalue and its eigenvector x 6= 0 must satisfy
Ax = x xT Ax = xT x = xx = |x|2
xT Ax (AT x)T x (Ax)T x (x)T x xx
= = = = = = .
|x|2 |x|2 |x|2 |x|2 |x|2
26
Remark: There is a remarkable similarity between the complex multiplication rule
(cos , sin ) (cos , sin ) = (cos cos sin sin , cos sin + sin cos )
= (cos( + ), sin( + )).
27
10 Projection Theorem
The component of a vector v in the direction u
is given by
= |v|cos .
vu
v = v (vu)
u.
vk = A(AT A)1 AT v.
vk = AAT v.
28
Problem 10.1: Find the orthogonal projection of v = [1, 1, 1] onto the plane W
spanned by the orthonormal vectors [0, 1, 0] and 45 , 0, 35 .
Compute
45
16
0 12
4
0 1
25 25 1 25
0 1 0
vk = AAT v = 1 0 4 3 1 = 0 1 0 1 = 1 .
3 50 5 12 9 3
0 5 1 25 0 5 1 25
11 Gram-Schmidt Orthonormalization
The Gram-Schmidt process provides a systematic procedure for constructing an or-
thonormal basis for an n-dimensional subspace span{a1 , a2 , . . . , an } of Rm :
3. Remove the components of the third vector parallel to the first and second:
a3 q1 a3 q2
q3 = a3 2
q1 q2 .
|q1 | |q2 |2
ak q1 ak q2 ak qk1
qk = ak q 1 q 2 qk1 . (3)
|q1 |2 |q2 |2 |qk1 |2
5. Finally normalize each of the vectors to obtain the orthogonal basis {q1 , q2 , . . . , qn }.
29
Remark: At the kth stage of the orthonormalization, if all of the previously created
vectors were orthogonal to each other, then so is the newly created vector:
ak qj
qk qj = ak qj |qj |2 = 0, j = 1, 2, . . . k 1.
|qj |2
Remark: The last two remarks imply that, at each stage of the orthonormalization,
all of the vectors created so far are orthogonal to each other!
Remark: At the kth stage of the orthonormalization, if all of the previously created
vectors were a linear combination of the original vectors a1 , a2 , . . . , an , then by
Eq. (4), we see that qk is as well.
Remark: The last two remarks imply that, at each stage of the orthonormalization,
all of the vectors created so far are orthogonal to each vector qk can be written as a
linear combination of the original basis vectors a1 , a2 , . . . , an . Since these vectors
are given to be linearly independent, we thus know that qk 6= 0. This is what
allows us to normalize qk in Step 5. It also implies that {q1 , q2 , . . . , qn } spans the
same space as {a1 , a2 , . . . , an }.
Problem 11.1: What would happen if we were to apply the Gram-Schmidt procedure
to a set of vectors that is not linearly independent?
Problem 11.2: Use the Gram-Schmidt process to find an orthonormal basis for the
plane x + y + z = 0.
In terms of the two parameters (say) y = s and z = t, we see that each point on the
plane can be expressed as
x = s t, y = s, z = t.
The parameter values (s, t) = (1, 0) and (s, t) = (0, 1) then yield two linearly independent
vectors on the plane:
a1 = [1, 1, 0] and a2 = [1, 0, 1].
The Gram-Schmidt process then yields
q1 = [1, 1, 0],
[1, 0, 1][1, 1, 0]
q2 = [1, 0, 1] [1, 1, 0]
|[1, 1, 0]|2
1
= [1, 0, 1] [1, 1, 0]
2
1 1
= , ,1 .
2 2
We note that q2 q1 = 0, as desired. Finally, we normalize the vectors to obtain q1 =
[ 12 , 12 , 0] and q2 = [ 16 , 16 , 26 ].
30
12 QR Factorization
We may rewrite the kth step (Eq. 4) of the Gram-Schmidt orthonormalization
process more simply in terms of the unit normal vectors qj :
qk = ak (ak q1 )q1 (ak q2 )q2 (ak qk1 )qk1 . (4)
On taking the dot product with qk we find, using the orthogonality of the vectors qj ,
0 6= qk qk = ak qk (5)
so that qk = (qk qk )qk = (ak qk )qk . Equation 5 may then be rewritten as
Remark: Every matrix m n matrix A with full column rank (linearly independent
columns) thus has a QR factorization.
Remark: Equation 6 implies that each of the diagonal elements of R is nonzero. This
guarantees that R is invertible.
QT A = R.
31
13 Least Squares Approximation
Suppose that a set of data (xi , yi ) for i = 1, 2, . . . n is measured in an experiment and
that we wish to test how well it fits an affine relationship of the form
y = + x.
Here and are parameters that we wish to vary to achieve the best fit.
If each data point (xi , yi ) happens to fall on the line y = +x, then the unknown
parameters and can be determined from the matrix equation
1 x1 y1
y2
1 x2
.. .. = ... . (7)
. .
1 xn yn
If the xi s are distinct, then the n 2 coefficient matrix A on the left-hand side, known
as the design matrix, has full column rank (two). Also note in the case of exact
agreement between the experimental data and the theoretical model y = + x that
the vector b on the right-hand side is in the column space of A.
AT Ax = AT b.
1 x1
P
1 1 1 x2 n
T
A A= .. .. = P
P x2i
x 1 xn . . xi xi
1 xn
32
and
y
1
y2
P
T 1 1 yi
A b= . = P
,
x1 xn .. xi y i
yn
where the sums are computed from i = 1 to n. Thus the solution to the least-
squares fit of Eq. (8) is given by
P 1 P
n x i y i
= P P 2 P .
xi xi xi y i
Problem 13.1: Show that the least squares line of best fit to the measured data
(0, 1), (1, 3), (2, 4), and (3, 4) is y = 1.5 + x.
The least squares method is sometimes called linear regression and the line y =
+ x can either be called the least squares line of best fit or the regression line.
For each data pair (xi , yi ), the difference yi ( + xi ) is called the residual.
Remark: Since the least-squares solution x = [, ] minimizes the least squares error
vector
b = b Ax = [y1 ( + x1 ), . . . , yn ( + xn )],
Xn
the method effectively minimizes the sum [yi ( + xi )]2 of the squares of the
i=1
residuals.
Problem 13.2: Show that for the same measured data as before, (0, 1), (1, 3), (2, 4),
and (3, 4), that
2 3
R=
0 5
and from this that the least squares solution again is y = 1.5 + x.
33
Remark: The least squares method can be extended to fit a polynomial
y = a0 + a1 x + + am x m
1 x1 x21 xm a0 y1
1
2 m
1 x2 x2 x2 a1 y2
. . .. . . .. . = . .
.. .. . . . .. ..
1 xn x2n xm n am yn
Here, the design matrix takes the form of a Vandermonde matrix, which has special
properties. For example, in the case where m = n 1, its determinant is given
Yn
by (xi xj ). If the n xi values are distinct, this determinant is nonzero. The
i,j=1
i>j
system of equations is consistent and has a unique solution, known as the Lagrange
interpolating polynomial. This polynomial passes exactly through the given data
points. However, if m < n 1, it will usually not be possible to find a polynomial
that passes through the given data points and we must find the least squares
solution by solving the normal equation for this problem [cf. Example 7 on p. 403
of Anton & Busby].
For real matrices, the Hermitian transpose is equivalent to the usual transpose.
34
If A and B are square matrices of the same size and B = P AP for some orthogonal
matrix P, we say that B is orthogonally similar to A.
Problem 14.1: Show that two orthogonally similar matrices A and B are similar.
Orthogonally similar matrices represent the same linear transformation under the
orthogonal change of bases described by the matrix P.
D = P AP
A = PDP ,
and
A = (PDP ) = PD P .
Then AA = PDP PD P = PDD P and A A = PD P PDP = PD DP . But
D D = DD since D is diagonal. Therefore A A = AA .
35
if A is an orthogonally diagonalizable matrix with real eigenvalues, it must be
Hermitian: if A = PDP , then
A = (PDP ) = PD P = PDP = A.
A = PUP ,
0 0
This requires that A have the form
1
0
. . .. .
.. .. . .. .
. . .. .. ..
.. .. . . .
0
36
This means that in the coordinate system {x1 , x2 , b3 , . . . , bn }, A has the form
1
0 2
. .. .. .
.. . . .. .
. .. .. .. ..
.. . . . .
0 0
Problem 14.4: Show that every normal nn upper triangular matrix U is a diagonal
matrix. Hint: letting Uij for i j be the nonzero elements of U, we can write out for
each i = 1, . . . , n the diagonal elements ik=1 Uik Uki = ik=1 Uki Uki of U U = UU :
P P
i
X n
X
|Uki |2 = |Uik |2 .
k=1 k=i
If A is a Hermitian matrix, it must have real eigenvalues. This follows from the
fact that an eigenvalue and its eigenvector x 6= 0 must satisfy
Ax = x x Ax = x x = xx = |x|2
x Ax (A x) x (Ax) x (x) x xx
= 2
= 2
= 2
= 2
= = .
|x| |x| |x| |x| |x|2
is similar to a diagonal matrix with real eigenvalues and find the eigenvalues.
37
Problem 14.6: Show that the anti-Hermitian matrix
0 i
i 0
is similar to a diagonal matrix with complex eigenvalues and find the eigenvalues.
2
If A is an n n normal matrix, then |Ax|2 = x A Ax = x AA x = A x for all
vectors x Rn .
Problem 14.7: If A is a normal matrix, show that A I is also normal.
If A is a normal matrix and Ax = x, then 0 = |(A I)x|= |(A I)x|, so that
A x = x.
If A is a normal matrix, the eigenvectors associated with distinct eigenvalues are
orthogonal: if Ax = x and Ay = y, then
0 = (x Ay)T x Ay = y T A xx Ay = y T xx y = x yx y = ()xy,
so that xy = 0 whenever 6= .
An n n normal matrix A can therefore be orthogonally diagonalized by applying
the Gram-Schmidt process to each of its distinct eigenspaces to obtain a set of n
mutually orthogonal eigenvectors that form the columns of an orthonormal ma-
trix P such that AP = PD, where the diagonal matrix D contains the eigenvalues
of A.
Problem 14.8: Find a matrix P that orthogonally diagonalizes
4 2 2
2 4 2 .
2 2 4
xn
= 1 x1 x1 + 2 x2 x2 + + n xn xn .
38
As we have seen earlier, powers of an orthogonally diagonalizable matrix A = PDP
are easy to compute once P and U are known.
Remark: The Caley-Hamilton Theorem can also be used to compute negative powers
(such as the inverse) of a matrix. For example, we can rewrite 9 as
1 n1 cn1 n2 c1
A A A . . . A = I.
c0 c0 c0
39
Problem 14.10: Compute eA for the diagonalizable matrix
1 2
A= .
3 2
40
Remark: A linear combination of solutions to an ordinary differential equation is
also a solution.
y = e1 t x1 , y = e2 t x2 , ..., y = ek t xk
0 = c1 e1 t x1 + c2 e2 t x2 + . . . + ck ek t xk
then at t = 0 we find
0 = c1 x1 + c2 x2 + . . . + ck xk ;
the linear independence of the k eigenvectors then implies that
c1 = c2 = . . . = ck = 0.
y = c1 e1 t x1 + c2 e2 t x2 + + cn en t xn ,
41
Moreover, if y = y0 at t = 0 we find that
c1
c2
y0 = P
... .
cn
If the n eigenvectors x1 , x2 , . . ., xn are linearly independent, then
c1
c2
. = P1 y0 .
..
cn
The solution to y 0 = Ay for the initial condition y0 may thus be expressed as
1 t
e 0 0
0 e2 t 0
y = P
... .. .. .. P1 y0 = eAt y0 ,
. . .
0 0 en t
on making use of Eq. (11) with f (x) = ex and A replaced by At.
y 0 = Ay y(0) = y0
actually holds even when the matrix A isnt diagonalizable (that is, when A doesnt
have a set of n linearly independent eigenvectors). In this case we cannot use
Eq. (11) to find eAt . Instead we must find eAt from the infinite series in Eq. (10):
A2 t2 A3 t3
eAt = I + At + + + ....
2! 3!
42
16 Quadratic Forms
Linear combinations of x1 , x2 , . . ., xn such as
n
X
a1 x 1 + a2 x 2 + . . . + an x n = ai x i ,
i=1
where a1 , a2 , . . ., an are fixed but arbitrary constants, are called linear forms on Rn .
where the aij are fixed but arbitrary constants, are called quadratic forms on Rn .
(Principle Axes Theorem) Consider the change of variable x = Py, where P or-
thogonally diagonalizes A. Then in terms of the eigenvalues 1 , 2 , . . ., n of A:
xT Ax = (Py)T APy = y T (PT AP)y = 1 y12 + 2 y22 + . . . + n yn2 .
43
Problem 16.1: Find the minimum and maximum values of the quadratic form
z = 5x2 + 4xy + 5y 2
if a > 0 and ad > b2 then d > 0, so both the trace a + d and determinant are
positive, ensuring in turn that both eigenvalues of A are positive.
44
Remark: If A is a symmetric 2 2 matrix then the equation xT Ax = 1 can be
expressed in the orthogonal coordinate system y = PT x as 1 y12 + 2 y22 = 1. Thus
xT Ax represents
a hyperbola if A is indefinite.
Problem 16.2: Determine the relative minima, maxima, and saddle points of the
function [Anton & Busby p. 498]
1
f (x, y) = x3 + xy 2 8xy + 3.
3
45
17 Vector Spaces:
A vector space V over R (C) is a set containing an element 0 that is closed under a
vector addition and scalar multiplication operation such that for all vectors u, v,
w in V and scalars c, d in R (C) the following axioms hold:
(A1) (u + v) + w = u + (v + w) (associative);
(A2) u + v = v + u (commutative);
Problem 17.1: Show that axiom (A2) in fact follows from the other seven axioms.
Since it satisfies axioms A1A8, the set of m n matrices is in fact a vector space.
46
The set C(R) of continuous functions on R is a vector space.
The set C 1 (R) of functions with continuous first derivatives on R is a vector space.
The set C m (R) of functions with continuous mth derivatives on R is a vector space.
The set C (R) of functions with continuous derivatives of all orders on R is a vector
space.
Definition: A nonempty subset of a vector space V that is itself a vector space under
the same vector addition and scalar multiplication operations is called a subspace
of V .
If the Wronskian
f1 (x) f2 (x) fn (x)
f10 (x) f20 (x) fn0 (x)
W (x) = .. .. .. ..
. . . .
f (n1) (x) f (n1) (x) f (n1) (x)
1 2 n
47
Problem 17.2: Show that the functions f1 (x) = 1, f2 (x) = ex , and f3 (x) = e2x are
linearly independent on R.
In a vector
p space V with inner product hu, vi, the norm |v| p of a vector v is
given by hv, vi, the distance d(u, v) is given by |u v|= hu v, u vi,
and the angle between two vectors u and v satisfies cos = hu, vi. Two vec-
tors u and v are orthogonal if hu, vi = 0. Analogues of the Pythagoras theorem,
Cauchy-Schwarz inequality, and triangle inequality follow directly from (I1)(I4).
defines an inner product on the vector space C[a, b] of continuous functions on [a, b],
with norm s
Z b
|f |= f 2 (x) dx.
a
48
The Fourier theorem states than an orthonormal basis for the infinite-dimensional
vector space
R of differentiable periodic functions on [, ] with inner product
c
hf, gi = f (x)g(x) dx is given by {un }n=0 = { 2 , c1 , s1 , c2 , s2 , . . .}, where
0
1
cn (x) = cos nx
and
1
sn (x) = sin nx.
The orthonormality of this basis follows from the trigonometric addition formulae
and
cos(nx mx) = cos nx cos mx sin nx sin mx,
from which we see that
sin(nx + mx) sin(nx mx) = 2 cos nx sin mx, (11)
since the cosine function is periodic with period 2. When n = m > 0 we find
Z
cos(2nx)
2 cos nx sin nx dx = = 0,
2n
49
Similarly, Eq. (14) yields for distinct non-negative integers n and m,
Z
2 sin nx sin mx dx = 0,
but Z
2 sin(2nx)
2 sin nx dx = x = 2.
2n
and Z
1 1
bn = hf, sn i = f (x) sin nx dx (n = 1, 2, . . .).
Problem 18.3: By using integration by parts, show that the Fourier series for f (x) =
x on [, ] is
X (1)n+1 sin nx
2 .
n=1
n
For x (, ) this series is guaranteed to converge to x. For example, at x = /2,
we find
2m+2
(1) sin (2m + 1)
(1)m
2
X X
4 =4 = .
m=0
2m + 1 m=0
2m + 1
50
Problem 18.4: By using integration by parts, show that the Fourier series for f (x) =
|x| on [, ] is
4 X cos(2m 1)x
.
2 m=1 (2m 1)2
This series can be shown to converge to |x| for all x [, ]. For example, at
x = 0, we find
X 1 2
2
= .
m=1
(2m 1) 8
Remark: Many other concepts for Euclidean vector spaces can be generalized to
function spaces.
The functions cos nx and sin nx in a Fourier series can be thought of as the
eigenfunctions of the differential operator d2 /dx2 :
d2
y = n2 y.
dx2
(a) T (0) = 0;
Problem 19.2: Show that the transformation that maps each n n matrix to its
trace is a linear transformation.
Problem 19.3: Show that the transformation that maps each n n matrix to its
determinant is not a linear transformation.
51
The kernel of a linear transformation T : V W is the set of all vectors in V that
are mapped by T to the zero vector.
The range of a linear transformation T is the set of all vectors in W that are the
image under T of at least one vector in V .
Problem 19.4: Given an inner product space V containing a fixed nonzero vector u,
let T (x) = hx, ui for x V . Show that the kernel of T is the set of vectors that
are orthogonal to u and that the range of T is R.
Problem 19.5: Show that the derivative operator, which maps continuously differ-
entiable functions f (x) to continuous functions f 0 (x), is a linear transformation
from C 1 (R) to C(R).
Problem 19.6: Show that the antiderivative operator,R which maps continuous func-
x
tions f (x) to continuously differentiable functions 0 f (t) dt, is a linear transfor-
mation from C(R) to C 1 (R).
Problem 19.7: Show that the kernel of the derivative operator on C 1 (R) is the set
of constant functions on R and that its range is C(R).
52
Let T : Rn Rn be a linear operator and B = {v1 , v2 , . . . , vn } be a basis for Rn .
The matrix
A = [[T (v1 )]B [T (v2 )]B [T (vn )]B ]
is called the matrix for T with respect to B, with
[T (x)]B = A[x]B
for all x Rn . In the case where B is the standard Cartesian basis for Rn , the ma-
trix A is called the standard matrix for the linear transformation T . Furthermore,
if B 0 = {v10 , v20 , . . . , vn0 } is any basis for Rn , then
Problem 20.1: Show that null(A A) = null(A) and conclude from the dimension theorem
that rank(A A) = rank(A).
[ u1 u2 . . . uk ]
53
is orthogonal. We also see that Avj = 0 for k < j n. We can then extend this set
of vectors to an orthonormal basis for Rm , which we write as the column vectors
of an m m matrix
U = [ u1 u2 . . . uk . . . um ].
A = UV .
Problem 20.2: Show that the singular values of a positive-definite Hermitian matrix
are the same as its eigenvalues.
Problem 20.3: Find a singular value decomposition of [Anton & Busby p. 505]
3 2
A= .
0 3
54
Problem 20.4: Find a singular value decomposition of [Anton & Busby p. 509]
1 1
A = 0 1 .
1 0
We find
1 1
1 0 1 2 1
AT A = 0 1 = ,
1 1 0 1 2
1 0
which has characteristic polynomial (2 )2 1 = 2 4 + 3 = ( 3)( 1). The matrix
AT A has eigenvalues 3 and 1; their respective eigenvectors
" 1 # " 1 #
2 2
and
1 12
2
55
Remark: An alternative way of computing a singular value decomposition of the
nonsquare matrix in Problem 20.4 is to take the transpose of a singular value
decomposition for
T 1 0 1
A = .
1 1 0
The first k columns of the matrix U form an orthonormal basis for col(A), whereas
the remaining m k columns form an orthonormal basis for col(A) = null(AT ).
The first k columns of the matrix V form an orthonormal basis for row(A), whereas
the remaining n k columns form an orthonormal basis for row(A) = null(A).
Problem 20.6: Find a reduced singular value decomposition for the matrix
1 1
A = 0 1
1 0
considered previously in Problem 20.4.
56
If A is an n n matrix of rank k, then A has the polar decomposition
A = UV = UU UV = PQ,
21 The Pseudoinverse
If A is an n n matrix of rank n, then we know it is invertible. In this case,
the singular value decomposition and the reduced singular value decomposition
coincide:
A = U1 1 V1 ,
where the diagonal matrix 1 contains the n positive singular values of A. More-
over, the orthogonal matrices U1 and V1 are square, so they are invertible. This
yields an interesting expression for the inverse of A:
A1 = V1 1
1 U1 ,
where 1
1 contains the reciprocals of the n positive singular values of A along its
diagonal.
A+ = V+ U .
Problem 21.1: Show that the pseudoinverse of the matrix in Problem 20.4 is [cf.
Anton & Busby p. 520]: 1 1 2
3
3 3
.
1 2 1
3 3
3
57
Problem 21.2: Prove that the pseuodinverse A+ of an m n matrix A satisfies the
following properties:
(a) AA+ A = A;
(b) A+ AA+ = A+ ;
(c) A AA+ = A ;
(e) (A+ ) = (A )+ .
Remark: If A has full column rank, then A A is invertible and property (c) in
Problem 21.2 provides an alternative way of computing the pseudoinverse:
1
A+ = A A A .
58
Index
C, 23 design matrix, 32
span, 10 Determinant, 11
QR factorization, 31 diagonal matrix, 7
diagonalizable, 19
Addition of matrices, 8 dimension theorem, 18
additive identity, 46 Distance, 4
additive inverse, 46 distance, 48
additivity, 48 distributive, 46
adjoint, 13 domain, 16
adjugate, 13 Dot, 4
affine, 32 dot product, 25
Angle, 4
angle, 48 eigenfunctions, 51
associative, 46 eigenspace, 15
Augmented matrix, 6 eigenvalue, 13
axioms, 46 eigenvector, 13
eigenvector matrix, 20
basis, 17 elementary matrix, 10
Caley-Hamilton Theorem, 39 Elementary row operations, 7
Cauchy-Schwarz inequality, 5 Equation of line, 5
change of bases, 19 Equation of plane, 5
characteristic equation, 14 first-order linear differential equation, 40
characteristic polynomial, 14 fixed point, 13
cofactor, 12 Fourier coefficients, 50
commutative, 46 Fourier series, 50
commute, 8 Fourier theorem, 49
Complex Conjugate Roots, 26 full column rank, 28
complex modulus, 25 fundamental set of solutions, 41
complex numbers, 22, 23 Fundamental Theorem of Algebra, 27
Component, 5
component, 28 GaussJordan elimination, 7
conjugate pairs, 26 Gaussian elimination, 7
conjugate symmetry, 48 general solution, 41
consistent, 32 Gram-Schmidt process, 29
coordinates, 17
Cramers rule, 12 Hermitian, 34
critical point, 45 Hermitian transpose, 34
cross product, 13 Hessian, 45
homogeneity, 48
deMoivres Theorem, 27 Homogeneous linear equation, 6
59
Identity, 9 null space, 16
imaginary number, 22 nullity, 17
indefinite, 44
initial condition, 40 one-to-one, 16, 17
inner, 4 onto, 17
inner product, 48 ordered, 24
Inverse, 9 orthogonal, 16, 34, 48
invertible, 9 orthogonal change of bases, 35
isomorphic, 52 orthogonal projection, 28
isomorphism, 52 Orthogonal vectors, 5
orthogonally diagonalizable, 35
kernel, 16, 52 orthogonally similar, 35
orthonormal, 16, 34
Lagrange interpolating polynomial, 34
orthonormal basis, 28
Law of cosines, 4
least squares, 32 parallel component, 28
least squares error vector, 33 Parallelogram law, 4
least squares line of best fit, 33 Parametric equation of line, 5
Length, 4 Parametric equation of plane, 6
length, 26 perpendicular component, 28
linear, 8 polar decomposition, 57
Linear equation, 6 Polynomial Factorization, 27
linear forms, 43 positive definite, 44
linear independence, 47 positivity, 48
linear operator, 16 pseudoinverse, 57
linear regression, 33 Pythagoras theorem, 5
linear transformation, 16, 51
linearly independent, 10 quadratic forms, 43
lower triangular matrix, 7
range, 16, 52
Matrix, 6 rank, 17
matrix, 8 real, 26
matrix for T with respect to B, 53 reciprocals, 25
matrix representation, 17 Reduced row echelon form, 7
minor, 12 reduced singular value decomposition, 56
Multiplication by scalar, 4 regression line, 33
Multiplication of a matrix by a scalar, 8 relative maximum, 45
Multiplication of matrices, 8 relative minimum, 45
multiplicative identity, 46 residual, 33
root, 23
negative definite, 44 rotation, 16
norm, 4, 48 Row echelon form, 7
normal, 35 row equivalent, 10
normal equation, 28
60
saddle point, 45
scalar multiplication, 46
Schur factorization, 36
similar, 18
similarity invariants, 19
singular value decomposition, 54
singular value expansion, 56
singular values, 54
spectral decomposition, 38
standard matrix, 53
subspace, 10, 47
symmetric matrix, 8
System of linear equations, 6
Trace, 9
transition matrix, 17
Transpose, 8
Triangle inequality, 5
triangular system, 13
Unit vector, 4
unitary, 34
upper triangular matrix, 7
Vandermonde, 34
vector space, 46
Vectors, 4
Wronskian, 47
61