0% found this document useful (0 votes)
79 views16 pages

Math 5610 Fall 2018 Notes of 9/24/18 Review: The Significance of Orthogonal Matrices

- The document discusses the singular value decomposition (SVD) of matrices. - The SVD decomposes a matrix A into three matrices: A = UΣV^T, where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. - The SVD is useful because it can help determine the rank of a matrix and reduce problems involving a matrix A into ones involving the diagonal matrix Σ, without amplifying errors.

Uploaded by

bb sparrow
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views16 pages

Math 5610 Fall 2018 Notes of 9/24/18 Review: The Significance of Orthogonal Matrices

- The document discusses the singular value decomposition (SVD) of matrices. - The SVD decomposes a matrix A into three matrices: A = UΣV^T, where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. - The SVD is useful because it can help determine the rank of a matrix and reduce problems involving a matrix A into ones involving the diagonal matrix Σ, without amplifying errors.

Uploaded by

bb sparrow
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Math 5610 Fall 2018

Notes of 9/24/18

Review: The Significance


of Orthogonal Matrices

A square matrix Q is orthogonal if

QT = Q−1 , i.e., QT Q = QQT = I. (1)

Hence Q−1 = QT , and an orthogonal matrix is


invertible. Moreover, orthogonal matrices pro-
vide an exception (pretty much the only one) to
our rule that you never invert a matrix.
Since the (i, j) entry of QT Q is the dot prod-
uct of the i-th and j-th columns of Q, it is clear
that geometrically the columns (or rows) of an
orthogonal matrix form an orthonormal set, i.e.,
a set of unit vectors that are pairwise orthogo-
nal. Another way of looking at this is that the
columns of an orthogonal matrix provide a basis
of orthonormal vectors of IRn , and that any such
basis defines an orthogonal matrix.
Orthogonal matrices are used ubiquitously through-
out Numerical Analysis because they do not
amplify errors. To see this note that if Q is
orthogonal then

kQxk22 = (Qx)T Qx = xT QT Qx = xT x = kxk22 .


(2)

Math 5610 Fall 2018 Notes of 9/24/18 page 1


So if you build a numerical algorithm on multi-
plications with orthogonal matrices then errors
present in the original problem will not be am-
plified by the algorithm.
The fact (2) has immediate consequences (think
about it): The 2-norm−1− of an orthogonal ma-
trix is 1, and any eigenvalue has absolute value
equal to 1. The property (1) implies that the
determinant of an orthogonal matrix is plus or
minus 1:
kQk2 = 1, Qx = λx 6= 0 =⇒ |λ| = 1, det Q = ±1.
(3)
It is also true that the product of orthogonal
matrices is orthogonal.
Exercise 1. Verify the claims made in this sec-
tion. Find some orthogonal matrices.
We will encounter many orthogonal matrices in
the course of this semester, including the identity
matrix, permutation matrices, reflection matri-
ces, rotation matrices, and several factorizations
involving orthogonal matrices. The most versa-
tile such factorization is

The Singular Value Decomposition.


What Is It.
Suppose we are given an m × n matrix A, where,
usually,
m ≥ n. (4)
−1−
The norm of a matrix A is maxx6=0 kAxk/kxk.

Math 5610 Fall 2018 Notes of 9/24/18 page 2


The Singular Value Decomposition of A is

A = U ΣV T (5)

where
• U is m × m orthogonal, i.e., U −1 = U T ,
• V is n × n orthogonal, i.e., V −1 = V T , and
• Σ is m × n diagonal. Specifically,
 
σ1 0 . . . 0
 0 σ2 . . . 0 
 . .. . . .. 
 . 
 . . . . 
 
Σ= 0 0 . . . σn  (6)
 
 0 0 ... 0 
 . .. .. 
 .. . . 
0 0 ... 0

where
σ1 ≥ σ2 ≥ . . . ≥ σn ≥ 0. (7)
Note that Σ is a matrix! The capital Greek letter
Σ in this context has nothing to do with the
summation symbol−2− . The σi are the singular
−2−
I first learned about the singular value decom-
position in an excellent talk by Cleve Moler that
I understood only in retrospect. At the time of
that first exposure the talk was utterly wasted
on me because the whole time I kept thinking
What is he summing there? However, the nota-
tion A = U ΣV T is well established.

Math 5610 Fall 2018 Notes of 9/24/18 page 3


values of A. The columns of U and V are the
left and right, respectively, singular vectors of A.
Some insight may be gained, and some of the
mystery can perhaps be lifted, by observing that
the right singular vectors of A are the eigen-
vectors of AT A, and the singular values are the
square roots of the corresponding eigenvalues of
AT A.
To see this let vj be the j-th column of V and
note that
AT = V ΣT U T . (8)
Letting
 σ2 0 ... 0 
1
 0 σ22 ... 0 
T
S=Σ Σ=
 .. .. .. .. 
 (9)
. . . .
0 0 . . . σn2

and ej be the j-th unit vector. We obtain

AT Avj = V ΣT U T T
| {zU} Σ |V {zv}j
=I =ej

= V ΣT Σej
(10)
= V Sej
= σj2 V ej
= σj2 vj

which is what we want to show.

Math 5610 Fall 2018 Notes of 9/24/18 page 4


Existence Theorem. Every matrix has a sin-
gular value decomposition.
The following proof is taken (with little modifi-
cation but a little elaboration) from Golub/van
Loan, p. 76. Let A ∈ IRm×n , and let x ∈ IRn
and y ∈ IRm be unit 2-norm vectors that satisfy
Ax = σy with σ = kAk2 . (11)
There exist matrices V2 ∈ IRn×(n−1) and U2 ∈
IRm×(m−1) such that
V = [x V2 ] and U = [y U2 ] (12)
are orthogonal.
It is not hard to show that U T AV has the fol-
lowing structure:
 T

σ w
U T AV = =: A1 . (13)
0 B
Since
  2
 σ
kA1 k22 σ 2 + wT w ≥ 2
1 w ≥ (σ +w w)
A T 2

2
 (14)
we have, after dividing by σ 2 + wT w , that
kA1 k22 ≥ (σ 2 + wT w). (15)
But
σ 2 = kAk22 = kA1 k22 ≥ σ 2 + wT w, (16)
and so we must have w = 0. An obvious induc-
tion argument completes the proof of the theo-
rem.

Math 5610 Fall 2018 Notes of 9/24/18 page 5


Why It Is Important.
The Singular Value Decomposition (SVD) is the
most versatile and powerful matrix factorization
in numerical linear algebra. It’s expensive to
compute, but if all else fails the SVD has the
best chance of succeeding.
Most applications of the SVD consist of reduc-
ing a problem involving A into one involving Σ.
Note that since

Σ = U T AV (17)

Σ is obtained from A by multiplying with two


orthogonal matrices, and multiplying with an or-
thogonal matrix does not amplify errors. So once
we know U and V then Σ can be obtained from
A in a process that is as well conditioned as it
can be.
The Singular Value Decomposition (of square
matrices) was first discovered independently by
Beltrami in 1873 and Jordan in 1874.

References.
The SVD is discussed in many textbooks on nu-
merical analysis, or numerical linear algebra. The
most comprehensive discussion is in the author-
itative monograph
• Gene H. Golub and Charles F. van Loan, Ma-
trix Computations, 4th ed., The Johns Hopkins
University Press, 2013, ISBN 10: 1-4214-0794-9.

Math 5610 Fall 2018 Notes of 9/24/18 page 6


However, the most easily understood first expla-
nation is in
• David Kahaner, Cleve Moler and Stephen Nash,
Numerical Methods and Software, Prentice Hall,
1989, ISBN 0-13-627258-4.

Computing the SVD.


Since the computation of the SVD amounts to
the solution of an eigenvalue problem it follows
that in general the SVD cannot be computed
exactly in a finite number of steps, and so in-
trinsically one has to use some sort of iteration.
The actual computation is involved and sophis-
ticated, for details consult the above mentioned
reference by Golub and van Loan. The empha-
sis in these notes is on what you can actually do
with the SVD, once you have it−3− . Thus the
remainder of these notes lists a sequence of ap-
plications of the Singular Value Decomposition.

Some Applications.
Note that the following list in no way is meant
to be complete.

Rank Determination.
−3−
Software to compute the SVD is available, for
example, at

http://www.netlib.org/

Math 5610 Fall 2018 Notes of 9/24/18 page 7


The rank of a matrix is the maximum number of
linearly independent rows or columns. In princi-
ple one can compute it by carrying out Gaussian
Elimination until it becomes impossible to find a
non-zero pivot element by row and column piv-
oting. The problem of that approach is that be-
cause of round-off errors it is extremely difficult
to decide when a number is zero. Numbers that
should be zero usually aren’t because of inexact
arithmetic. Since U and V are non-singular the
rank of A equals the rank of Σ, and the rank of
Σ equals the number of non-zero singular values.
In this context, a singular values σi is considered
zero if
σi
<τ (18)
σ1

where τ is a specified tolerance that usually is a


small multiple of the round-off unit−4− . Another
approach is based on looking at the whole set of
singular values. Often they decrease gradually
and then there is a pronounced jump to very
small singular values. If the last singular value
before the jump is σr then r is the rank of A
(and Σ).
Throughout the remainder of this note we will

−4−
The round off unit ǫ, also called the machine
epsilon, is the smallest number that can be rep-
resented on a computer such that the system rec-
ognizes 1 + ǫ as being larger than 1. On many
systems, including our Unix systems, ǫ equals
approximately 2 × 10−16 .

Math 5610 Fall 2018 Notes of 9/24/18 page 8


assume that

σ1 ≥ σ2 ≥ . . . ≥ σr > σr+1 = σr+2 = . . . = σn = 0


(19)
and hence

rankA = rankΣ = r. (20)

It is of course possible that r = n, in which case


A has full rank.

Computing the Determinant of a


Square Matrix. The determinant of an or-
thogonal matrix is positive or negative 1. The
determinant of a square diagonal matrix is the
product of its diagonal entries. The determinant
of the product of two matrices is the product of
the individual determinants. Thus for a square
matrix A its determinant is plus or minus the
product of the singular values:

det A = ±σ1 × σ2 × . . . × σn . (21)

Computing the Condition Number


of a Matrix.
Multiplying with an orthogonal matrix does not
change the 2-norm of a matrix. The two norm
of A is therefore the two norm of Σ which equals
σ1 . If A is square and invertible, then its inverse
is given by
A−1 = V Σ−1 U T . (22)

Math 5610 Fall 2018 Notes of 9/24/18 page 9


The matrix Σ−1 is diagonal and has the recip-
rocals of the singular values along the diagonal.
Its 2-norm (and that of A−1 ) is 1/σn . Hence,

σ1
kAk2 kA−1 k2 = . (23)
σn

The right hand side of equation (23) makes sense


even for rectangular matrices and is usually taken
as the definition of the condition number of A
even if A is not square. This turns out to be
useful beyond being a mere formal generaliza-
tion.
Note that in the process of computing the con-
dition number we also obtained

kAk2 = σ1 (24)

for general matrices A, and

1
kA−1 k2 = (25)
σn

for non-singular square matrices A.

Solving a Linear System.


Let’s consider the linear system

Ax = b (26)

and ask if it has any solutions, and if it does, how


many, and what they are. All of these questions

Math 5610 Fall 2018 Notes of 9/24/18 page 10


can be answered via the SVD. Recalling (5) the
system (26) turns into

Ax = U ΣV T x = b. (27)

Multiplying with U T gives

Σz = c (28)

where
z = V T x and c = U T b. (29)
This is a diagonal linear system that can be
analyzed easily.

   c 
σ1 1
 ..   ... 
 . 0  z1
  
   c 
 σm   ..   r 
  .   c 
 0     r+1 
 ..   zr   .. 

0 . 
 = . 

  zr+1   
 0 .   cn  
   ..   
 0 0 0 ... 0 0  cn+1 
 . .. .. .. ..  zn  . 
 .. . . . .   . 
.
0 0 0 ... 0 0 cm
(30)
Recalling (19) we distinguish three cases:
1. r = n and

cn+1 = . . . = cm = 0. (31)

Math 5610 Fall 2018 Notes of 9/24/18 page 11


There is a unique solution
ci
zi = , i = 1, . . . , n (32)
σi
Note that this includes the case m = n where
the condition (31) is vacuous.
2. r < n and cr+1 = . . . = cm = 0. In that case
ci
zi = , i = 1, . . . , r (33)
σi
and zr+1 through zn are arbitrary. There are
infinitely many solutions and they form an n − r
dimensional affine space−5− .
3. r ≤ n and ci 6= 0 for some i > r. In that case the
system is inconsistent and there is no solution.
Note that this includes the case that m = n (in
which case of course the rank r must be less than
n for the system to have no solution).
Note that once we have z it is easy to compute

x = V z. (34)

Also note that all the required transformations


involve multiplications with orthogonal matri-
ces, which does not amplify errors.
−5−
An affine space S is a set of vectors s + v where v
resides in an ordinary vector space. An example
would be a line in the plane, or a plane in three
dimensional space. If the line passes through the
origin it’s a linear subspace, and whether or not
it does, it’s an affine subspace of the plane.

Math 5610 Fall 2018 Notes of 9/24/18 page 12


Solving Least Squares Problems.
Consider the standard Least Squares Problem

kAx − bk2 = min . (35)

The standard approach to this problem is via


the QR factorization which we discussed in class.
That approach fails if A has a rank less than n.
The SVD still works in this case, and once we
have it it can of course also be applied in the
full rank case. Remembering once again that
multiplication with an orthogonal matrix does
not alter the 2-norm of a vector, and proceeding
similarly as for linear systems we obtain

kAx − bk22 = kU T (Ax − b)k22


= kU T AV V T x − U T bk22
= kΣz − ck22 (36)
Xr m
X
= (σi zi − ci )2 + c2i .
i=1 i=r+1

where c and z are defined in (29) and r is de-


fined in (19). There is nothing we can do about
the second sum in (36). However, we can ren-
der the first sum zero by picking zi as before in
(33). If r < n then we can pick zr+1 through
zn arbitrarily. In that case the solution x = V z
of the Least Squares problem is not unique, but
the value of Ax is. The usual choice of z in that
case is

zr+1 = zr+2 = . . . = zn = 0 (37)

Math 5610 Fall 2018 Notes of 9/24/18 page 13


which gives among all solutions the solution z
(and hence x) that itself has the smallest 2-norm.

Data Compression.
It’s an easy exercise to see that
n
X
T
A = U ΣV = σi ui viT (38)
i=1

where the ui and vi are the columns of U and


V , respectively. Suppose now that A represents
an image, or some other kind of data. For ex-
ample, its entries might be numbers between 0
and 1 that indicate shades of gray. One way to
approximate A by fewer than mn numbers (and
thus compress the image or data) would be to use
only the first few terms in the sum on the right of
(38), and of course store only the corresponding
few left and right singular vectors rather than
an m × n array. The book by Kahaner, Moler
and Nash referenced above has an impressive il-
lustration of that technique.
The following exercises can help you understand
the SVD more thoroughly.

Exercise 2. Fill in the details of the above


existence proof.

Exercise 3. Explore how the discussion in these


notes has to be modified if m < n. One source
of problems with underdetermined systems are
constrained minimization problems. In that con-
text one is often interested in the null space of a

Math 5610 Fall 2018 Notes of 9/24/18 page 14


matrix, i.e., the space of all vectors that satisfy
Az = 0. See how to use the SVD to find the null
space of A.

Exercise 4. Compute the singular value decom-


position in the case that m = 1 or n = 1.

Exercise 5. Show that the condition number


of AT A is the square of the condition number of
A. Comment on the suitability of solving Least
Squares Problems via the Normal Equations

AT Ax = AT b. (39)

Exercise 6. Show that the columns of U are


the eigenvectors of AAT . There are m singular
values, but n ≥ m eigenvalues of AAT , so what
are the eigenvalues of AAT ?

Exercise 7. Ask yourself what happens when


A is symmetric. What if it is positive definite?

Exercise 8. Investigate the use of the SVD for


the solution of eigenvalue problems.

Exercise 9. Explore the applicability of the SVD


for sparse matrices A.

Exercise 10. Using the SVD, express the solu-


tion of the Least Squares problem kAx − bk2 =
min in the form x = A+ b where A+ is given
in terms of A. A+ is known as the generalized
inverse of A.

Math 5610 Fall 2018 Notes of 9/24/18 page 15


Exercise 11. Show that

y T Ax
σ1 = maxm . (40)
y ∈ IR kyk2 kxk2
x ∈ IRn

Math 5610 Fall 2018 Notes of 9/24/18 page 16

You might also like