0% found this document useful (0 votes)
211 views11 pages

Strang 367-376

The singular value decomposition (SVD) factors any m by n matrix A into A = UΣV^T, where: 1. U is an m by m orthogonal matrix with columns that are eigenvectors of AA^T. 2. Σ is an m by n diagonal matrix with singular values σ_1, ..., σ_r along the diagonal. 3. V^T is an n by n orthogonal matrix with columns that are eigenvectors of A^TA. The singular values are the square roots of the nonzero eigenvalues of both AA^T and A^TA and reveal the essential information and effective rank of the matrix. The SVD is useful for applications like image

Uploaded by

Halyna Oliinyk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
211 views11 pages

Strang 367-376

The singular value decomposition (SVD) factors any m by n matrix A into A = UΣV^T, where: 1. U is an m by m orthogonal matrix with columns that are eigenvectors of AA^T. 2. Σ is an m by n diagonal matrix with singular values σ_1, ..., σ_r along the diagonal. 3. V^T is an n by n orthogonal matrix with columns that are eigenvectors of A^TA. The singular values are the square roots of the nonzero eigenvalues of both AA^T and A^TA and reveal the essential information and effective rank of the matrix. The SVD is useful for applications like image

Uploaded by

Halyna Oliinyk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

6.

3 Singular Value Decomposition 367

6.3 Singular Value Decomposition

A great matrix factorization has been saved for the end of the basic course. UΣV T joins
with LU from elimination and QR from orthogonalization (Gauss and Gram-Schmidt).
Nobody’s name is attached; A = UΣV T is known as the “SVD” or the singular value
decomposition. We want to describe it, to prove it, and to discuss its applications—
which are many and growing.
The SVD is closely associated with the eigenvalue-eigenvector factorization QΛQT of
a positive definite matrix. The eigenvalues are in the diagonal matrix Λ. The eigenvector
matrix Q is orthogonal (QT Q = I) because eigenvectors of a symmetric matrix can be
chosen to be orthonormal. For most matrices that is not true, and for rectangular matrices
it is ridiculous (eigenvalues undefined). But now we allow the Q on the left and the QT
on the right to be any two orthogonal matrices U and V T —not necessarily transposes of
each other. Then every matrix will split into A = UΣV T .
The diagonal (but rectangular) matrix Σ has eigenvalues from AT A, not from A! Those
positive entries (also called sigma) will be σ1 , . . . , σr . They are the singular values of A.
They fill the first r places on the main diagonal of Σ—when A has rank r. The rest of Σ
is zero.
With rectangular matrices, the key is almost always to consider AT A and AAT .

Singular Value Decomposition: Any m by n matrix A can be factored


into
A = UΣV T = (orthogonal)(diagonal)(orthogonal).
The columns of U (m by m) are eigenvectors of AAT , and the columns of V (n
by n) are eigenvectors of AT A. The r singular values on the diagonal of Σ (m
by n) are the square roots of the nonzero eigenvalues of both AAT and AT A.

Remark 1. For positive definite matrices, Σ is Λ and UΣV T is identical to QΛQT . For
other symmetric matrices, any negative eigenvalues in Λ become positive in Σ. For
complex matrices, Σ remains real but U and V become unitary (the complex version of
orthogonal). We take complex conjugates in U HU = I and V HV = I and A = UΣV H .
Remark 2. U and V give orthonormal bases for all four fundamental subspaces:

first r columns of U: column space of A


last m − r columns of U: left nullspace of A
first r columns of V : row space of A
last n − r columns of V : nullspace of A

Remark 3. The SVD chooses those bases in an extremely special way. They are more
than just orthonormal. When A multiplies a column v j of V , it produces σ j times a
column of U. That comes directly from AV = UΣ, looked at a column at a time.
368 Chapter 6 Positive Definite Matrices

Remark 4. Eigenvectors of AAT and AT A must go into the columns of U and V :

AAT = (UΣV T )(V ΣTU T ) = UΣΣTU T and, similarly, AT A = V ΣT ΣV T . (1)

U must be the eigenvector matrix for AAT . The eigenvalue matrix in the middle is ΣΣT —
which is m by m with σ12 , . . . , σr2 on the diagonal.
From the AT A = V ΣT ΣV T , the V matrix must be the eigenvector matrix for AT A. The
diagonal matrix ΣT Σ has the same σ12 , . . . , σr2 , but it is n by n.
Remark 5. Here is the reason that Av j = σ j u j . Start with AT Av j = σ 2j v j :

Multiply by A AAT Av j = σ 2j Av j (2)

This says that Av j is an eigenvector of AAT ! We just moved parentheses to (AAT )(Av j ).
The length of this eigenvector Av j is σ j , because

vT AT Av j = σ 2j vTj v j gives kAv j k2 = σ 2j .

So the unit eigenvector is Av j /σ j = u j . In other words, AV = UΣ.


Example 1. This A has only one column: rank r = 1. Then Σ has only σ1 = 3:
    
−1 − 31 23 2
3 3 h i
   2 1 2   T
SVD A =  2  =  3 − 3 3  0 1 = U3×3 Σ3×1V1×1 .
2 2
2 3 3 − 31 0

AT A is 1 by 1, whereas AAT is 3 by 3. They both have eigenvalue 9 (whose square root


is the 3 in Σ). The two zero eigenvalues of AAT leave some freedom for the eigenvectors
in columns 2 and 3 of U. We kept that matrix orthogonal.
" #
2 −1
Example 2. Now A has rank 2, and AAT = with λ = 3 and 1:
−1 2
 
" # " # "√ # 1 −2 1 /√6
−1 1 0 1 −1 1 3 0 0   √
= UΣV T = √ −1 0 1 /√2 .
0 −1 1 2 1 1 0 1 0
1 1 1 / 3
√ √
Notice 3 and 1. The columns of U are left singular vectors (unit eigenvectors of
AAT ). The columns of V are right singular vectors (unit eigenvectors of AT A).

Application of the SVD

We will pick a few important applications, after emphasizing one key point. The SVD is
terrific for numerically stable computations. because U and V are orthogonal matrices.
They never change the length of a vector. Since kUxk2 = xTU TUx = kxk2 , multiplication
by U cannot destroy the scaling.
6.3 Singular Value Decomposition 369

Of course Σ could multiply by a large σ or (more commonly) divide by a small σ ,


and overflow the computer. But still Σ is as good as possible. It reveals exactly what is
large and what is small. The ratio σmax /σmin is the condition number of an invertible n
by n matrix. The availability of that information is another reason for the popularity of
the SVD. We come back to this in the second application.

1. Image processing Suppose a satellite takes a picture, and wants to send it to Earth.
The picture may contain 1000 by 1000 “pixels”—a million little squares, each with a
definite color. We can code the colors, and send back 1,000,000 numbers. It is better to
find the essential information inside the 1000 by 1000 matrix, and send only that.
Suppose we know the SVD. The key is in the singular values (in Σ). Typically, some
σ ’s are significant and others are extremely small. If we keep 20 and throw away 980,
then we send only the corresponding 20 columns of U and V . The other 980 columns
are multiplied in UΣV T by the small σ ’s that are being ignored. We can do the matrix
multiplication as columns times rows:
A = UΣV T = u1 σ1 vT1 + u2 σ2 vT2 + · · · + ur σr vTr . (3)
Any matrix is the sum of r matrices of rank 1. If only 20 terms are kept, we send 20
times 2000 numbers instead of a million (25 to 1 compression).
The pictures are really striking, as more and more singular values are included. At
first you see nothing, and suddenly you recognize everything. The cost is in computing
the SVD—this has become much more efficient, but it is expensive for a big matrix.

2. The effective rank The rank of a matrix is the number of independent rows, and
the number of independent columns. That can be hard to decide in computations! In
exact arithmetic, counting the pivots is correct. Real arithmetic can be misleading—but
discarding small pivots is not the answer. Consider the following:
" # " # " #
ε 2ε ε 1 ε 1
ε is small and and .
1 2 0 0 ε 1+ε
The first has rank 1, although roundoff error will probably produce a second pivot. Both
pivots will be small; how many do we ignore? The second has one small pivot, but we
cannot pretend that its row is insignificant. The third has two pivots and its rank is 2, but
its “effective rank” ought to be 1.
We go to a more stable measure of rank. The first step is to use AT A or AAT , which
are symmetric but share the same rank as A. Their eigenvalues—the singular values
squared—are not misleading. Based on the accuracy of the data, we decide on a toler-
ance like 10−6 and count the singular values above it—that is the effective rank. The
examples above have effective rank 1 (when ε is very small).

3. Polar decomposition Every nonzero complex number z is a positive number r times


370 Chapter 6 Positive Definite Matrices

a number eiθ on the unit circle: z = reiθ . That expresses z in “polar coordinates.” If
we think of z as a 1 by 1 matrix, r corresponds to a positive definite matrix and eiθ
corresponds to an orthogonal matrix. More exactly, since eiθ is complex and satisfies
e−iθ eiθ = 1, it forms a 1 by 1 unitary matrix: U HU = I. We take the complex conjugate
as well as the transpose, for U H .
The SVD extends this “polar factorization” to matrices of any size:
Every real square matrix can be factored into A = QS, where Q is orthogonal
and S is symmetric positive semidefinite. If A is invertible then S is positive
definite.
For proof we just insert V TV = I into the middle of the SVD:

A = UΣV T = (UV T )(V ΣV T ). (4)

The factor S = V ΣV T is symmetric and semidefinite (because Σ is). The factor Q = UV T


is an orthogonal matrix (because QT Q = VU TUV T = I). In the complex case, S becomes
Hermitian instead of symmetric and Q becomes unitary instead of orthogonal. In the
invertible case Σ is definite and so is S.
Example 3. Polar decomposition:
" # " #" #
1 −2 0 −1 3 −1
A = QS = .
3 −1 1 0 −1 2

Example 4. Reverse polar decomposition:


" # " #" #
1 −2 2 1 0 −1
A = S0 Q = .
3 −1 1 3 1 0

The exercises show how, in the reverse order. S changes but Q remains the same. Both
S and S0 are symmetric positive definite because this A is invertible.

Application of A = QS: A major use of the polar decomposition is in continuum


mechanics (and recently in robotics). In any deformation, it is important to separate
stretching from rotation, and that is exactly what QS achieves. The orthogonal matrix
Q is a rotation, and possibly a reflection. The material feels no strain. The symmetric
matrix S has eigenvalues σ1 , . . . , σr , which are the stretching factors (or compression
factors). The diagonalization that displays those eigenvalues is the natural choice of
axes—called principal axes: as in the ellipses of Section 6.2. It is S that requires work
on the material, and stores up elastic energy.
We note that S2 is AT A, which is symmetric positive definite when A is invertible. S
is the symmetric positive definite square root of AT A, and Q is AS−1 . In fact, A could be
rectangular, as long as AT A is positive definite. (That is the condition we keep meeting,
6.3 Singular Value Decomposition 371

that A must have independent columns.) In the reverse order A = S0 Q, the matrix S0 is
the symmetric positive definite square root of AAT .

4. Least Squares For a rectangular system Ax = b. the least-squares solution comes


from the normal equations AT Ab x = AT b. If A has dependent columns then AT A is not
invertible and xb is not determined. Any vector in the nullspace could be added to xb. We
can now complete Chapter 3, by choosing a “best” (shortest) xb for every Ax = b.
Ax = b has two possible difficulties: Dependent rows or dependent columns. With
dependent rows, Ax = b may have no solution. That happens when b is outside the
column space of A. Instead of Ax = b. we solve AT Ab x = AT b. But if A has dependent
columns, this xb will not be unique. We have to choose a particular solution of AT Ab x=
T
A b, and we choose the shortest.

The optimal solution of Ax = b is the minimum length solution of AT Ab


x = AT b.

That minimum length solution will be called x+ . It is our preferred choice as the best
solution to Ax = b (which had no solution), and also to AT Ab x = AT b (which had too
many). We start with a diagonal example.

Example 5. A is diagonal, with dependent rows and dependent columns:


 
  xb1  
σ1 0 0 0   b1
  xb2   
Abx = p is  0 σ2 0 0   = b2  .
xb3 
0 0 0 0 0
xb4

The columns all end with zero. In the column space, the closest vector to b = (b1 , b2 , b3 )
is p = (b1 , b2 , 0). The best we can do with Ax = b is to solve the first two equations,
since the third equation is 0 = b3 . That error cannot be reduced, but the errors in the first
two equations will be zero. Then

xb1 = b1 /σ1 and xb2 = b2 /σ2 .

Now we face the second difficulty. To make xb as short as possible, we choose the
totally arbitrary xb3 and xb4 to be zero. The minimum length solution is x+ :
   
b1 /σ1 1/σ1 0 0  
 b /σ   0 b1
A+ is pseudoinverse +  2 2  1/σ2 0  
x = =  b2  . (5)
x+ = A+ b is shortest  0   0 0 0
b3
0 0 0 0

This equation finds x+ , and it also displays the matrix that produces x+ from b. That
matrix is the pseudoinverse A+ of our diagonal A. Based on this example, we know Σ+
372 Chapter 6 Positive Definite Matrices

and x+ for any diagonal matrix Σ:


     
σ1 1/σ1 b1 /σ1
 ...   ...   .. 
  +   +  . 
Σ=  Σ =  Σ b= .
 σr   1/σr   br /σr 

The matrix Σ is m by n, with r nonzero entries σi . Its pseudoinverse Σ+ is n by m, with


r nonzero entries 1/σi . All the blank spaces are zeros. Notice that (Σ+ )+ is Σ again.
That is like (A−1 )−1 = A, but here A is not invertible.
Now we find x+ in the general case. We claim that the shortest solution x+ is always
in the row space of A. Remember that any vector xb can be split into a row space compo-
nent xr and a nullspace component: xb = xr + xn . There are three important points about
that splitting:
1. The row space component also solves AT Ab
xr = AT b, because Axn = 0.
2. The components are orthogonal, and they obey Pythagoras’s law:

xk2 = kxr k2 + kxn k2 ,


kb so xb is shortest when xn = 0.

3. All solutions of AT Ab
x = AT b have the same xr . That vector is x+ .
The fundamental theorem of linear algebra was in Figure 3.4. Every p in the column
space comes from one and only one vector xr in the row space. All we are doing is to
choose that vector, x+ = xr , as the best solution to Ax = b.
The pseudoinverse in Figure 6.3 starts with b and comes back to x+ . It inverts A where
A is invertible—between row space and column space. The pseudoinverse knocks out
the left nullspace by sending it to zero, and it knocks out the nullspace by choosing xr as
x+ .
We have not yet shown that there is a matrix A+ that always gives x+ —but there is.
It will be n by m, because it takes b and p in Rm back to x+ in Rn . We look at one more
example before finding A+ in general.
Example 6. Ax = b is −x1 + 2x2 + 2x3 = 18, with a whole plane of solutions.
According to our theory, the shortest solution should be in the row space of A =
[−1 2 2]. The multiple of that row that satisfies the equation is x+ = (−2, 4, 4). There
are longer solutions like (−2, 5, 3), (−2, 7, 1), or (−6, 3, 3), but they all have nonzero
components from the nullspace. The matrix that produces x+ from b = [18] is the pseu-
doinverse A+ . Whereas A was 1 by 3, this A+ is 3 by 1:
   
h i+ − 19 −2
+  2  +  
A = −1 2 2 =  9  and A [18] =  4  . (6)
2
9 4
6.3 Singular Value Decomposition 373

Figure 6.3: The pseudoinverse A+ inverts A where it can on the column space.

The row space of A is the column space of A+ . Here is a formula for A+ :

If A = UΣV T (the SVD), then its pseudoinverse is A+ = V Σ+U T . (7)

Example 6 had σ = 3—the square root of the eigenvalue of AAT = [9]. Here it is again
with Σ and Σ+ :
 
1 2 2
h i h ih i −
 3 3 3

A = −1 2 2 = UΣV T = 1 3 0 0  23 − 31 23 
2 2
3 3 − 31
    
1
− 31 23 2
3 3h i − 19
     +
V Σ+U T =  23 − 31 2
3  0 1 =  92  = A .
2
2
3 3 − 31 0 2
9

The minimum length least-squares solution is x+ = A+ b = V Σ+U T b.

Proof. Multiplication by the orthogonal matrix U T leaves lengths unchanged:

kAx − bk = kUΣV T x − bk = kΣV T x −U T bk.

Introduce the new unknown y = V T x = V −1 x, which has the same length as x. Then,
minimizing kAx − bk is the same as minimizing kΣy −U T bk. Now Σ is diagonal and we
know the best y+ . It is y+ = Σ+U T b so the best x+ is V y+ :

Shortest solution x+ = V y+ = V Σ+U T b = A+ b.

V y+ is in the row space, and AT Ax+ = AT b from the SVD.


374 Chapter 6 Positive Definite Matrices

Problem Set 6.3

Problems 1–2 compute the SVD of a square singular matrix A.


1. Compute AT A and its eigenvalues σ12 , 0 and unit eigenvectors v1 , v2 :
" #
1 4
A= .
2 8

2. (a) Compute AAT and its eigenvalues σ12 , 0 and unit eigenvectors u1 , u2 .
(b) Choose signs so that Av1 = σ1 u1 and verify the SVD:
" # " #
1 4 h i σ1 h iT
= u1 u2 v1 v2 .
2 8 0

(c) Which four vectors give orthonormal bases for C (A), N (A), C (AT ), N (AT )?
Problems 3–5 ask for the SVD of matrices of rank 2.
3. Find the SVD from the eigenvectors v1 , v2 of AT A and Avi = σi ui :
" #
1 1
Fibonacci matrix A= .
1 0

4. Use the SVD part of the MATLAB demo eigshow (or Java on the course page
web.mit.edu/18.06) to find the same vectors v1 and v2 graphically.
5. Compute AT A and AAT , and their eigenvalues and unit eigenvectors, for
" #
1 1 0
A= .
0 1 1

Multiply the three matrices UΣV T to recover A.


Problems 6–13 bring out the underlying ideas of the SVD.
6. Suppose u1 , . . . , un and v1 , . . . , vn are orthonormal bases for Rn . Construct the matrix
A that transforms each v j into u j to give Av1 = u1 , . . . , Avn = un .
7. Construct the matrix with rank 1 that has Av = 12u for v = 12 (1, 1, 1, 1) and u =
1
3 (2, 2, 1). Its only singular value is σ1 = .
8. Find UΣV T if A has orthogonal columns w1 , . . . , wn of lengths σ1 , . . . , σn .
9. Explain how UΣV T expresses A as a sum of r rank-1 matrices in equation (3):

A = σ1 u1 vT1 + · · · + σr ur vTr .
6.3 Singular Value Decomposition 375

10. Suppose A is a 2 by 2 symmetric matrix with unit eigenvectors u1 and u2 . If its


eigenvalues are λ1 = 3 and λ2 = −2, what are U, Σ, and V T ?
11. Suppose A is invertible (with σ1 > σ2 > 0). Change A by as small a matrix as possible
to produce a singular matrix A0 . Hint: U and V do not change:
" #
h i σ h iT
1
Find A0 from A = u1 u2 v1 v2 .
σ2

12. (a) If A changes to 4A, what is the change in the SVD?


(b) What is the SVD for AT and for A−1 ?
13. Why doesn’t the SVD for A + I just use Σ + I?
14. Find the SVD and the pseudoinverse 0+ of the m by n zero matrix.
15. Find the SVD and the pseudoinverse V Σ+U T of
" # " #
h i 0 1 0 1 1
A= 1 1 1 1 , B= , and C = .
1 0 0 0 0

16. If an m by n matrix Q has orthonormal columns, what is Q+ ?


17. Diagonalize AT A to find its positive definite square root S = V Σ1/2V T and its polar
decomposition A = QS: " #
1 10 6
A= √ .
10 0 8

18. What is the minimum-length least-squares solution x+ = A+ b to the following?


    
1 0 0 C 0
    
Ax = 1 0 0 D = 2 .
1 1 1 E 2

You can compute A+ , or find the general solution to AT Ab x = AT b and choose the
solution that is in the row space of A. This problem fits the best plane C + Dt + Ez to
b = 0 and also b = 2 at t = z = 0 (and b = 2 at t = z = 1).

(a) If A has independent columns, its left-inverse (AT A)−1 AT is A+ .


(b) If A has independent rows, its right-inverse AT (AAT )−1 is A+ .

In both cases, verify that x+ = A+ b is in the row space. and AT Ax+ = AT b.


19. Split A = UΣV T into its reverse polar decomposition QS0 .
20. Is (AB)+ = B+ A+ always true for pseudoinverses? I believe not.
376 Chapter 6 Positive Definite Matrices

21. Removing zero rows of U leaves A = LU, where the r columns or L span the column
space of A and the r rows of U span the row space. Then A+ has the explicit formula
U T (U U T )−1 (LT L)−1 LT .
Why is A+ b in the row space with U T at the front? Why does AT AA+ b = AT b, so
that x+ = A+ b satisfies the normal equation as it should?

22. Explain why AA+ and A+ A are projection matrices (and therefore symmetric). What
fundamental subspaces do they project onto?

6.4 Minimum Principles

In this section we escape for the first time from linear equations. The unknown x will not
be given as the solution to Ax = b or Ax = λ x. Instead, the vector x will be determined
by a minimum principle.
It is astonishing how many natural laws can be expressed as minimum principles. Just
the fact that heavy liquids sink to the bottom is a consequence of minimizing their po-
tential energy. And when you sit on a chair or lie on a bed, the springs adjust themselves
so that the energy is minimized. A straw in a glass of water looks bent because light
reaches your eye as quickly as possible. Certainly there are more highbrow examples:
The fundamental principle of structural engineering is the minimization of total energy.1
We have to say immediately that these “energies” are nothing but positive definite
quadratic functions. And the derivative of a quadratic is linear. We get back to the
familiar linear equations, when we set the first derivatives to zero. Our first goal in
this section is to find the minimum principle that is equivalent to Ax = b, and the
minimization equivalent to Ax = λ x. We will be doing in finite dimensions exactly
what the theory of optimization does in a continuous problem, where “first derivatives
= 0” gives a differential equation. In every problem, we are free to solve the linear
equation or minimize the quadratic.
The first step is straightforward: We want to find the “parabola” P(x) whose minimum
occurs when Ax = b. If A is just a scalar, that is easy to do:
1 dP
The graph of P(x) = Ax2 − bx has zero slope when = Ax − b = 0.
2 dx
This point x = A−1 b will be a minimum if A is positive. Then the parabola P(x) opens
upward (Figure 6.4). In more dimensions this parabola turns into a parabolic bowl (a
paraboloid). To assure a minimum of P(x), not a maximum or a saddle point, A must be
positive definite!
1 I am convinced that plants and people also develop in accordance with minimum principles. Perhaps civilization
is based on a law of least action. There must be new laws (and minimum principles) to be found in the social
sciences and life sciences.

You might also like