100% found this document useful (1 vote)
298 views24 pages

Singular Value Decomposition

1. Singular value decomposition (SVD) is a technique that decomposes a matrix into three component matrices that can be used to solve problems like matrix inverses, least squares fitting, and principal component analysis. 2. SVD is useful for solving underconstrained least squares problems by computing the pseudoinverse and setting small singular values to zero. 3. Total least squares aims to minimize errors in both the x and y dimensions by finding the line that minimizes the perpendicular distances to data points.

Uploaded by

Devansh Dwivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
298 views24 pages

Singular Value Decomposition

1. Singular value decomposition (SVD) is a technique that decomposes a matrix into three component matrices that can be used to solve problems like matrix inverses, least squares fitting, and principal component analysis. 2. SVD is useful for solving underconstrained least squares problems by computing the pseudoinverse and setting small singular values to zero. 3. Total least squares aims to minimize errors in both the x and y dimensions by finding the line that minimizes the perpendicular distances to data points.

Uploaded by

Devansh Dwivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Singular Value Decomposition

Underconstrained Least Squares

• What if you have fewer data points than


parameters in your function?
– Intuitively, can’t do standard least squares
– Recall that solution takes the form ATAx = ATb
– When A has more columns than rows,
ATA is singular: can’t take its inverse, etc.
Underconstrained Least Squares

• More subtle version: more data points than


unknowns, but data poorly constrains function
• Example: fitting to y=ax2+bx+c
Underconstrained Least Squares

• Problem: if problem very close to singular,


roundoff error can have a huge effect
– Even on “well-determined” values!
• Can detect this:
– Uncertainty proportional to covariance C = (ATA)-1
– In other words, unstable if ATA has small values
– More precisely, care if xT(ATA)x is small for any x
• Idea: if part of solution unstable, set answer to 0
– Avoid corrupting good parts of answer
Singular Value Decomposition (SVD)

• Handy mathematical technique that has


application to many problems
• Given any mn matrix A, algorithm to find
matrices U, V, and W such that
A = U W VT
U is mn and orthonormal
W is nn and diagonal
V is nn and orthonormal
SVD

   
    T
    w1 0 0  
 A  U  0  0

V

     
    0 0 wn  

   
   

• Treat as black box: code widely available


In Matlab: [U,W,V]=svd(A,0)
SVD

• The wi are called the singular values of A


• If A is singular, some of the wi will be 0
• In general rank(A) = number of nonzero wi
• SVD is mostly unique (up to permutation of
singular values, or if some wi are equal)
SVD and Inverses

• Why is SVD so useful?


• Application #1: inverses
• A-1=(VT)-1 W-1 U-1 = V W-1 UT
– Using fact that inverse = transpose
for orthogonal matrices
– Since W is diagonal, W-1 also diagonal with reciprocals
of entries of W
SVD and Inverses

• A-1=(VT)-1 W-1 U-1 = V W-1 UT


• This fails when some wi are 0
– It’s supposed to fail – singular matrix
• Pseudoinverse: if wi=0, set 1/wi to 0 (!)
– “Closest” matrix to inverse
– Defined for all (even non-square, singular, etc.)
matrices
– Equal to (ATA)-1AT if ATA invertible
SVD and Least Squares

• Solving Ax=b by least squares


• x=pseudoinverse(A) times b
• Compute pseudoinverse using SVD
– Lets you see if data is singular
– Even if not singular, ratio of max to min singular values
(condition number) tells you how stable the solution
will be
– Set 1/wi to 0 if wi is small (even if not exactly 0)
SVD and Eigenvectors

• Let A=UWVT, and let xi be ith column of V


• Consider ATA xi: 0  0 
   
   
2   2 2
A Axi  VW U UWV xi  VW V xi  VW 1  V wi  wi xi
T T T T 2 T
   
   
0  0 
   

• So elements of W are sqrt(eigenvalues) and


columns of V are eigenvectors of ATA
– What we wanted for robust least squares fitting!
SVD and Matrix Similarity

• One common definition for the norm of a matrix is


the Frobenius norm:
A F   aij 2
2

i j
• Frobenius norm can be computed from SVD
A F   wi 2
2

i
• So changes to a matrix can be evaluated by
looking at changes to singular values
SVD and Matrix Similarity

• Suppose you want to find best rank-k


approximation to A
• Answer: set all but the largest k singular values to
zero
• Can form compact representation by eliminating
columns of U and V corresponding to zeroed wi
SVD and PCA

• Principal Components Analysis (PCA):


approximating a high-dimensional data set
with a lower-dimensional subspace

* *
Second principal component
* * * First principal component
* * *
* * *
** ** Original axes
* * *
** *
* * *
Data points
SVD and PCA

• Data matrix with points as rows, take SVD


– Subtract out mean (“whitening”)
• Columns of Vk are principal components
• Value of wi gives importance of each component
PCA on Faces: “Eigenfaces”

First principal component


Average
face

Other
components

For all except average,


“gray” = 0,
“white” > 0,
“black” < 0
Using PCA for Recognition

• Store each person as coefficients of projection


onto first few principal components
imax
image   ai Eigenface i
i 0

• Compute projections of target image, compare to


database (“nearest neighbor classifier”)
Total Least Squares

• One final least squares application


• Fitting a line: vertical vs. perpendicular error
Total Least Squares

• Distance from point to line:


 xi  
di     n  a
 yi 
where n is normal vector to line, a is a constant
• Minimize:
2
 xi   
   d      n  a 
2
i
2

i  yi 
i  
Total Least Squares

• First, let’s pretend we know n, solve for a


2
 xi   
      n  a 
2

 yi 
i  
1  xi  
a     n
m i  yi 
• Then
 xi    xi  mxi  
x

d i     n  a   n
y  y  y i 
 i  i m
Total Least Squares

• So, let’s define


~
xi   xi  mxi 
  
~
y   y  y i 
 i  i m 
and minimize
~
 xi  
2

i  ~y   n 
 i  
Total Least Squares

• Write as linear system


~x1 ~
y1 
 
~x2 ~
y2  nx  
~  0
~  ny 
x
 3 y3 
  
 
• Have An=0
– Problem: lots of n are solutions, including n=0
– Standard least squares will, in fact, return n=0
Constrained Optimization

• Solution: constrain n to be unit length


• So, try to minimize |An|2 subject to |n|2=1
2  T  T T 
An  An  An   n A A n

• Expand in eigenvectors
 e i of A T
A:
n  1e1   2e 2

T T 

n A A n  112  2  22
2
n  12   22

where the i are eigenvalues of ATA


Constrained Optimization

• To minimize 112  2  2subject


2
to 12   22  1
set min = 1, all other i = 0
• That is, n is eigenvector of ATA with
the smallest corresponding eigenvalue

You might also like