Intro SVD

The Singular Value Decomposition
Goal: We introduce/review the singular value decompostion (SVD) of a matrix and discuss some
applications relevant to vision.
Consider a matrix M R
nk
. For convenience we assume n k (otherwise consider M
T
). The
SVD of M is a real-valued matrix factorization, M = USV
T
. The SVD can be computed using an
exceptionally stable numerical algortihm.
The compact SVD for tall-rectangular matrices, like M, is generated in Matlab by:
% When n >= k
[U, S, V] = svd(M, 0);
% Here U is n x k, S is k x k diagonal, V is k x k.
See also the matlab calls:
[U,S,V] = svd(M, econ); Gives a compact form of SVD for both n < k and n k.
[U,S,V] = svd(M); Gives a non-compact representation, U is n n, V is k k.
See Singular Value Decomposition in Wikipedia, or the classic textbook by Gilbert Strang (1993)
(see Section 6.7).
CSC420: Intro to SVD c Allan Jepson and Fernando Flores-Mangas, Sept. 2011 Page: 1
Properties of the SVD
Some properties of U, S, V are:
U, S, V provide a real-valued matrix factorization of M, i.e., M = USV
T
.
U is a n k matrix with orthonormal columns, U
T
U = I
k
, where I
k
is the k k identity matrix.
V is an orthonormal k k matrix, V
T
= V
1
.
S is a k k diagonal matrix, with the non-negative singular values, s
1
, s
2
, . . . , s
k
, on the diagonal.
By convention the singular values are given in the sorted order s
1
s
2
. . . s
k
0.
Summary: For any square or tall-rectangular matrix M, the SVD shows that the matrix-vector prod-
uct Mx can be represented as:
1. An orthogonal change of coordinates, V
T
x;
2. An axis-aligned scaling of the result, S(V
T
x); and
3. The application of the resulting coefcients in an orthonormal basis, U(S(V
T
x)).
Each of these steps is easily inverted. A similar story holds for wide-rectangular matrices, i.e., M
R
nk
for n < k.
CSC420: Intro to SVD Page: 2
Additional Properties of the SVD
In addition we have:
The rank of M is given by the number of singular values s
j
that are non-zero.
If n = k, then U is an orthonormal matrix, U
T
= U
1
, so U
T
U = UU
T
= I
n
.
The pseudo-inverse of M is dened to be M
= V RU
T
, where R is a diagonal matrix. The the j
th
entry on the diagonal of Ris r
j
= 1/s
j
if s
j
= 0, and r
j
= 0 if s
j
= 0. Here Ris the pseudo-inverse
of the diagonal matrix S.
We consider the uniqueness of the SVD next, this can be skipped on the rst reading.
Uniqueness of the SVD
Consider the SVD, M = USV
T
, for any square or tall-rectangular matrix, i.e., M R
nk
with n k.
1. The singular values are unique and, for distinct positive singular values, s
j
> 0, the j
th
columns of
U and V are also unique up to a sign change of both columns.
2. For any repeated and positive singular values, say s
i
= s
i+1
= . . . = s
j
> 0 are all the singular
values equal to s
i
, the corresponding columns of U and V are unique up to any rotation/reection
applied to both sets of columns (i.e., U
,i:j
U
,i:j
W and V
,i:j
V
,i:j
W for some orthogonal
matrix W).
3. More care must be taken with one or more singular values at zero. Suppose s
j
> 0 and s
j+1
=
. . . = s
k
= 0. Here the (j + 1)
st
through the k
th
columns of U are less constrained, and can be
any set of (k j) orthonormal vectors in the (n j)-dimensional left null space of M. Moreover
these columns of U can be chosen independently of the last (k j) columns of V (which form a
orthonormal basis for the right null space of M).
Summary: These symmetries in the SVD are identical to those of the eigenvectors of a symmetric
matrix, except for the third point above, which states there is additional freedom in the singular vectors
for singular values equal to 0.
SVD, Least Squares, and Pseudo-Inverse
Applications of the SVD include solving least squares problems:
x = arg min
x
||Ax
b||
2
, (1)
where A is n k and || || is the standard vector 2-norm (Euclidian length).
Let A = USV
T
denote the SVD of A. Then the range of A is contained in (or equal to) the subspace
spanned by the orthogonal columns of U. We cannot reduce any error in Ax
b that is perpendicular
to the range of A. Thus, it is equivalent to minimize
x = arg min
x
||U
T
Ax U
T
b||
2
= arg min
x
||U
T
(USV
T
)x U
T
b||
2
= arg min
x
||(SV
T
)x U
T
b||
2
. (2)
From (2) it follows that an optimal solution is x = (V RU
T
)
b where R is the pseudo-inverse of S (as

given on p.3). Note that for large matrices, x = V (R(U
T
b)) is much more efcient to compute.

Note the solution matrix used above, namely V RU
T
, equals the pseudo-inverse A
.
SVD and the Matrix Square Root
Suppose K is a symmetric n n matrix. Moreover, assume that K is non-negative denite, which
means for every vector x R
n
we have x
T
Kx 0.
We then compute the matrix square root of K, namely K
1/2
, as follows:
1. Compute [U, S, V] = svd(K).
2. Since K is symmetric and non-negative denite, it follows without loss of generality that we can
set U = V .
3. Thus K = V SV
T
, thus the columns of V are eigenvectors of K with the j
th
eigenvalue being s
j
.
4. Dene S
1/2
= diag([s
1/2
1
, s
1/2
2
, . . . , s
1/2
k
]), and note that the singular values are non-negative.
5. Therefore J = V S
1/2
V
T
is a symmetric nn matrix, such that K = JJ. So J is a suitable matrix
square root, K
1/2
.
6. Moreover, it also follows that J is non-negative denite and, as such, J is similar to the positive
square root of a positive real number.
Covariance Matrices
Consider a multivariate normal distribution for n-dimensional vectors, x N( m, K), where:
N( m, K) denotes the normal distribution;
m R
n
is the mean;
K R
nn
is the covariance matrix. As such, K is symmetric (i.e., K
T
= K) and positive denite
(i.e., u
T
Ku > 0 for all u R
n
\{
0}).
The probability density function for N( m, K) is,
x N( m, K)
1
(2|K|))
n/2
e
1
2
(x m)
T
K
1
(x m)
.
Here |K| = det(K) denotes the determinant of K.
Sampling from a Multivariate Normal Distribution
To sample from the Normal distribution N( m, K) we do the following:
1. Generate a n 1 vector u where each element u
j
is independently sampled from N(0, 1) (i.e., the
1D Normal distribution with mean 0 and covariance 1).
2. Compute the matrix square root of K, namely K
1/2
, as dened on p.6.
3. Then

d = K
1/2
u generates a fair sample from N(0, K).
4. Set x = m+

d, we claim this is a fair sample from N( m, K).
To check that the covariance of

d is actually K, rst note that the mean of

d =
0. Then, by denition,
the covariance of

d is:
C E(
d
T
) = E(K
1/2
uu
T
K
T/2
) = K
1/2
E(uu
T
)K
1/2
= K
1/2
I
n
K
1/2
= K,
conrming that

d has the covariance K, as desired.
Sample Covariance and Principal Directions
Given a set of sample vectors {x
j
}
k
j=1
, with each x
j
R
n
, the sample mean and covariance are dened
to be:
m
s
=
1
k
k
j=1
x
j
, and C
s
=
1
(k 1)
k
j=1
(x
j
m
s
)(x
j
m
s
)
T
, (3)
A 2D example is shown to the right. The blue points denote the
samples x
k
. The ellipses denote curves of constant standard devi-
ation, when measured in terms of the sample covariance C
s
. That
is, the curve m +

d() satises

d()
T
C
1
s
d() = , for = 1, 2, or 3
(corresponding to the yellow, green and red ellipses, respectively).
2 0 2 4
1
0
1
2
3
4
5
Samples from 2dim Normal
X
Y
The black lines above indicate the principal directions from the sample mean, i.e., the major
and minor axes of the ellipses. These are the directions of eigenvectors of C
s
. The length of the j
th
line segment, from the sample mean to the red ellipse, is equal to s
1/2
j
, where = 3 and s
j
is the j
th
singular value of C
s
.
Minimum Residual Variance Bases
Given a set of sample vectors {x
j
}
k
j=1
, with each x
j
R
n
, form the matrix X R
nk
. As before, the
sample mean is
1
k
k
j=1
x
k
= m
s
.
Optimal Basis Selection Problem: Select a p-dimensional basis {
b
j
}
p
j=1
that minimizes the following:
SSD
p
= min
BR
np
k
j=1
min
a
j
||x
j
( m
s
+ Ba
j
)||
2
. (4)
Here B = (
b
1
, . . . ,
b
p
) is the n p matrix formed from the selected basis. The right-most minima
above indicates that (for a given basis B), we choose the coefcients a
j
which minimize the least
squares error, E
2
j
= ||x
j
( m
s
+Ba
j
)||
2
. The basis selection problem is then to choose B to minimize
the sum of these least-squares errors
k
j=1
E
2
j
(aka, the sum of squared differences (SSD)).
An optimal choice of the p-dimensional basis, B, makes SSD
p
=
k
j=1
E
2
j
as small as possible, and
SSD
p
is called the minimum residual variance for any basis of dimension p.
Example: Minimum Residual Variance Basis
Consider choosing an optimal 1D basis for the previous 2D example:
4 2 0 2 4 6
2
1
0
1
2
3
4
5
6
Residual SSD: 491.5
X
Y
2 0 2 4
1
0
1
2
3
4
5
Residual SSD: 164.4
X
Y
The cyan lines above indicate two choices for the basis direction
b. The mauve lines connect selected

samples x
j
with their best approximation m
s
+
ba
j
. The squared length of these mauve lines are the
least squares errors E
2
j
= min
a
j
||x
j
( m
s
+
ba
j
)||
2
. The residual SSD equals
k
j=1
E
2
j
, and is given
in the title of each plot.
In the right plot above we have set
b to be the rst principal direction. That is,
b is the rst column of

U where C
s
= USV
T
. The gure illustrates that this choice minimizes the residual variance.
Principal Component Analysis: PCA
The following Theorem provides the general result.
Theorem: (Minimum residual variance.) For 0 p n, the basis B formed from the rst p principal
components of the sample covariance matrix C
s
(i.e., the rst p columns of the matrix U of an SVD of
C
s
= USV
T
) minimizes the residual variance
SSD(B) =
k
j=1
min
a
j
||x
j
( m
s
+ Ba
j
)||
2
, (5)
over all possible choices of (p dimensional) bases B. Moreover, the optimal value SSD
p
is given by
SSD
p
=
n
j=p+1
s
j
, (6)
where s
j
is the j
th
singular value of the sample covariance C
s
.
Note SSD
0
=
n
j=1
s
j
is the total variance in the original data set. And SSD
p
monotonically decreases
to 0 as p increases to n. A useful statistic is the fraction of the total variance that can be explained by a
p-dimensional basis, Q
p
= (SSD
0
SSD
p
)/SSD
0
.
PCA Applied to Eyes
Subset of 1196 eye images (25 20 pixels, rewritten as 500-dimensional sample vectors x
j
):
0 5 10 15 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Singular value index
F
r
a
c
t
i
o
n

o
f

V
a
r
i
a
n
c
e
Variance Fraction Explained by Subspace
The fraction of the total variance, Q
p
, captured by an optimal p dimensional subspace is plotted above.
The basis is formed from the rst p principal components of the 500-dimensional data set. Note Q
p
approaches 75% for a 20-dimensional basis.
Eigen-Eye Subspace Model
The mean iamge and some principal components for the eye dataset are shown below:
Basis Images 10, 15, 20, 25, 30, 35
Mean Eye Basis Images 1 through 6
The rst few principal directions (representing the dominant directions of variation in the data set)
are shown on the top row. These appear to correspond to large scale shading effects and the variation
around the eyebrow.
The higher order principal directions appear to capture variations at a smaller spatial scale.
Eye Reconstructions
Given the sample covariance C
s
of the data, and the SVDC
s
= USV
T
, let U
p
denote the rst p columns
of U. Then, according to the theorem on p.12, this choice of basis minimizes the residual variance.
Eye Image Reconstruction
(K = 5)
Reconstruction
(K = 20)
Reconstruction
(K = 50)
Reconstructions for p = 5, 20, 50: Original Image
Eye Image Reconstruction
(K = 5)
Reconstruction
(K = 20)
Reconstruction
(K = 50)
Given a new eye image x (centered and scaled), we can represent this image in the basis U
p
, by solving
the least squares problema
0
= arg min
a
||x ( m
s
+ U
p
a)||
2
.
The reconstructed image r(a
0
) = m
s
+U
p
a
0
is shown for two cases above. Note that the reconstruction
is reasonable for a 20-dimensional basis, and improves as the dimension (i.e., p) increases.
References
Gilbert Strant, Introduction to Linear Algebra, 2
nd
Edition, Wellesley-
Cambridge Press, August 1993.
J. Cohen, Dependency of the spectral reectance curves of the Munsell
color chips, Psychonomic Sci 1964, 1, pp.369-370.
A. Kimball Romney, and Tarow Indow, Munsell Reectance Spectra
Represented in Three-Dimensional Euclidean Space, Color Research
and Application, 28(3), 2003, pp.182-196.
Matlab tutorial: utvisToolbox/tutorials/colourTutorial.m
Matthew Turk and Alex Pentland, Eigenfaces for Recognition, Journal
of Cognitive Neuroscience, 3(1), 1991, pp.71-86.
Brett Allen, Brian Curless, and Zoran Popovic, The space of human
body shapes: reconstruction and parameterization from range scans, in
ACM SIGGRAPH 2003, pp.27-31.
CSC420: Intro to SVD Notes: 16

Intro SVD

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Intro SVD

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intro SVD

Uploaded by

Copyright:

Available Formats

The Singular Value Decomposition

b where R is the pseudo-inverse of S (as

b)) is much more efcient to compute.

b. The mauve lines connect selected

b to be the rst principal direction. That is,

b is the rst column of

You might also like