0% found this document useful (0 votes)

92 views12 pages

S VD Chapter

The singular value decomposition (SVD) factors a matrix A into three matrices: A = UDVT, where U and V are orthonormal and D is diagonal with positive or zero values. The SVD is useful for determining the rank of a matrix and computing the inverse of an invertible matrix. It provides the best low-dimensional approximation of a matrix by maximizing the sum of squared lengths of projections onto a subspace. The singular vectors of A form an orthonormal basis of singular vectors that span the row space of A.

Uploaded by

haji saheb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views12 pages

S VD Chapter

Uploaded by

haji saheb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

4 Singular Value Decomposition (SVD)

The singular value decomposition of a matrix A is the factorization of A into the

product of three matrices A = U DV T where the columns of U and V are orthonormal
and the matrix D is diagonal with positive real entries. The SVD is useful in many tasks.
Here we mention two examples. First, the rank of a matrix A can be read oﬀ from its
SVD. This is useful when the elements of the matrix are real numbers that have been
rounded to some finite precision. Before the entries were rounded the matrix may have
been of low rank but the rounding converted the matrix to full rank. The original rank
can be determined by the number of diagonal elements of D not exceedingly close to zero.
Second, for a square and invertible matrix A, the inverse of A is V D−1 U T .

To gain insight into the SVD, treat the rows of an n × d matrix A as n points in a
d-dimensional space and consider the problem of finding the best k-dimensional subspace
with respect to the set of points. Here best means minimize the sum of the squares of the
perpendicular distances of the points to the subspace. We begin with a special case of
the problem where the subspace is 1-dimensional, a line through the origin. We will see
later that the best-fitting k-dimensional subspace can be found by k applications of the
best fitting line algorithm. Finding the best fitting line through the origin with respect
to a set of points {xi |1 ≤ i ≤ n} in the plane means minimizing the sum of the squared
distances of the points to the line. Here distance is measured perpendicular to the line.
The problem is called the best least squares fit.

In the best least squares fit, one is minimizing the distance to a subspace. An alter-
native problem is to find the function that best fits some data. Here one variable y is a
function of the variables x1 , x2 , . . . , xd and one wishes to minimize the vertical distance,
i.e., distance in the y direction, to the subspace of the xi rather than minimize the per-
pendicular distance to the subspace being fit to the data.

distance
v

projection

Figure 4.1: The projection of the point xi onto the line through the origin in the direction
of v

Returning to the best least squares fit problem, consider projecting a point xi onto a

110
line through the origin. Then

x2i1 + x2i2 + · · · +2id = (length of projection)2 + (distance of point to line)2 .

See Figure 4.1. Thus

(distance of point to line)2 = x2i1 + x2i2 + · · · +2id − (length of projection)2 .

To minimize the sum of the squares of the distances to the line, one could minimize
�n
(x2i1 + x2i2 + · · · +2id ) minus the sum of the squares of the lengths of the projections of
i=1
�
n
the points to the line. However, (x2i1 + x2i2 + · · · +2id ) is a constant (independent of the
i=1
line), so minimizing the sum of the squares of the distances is equivalent to maximizing
the sum of the squares of the lengths of the projections onto the line. Similarly for best-fit
subspaces, we could maximize the sum of the squared lengths of the projections onto the
subspace instead of minimizing the sum of squared distances to the subspace.

4.1 Singular Vectors

We now define the singular vectors of an n × d matrix A. Consider the rows of A as n
points in a d-dimensional space. Consider the best fit line through the origin. Let v be a
unit vector along this line. The length of the projection of ai , the ith row of A, onto v is
|ai · v|. From this we see that the sum of length squared of the projections is |Av|2 . The
best fit line is the one maximizing |Av|2 and hence minimizing the sum of the squared
distances of the points to the line.

With this in mind, define the first singular vector, v1 , of A, which is a column vector,
as the best fit line through the origin for the n points in d-space that are the rows of A.
Thus
v1 = arg max |Av|.
|v|=1

The value σ1 (A) = |Av1 | is called the first singular value of A. Note that σ12 is the
sum of the squares of the projections of the points to the line determined by v1 .

The greedy approach to find the best fit 2-dimensional subspace for a matrix A, takes
v1 as the first basis vector for the 2-dimenional subspace and finds the best 2-dimensional
subspace containing v1 . The fact that we are using the sum of squared distances helps.
For every 2-dimensional subspace containing v1 , the sum of squared lengths of the pro-
jections onto the subspace equals the sum of squared projections onto v1 plus the sum
of squared projections along a vector perpendicular to v1 in the subspace. Thus, instead
of looking for the best 2-dimensional subspace containing v1 , look for a unit vector, call
it v2 , perpendicular to v1 that maximizes |Av|2 among all such unit vectors. Using the
same greedy strategy to find the best three and higher dimensional subspaces, defines
v3 , v4 , . . . in a similar manner. This is captured in the following definitions. There is no

111
apriori guarantee that the greedy algorithm gives the best fit. But, in fact, the greedy
algorithm does work and yields the best-fit subspaces of every dimension.

The second singular vector, v2 , is defined by the best fit line perpendicular to v1

v2 = arg max |Av| .

v⊥v1 ,|v|=1

The value σ2 (A) = |Av2 | is called the second singular value of A. The third singular
vector v3 is defined similarly by

v3 = arg max |Av|

v⊥v1 ,v2 ,|v|=1

and so on. The process stops when we have found

v1 , v2 , . . . , vr

as singular vectors and

arg max |Av| = 0.
v⊥v1 ,v2 ,...,vr
|v|=1

If instead of finding v1 that maximized |Av| and then the best fit 2-dimensional
subspace containing v1 , we had found the best fit 2-dimensional subspace, we might have
done better. This is not the case. We now give a simple proof that the greedy algorithm
indeed finds the best subspaces of every dimension.

Theorem 4.1 Let A be an n × d matrix where v1 , v2 , . . . , vr are the singular vectors

defined above. For 1 ≤ k ≤ r, let Vk be the subspace spanned by v1 , v2 , . . . , vk . Then for
each k, Vk is the best-fit k-dimensional subspace for A.

Proof: The statement is obviously true for k = 1. For k = 2, let W be a best-fit 2-

dimensional subspace for A. For any basis w1 , w2 of W , |Aw1 |2 + |Aw2 |2 is the sum of
squared lengths of the projections of the rows of A onto W . Now, choose a basis w1 , w2
of W so that w2 is perpendicular to v1 . If v1 is perpendicular to W , any unit vector in W
will do as w2 . If not, choose w2 to be the unit vector in W perpendicular to the projection
of v1 onto W. Since v1 was chosen to maximize |Av1 |2 , it follows that |Aw1 |2 ≤ |Av1 |2 .
Since v2 was chosen to maximize |Av2 |2 over all v perpendicular to v1 , |Aw2 |2 ≤ |Av2 |2 .
Thus
|Aw1 |2 + |Aw2 |2 ≤ |Av1 |2 + |Av2 |2 .
Hence, V2 is at least as good as W and so is a best-fit 2-dimensional subspace.

For general k, proceed by induction. By the induction hypothesis, Vk−1 is a best-fit

k-1 dimensional subspace. Suppose W is a best-fit k-dimensional subspace. Choose a
basis w1 , w2 , . . . , wk of W so that wk is perpendicular to v1 , v2 , . . . , vk−1 . Then

|Aw1 |2 + |Aw2 |2 + · · · + |Awk |2 ≤ |Av1 |2 + |Av2 |2 + · · · + |Avk−1 |2 + |Awk |2

112
since Vk−1 is an optimal k -1 dimensional subspace. Since wk is perpendicular to
v1 , v2 , . . . , vk−1 , by the definition of vk , |Awk |2 ≤ |Avk |2 . Thus

|Aw1 |2 + |Aw2 |2 + · · · + |Awk−1 |2 + |Awk |2 ≤ |Av1 |2 + |Av2 |2 + · · · + |Avk−1 |2 + |Avk |2 ,

proving that Vk is at least as good as W and hence is optimal.

Note that the n-vector Avi is really a list of lengths (with signs) of the projections of
the rows of A onto vi . Think of |Avi | = σi (A) as the “component” of the matrix A along
vi . For this interpretation to make sense, it should be true that adding up the squares of
the components of A along each of the vi gives the square of the “whole content of the
matrix A”. This is indeed the case and is the matrix analogy of decomposing a vector
into its components along orthogonal directions.

Consider one row, say aj , of A. Since v1 , v2 , . . . , vr span the space of all rows of A,
�
r
aj · v = 0 for all v perpendicular to v1 , v2 , . . . , vr . Thus, for each row aj , (aj · vi )2 =
i=1
|aj |2 . Summing over all rows j,
n
� n �
� r r �
� n r
� r
�
|aj |2 = (aj · vi )2 = (aj · vi )2 = |Avi |2 = σi2 (A).
j=1 j=1 i=1 i=1 j=1 i=1 i=1

�
n �
n �
d
But |aj |2 = a2jk , the sum of squares of all the entries of A. Thus, the sum of
j=1 j=1 k=1
squares of the singular values of A is indeed the square of the “whole content of A”, i.e.,
the sum of squares of all the entries. There is an important norm associated with this
quantity, the Frobenius norm of A, denoted ||A||F defined as
��
||A||F = a2jk .
j,k

Lemma 4.2 For any matrix � 2A, the sum2 of squares of the singular values equals the
Frobenius norm. That is, σi (A) = ||A||F .

Proof: By the preceding discussion.

A matrix A can be described fully by how it transforms the vectors vi . Every vector
v can be written as a linear combination of v1 , v2 , . . . , vr and a vector perpendicular
to all the vi . Now, Av is the same linear combination of Av1 , Av2 , . . . , Avr as v is of
v1 , v2 , . . . , vr . So the Av1 , Av2 , . . . , Avr form a fundamental set of vectors associated
with A. We normalize them to length one by
1
ui = Avi .
σi (A)

113
The vectors u1 , u2 , . . . , ur are called the left singular vectors of A. The vi are called the
right singular vectors. The SVD theorem (Theorem 4.5) will fully explain the reason for
these terms.

Clearly, the right singular vectors are orthogonal by definition. We now show that the
�r
left singular vectors are also orthogonal and that A = σi ui viT .
i=1

Theorem 4.3 Let A be a rank r matrix. The left singular vectors of A, u1 , u2 , . . . , ur ,

are orthogonal.

Proof: The proof is by induction on r. For r = 1, there is only one ui so the theorem is
trivially true. For the inductive part consider the matrix

B = A − σ1 u1 v1T .

The implied algorithm in the definition of singular value decomposition applied to B is

identical to a run of the algorithm on A for its second and later singular vectors and sin-
gular values. To see this, first observe that Bv1 = Av1 − σ1 u1 v1T v1 = 0. It then follows
that the first right singular vector, call
� it z, �of B will be perpendicular to v1 since if it
� z−z1 � |Bz|
had a component z1 along v1 , then, �B |z−z 1|
� = |z−z1 | > |Bz|, contradicting the arg max
definition of z. But for any v perpendicular to v1 , Bv = Av. Thus, the top singular
vector of B is indeed a second singular vector of A. Repeating this argument shows that
a run of the algorithm on B is the same as a run on A for its second and later singular
vectors. This is left as an exercise.

Thus, there is a run of the algorithm that finds that B has right singular vectors
v2 , v3 , . . . , vr and corresponding left singular vectors u2 , u3 , . . . , ur . By the induction
hypothesis, u2 , u3 , . . . , ur are orthogonal.

It remains to prove that u1 is orthogonal to the other ui . Suppose not and for some
i ≥ 2, uT1 ui �= 0. Without loss of generality assume that uT1 ui > 0. The proof is symmetric
for the case where uT1 ui < 0. Now, for infinitesimally small ε > 0, the vector
� �
v1 + εvi σ1 u1 + εσi ui
A = √
|v1 + εvi | 1 + ε2
has length at least as large as its component along u1 which is
σ1 u1 + εσi ui � �� ε2
� � �
uT
1 ( √ ) = σ 1 + εσ u T
i 1 iu 1 − 2
+ O (ε 4
) = σ1 + εσi uT1 ui − O ε2 > σ1
1 + ε2
a contradiction. Thus, u1 , u2 , . . . , ur are orthogonal.

114
4.2 Singular Value Decomposition (SVD)

Let A be an n×d matrix with singular vectors v1 , v2 , . . . , vr and corresponding singular

values σ1 , σ2 , . . . , σr . Then ui = σ1i Avi , for i = 1, 2, . . . , r, are the left singular vectors and
by Theorem 4.5, A can be decomposed into a sum of rank one matrices as
r
�
A= σi ui viT .
i=1

We first prove a simple lemma stating that two matrices A and B are identical if
Av = Bv for all v. The lemma states that in the abstract, a matrix A can be viewed as
a transformation that maps vector v onto Av.

Lemma 4.4 Matrices A and B are identical if and only if for all vectors v, Av = Bv.

Proof: Clearly, if A = B then Av = Bv for all v. For the converse, suppose that
Av = Bv for all v. Let ei be the vector that is all zeros except for the ith component
which has value 1. Now Aei is the ith column of A and thus A = B if for each i, Aei = Bei .

Theorem 4.5 Let A be an n × d matrix with right singular vectors v1 , v2 , . . . , vr , left

singular vectors u1 , u2 , . . . , ur , and corresponding singular values σ1 , σ2 , . . . , σr . Then
r
�
A= σi ui viT .
i=1

�
r
Proof: For each singular vector vj , Avj = σi ui viT vj . Since any vector v can be ex-
i=1
pressed as a linear combination of the singular vectors plus a vector perpendicular to the
�r �r
vi , Av = σi ui viT v and by Lemma 4.4, A = σi ui viT .
i=1 i=1

The decomposition is called the singular value decomposition, SVD, of A. In matrix

notation A = U DV T where the columns of U and V consist of the left and right singular
vectors, respectively, and D is a diagonal matrix whose diagonal entries are the singular
values of A.

For any matrix A, the sequence of singular values is unique and if the singular values
are all distinct, then the sequence of singular vectors is unique also. However, when some
set of singular values are equal, the corresponding singular vectors span some subspace.
Any set of orthonormal vectors spanning this subspace can be used as the singular vectors.

115
D VT
r×r r×d
A U
n×d = n×r

Figure 4.2: The SVD decomposition of an n × d matrix.

4.3 Best Rank k Approximations

There are two important matrix norms, the Frobenius norm denoted ||A||F and the
2-norm denoted ||A||2 . The 2-norm of the matrix A is given by

max |Av|
|v|=1

and thus equals the largest singular value of the matrix.

Let A be an n × d matrix and think of the rows of A as n points in d-dimensional

space. The Frobenius norm of A is the square root of the sum of the squared distance of
the points to the origin. The 2-norm is the square root of the sum of squared distances
to the origin along the direction that maximizes this quantity.

Let r
�
A= σi ui viT
i=1

be the SVD of A. For k ∈ {1, 2, . . . , r}, let

k
�
Ak = σi ui viT
i=1

be the sum truncated after k terms. It is clear that Ak has rank k. Furthermore, Ak is
the best rank k approximation to A when the error is measured in either the 2-norm or
the Frobenius norm.

Lemma 4.6 The rows of Ak are the projections of the rows of A onto the subspace Vk
spanned by the first k singular vectors of A.

116
Proof: Let a be an arbitrary row vector. Since the vi are orthonormal, the projection
�
k
of the vector a onto Vk is given by (a · vi )vi T . Thus, the matrix whose rows are the
i=1
�
k
projections of the rows of A onto Vk is given by Avi viT . This last expression simplifies
i=1
to
k
� k
�
T
Avi vi = σi ui vi T = Ak .
i=1 i=1

The matrix Ak is the best rank k approximation to A in both the Frobenius and the
2-norm. First we show that the matrix Ak is the best rank k approximation to A in the
Frobenius norm.

Theorem 4.7 For any matrix B of rank at most k

�A − Ak �F ≤ �A − B�F

Proof: Let B minimize �A − B�2F among all rank k or less matrices. Let V be the space
spanned by the rows of B. The dimension of V is at most k. Since B minimizes �A − B�2F ,
it must be that each row of B is the projection of the corresponding row of A onto V ,
otherwise replacing the row of B with the projection of the corresponding row of A onto V
does not change V and hence the rank of B but would reduce �A − B�2F . Since each row
of B is the projection of the corresponding row of A, it follows that �A − B�2F is the sum
of squared distances of rows of A to V . Since Ak minimizes the sum of squared distance
of rows of A to any k-dimensional subspace, it follows that �A − Ak �F ≤ �A − B�F .

Next we tackle the 2-norm. We first show that the square of the 2-norm of A − Ak is
the square of the (k + 1)st singular value of A,

Lemma 4.8 �A − Ak �22 = σk+1

2
.
�
r
Proof: Let A = σi ui vi T be the singular value decomposition of A. Then Ak =
i=1
�
k �
r
σi ui vi T and A − Ak = σi ui vi T . Let v be the top singular vector of A − Ak .
i=1 i=k+1
�
r
Express v as a linear combination of v1 , v2 , . . . , vr . That is, write v = αi vi . Then
i=1
� � � �
�� r �r � �� r �
� � � �
|(A − Ak )v| = � σ i ui v i T αj vj � = � α i σ i ui v i T v i �
� � � �
i=k+1 j=1 i=k+1
� � �
�� r � ��
� � � r
= � α i σ i ui � = � αi2 σi2 .
� �
i=k+1 i=k+1

117
�
r
The v maximizing this last quantity, subject to the constraint that |v|2 = αi2 = 1,
i=1
occurs when αk+1 = 1 and the rest of the αi are 0. Thus, �A − Ak �22 = σk+1
2
proving the
lemma.

Finally, we prove that Ak is the best rank k 2-norm approximation to A.

Theorem 4.9 Let A be an n × d matrix. For any matrix B of rank at most k

�A − Ak �2 ≤ �A − B�2

Proof: If A is of rank k or less, the theorem is obviously true since �A − Ak �2 = 0. Thus

assume that A is of rank greater than k. By Lemma 4.8, �A − Ak �22 = σk+1 2
. Now suppose
there is some matrix B of rank at most k such that B is a better 2-norm approximation to
A than Ak . That is, �A − B�2 < σk+1 . The null space of B, Null (B), (the set of vectors
v such that Bv = 0) has dimension at least d − k. Let v1 , v2 , . . . , vk+1 be the first k + 1
singular vectors of A. By a dimension argument, it follows that there exists a z �= 0 in

Null (B) ∩ Span {v1 , v2 , . . . , vk+1 } .

Scale z so that |z| = 1. We now show that for this vector z, which lies in the space of the
first k + 1 singular vectors of A, that (A − B) z ≥ σk+1 . Hence the 2-norm of A − B is at
least σk+1 contradicting the assumption that �A − B�2 < σk+1 . First

�A − B�22 ≥ |(A − B) z|2 .

Since Bz = 0,
�A − B�22 ≥ |Az|2 .
Since z is in the Span {v1 , v2 , . . . , vk+1 }
� �2
��n � n
� � k+1
�2 � � �2 k+1
� � T �2
� �
|Az|2 = � σi ui vi T z� = σi2 vi T z = σi2 vi T z ≥ σk+1
2 2
vi z = σk+1 .
� �
i=1 i=1 i=1 i=1

It follows that
�A − B�22 ≥ σk+1
2

contradicting the assumption that ||A − B||2 < σk+1 . This proves the theorem.

4.4 Power Method for Computing the Singular Value Decom-

position
Computing the singular value decomposition is an important branch of numerical
analysis in which there have been many sophisticated developments over a long period of
time. Here we present an “in-principle” method to establish that the approximate SVD
of a matrix A can be computed in polynomial time. The reader is referred to numerical

118
analysis texts for more details. The method we present, called the Power Method, is
conceptually simple.�The word power refers to taking high powers of the matrix B = AAT .
If the SVD of A is σi ui viT , then by direct multiplication
i
� ��
� �
B = AAT = σi ui viT σj vj uTj
i j
� �
= σi σj ui viT vj uTj = σi σj ui (viT · vj )uTj
i,j i,j
�
= σi2 ui uTi ,
i

since viT vj is the dot product of the two vectors and is zero unless i = j. [Caution: ui uj T
is a matrix and is not zero even for i �= j.] Using the same kind of calculation,
�
Bk = σi2k ui uTi .
i

As k increases, for i > 1, σi2k /σ12k goes to zero and B k is approximately equal to
σ12k u1 uT1
provided that for each i > 1, σi (A) < σ1 (A).

This suggests a way of finding σ1 and u1 , by successively powering B. But there are
two issues. First, if there is a significant gap between the first and second singular values
of a matrix, then the above argument applies and the power method will quickly converge
to the first left singular vector. Suppose there is no significant gap. In the extreme case,
there may be ties for the top singular value. Then the above argument does not work. We
overcome this problem in Theorem 4.11 below which states that even with ties, the power
method converges to some vector in the span of those singular vectors corresponding to
the “nearly highest” singular values.

A second issue is that computing B k costs k matrix multiplications when done in

a straight-forward manner or O (log k) when done by successive squaring. Instead we
compute
Bkx
where x is a random unit length vector. Each increase in k requires a matrix-vector
product which takes time proportional to the number of nonzero entries in B. Further
saving may be achieved by writing
� �
B k x = AAT B k−1 x .
Now the cost is proportional to the number of nonzero entries in A. Since B k x ≈
σ12k u1 (uT1 · x) is a scalar multiple of u1 , u1 can be recovered from B k x by normaliza-
tion.

119
1√
20 d

1√
Figure 4.3: The volume of the cylinder of height 20 d
is an upper bound on the volume
of the hemisphere below x1 = 201√d

We start with a technical Lemma needed in the proof of the theorem.

Lemma 4.10 Let (x1 , x2 , . . . , xd ) be a unit d-dimensional vector picked at random. The
probability that |x1 | ≥ 201√d is at least 9/10.

Proof: We first show that for a vector v picked at random with |v| ≤ 1, the probability
that v1 ≥ 201√d is at least 9/10. Then we let x = v/|v|. This can only increase the value
of v1 , so the result follows.

Let α = 201√d . The probability that |v1 | ≥ α equals one minus the probability that
|v1 | ≤ α. The probability that |v1 | ≤ α is equal to the fraction of the volume of the unit
sphere with |v1 | ≤ α. To get an upper bound on the volume of the sphere with |v1 | ≤ α,
consider twice the volume of the unit radius cylinder of height α. The volume of the
portion of the sphere with |v1 | ≤ α is less than or equal to 2αA(d − 1) and
2αA(d − 1)
Prob(|v1 | ≤ α) ≤
V (d)
Now the volume of the �
unit radius sphere is at least twice the volume of the cylinder of
1 1
height √d−1 and radius 1 − d−1 or

2 1 d−2
V (d) ≥ √ V (d − 1)(1 − ) 2
d−1 d−1

120
Using (1 − x)a ≥ 1 − ax

2 d−2 1 V (d − 1)
V (d) ≥ √ A(d − 1)(1 − )≥ √
d−1 2 d−1 d−1
and √
2αV (d − 1) d−1 1
Prob(|v1 | ≤ α) ≤ 1 ≤ √ ≤ .
√
d−1
V (d − 1) 10 d 10
1√
Thus the probability that v1 ≥ 20 d
is at least 9/10.

Theorem 4.11 Let A be an n × d matrix and x a random unit length vector. Let V be
the space spanned by the left�singular � vectors of A corresponding to singular values greater
ln(n/ε)
than (1 − ε) σ1 . Let k be Ω ε
. Let w be unit vector after k iterations of the power
method, namely,
� T �k
AA x
w = �� .
(AA T )k x�
� �
The probability that w has a component of at least � perpendicular to V is at most 1/10.

Proof: Let r
�
A= σi ui viT
i=1

be the SVD of A. If the rank of A is less than n, then complete {u1 , u2 , . . . ur } into a
basis {u1 , u2 , . . . un } of n-space. Write x in the basis of the ui � s as
n
�
x= c i ui .
i=1

�
n �
n
Since (AAT )k = σi2k ui uTi , it follows that (AAT )k x = σi2k ci ui . For a random unit
i=1 i=1
length vector x picked independent of A, the ui are fixed vectors and picking x at random
is equivalent to picking random ci . From Lemma 4.10, |c1 | ≥ 201√n with probability at
least 9/10.

Suppose that σ1 , σ2 , . . . , σm are the singular values of A that are greater than or equal
to (1 − ε) σ1 and that σm+1 , . . . , σn are the singular values that are less than (1 − ε) σ1 .
Now � n �2
�� n
1 4k
T k 2 � 2k �
|(AA ) x| = � σ i c i ui � = σi4k c2i ≥ σ14k c21 ≥ σ1 ,
� � 400n
i=1 i=1

with probability at least 9/10. Here we used the fact that a sum of positive quantities
is at least as large as its first element and the first element is greater than or equal to

121

Singular Value Decomposition
100% (1)
Singular Value Decomposition
24 pages
MFC2 L5
No ratings yet
MFC2 L5
63 pages
Applications Revisited: 6.1. The "Best" Subspace For Given Data
No ratings yet
Applications Revisited: 6.1. The "Best" Subspace For Given Data
29 pages
Linear 3
No ratings yet
Linear 3
33 pages
MA412 Final
No ratings yet
MA412 Final
82 pages
Chapter 4
No ratings yet
Chapter 4
14 pages
Handout B: Linear Algebra Cheat Sheet: 1.1 Vectors and Matrices
100% (1)
Handout B: Linear Algebra Cheat Sheet: 1.1 Vectors and Matrices
9 pages
Mult 2023 Final 1
No ratings yet
Mult 2023 Final 1
96 pages
SVD 4
No ratings yet
SVD 4
29 pages
HW4 Solution
No ratings yet
HW4 Solution
10 pages
The Best Approximation Theorem INCOMPLETE
No ratings yet
The Best Approximation Theorem INCOMPLETE
4 pages
L14 SVD
No ratings yet
L14 SVD
8 pages
1 Applications of SVD: Least Squares Approximation: Lecture 8: October 21, 2021
No ratings yet
1 Applications of SVD: Least Squares Approximation: Lecture 8: October 21, 2021
5 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
Optimal Subspaces With PCA
No ratings yet
Optimal Subspaces With PCA
11 pages
Solns Recitation5-6 Fall24
No ratings yet
Solns Recitation5-6 Fall24
6 pages
Singular-Value Decomposition and Its Applications
No ratings yet
Singular-Value Decomposition and Its Applications
28 pages
Total Least Squares
No ratings yet
Total Least Squares
11 pages
Midtermsols Sp2010
No ratings yet
Midtermsols Sp2010
6 pages
Math Prelims
No ratings yet
Math Prelims
40 pages
Total Least Squares
No ratings yet
Total Least Squares
11 pages
Lecture-04 - Least Squares and Geometry
No ratings yet
Lecture-04 - Least Squares and Geometry
35 pages
Bo
No ratings yet
Bo
36 pages
Least Square by Nicholson-Linear Algebra-2018
No ratings yet
Least Square by Nicholson-Linear Algebra-2018
12 pages
Nonlinear Optimization (18799 B, PP) : Ist-Cmu PHD Course, Spring 2011
No ratings yet
Nonlinear Optimization (18799 B, PP) : Ist-Cmu PHD Course, Spring 2011
11 pages
Lecture Week04 PDF
No ratings yet
Lecture Week04 PDF
9 pages
a bình phương tối tiểu
No ratings yet
a bình phương tối tiểu
11 pages
Transpose & Dot Product: M N A N M A A A A A A A A
No ratings yet
Transpose & Dot Product: M N A N M A A A A A A A A
13 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
Chapter 0 - Miscellaneous Preliminaries: EE 520: Topics - Compressed Sensing Linear Algebra Review
No ratings yet
Chapter 0 - Miscellaneous Preliminaries: EE 520: Topics - Compressed Sensing Linear Algebra Review
18 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Further Mathematical Methods (Linear Algebra) Solutions For Problem Sheet 8
No ratings yet
Further Mathematical Methods (Linear Algebra) Solutions For Problem Sheet 8
15 pages
Module 2 - DS I
No ratings yet
Module 2 - DS I
94 pages
CS 532 Lecture Notes
No ratings yet
CS 532 Lecture Notes
25 pages
Singular Value Decomposition (SVD)
No ratings yet
Singular Value Decomposition (SVD)
94 pages
Tutorial On Compressed Sensing Exercises: 1. Exercise
No ratings yet
Tutorial On Compressed Sensing Exercises: 1. Exercise
12 pages
L02 Notes
No ratings yet
L02 Notes
6 pages
Compact SVD
No ratings yet
Compact SVD
10 pages
Least Squares and The Singular Value Decomposition: Ivan Markovsky
No ratings yet
Least Squares and The Singular Value Decomposition: Ivan Markovsky
52 pages
Final 4 Sem
No ratings yet
Final 4 Sem
29 pages
SVD Slides
No ratings yet
SVD Slides
17 pages
M - L Semester - LL
No ratings yet
M - L Semester - LL
8 pages
Ellipse Fitting PDF
100% (1)
Ellipse Fitting PDF
23 pages
Slides
No ratings yet
Slides
428 pages
SVD Chapter
No ratings yet
SVD Chapter
12 pages
Basic Matrix Theory
No ratings yet
Basic Matrix Theory
10 pages
The Singular Value Decomposition: Prof. Walter Gander ETH Zurich Decenber 12, 2008
No ratings yet
The Singular Value Decomposition: Prof. Walter Gander ETH Zurich Decenber 12, 2008
18 pages
Homework 4 MATH2050
No ratings yet
Homework 4 MATH2050
7 pages
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
No ratings yet
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
24 pages
CS168: The Modern Algorithmic Toolbox Lecture #9: The Singular Value Decomposition (SVD) and Low-Rank Matrix Approximations
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #9: The Singular Value Decomposition (SVD) and Low-Rank Matrix Approximations
10 pages
Properties of The Singular Value Decomposition: Preliminary Definitions
No ratings yet
Properties of The Singular Value Decomposition: Preliminary Definitions
24 pages
Singular Value Decomposition
No ratings yet
Singular Value Decomposition
24 pages
Solution 1
No ratings yet
Solution 1
9 pages
Math Primer
No ratings yet
Math Primer
13 pages
Midterm Solutions: 1: Schur, Backsubstitution, Complexity (20 Points)
No ratings yet
Midterm Solutions: 1: Schur, Backsubstitution, Complexity (20 Points)
4 pages
Ecd 01
No ratings yet
Ecd 01
16 pages
Cos323 s06 Lecture09 SVD
No ratings yet
Cos323 s06 Lecture09 SVD
24 pages
Machine Learning Unit 2 MCQ
No ratings yet
Machine Learning Unit 2 MCQ
17 pages
20RMI17 - PG - Notes - Tejaswini B J
No ratings yet
20RMI17 - PG - Notes - Tejaswini B J
50 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages
Cs421 Cheat Sheet
No ratings yet
Cs421 Cheat Sheet
2 pages
BA01 - Mathematics For Technicians Formulas: Geometry
No ratings yet
BA01 - Mathematics For Technicians Formulas: Geometry
8 pages
Pesco
No ratings yet
Pesco
6 pages
Tuck Casebook 2008 Draft
100% (2)
Tuck Casebook 2008 Draft
115 pages
Basic Aerodynamics: Lecture 12: Blade Element Analysis
No ratings yet
Basic Aerodynamics: Lecture 12: Blade Element Analysis
42 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Power System Operation Control
No ratings yet
Power System Operation Control
4 pages
فلتر
No ratings yet
فلتر
41 pages
BRM - BMB203 Unit 2 2025
No ratings yet
BRM - BMB203 Unit 2 2025
15 pages
Ecornell Brochure Email
No ratings yet
Ecornell Brochure Email
7 pages
Gas Dynamics-Rayleigh Flow
100% (4)
Gas Dynamics-Rayleigh Flow
26 pages
Dav Institutions, West Bengal Zone SESSION: 2022-2023 Divided Syllabus Class:Xii Subject: English
No ratings yet
Dav Institutions, West Bengal Zone SESSION: 2022-2023 Divided Syllabus Class:Xii Subject: English
42 pages
TCS NQT (Numerical Ability) Official Memory Based Paper 2020
100% (2)
TCS NQT (Numerical Ability) Official Memory Based Paper 2020
7 pages
JCMPHS Sy2023 Las Math9 Q4W1 Trigonometric Ratios
No ratings yet
JCMPHS Sy2023 Las Math9 Q4W1 Trigonometric Ratios
8 pages
Griffiths Problems 09.24
100% (1)
Griffiths Problems 09.24
2 pages
Itec 8500-Research Article Annotations-Walker-S
No ratings yet
Itec 8500-Research Article Annotations-Walker-S
1 page
Paper 2 2025 TTC
No ratings yet
Paper 2 2025 TTC
32 pages
TOC Analysis For Standard Methods 5310B - Shimadzu
No ratings yet
TOC Analysis For Standard Methods 5310B - Shimadzu
3 pages
Pröbsting Mahadik Schuler Hofmann
No ratings yet
Pröbsting Mahadik Schuler Hofmann
16 pages
Decision Analysis Applications
No ratings yet
Decision Analysis Applications
50 pages
Chapter I - Art of Measurement
No ratings yet
Chapter I - Art of Measurement
29 pages
0009 VAV Box Calibration
No ratings yet
0009 VAV Box Calibration
4 pages
Xi CS Revision Exam MS
No ratings yet
Xi CS Revision Exam MS
8 pages
Circle Theorems pdf2
No ratings yet
Circle Theorems pdf2
11 pages
Control Charts - An Updated Overview and Some Results (2019)
No ratings yet
Control Charts - An Updated Overview and Some Results (2019)
23 pages
Numerical Linear Algebra
No ratings yet
Numerical Linear Algebra
45 pages
Lab 1 Tools II Report - Fall - Abhinav Singh
No ratings yet
Lab 1 Tools II Report - Fall - Abhinav Singh
4 pages
Unscented Kalman Filter For Dummies - Robotics Stack Exchange
No ratings yet
Unscented Kalman Filter For Dummies - Robotics Stack Exchange
4 pages
University of Cambridge International Examinations International General Certificate of Secondary Education
No ratings yet
University of Cambridge International Examinations International General Certificate of Secondary Education
20 pages
Michael Okpara University of Agriculture, Umudike
No ratings yet
Michael Okpara University of Agriculture, Umudike
2 pages
Sheet3 Biomath
No ratings yet
Sheet3 Biomath
1 page
Forecasting Indonesia Stock Exchange (IDX)
No ratings yet
Forecasting Indonesia Stock Exchange (IDX)
8 pages
Pythontex: Fast Access To Python From Within Latex: Xie Pastell MK
No ratings yet
Pythontex: Fast Access To Python From Within Latex: Xie Pastell MK
7 pages
Exercises of Vectors and Vectorial Spaces
From Everand
Exercises of Vectors and Vectorial Spaces
Simone Malacrida
No ratings yet

S VD Chapter

Uploaded by

S VD Chapter

Uploaded by

4 Singular Value Decomposition (SVD)

The singular value decomposition of a matrix A is the factorization of A into the

x2i1 + x2i2 + · · · +2id = (length of projection)2 + (distance of point to line)2 .

See Figure 4.1. Thus

(distance of point to line)2 = x2i1 + x2i2 + · · · +2id − (length of projection)2 .

4.1 Singular Vectors

v2 = arg max |Av| .

v3 = arg max |Av|

and so on. The process stops when we have found

as singular vectors and

Theorem 4.1 Let A be an n × d matrix where v1 , v2 , . . . , vr are the singular vectors

Proof: The statement is obviously true for k = 1. For k = 2, let W be a best-fit 2-

For general k, proceed by induction. By the induction hypothesis, Vk−1 is a best-fit

|Aw1 |2 + |Aw2 |2 + · · · + |Awk |2 ≤ |Av1 |2 + |Av2 |2 + · · · + |Avk−1 |2 + |Awk |2

|Aw1 |2 + |Aw2 |2 + · · · + |Awk−1 |2 + |Awk |2 ≤ |Av1 |2 + |Av2 |2 + · · · + |Avk−1 |2 + |Avk |2 ,

proving that Vk is at least as good as W and hence is optimal.

Proof: By the preceding discussion.

Theorem 4.3 Let A be a rank r matrix. The left singular vectors of A, u1 , u2 , . . . , ur ,

The implied algorithm in the definition of singular value decomposition applied to B is

Let A be an n×d matrix with singular vectors v1 , v2 , . . . , vr and corresponding singular

Theorem 4.5 Let A be an n × d matrix with right singular vectors v1 , v2 , . . . , vr , left

The decomposition is called the singular value decomposition, SVD, of A. In matrix

Figure 4.2: The SVD decomposition of an n × d matrix.

4.3 Best Rank k Approximations

and thus equals the largest singular value of the matrix.

Let A be an n × d matrix and think of the rows of A as n points in d-dimensional

be the SVD of A. For k ∈ {1, 2, . . . , r}, let

Theorem 4.7 For any matrix B of rank at most k

Lemma 4.8 �A − Ak �22 = σk+1

Finally, we prove that Ak is the best rank k 2-norm approximation to A.

Theorem 4.9 Let A be an n × d matrix. For any matrix B of rank at most k

Proof: If A is of rank k or less, the theorem is obviously true since �A − Ak �2 = 0. Thus

Null (B) ∩ Span {v1 , v2 , . . . , vk+1 } .

�A − B�22 ≥ |(A − B) z|2 .

4.4 Power Method for Computing the Singular Value Decom-

A second issue is that computing B k costs k matrix multiplications when done in

We start with a technical Lemma needed in the proof of the theorem.

You might also like