Sol 11
Sol 11
(a) Consider two vectors ~x ∈ Rm and ~y ∈ Rn , what is the dimension of the matrix ~x~y > and what is the
rank of it?
h i>
Solution: Let ~y = y1 y2 . . . yn , then
~x~y > = y1 ~x y2 ~x . . . yn ~x
We can see that each column of ~x~y > is a multiple of ~x, thus it has rank 1.
(b) Consider a matrix A ∈ Rm×n and the rank of A is r. Suppose its SVD is A = U ΣV > where
U ∈ Rm×m , Σ ∈ Rm×n , and V ∈ Rn×n . Can you write A in terms of the singular values of A and
outer products of the columns of U and V ?
Solution: We have A = ri=1 σi u~i v~i > where u~i and v~i are the columns of U and V . Note that we
P
only sum to r since A has rank r and hence it has r non-zero singular values σ1 , ..., σr . This is the
outer product form of the SVD.
2. Proofs
In this problem we will review some of the important proofs
h we saw
i in the lecture as practice. Let’s define
>
S = A A where A is an arbitrary m × n matrix. V = ~v1 · · · ~vn is the matrix of normalized eigenvectors
of S with eigenvalues λ1 , · · · , λn .
(a) Let v~1 , · · · , v~r be those normalized eigenvectors that correspond to non-zero eigenvalues (i.e. k~vi k = 1
and λi 6= 0 for i = 1, · · · , r). Show that kA~vi k2 = λi .
Solution: We know that vi is an eigenvector of S, so we can start with:
S v~i = λi v~i
~vi> S~vi = λi~vi>~vi .
~vi> S~vi = λi k~vi k2 .
On the second step, we multiplied both sides with ~vi> . Sincek~vi k2 = 1 the right hand side will simplify
to λi . By substituting S = A> A:
(b) Following the assumptions in part (a), show that A~vi is orthogonal to A~vj .
Solution: To show that two vectors are orthogonal we have to take the inner product and show that
it is zero:
(c) Show that if V ∈ Rn×n is an orthonormal square matrix, then ||~x||2 = ||V ~x||2 for all ~x ∈ Rn . Hint:
Write the norm as an inner product instead of trying to do this elementwise.
Solution:
||V ~x||2 =< V ~x, V ~x >= ~x> V > V ~x = ~x> In×n ~x = ||~x||2 .
3. SVD
(c) Write out a singular value decomposition of A = U ΣV > using the previous part. Note the
ordering of the singular values in Σ should be from the largest to smallest.
(HINT: From the previous part A = BDI3×3 . Now, re-order to have eigenvalues in decreasing order.)
Solution:
√
− √1 √1 √5
3 14 3 42 14 √0 0 1 0 0
A = BD = BDI = √ √1 − √142 3 √0 0 1 0 .
0
14 3
√2 − √13 √4 0 0 42 0 0 1
14 42
Reordering the singular values and corresponding left and right singular vectors, we have the SVD:
√
√5 √1 √1
42
− 14 3 42 0 0 0 0 1
1 √
√3 √1
− √ 14 √0 1 0 0
3 0
42 14
√4 √2 − √13 0 0 3 0 1 0
42 14
where: " #
√1 √1
U= 2 2
− √12 √1
2
" #
2 √0 0
Σ=
0 2 0
0 − √12 √1
2
V> = 1 0 0
0 √12 √1
2
It is a good idea to be able to calculate the SVD yourself as you may be asked to solve similar questions
on your own in the exam. For this subpart, verify that this decomposition of A is correct. You may use
a computer to do the matrix multiplication if you want, but it is better to verify by hand.
Solution: You may give yourself full credit for this subpart if you showed the multiplication of the
matrices out to get A, or stated that you did this using a computer. Though you did not have to do any
work for this sub-part the following solutions will walk you through how to solve for the SVD:
A = U ΣV >
. #"
3 −1
AA> = .
−1 3
and " #
√1 √1
U= 2 2
− √12 √1
2
We can then solve for the ~v vectors using A> ~ui = σi~vi , producing ~v1 = [0, − √12 , √12 ]> and ~v2 =
[1, 0, 0]> . The last ~v must be orthonormal to the other two, so we can pick [0, √12 , √12 ]> .
The SVD is:
" #" # 0 − √1 √1
√1 √1 2 √0 0 2 2
2 2
A= 1 0 0
− √12 √1 0 2 0
2 0 √12 √1
2
(b) Let us now think about what the SVD does. Consider a rank m matrix A ∈ Rm×n , with n > m. Let
U ∈ Rm×m , Σ ∈ Rm×n , V ∈ Rn×n . Let us look at matrix A acting on some vector ~x to give the result
~y . We have
A~x = U ΣV > ~x = ~y .
We can think of V > ~x as a rotation of the vector ~x, then Σ as a scaling, and U as another rotation
(multiplication by an orthonormal matrix does not change the norm of a vector, try to verify this for
yourself). We will try to "reverse" these operations one at a time and then put them together to construct
the Moore-Penrose pseudoinverse.
If U “rotates” the vector ΣV > ~x, what operator can we derive that will undo the rotation?
Solution: By orthonormality, we know that U > U = U U > = I. Therefore, U > undoes the rotation.
(c) Derive a matrix Σe that will "unscale", or undo the effect of Σ where it is possible to undo. Recall
that Σ has the same dimensions as A.
Hint: Consider
1
σ0 0 ... 0
0 1
... 0
. σ.1
. .. . . . .
..
.
1
Σ
e = 0 0 . . . σm−1 .
0 0 ... 0
.. .. .. ..
. . . .
0 0 ... 0
you can see that σi xi = yei for i = 0, ..., m − 1, which means that to obtain xi from yi , we need to
multiply yi by σ1i . For any i > m − 1, the information in xi is lost by multiplying with 0. If the
corresponding yei 6= 0, there is no way of solving this equation. No solution exists, and we have to
accept an approximate solution. If the corresponding yei = 0, then any xi would still work. Either way,
it is reasonable to just say xi is 0 in the case that σi = 0. That’s why we can legitimately pad 0s in the
bottom of Σ e given below:
1
σ0 0 ... 0
0 1
σ1 ... 0
σ0 0 0 0 0 ... 0 .
. .. ..
. . ... .
0 σ1 0 0 0 ... 0
1
If Σ =
.. .. .. .. .. then Σ 0
e = 0 ... σm−1
. . . . . ... 0
0 0 ... 0
0 0 0 σm−1 0 . . . 0 ..
.. .. ..
. . . .
0 0 ... 0
Σ~
e y = ΣΣ~
e x
1
σ0 0 ... 0
0 1
σ1 ... 0
. .. .. σ0 0 0 0 0 ... 0
.
. . ... .
0 σ1 0 0 0 ... 0
1
=0 0 . . . σm−1 . ~x
.. .. .. ..
.. . . . . ... 0
0 0 ... 0 0 0 0 σm−1 0 ... 0
.. .. .. ..
. . . .
0 0 ... 0
" #
Σ−1
1
h i
= Σ1 0m×(n−m) ~x
0(n−m)×m
" #
Im×m 0m×(n−m)
= ~x
0(n−m)×m 0(n−m)×(n−m)
x1
.
..
x
= m
0
.
.
.
0
Therefore, we are able to recover all xi for 1 ≤ i ≤ m with Σ. e The rest of the entries of ~x are lost
when multiplied by the zeros in Σ and thus it’s not possible to recover them.
where ~x̂ is the recovered ~x. The reason why the word inverse is in quotes (or why this is called a
pseudo-inverse) is because we’re ignoring the "divisions" by zero and ~x̂ isn’t exactly equal to ~x.
Solution: You may give yourself full credit for this subpart if you write out A† = V ΣU e >.
To see how we get this, recall that ~y = A~x = U ΣV > ~x. We get ~y by first rotating ~x by V > , then
scaling the resulting vector by Σ, and finally rotating by U . To undo the operations, we should first
"unrotate" ~y by U > , then "unscale" it by Σ,
e and finally "unrotate" it by V . Therefore, we have
e > ~y
~x̂ = V ΣU
~y = A~x = U ΣV > ~x
U > ~y = U > U ΣV > ~x Unrotating by U
U > ~y = Im×m ΣV > ~x Unrotating by U
>
U ~y = ΣV > ~x Unrotating by U
e > ~y = ΣΣV
ΣU e >~x Unscaling by Σ
e
V ΣU
e > ~y = V ΣΣV
e >~x Unrotating by V
where (V > ~x)i is the ith entry of V > ~x. This gives the coordinate of ~x in the basis of V with the
components of ~x in the null space of A zeroed out.
Finally, we transform this vector back to the standard basis by V and this gives ~x̂ = V (ΣΣVe >~x).
~
Therefore, x̂ has no component in the null space of A.
(f) Use A† to solve for a vector ~x in the following system of equations.
" # " #
1 −1 1 2
~x =
1 1 −1 4
~x = A† ~y = V ΣUe > ~y
0 1 0 1
0 "
√1
#" #
1 1 2 − √12 2
= − √2 0 √2 0
√1 2
2 √1 √1 4
√1 0 √12 0 0 2 2
2
1 1 " # 3
21 2
1 2
= − 4 = 21
4 4
1 1
4 − 4 − 21
(g) Now we will see why this matrix, A† = V ΣU e > , is a useful proxy for the matrix inverse in such
circumstances. Show that the solution given by the Moore-Penrose pseudoinverse satisfies the
minimality property that if ~x̂ = A† ~y is the pseudo-inverse solution to A~x = ~y , then ~x̂ has no
component in the nullspace of A.
Hint: To show this recall that for any vector ~x, the vector V −1 ~x represents the coordinates of ~x in the
basis of the columns of V . Compute V −1 ~x and show that the last (n − m) rows are all zero.
This minimality property is useful in many applications. You saw a control application in lecture. This
is also used all the time in machine learning, where it is connected to the concept behind what is called
ridge regression or weight shrinkage.
Solution:
Since ~x̂ is the pseudoinverse solution, we know that,
e > ~y
~x̂ = V ΣU
Let us write down what ~x̂ is with respect to the orthonormal basis formed by the columns of V .
Let there be k non-zero singular values. The following expression comes from expanding the matrix
multiplication.
Note that V > ~x is the coordinate of ~x in the V basis and recall that {~vm+1 , ..., v~n } is a basis of the null
space of A. Since the last n − m entries of V > ~x̂ are all zeros, this means that ~x̂ has no component in
the direction of ~vm+1 , ..., v~n , i.e. ~x̂ has no component in the null space of A.
(h) Consider a generic wide matrix A. We know that A can be written using A = U ΣV > where U and
V each are the appropriate size and have orthonormal columns, while Σ is the appropriate size and is
a diagonal matrix — all off-diagonal entries are zero. Further assume that the rows of A are linearly
independent. Prove that A† = A> (AA> )−1 .
(HINT: Just substitute in U ΣV > for A in the expression above and simplify using the properties you
know about U, Σ, V . Remember the transpose of a product of matrices is the product of their transposes
in reverse order: (CD)> = D> C > .)
Solution:
We just substitute in to see what happens:
A> (AA> )−1 = (U ΣV > )> (U ΣV > (U ΣV > )> )−1 (2)
> > > > > −1
= V Σ U (U ΣV V Σ U ) (3)
> > > > −1
= V Σ U (U (ΣΣ )U ) (4)
= V Σ> U > U (ΣΣ> )−1 U > (5)
> > −1 >
= V Σ (ΣΣ ) U . (6)
At this point, we are almost done in reaching A† = V ΣU e > . We have the leading V and the ending
U > . All that we need to do is multiply out the diagonal matrices in the middle.
σ0 0 · · · 0
0 σ1 · · · 0
σ0 0 0 0 0 ... 0 σ0 0 0 0 0 ... 0 . . ..
.. ...
.. .
0 σ 0 0 0 ... 0 > 0 σ1 0 0 0 ... 0
1
Σ> (ΣΣ> )−1 )−1
= ( . . . . . ) ( . .. .. .. .. 0 0 · · · σm−1
. .. .. .. .. . . .
. 0 .. . . . . ... 0
0 0 ··· 0
0 0 0 σm−1 0 . . . 0 0 0 0 σm−1 0 . . . 0 .
.. .. ..
.. . . .
0 0 ... 0
(7)
σ0 0 0 0 0 ... 0 σ02 0 . . . 0
0 σ1 0 0 0 . . . 0 > 0 σ12 . . . 0 −1
= (
.. .. .. .. .. ) ( . . . . . . .
)
.. (8)
. . . . . . . . 0 .. .
0 0 0 σm−1 0 ... 0 0 0 2
0 σm−1
σ0 0 ... 0
0 σ1 ... 0
1
. .. .. σ02 0 . . . 0
. 1
. . ... .
0 ... 0
σ12
=0 0 . . . σm−1 . . (9)
. .. ... ..
0 0 ... 0 . .
1
0 0 0
. .. .. ..
. 2
σm−1
. . . .
0 0 ... 0
1
σ0 0 ... 0
0 1
... 0
. σ.1
. .. ..
. ... .
1
=0 0 . . . σm−1 = Σ. (10)
e
0 0 ... 0
.. .. .. ..
. . . .
0 0 ... 0
the path of weighing — the norm of the error is minimized. OMP iterates that to also follow the path
of counting, where the number of nonzero variables corresponds to the things that are counted. The
Moore-Penrose Psuedoinverse is fully in the path of weighing.
Both of these paths grow into major themes in machine learning generally, and both play a very impor-
tant role in modern machine learning in particular. This is because in many contemporary approaches
to machine learning, we try to learn models that have more parameters than we have data points.
5. Frobenius Norm
In this problem we will investigate the basic properties of the Frobenius norm.
qP
N N 2
Similar to how the norm of vector ~x ∈ R is defined as kxk = i=1 xi , the Frobenius norm of a matrix
A ∈ RN ×N is defined as v
uN N
uX X
kAkF = t |Aij |2 .
i=1 j=1
Aij is the entry in the ith row and the j th column. This is basically the norm that comes from treating a
matrix like a big vector filled with numbers.
Think about how this generalizes to n × n matrices. Note: The trace of a matrix is the sum of its
diagonal entries. For example, let A ∈ RN ×N , then,
N
X
Tr{A} = Aii
i=1
Solution: This proof is for the general case of n × n matrices. You should give yourself full credit
if you did this calculation only on the 2 × 2 case.
N
X
Tr{A> A} = (A> A)ii (11)
i=1
N
X N
X
= (A> )ij Aji (12)
i=1 j=1
N
X N
X
= Aji Aji (13)
i=1 j=1
N X
X N
= (A2ji ) (14)
i=1 j=1
= kAk2F (15)
In the above solution, step 11 writes out the trace definition, step 12 expands the matrix multiplication
on the diagonal indices (i.e. index (i, i) is the inner product of row i and column i), step 13 applies the
definition of matrix transpose, and the last two steps collects the result into the definition of Frobenius
norm.
(b) Show that if U and V are square orthonormal matrices, then
Another path is to note that the Frobenius norm squared of a matrix is the sum of squared Euclidean
norms of the columns of the matrix. Matrix multiplication U A proceeds to act on each column of A
independently. None of those norms change since U is orthonormal, and so the Frobenius norm also
doesn’t change.
To show the second equality, we must note that A> = kAkF , because we are just summing over
F
the same numbers, just in a different order. Hence:
But the transpose of an orthonormal matrix is also orthonormal, hence this case reduces to the previous
case.
qP
N 2
(c) Use the SVD decomposition to show that kAkF = i=1 σi , where σ1 , · · · , σN are the singular
values of A.
(HINT: The previous part might be quite useful.)
Solution:
ei ≥ 0.
(a) Show that λ
(HINT: You want to involve ~uTi ~ui somehow.)
Solution: We know that ui is an eigenvector of Q, so we can start with:
λ
ei u~i = Qu~i
ei ~uT ~ui = ~uT Q~ui
λ i i
ei k~ui k2 = ~uT Q~ui
λ i
ei k~ui k2 = ~uT M M T ~ui
λ i
ei k~ui k2 = (M T ~ui )T (M T ~ui )
λ
2
ei k~ui k2 = M T ~ui
λ
Since we know that any squared quantity is positive, the right side of the equation is positive. This
means that λ
ei must also be positive.
T
(b) Suppose that we define w~ i = M√ ~ui for all i for which λ
ei > 0. Suppose that there are ` such eigenval-
λ
ei
ues. Show that W = [w ~ 2, · · · , w
~ 1, w ~ ` ] has orthonormal columns.
Solution: To show orthonormality, we can compute the inner product w ~ iT w
~ j for all i, j ∈ 1, 2, . . . `
and demonstrate that w T
~i w~ j = 1 if i = j and w T
~i w~ j = 0 if i 6= j:
M T ~ui M T ~uj
~ iT w
w ~ j = ( q )T ( q )
λ
ei λ
ej
~uTi M M T ~uj
~ iT w
w ~j = q q
λi λ
e ej
(a) What sources (if any) did you use as you worked through the homework?
(b) If you worked with someone on this homework, who did you work with?
List names and student ID’s. (In case of homework party, you can also just describe the group.)
Contributors:
• Kourosh Hakhamaneshi.
• Elena Jia.
• Siddharth Iyer.
• Justin Yim.
• Anant Sahai.
• Gaoyue Zhou.
• Aditya Arun.