Ecd 01
Ecd 01
1 0 0 0 1 i3 i6 i9
2 3
The corresponding eigenvalues are λ1 = 1, λ2 = i, λ3 = i = −1, λ4 = i = −i.
1 1 1 1
1 1 i −1 −i
Q=
2 1
i2 1 −1
1 i3 −1 i
1 1 1 1 1 1 1 1 4 0 0 0
T 1 1 −i −1 i 1 i −1 −i 1 0 4 0 0
Q Q= = =I
4 1 −1 1 −1 1 −1 1 −1 4 0 0 4 0
1 i −1 −i 1 −i −1 i 0 0 0 4
I.6 12 Find three eigenvalues and eigenvectors of A.
1 2 1 2
A = 2 2 1 2 = 4 2 4
1 2 1 2
1 0 1
x1 = 2 , λ1 = 6, x2 = −2 , λ2 = 0, x3 = −2 , λ2 = 0
1 1 0
I.6 16 Suppose A = XΛX −1 . What is the eigenvalue matrix for A + 2I? What is the eigenvector
matrix?
A + 2I = XΛX −1 + X(2I)X −1 = X(2I + Λ)X −1
Therefore, the eigenvalue matrix is Λ + 2I and the eigenvector matrix is still X.
I.6 19 If the eigenvalues of A are (2, 2, 5) then A is positive definite, and therefore:
(a) invertible? Yes, none of the eigenvalues are zero.
(b) diagonalizable? Not necessarily, two of the eigenvalues are the same.
(c) not diagonalizable? Not necessarily, distinct eigenvalues is sufficient but not necessary
to prove diagonalizable.
I.7 23 Suppose C is positive definite and A has independent columns. Apply the energy test to
xT AT CAx to show that S = AT CA is positive definite.
If xT AT CAx is positive for all x, then S is positive definite. We can rewrite this in the
following way:
xT AT CAx = (Ax)T C(Ax)
For y = Ax,
xT AT CAx = y T Cy
which is positive for all y, since C is positive definite. Since A has independent columns, it
has full rank, and is therefore invertible. For any x, we can find a y such that Ax = y by
taking x = A−1 y. Therefore, AT CA is positive definite.
I.7 24 For F1 (x, y) = 14 x4 + x2 y + y 2 and F2 (x, y) = x3 + xy − x find the second derivative matrix
H2 . Test for minimum, find the saddle point of F2 .
6x 1
H2 =
1 0
(6x − λ)(−λ) − 1 = 0
λ2 − 6xλ − 1 = 0
p
λ = 3x ± 9x2 + 1
√
For H2 to be positive definite, we need λ = 3x − 9x2 + 1 > 0, but this is never true because
p √
9x2 + 1 > 9x2 = 3x
so this function fails the test for a minimum.
The saddle point is where
∂F ∂F ∂F
= = =0
∂x ∂y ∂z
. For F2 , this is
3x2 + y − 1 = x = 0 = 0
which gives us (x, y) = (0, 1) as the saddle point.
I.7 26 Without multiplying, find:
(a) the determinant of S: multiply determinants. This gives 1*10*1 = 10.
(b) the eigenvalues of S: the values along the diagonal of the central matrix. These are
(2, 5).
cos θ
(c) the eigenvectors of S: the columns of the outer matrices. This gives x1 = and
sin θ
− sin θ
x2 = .
cos θ
(d) a reason why S is symmetric positive definite: the eigenvalues are all positive, and the
eigenvectors are orthogonal.
I.7 28 Suppose S is positive definite with eigenvalues λ1 ≥ λ2 . . . ≥ λn .
(a) What are the eigenvalues of λ1 I − S? Is it positive semidefinite?
If we take the eigenvectors v1 , v2 , . . . vn and multiple them by λ1 I − S we get
(λ1 I − S)vi = λ1 vi − Svi = λ1 − λi vi = (λ1 − λi )vi
Our new eigenvalues are then 0, λ1 −λ2 , λ1 −λ3 , . . . λ1 −λn , and it is positive semidefinite
since due to how we defined our λs, these are all greater than or equal to zero.
(b) How does it follow that λ1 xT x ≥ xT Sx for every x?
Since λ1 I − S is positive semidefinite, we have for all x
xT (λ1 I − S)x ≥ 0
(λ1 xT I − xT S)x ≥ 0
λ1 xT x − xT Sx ≥ 0
λ1 xT x ≥ xT Sx
(c) Draw this conclusion: The maximum value of xT Sx/xT x is λ1 .
Taking our previous equation and diving both sides by xT x, we get (for all x)
λ1 ≥ xT Sx/xT x
We can set x = v1 to get
(v1T (Sv1 ))/(v1T v1 ) = (v1T λ1 v1 )/(v1T v1 ) = λ1
And since we have show that xT Sx/xT x is no larger than λ1 for all possible x, and that
it attains λ1 for at least one x, λ1 must be the maximum value.
I.8 7 What is the norm ||A − σ1 u1 v1T || when that largest rank one piece of A is removed? What
are all the singular values of this reduced matrix, and its rank?
Since we can write A as a sum of σi ui viT , the norm of A without the largest rank one piece
is just the second largest σi . The singular values of this reduced matrix are the same but
instead of σ1 there’s a 0, and its rank is rank(A) − 1.
0 2 0
I.8 8 Find the σ’s and v’s and u’s, and verify that A = 0 0 3 = U ΣV T such that the
0 0 0
orthogonal matrices U and V are permutation matrices.
AT A has the eigenvalues 9, 4, 0. A therefore hasthe
singular
values
σ1
= 3,
σ2 = 2, σ3 = 0.
0 0 1
It has the eigenvectors correspondingly of v1 = 0 , v2 = 1 , v3 = 0. AAT has the
1 0 0
0 1 0
same eigenvalues but the eigenvectors are instead u1 = 1 , u2 = 0 , u3 = 0.
0 0 1
Putting this all together,
0 1 0 3 0 0 0 0 1
U = 1 0 0 , Σ = 0 2 0 , V = 0 1 0
0 0 1 0 0 0 1 0 0
and we can see U and V are indeed permutation matrices.
Extra Here’s my code (matlab) and the histogram:
18.065 Pset # 3 Elizabeth Chang-Davidson
I.8 4. A symmetric matrix S = S T has orthonormal eigenvectors v1 to vn . Then any vector x can
be written as c1 v1 + . . . + cn vn . The Rayleigh quotient R(x) can be written in the following
way.
xT Sx λ1 c21 + . . . + λn c2n
R(x) = =
T
x x c21 + . . . + c2n
I.8 23. Show that an m by n matrix of rank r has r(m + n − r) free parameters in its SVD: A =
U ΣV T = (m × r)(r × r)(r × n). Why do r orthonormal vectors u1 to ur have (m − 1) + (m −
2) + . . . + (m − r) parameters?
U has m − 1 choices for the first column, because it has to be normalized. There are m − 2
for the second column, because has to be normalized and orthogonal to the first one. There
are m − 3 for the third, because it has to be normalized and orthogonal to the first two. This
pattern continues until you get (m − 1) + (m − 2) + . . . + (m − r) for the rth column. This
adds up to mr − 12 r(r + 1) choices overall for U .
By an identical train of logic on rows instead of columns, V has nr − 12 r(r + 1) choices.
Σ has r possible choices down the diagonal.
Putting this all together, we get
1 1
mr − r(r + 1) + r + nr − r(r + 1) = mr + nr + r − r 2 − r = r(m + n + r)
2 2
I.9 1. What are the singular values (in descending order) of A − Ak ? Omit any zeros. We know we
can write A as
A = σ1 u1 v1T + . . . + σr ur vrT
and that we can write Ak as
Ak = σ1 u1 v1T + . . . + σk uk vkT
I.9 2. Find a closest rank-1 approximation to these matrices (L2 or Frobenius norm).
The closest rank-1 approximation of a matrix is A1 = σ1 u1 v1T . Therefore, we have the
following approximations:
3 0 0 1 0 0 3 0 0 1 0 0
A = 0 2 0 = 0 1 0 0 2 0 0 1 0
0 0 1 0 0 1 0 0 1 0 0 1
1 3 0 0
A1 = 3 0 1 0 0 = 0 0 0
0 0 0 0
0 3 3 0 1 0
0 1
A= =
2 0 0 2 0 1
1 0
1 0 3
A1 = 3 0 1 =
0 0 0
T
2 1 1 1 −1 3 0 1 1 −1
A= =√ √
1 2 2 1 1 0 1 2 1 1
3 1 3 1 1
A1 = 1 1 =
2 1 2 1 1
I.9 10. If A is a 2 × 2 matrix with σ1 ≥ σ2 > 0, find ||A−1 ||2 and ||A−1 ||2F .
A−1 will have the same eigenvectors, and the eigenvalues will be λ11 and 1
λ2 , the second of
which will be larger, producing σs of σ12 and σ11 in descending order.
1
||A−1 ||2 =
σ2
12 12
||A−1 ||2F = +
σ2 σ1
I.11 1. Show directly this fact about vector norms: ||v||22 ≤ ||v||1 ||v||∞
Let us designate vk to be ||v||∞ . We then have |vi | ≤ |vk | for all i. Therefore we have
|vi |2 ≤ |vi | ∗ |vk | for all i.
Looking at each term of ||v||22 and ||v||1 ||v||∞ , we can see that by our inequality above, every
term of ||v||22 is less than or equal to the corresponding term in ||v||1 ||v||∞ , so we have proved
our inequality overall.
√ √
I.11 3. Show that always ||v||2 ≤ n||v||∞ . Also prove ||v||1 ≤ n||v||2 by choosing a suitable vector
w and applying the Cauchy-Swartz inequality.
Let us again designate vk to be ||v||∞ . We then have that
n||v||2∞ = n|vk |2
We can see that for each term in ||v||22 , we have that |vi |2 ≤ |vk |2 for all i. We have n such
terms, so ||v||22 ≤ n||v||2∞ , which gives us our equality when we take the square root of each
side.
For the second half:
n n
X 1 X
||v||22 = |v1 |2 + |v2 |2 + . . . + |vn |2 = |vi |2
n
i=1 i=1
n
1 1 X 1 2
||v||21 = (|v1 | + |v2 | + . . . + |vn |)2 = √ |vi |
n n n
i=1
1
||v||21 ≤ ||v||22
n
||v||21 ≤ n||v||22
√
||v||1 ≤ n||v||2
T
A. Find the sample covariance matrix S = AA3 and find the line through (0, 0, 0) that is closest
to the four columns (from the SVD of A).
2 −1 1
1
S = −1 10 −3
3
1 −3 2
The line through (0, 0, 0) that is closest to the four columns of A is in the direction of u1 ,
T
which we find from the SVD to be u1 = −0.1371 0.9370 −0.3213 . The equation for the
line is then (in parametric and then nonparametric format):
n = −0.1371 0.9370 −0.3213 × −0.8716 −0.2683 −0.4103 = −0.4706 0.2238 0.8535
C. Comparing with Fig I.16, what shapes (with rough sketches) show these three sets in 3D?
D. If you blow up those 3 sets, where will they touch the plane v1 + 2v2 + 5v3 = 1? Your 3 points
will be the smallest solutions (in those 3 norms) to that linear equation.
1 1 1
The points are (0, 0, 0.2), ( 30 , 15 , 6 ), and (0.125, 0.125, 0.125) in the ||v||1 , ||v||2 , and ||v||∞
norms respectively.
18.065 Pset # 4 Elizabeth Chang-Davidson
II.2 2. Why do A and A+ have the same rank? If A is square, do A and A+ have the same
eigenvectors? What are the eigenvalues of A+ ?
We know that
r
X vi uT
A+ = V Σ+ U T = i
σi
i=1
A+ (Avi ) = A+ (λi vi )
A+ Avi = vi = λi A+ vi
1
vi = A+ vi
λi
Therefore, they have the same eigenvectors and the eigenvalues of A+ are the inverse of the
eigenvalues of A.
II.2 3. From A and A+ show that A+ A is correct and that (A+ A)2 = A+ A = projection.
X
A= σi ui viT
X vi uT
i
A+ =
σi
Because of orthonormality in ui and vi , we only need to worry about the terms with the same
index when multiplying, since everything else goes to zero. Also, uTi ui = ||ui || = 1.
X vi uT X X
A+ A = i
σi ui viT = vi uTi ui viT = vi viT
σi
Which matches what we are given. In the next half, we can use orthonormality again to get
viT vi = ||v|| = 1. X X
(A+ A)2 = vi viT vi viT = vi viT = A+ A
1
σ12
0 ... 0 σ 0 ... 0
1
0 ... 0
1 σ1
0 1 .. .. 1 ..
σ22
.
0 σ2 .=0 .
(Σ2r )−1 Σr
=
σ2
.
.. .. . .. . ..
. 0 ..
. . 0 . . 0
0 . . . 0 σr 1
0 ... 0 1
2
0 ... 0 σr
σr
(ΣT Σ)−1 ΣT is therefore equal to Σ+ , since this is the same result as if you had transposed
and then inverted the nonzero diagonals.
(d) Substitute A = U ΣV T into (AT A)−1 AT and identify that matrix as A+ .
= V (ΣT Σ)−1 ΣT U
= V Σ+ U
Which is indeed A+ .
" √1 #
1 1
q1 = a/||a|| = √ = 12
2 1 √
2
" #
√1
q2 = A2 /||A2 || = 2
− √12
√ " √1 # √ √
√1
1 4 ||a|| 2 2 2 2√2
= q1 q2 = √12 2
1 0 0 ||A2 || 2
− √12 0 2 2
" 1 1
#
√ √
Q= 2 2
√1 − √12
2
√ √
2 2√2
R=
0 2 2
II.2 11. If QT Q = I show that QT = Q+ . If A = QR for invertible R, show that QQT = AA+ .
QQT Q = Q ∗ I = Q
QT QQT = I ∗ QT = QT
(QQT )T = QQT
(QT Q)T = I T = I = QT Q
II.2 12. With b = (0, 8, 8, 20) at t = (0, 1, 3, 4), set up and solve the normal equations AT Ax̂ = AT b.
For the best straight line in Figure II.3a, find its four height pi and four errors ei . What is
the minimum squared error E = e21 + e22 + e23 + e24 ?
T 4 8 Ĉ 36
A Ax̂ = =
8 26 D̂ 112
−1
Ĉ 4 8 36 1 13 −4 36 1
x̂ = = = =
D̂ 8 26 112 20 −4 2 112 4
pi = Ĉ + D̂ti so our pi s will be 1, 5, 10, 13. This makes our ei s be 1, 3, 2, 7 (signs not given)
for a total minimum squared error of E1 = 63.
II.2 22. The averages of the ti and bi are t = 2 and b = 9. Verify that C + Dt = b. Explain!
C + Dt = 1 + 4 ∗ 2 = 9 = b
Intuitively, we are weighting all the errors equally, so it should go through the average point
and rotate from there to find the best line. More quantitatively, our first equation from
AT Ax̂ = AT b gives us X X
mĈ + D̂ ti = bi
If we divide by m all through, this gives us
1 X 1 X
Ĉ + D̂ ti = bi
m m
which we can see gives us this same relation.
Comp. Q. Create a random 6 by 10 matrix. (You can choose the definition of random) Find its SVD
and its pseudoinverse.
18.065 Pset # 6 Elizabeth Chang-Davidson
II.4 2 (for functions) Given a(x) > 0 find p(x) > 0 by analogy with problem 1, so that
Z 1 Z 1
(a(x))2
p(x)dx = 1 and dx is a minimum
0 0 p(x)
1 Z 1 Z 1
(a(x))2
Z
L(p(x), λ) = dx + λp(x)dx − λdx
0 p(x) 0 0
Z 1h
(a(x))2 i
L(p(x), λ) = dx + λp(x)dx − λ dx
0 p(x)
∂L (a(x))2
=0=− +λ
∂p(x) (p(x))2
(a(x))2
=λ
(p(x))2
a(x) a(x)
p(x) = √ =
λ C
II.4 3 Prove that n(a21 + . . . + a2n ) ≥ (a1 + . . . + an )2 . This is problem 1 with pi = 1/n. Back in
√
Problem Set I.11 you proved that ||a||1 ≤ n||a||2 .
1
We take pi = n and so
n n n
X a2 i
X X
V = = n(a2i ) = n a2i
pi
i=1 i=1 i=1
which is the left side of our equation.
PnIn problem 1, we saw that choosing pi correctly gives
2
the minimum value of V , which is ( i=1 ai ) . Since this is a minimum, the choice of any pi s
must yield a value for V greater than or equal to that value, which gives us
n
X Xn
n a2i ≥( ai )2
i=1 i=1
II.4 4 If M = 11T is the n × n matrix of 1s, prove that nI − M is positive semidefinite. Problem 3
was the energy test. For Problem 4, find the eigenvalues of nI − M .
The eigenvalues of nI − M are the solutions to the equation det(nI − M − λI) = 0.
II.4 6 The variance computed in equation 7 cannot be negative! Show this directly:
X
||AB||2F ≤ ( ||ak || ||bTk ||)2
Computational Problem
Take a matrix A where A is 0 below the diagonal and 1 above and on the diagonal and is of order
1000 by 1000. Compare the actual SVD of A to the randomized SVD of A reduced to Y = AG:
the first Gaussian random matrix G is 1000 by 10 and the second G is 1000 by 100.
After implementing this algorithm, I found that the Frobenius norm of the difference be-
tween the singular value matrix found by randomization and the actual singular value matrix
was 3.0774 × 10−12 when using G 1000 by 10 and 2.9577 × 10−12 when using G 1000 by 100, show-
ing that this is a very accurate method.
1. Y = AG
2. Factor Y = QR