0% found this document useful (0 votes)

10 views6 pages

Solns Recitation5-6 Fall24

The document outlines the structure of recitations for CMSC 25300/35300, focusing on the singular value decomposition (SVD) and its applications over two weeks. It includes detailed instructions for exercises related to the pseudoinverse, least-squares solutions, Frobenius matrix norm, principal component analysis (PCA), and principal components regression (PCR) using the MNIST dataset. The document also provides solutions and hints for visualizing SVD and plotting decision boundaries in PCA.

Uploaded by

Lauren Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

Solns Recitation5-6 Fall24

Uploaded by

Lauren Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CMSC 25300 / 35300

Recitations 5-6, Fall 2024

The singular value decomposition is a big topic, so we will be splitting the questions across two
weeks. In week 5 (10/31 and 11/1), we will discuss the SVD’s relationship to the pseudoinverse
and least-squares problems (questions 1-3). In week 6 (11/7 and 11/8), we will discuss the PCA
(questions 4-5), which is a specific method built on the SVD. We include it in this document in
case you would like to review PCA before it is covered in recitation section, as it may be helpful
for your geometric intuition of the SVD.

1. Go through the demo from lecture 7 on visualizing the SVD basis: https://canvas.uchicago.
edu/courses/59515/files/11965539?wrap=1. Note that the 3D plots’ aspect ratios may be
a bit off, so you can try inserting ax.set box aspect(aspect=(1, 1, 1)) into the plotting
code so that the columns of U from the SVD look more obviously orthogonal.

2. Pseudoinverse and the minimum-norm least squares solution. Prove that for the
case of X ∈ Rn×p , p >> n, if X has linearly-independent rows then the choice ŵ = V Σ† U T y
has the smallest ℓ2 norm of any w solving w = arg minw ∥Xw − y∥22 . Recall that Σ† =
−1 −1
ΣT ΣΣT , so ŵ = X T XX T y

SOLUTION: Proof: Consider another w which is a solution to y = Xw. Then

||w||22 = ||w − ŵ + ŵ||22

= ||(w − ŵ) + ŵ||22
= ||w − ŵ||22 + 2(w − ŵ)T ŵ + ||ŵ||22

What is (w − ŵ)T ŵ?

(w − ŵ)T ŵ = (w − ŵ)T X T (XX T )−1 y

= (X(w − ŵ))T (XX T )−1 y
= (Xw − X ŵ)T (XX T )−1 y
= (y − y)T (XX T )−1 y
=0

where the second-to-last substitution occurs because both w and ŵ are solutions to Xw = y.
Note that the perfect solution with zero residual exists because we assumed X has full row-
rank, so y lives in the column space of X. However, even when y does not live in the column
space of X, it is still the case that Xw = ŷ for all solutions w, where ŷ is the projection of
y onto the column space of X. This means this term is zero and thus

||w||22 = ||w − ŵ||22 + ||ŵ||22 (1)

and then for any w, ||w − ŵ||22 ≥ 0 so ||w||22 ≥ ||ŵ||22 .

Here is an alternate way of setting things up that may be slightly more straightforward:
consider some arbitrary weight vector w = ŵ + ∆w satisfying ∥Xw − y∥22 = ∥X ŵ − y∥22 .

1 of 6
Even though there are possible degeneracies in the value of w responsible for a given value of
Xw, the best value for Xw = ŷ = projcol(X) y is the projection of y into the column space
of X (in this specific case y = ŷ, though we do not necessarily need to rely on this). So,
X(w − ŵ) = X∆w = 0. With that, we can expand the squared norm of w to find

∥w∥22 = ∥ŵ + ∆w∥22

= ∥ŵ∥22 + ∥∆w∥22 + 2∆wT ŵ
= ∥ŵ∥22 + ∥∆w∥22 + 2∆wT X T (XX T )−1 y
= ∥ŵ∥22 + ∥∆w∥22 + 2(X∆w)T (XX T )−1 y
= ∥ŵ∥22 + ∥∆w∥22 since X∆w = 0
≥ ∥ŵ∥22 since ∥∆w∥22 ≥ 0.

In particular, the last inequality can be made into ∥w∥22 > ∥ŵ∥22 when ∆w ̸= 0.

3. Frobenius matrix norm. The Frobenius matrix norm is defined on any matrix A ∈ Rn×p
as
Xn,p
||A||F = ( A2ij )1/2 .
i,j

Here, we will show that the squared Frobenius norm of a matrix is equal to the sum of its
squared singular values, in other words
r
X
||A||2F = σi2 .
i

a) The trace of a square matrix is defined as the sum of the elements on its diagonal.
n
X
tr(X) = Xii .
i

Show that the trace is cyclic, that for matrices X ∈ Rn×p and Y ∈ Rp×n , tr(XY ) =
tr(Y X). (Note that X and Y need not be square, but XY and Y X are square).
SOLUTION:
  
x11 . . . x1n y11 . . . y1n
 . .. ..   ..
  .. .. 
 ..
XY =  . .  . . . 


xn1 . . . xnn yn1 . . . ynn

 
x11 y11 + x12 y21 + · · · + x1n yn1 ...

= .. 
 . 

... xn1 y1n + · · · + xnn ynn
Pn Pp Pp Pn
Therefore, tr(XY ) = i j xij yji = j i yji xij = tr(Y X).

2 of 6
An alternate explanation that does not rely on the index-based definition of matrix
multiplication is to take A = X T , B = Y so that we are interested in tr(AT B). We can
write the matrices in terms of their columns:

A = a1 a2 · · · an and B = b1 b2 · · · bn

so that
 T 
a1 b1 aT1 b2 · · · aT1 bn
 T
a2 b1 aT2 b2 · · · aT2 bn 

tr(AT B) = tr  . T T T
..  = a1 b1 + a2 b2 + · · · + an bn
 
. .. . .
 . . . . 
 
aTn b1 aTn b2 · · · aTn bn

which basically corresponds to the standard dot product of A and B if we had first
stacked each of the columns on top of each other to make each into a tall np × 1 vector.

b) Show that ||A||2F = tr(AA⊤ ) = tr(A⊤ A).

SOLUTION:
n,p
X
||A||2F = A2ij
i,j
Xn Xp
= Aij Aij
i j
n
XXp
= Aij A⊤
ji
i j

= tr(AA⊤ )

c) Let A = U ΣV ⊤ be the SVD of A. Use parts a and b to conclude that the squared
Frobenius norm of A is the sum of its squared singular values.

3 of 6
SOLUTION:
X
||A||2F = A2ij
i,j

= tr(A⊤ A)
= tr(V Σ⊤ U ⊤ U ΣV ⊤ )
= tr(V Σ⊤ ΣV ⊤ )
= tr(V ⊤ V Σ⊤ Σ)
= tr(Σ⊤ Σ)
 2 
σ1 . . . 0 ...
 .. ..  
 . .  
= tr 
 
.. 

 0 . . . σr2 . 
 

0 ... ... 0
r
X
= σi2
i

An alternate last step is to say tr(ΣT Σ) = ∥Σ∥2F = σ12 + · · · + σr2 .

4. Work through the basics of PCA with the students from Eldén 6.4.

4 of 6
5. PCR. In class and above, we looked at PCA and the closely related idea of using the SVD to
perform dimensionality reduction. When solving a regression problem, one approach you can
use is called Principal Components Regression (PCR). The idea here is that you are given
training samples (xi , yi ) where xi ∈ Rp for i = 1, . . . , n. We are going to find a reduced-
dimension version of each xi , denoted zi ∈ Rk for some k < p, and then we are going to solve
least squares on the (zi , yi ) pairs instead of on the (xi , yi ) pairs. You will experiment with
PCR on the MNIST dataset, a well known handwritten digit recognition dataset. A small
data subset is provided in mnist.mat. It contains two types of digits (1 and 4), and each
digit has 100 training images and 100 testing images. Each image xi has 28 × 28 pixels, which
we can flatten to form a p = 784-dimensional feature vector, and target yi ∈ {−1, 1}. Since
raw data is in a high dimensional space, we’re trying to use PCA to first compress data into
a lower-dimensional representation where the decision boundary is easy to draw.

Figure 1: Image samples in the training set

a) Let zi be the 2-dimensional representation of xi (found using the truncated SVD), and
find a weight w so that Zw ≈ y. What is ∥y − Zw∥22 (training loss)? Given test
samples x′i and yi′ , compute 2-dimensional zi′ ’s, and measure ∥y ′ − Z ′ w∥22 (test loss).
SOLUTION:

w = np.linalg.inv(z.T@z)@z.T@train_target[0]
y_pred = ((z@w) >= 0).astype(np.int32)
y_pred[y_pred <= 0] = -1
print("training␣accuracy:{}".
\format((y_pred == train_target[0]).mean()))
z_test = vh[:2, :] @ test_data.T
z_test = z_test.T
y_pred_test = ((z_test@w) >= 0).astype(np.int32)
y_pred_test[y_pred_test <= 0] = -1
print("testing␣accuracy:{}".
\format((y_pred_test == test_target[0]).mean()))

training accuracy: 0.95

testing accuracy: 0.96

b) Plot the classification decision boundary in 2d space. (Hint: the plot should look some-
thing like figure 2.)
SOLUTION:

5 of 6
Figure 2: Reference for what the plot should roughly look like. Here, the decision boundary is not required
to pass through the origin (i.e., we have added a bias term to the data matrix). Each point is one sample,
and the color is determined by its digit label. The axes are given by the first two principal component
directions, and the coordinates are given by the principal components.

point_num = 1000
zmin = z.min(axis = 0)
zmax= z.max(axis = 0)
point_x = (np.arange(point_num).astype(np.float32) /
point_num * (zmax[0] - zmin[0]) + zmin[0])
point_y = -1 * point_x * w[0] / w[1]
idx = np.logical_and(point_y < zmax[1], point_y >= zmin[1])
plt.scatter(z[:100,0],z[:100,1],c=’b’)
plt.scatter(z[100:,0],z[100:,1], c=’y’)
plt.scatter(point_x[idx],point_y[idx], s = 1, color= ’red’)
plt.savefig(’so4.pdf’)

6 of 6

Singular Value Decomposition
100% (1)
Singular Value Decomposition
24 pages
CDT 05 PCA SVD FoDS
No ratings yet
CDT 05 PCA SVD FoDS
34 pages
Math 5390 Chapter 3
No ratings yet
Math 5390 Chapter 3
32 pages
Final2008f-Solution SVM PCA HMM BN
No ratings yet
Final2008f-Solution SVM PCA HMM BN
18 pages
Matrix Norms
100% (1)
Matrix Norms
15 pages
3 - Low Rank Apprx For SVD
No ratings yet
3 - Low Rank Apprx For SVD
4 pages
03a1 MIT18 - 409F09 - Scribe21
No ratings yet
03a1 MIT18 - 409F09 - Scribe21
8 pages
Chapter1 - II 2024-2025
No ratings yet
Chapter1 - II 2024-2025
35 pages
Selected Linear Algebra For Machine Learning
No ratings yet
Selected Linear Algebra For Machine Learning
30 pages
Cis515 13 sl1 A
No ratings yet
Cis515 13 sl1 A
68 pages
Final2008f Solution
No ratings yet
Final2008f Solution
18 pages
Eigenvalues and Dynamical Systems-1
No ratings yet
Eigenvalues and Dynamical Systems-1
6 pages
Total Least Squares
No ratings yet
Total Least Squares
11 pages
Midtermsols Sp2010
No ratings yet
Midtermsols Sp2010
6 pages
Tutorial On Principal Component Analysis: Javier R. Movellan
No ratings yet
Tutorial On Principal Component Analysis: Javier R. Movellan
9 pages
Ee127-Fa2018-Mt1-El Ghaoui-Soln
No ratings yet
Ee127-Fa2018-Mt1-El Ghaoui-Soln
15 pages
Chapter1 - Numerical Analysis II 2023-2024
No ratings yet
Chapter1 - Numerical Analysis II 2023-2024
30 pages
15PCA
No ratings yet
15PCA
27 pages
Chapter - 3 Performance Surface and Search Method
No ratings yet
Chapter - 3 Performance Surface and Search Method
24 pages
Matrix Analysis (2nd) Solutions To Exercises
55% (11)
Matrix Analysis (2nd) Solutions To Exercises
81 pages
Fisher Linear Discriminant Analysis: Max Welling
No ratings yet
Fisher Linear Discriminant Analysis: Max Welling
4 pages
Jiang 2022 J. Phys. Conf. Ser. 2282 012004
No ratings yet
Jiang 2022 J. Phys. Conf. Ser. 2282 012004
9 pages
2nd To 1st Order
No ratings yet
2nd To 1st Order
29 pages
CS 532 Lecture Notes
No ratings yet
CS 532 Lecture Notes
25 pages
Chapter02 - 2024-2025 Num - Ana
No ratings yet
Chapter02 - 2024-2025 Num - Ana
23 pages
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
No ratings yet
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
321 pages
Final 4 Sem
No ratings yet
Final 4 Sem
29 pages
Chapter 6
No ratings yet
Chapter 6
7 pages
Iterative Linear System PDF
No ratings yet
Iterative Linear System PDF
13 pages
Caam 453 Numerical Analysis I: 6 October 2009 M. Embree, Rice University
No ratings yet
Caam 453 Numerical Analysis I: 6 October 2009 M. Embree, Rice University
4 pages
NA Lecture 15
No ratings yet
NA Lecture 15
37 pages
Huang MVC General
No ratings yet
Huang MVC General
27 pages
Orf523 S24 HW1
No ratings yet
Orf523 S24 HW1
5 pages
Direct Methods
No ratings yet
Direct Methods
79 pages
MA398 Script
No ratings yet
MA398 Script
115 pages
Math Data
No ratings yet
Math Data
117 pages
Fundamentals of Numerical Linear Algebra
No ratings yet
Fundamentals of Numerical Linear Algebra
265 pages
Applied and Computational Linear Algebra
100% (2)
Applied and Computational Linear Algebra
504 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Solution of Linear Algebraic Equations
No ratings yet
Solution of Linear Algebraic Equations
5 pages
Iterative Linear
No ratings yet
Iterative Linear
10 pages
Shifting Method
No ratings yet
Shifting Method
9 pages
The Singular Value Decomposition: Prof. Walter Gander ETH Zurich Decenber 12, 2008
No ratings yet
The Singular Value Decomposition: Prof. Walter Gander ETH Zurich Decenber 12, 2008
18 pages
Numerical Analisis 2015
No ratings yet
Numerical Analisis 2015
357 pages
Instructors Solutions Manual For Elementary Linear Algebra With Applications 9th Edition Ebook PDF
No ratings yet
Instructors Solutions Manual For Elementary Linear Algebra With Applications 9th Edition Ebook PDF
89 pages
Math 313 (Linear Algebra) Final Exam Practice KEY
No ratings yet
Math 313 (Linear Algebra) Final Exam Practice KEY
13 pages
Singular Value Decomposition
No ratings yet
Singular Value Decomposition
24 pages
Properties of The Singular Value Decomposition: Preliminary Definitions
No ratings yet
Properties of The Singular Value Decomposition: Preliminary Definitions
24 pages
NLA10
No ratings yet
NLA10
66 pages
Ecd 01
No ratings yet
Ecd 01
16 pages
Midterm Solutions: 1: Schur, Backsubstitution, Complexity (20 Points)
No ratings yet
Midterm Solutions: 1: Schur, Backsubstitution, Complexity (20 Points)
4 pages
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
0% (1)
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
4 pages
Cos323 s06 Lecture09 SVD
No ratings yet
Cos323 s06 Lecture09 SVD
24 pages
Numerical Linear Algebra: Course Material Networkmaths Graduate Programme Maynooth 2010
No ratings yet
Numerical Linear Algebra: Course Material Networkmaths Graduate Programme Maynooth 2010
66 pages
M.A. Naimark - Linear Differential Operators Part 1 - 1967
No ratings yet
M.A. Naimark - Linear Differential Operators Part 1 - 1967
169 pages
Cheat Sheet (Regular Font) PDF
No ratings yet
Cheat Sheet (Regular Font) PDF
4 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages
Assignment 01 - Solutions
No ratings yet
Assignment 01 - Solutions
10 pages
Lecture Notes On General Relativity Columbia University: January 16, 2013
No ratings yet
Lecture Notes On General Relativity Columbia University: January 16, 2013
139 pages
Chemical Engineering Thermodynamics: Recommended Books
No ratings yet
Chemical Engineering Thermodynamics: Recommended Books
47 pages
Hamiltonian PDF
No ratings yet
Hamiltonian PDF
64 pages
Lecture06 Physics 73
No ratings yet
Lecture06 Physics 73
117 pages
Student's: A Guide Vectors and Tensors
No ratings yet
Student's: A Guide Vectors and Tensors
3 pages
Relation Function Concepts
No ratings yet
Relation Function Concepts
3 pages
MathEd14 MODULE-6 Vectors
No ratings yet
MathEd14 MODULE-6 Vectors
19 pages
Mass Transfer: Deparment of Chemical Engineering
No ratings yet
Mass Transfer: Deparment of Chemical Engineering
5 pages
Joakim Gasoy Master Thesis
No ratings yet
Joakim Gasoy Master Thesis
37 pages
Solutions For Exercises in LMIs in Control Systems by Guang-Ren Duan & Hai-Hua Yu
No ratings yet
Solutions For Exercises in LMIs in Control Systems by Guang-Ren Duan & Hai-Hua Yu
6 pages
Welcome: CHEM F111: General Chemistry
No ratings yet
Welcome: CHEM F111: General Chemistry
27 pages
Lecture 4 - Quantum Electrodynamics (QED)
No ratings yet
Lecture 4 - Quantum Electrodynamics (QED)
14 pages
23-09-2023 - JR C 120 - Jee-Mains - WTM-14 - Key & Sol's
No ratings yet
23-09-2023 - JR C 120 - Jee-Mains - WTM-14 - Key & Sol's
20 pages
L6-Unsteady State (Transient) Heat Conduction
No ratings yet
L6-Unsteady State (Transient) Heat Conduction
11 pages
The Kinetic Molecular Theory Postulates
No ratings yet
The Kinetic Molecular Theory Postulates
1 page
Solomon A QP C1 Edexcel
No ratings yet
Solomon A QP C1 Edexcel
4 pages
Lecture 3 Fys4130
No ratings yet
Lecture 3 Fys4130
25 pages
F.Memoli - The Gromov-Wassertein Distance - A Brief Overview
No ratings yet
F.Memoli - The Gromov-Wassertein Distance - A Brief Overview
7 pages
WUC111 Final Exam
No ratings yet
WUC111 Final Exam
4 pages
Kothare - 1994 - A Unified Framework For The Study of Anti-Windup Designs
No ratings yet
Kothare - 1994 - A Unified Framework For The Study of Anti-Windup Designs
15 pages
Advanced Mathematics 2
No ratings yet
Advanced Mathematics 2
4 pages
RK Bali
No ratings yet
RK Bali
6 pages
(1991) (PRL) Duality-Symmetric String Theory and The Cosmological-Constant Problem
No ratings yet
(1991) (PRL) Duality-Symmetric String Theory and The Cosmological-Constant Problem
4 pages
Exercises - 03 - Algebra
No ratings yet
Exercises - 03 - Algebra
2 pages
CPT - Part B - Calculus
No ratings yet
CPT - Part B - Calculus
4 pages
TOS 4th 9
No ratings yet
TOS 4th 9
1 page
The Four Laws o F Black Hole Mechanics: Abstract
No ratings yet
The Four Laws o F Black Hole Mechanics: Abstract
2 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet

Solns Recitation5-6 Fall24

Uploaded by

Solns Recitation5-6 Fall24

Uploaded by

CMSC 25300 / 35300

Recitations 5-6, Fall 2024

SOLUTION: Proof: Consider another w which is a solution to y = Xw. Then

||w||22 = ||w − ŵ + ŵ||22

What is (w − ŵ)T ŵ?

(w − ŵ)T ŵ = (w − ŵ)T X T (XX T )−1 y

||w||22 = ||w − ŵ||22 + ||ŵ||22 (1)

and then for any w, ||w − ŵ||22 ≥ 0 so ||w||22 ≥ ||ŵ||22 .

∥w∥22 = ∥ŵ + ∆w∥22

xn1 . . . xnn yn1 . . . ynn

b) Show that ||A||2F = tr(AA⊤ ) = tr(A⊤ A).

An alternate last step is to say tr(ΣT Σ) = ∥Σ∥2F = σ12 + · · · + σr2 .

Figure 1: Image samples in the training set

training accuracy: 0.95

You might also like