5106 hw3
5106 hw3
(Thursday, September 8)
Due: Thursday, September 15
1. Write multilinreg program in Matlab to solve the following regression problem: find least-
square estimate
bˆ = arg min || y - Xb || 2
b
where
é5 0 9 3ù é20 ù
ê3 6 8 9úú ê17 ú
ê ê ú
X = ê4 4 9 6ú and y = ê32 ú
ê ú ê ú
ê0 3 1 8ú ê10 ú
êë2 8 2 3úû êë12 úû
2. In this problem we will find and display the principal direction of a 2D dataset using Matlab.
! 15 16 12 14 13 15 16 21 12 11 19 14 13 14 16 17 12 16 $
x =# &
" 13 11 13 12 9 14 12 16 9 8 15 13 15 13 12 16 11 9 %
The two rows are the experiment time costs (in minutes) of 18 students (first trial vs.
second trial). Plot this data on a 2D scatter plot using plot(x(1, :), x(2, :),’*′).
(b) Perform PCA on this data with sample size being 18 (i.e. each sample point is 2-
dimensional). Draw the first principal direction on this plot. Compute the variance in the
first principal component. What is the ratio of this variance over the total variance in the
original data?
3. PCA and Linear Regression: Consider a linear regression problem where y ∈ Rm and X ∈
Rm×n are given, and we have to solve for the coefficients b ∈ Rn such that ||y – Xb||2 is
minimized. In case n is too large to handle, we can use principal component analysis to reduce n
to d and then solve for the coefficients.
(a) For the data provided on the website, first compute a matrix X1 ∈ Rm×d as follows: (i)
Find the sample covariance matrix C ∈ Rn×n of the elements of X, (ii) Compute the
singular value decomposition (SVD) of C to obtain the orthogonal matrix U ∈ Rn×n , (iii)
Set U1 to be the first d columns of U, and (iv) define X1 = XU1 ∈ Rm×d.
(b) Now solve for the coefficients b̂1 by minimizing || y - X 1b1 || 2 .
(c) Compute the sum of squares of error, SSE = || y - X bˆ || 2 .
1 1
For the dataset provided m = 200, n = 100, and use d = 10. Use “load hw3_3_data” command to
load the data in Matlab to obtain X and y.
Compute and plot (use the command ‘plot’ in Matlab) the SSE for values of d ranging from 10 to
100 in the steps of 10. i.e. d = 10, 20, 30, …, 100.
6 (Optional): Use Python program to finish Problems 3. In particular, use the following python
commands to load the data in MAT format:
import numpy as np
from scipy import linalg, io
mat_contents = io.loadmat('hw3_3_data.mat')
X = np.mat(mat_contents['X'])
y = np.mat(mat_contents['y'])
To use subroutines in mlr.py, you need to import this file with the following command: