Week 1
Week 1
Consider a point and a line passing through the origin which is represented by the vector
Options
(a)
(b)
(c)
(d)
Answer
(b), (c)
Solution
We have . So, the projection is the zero vector. The residue is given by:
Common data for questions (2) to (5)
Statement
Consider a point and a line that passes through the origin . The point lies on the line.
Options
(a)
Statement-1
(b)
Statement-2
(c)
Statement-3
(d)
Statement-4
(e)
Answer
(e)
Solution
The projection of a point on a line is given by:
This is the expression when does not have unit length. In this problem, does not have unit
length. If , then the expression becomes:
Question-3 [1 point]
Statement
Find the length of the projection of on the line . Enter your answer correct to two decimal places.
Answer
Range:
Solution
The length of the projection is given by:
Question-4 [1 point]
Statement
Find the residue after projecting on the line .
Options
(a)
(b)
(c)
(d)
Answer
(b)
Solution
The residue is given by:
Question-5 [1 point]
Statement
Find the reconstruction error for this point. Enter your answer correct to two decimal places.
Answer
Range:
Solution
The reconstruction error is given by the square of the length of the residue. If the residue is , then:
Programming based solution. This is to be used only to verify the correctness of the calculations. The
added benefit is that you get used to NumPy .
1 import numpy as np
2
3 x = np.array([2, 5])
4 w = np.array([1, 1])
5 w = w / np.linalg.norm(w)
6
7 # Projection
8 proj = (x @ w) * w
9 print(f'Projection = {np.linalg.norm(proj)}')
10 # Residue
11 res = x - proj
12 print(f'Residue = {res}')
13 # Reconstruction error
14 recon = res @ res
15 print(f'Reconstruction error = {recon}')
Question-6 [0.5 point]
Statement
Consider the following images of points in 2D space. The red line segments in one of the images
represent the lengths of the residues after projecting the points on the line . Which image is it?
Image-1
Image-2
Options
(a)
Image-1
(b)
Image-2
Answer
(b)
Solution
The residue after the projection should be perpendicular to the line. Note that by projection we mean
the orthogonal projection of a point on a line. The projection of a point on a line is one of the proxies
for that point on the line, in fact it is the "best" possible proxy. But every proxy does not become a
projection. The projection of a point on a line is unique.
Question-7 [1 point]
Statement
Consider a dataset that has samples, where each sample belongs to . PCA is run on this
dataset and the top principal components are retained, the rest being discarded. If it takes one unit
of memory to store a real number, find the percentage decrease in storage space of the dataset by
moving to its compressed representation. Enter your answer correct to two decimal places; it should
lie in the range .
Answer
Range:
Solution
Original space =
Compressed space =
Options
(a)
(b)
(c)
(d)
Answer
(a)
Solution
Let us first arrange the data in the form of a matrix. Here, and :
Recall that the first principal component is the most important. Enter your answer correct to two
decimal places.
Answer
Range:
Solution
If is the eigenpair for , we have:
1 import numpy as np
2
3 X = np.array([[-3, 0],
4 [-2, 0],
5 [-2, 1],
6 [-1, 0],
7 [1, 0],
8 [2, 0],
9 [2, -1],
10 [3, 0]])
11
12 C = X.T @ X / X.shape[0]
13 print(f'Covariance matrix = {C}')
14 eigval, eigvec = np.linalg.eigh(C)
15 print(f'Variance = {eigval[-1]}')
A more detailed version. The variance of the dataset along the principal component is and is
given by:
So, the variance along the principal component is the largest eigenvalue of the covariance
matrix.
Question-10 [1 point]
Statement
Consider a dataset of points all of which lie in . The eigenvalues of the covariance matrix are
given below:
If we run the PCA algorithm on this dataset and retain the top- principal components, what is a good
choice of ? Use the heuristic that was discussed in the lectures.
Answer
2
Solution
The top- principal components should capture of the variance. Here is a code snippet to
answer this question:
Options
(a)
(b)
(c)
(d)
Answer
(b), (d)
Solution
Each vector is associated with a line perpendicular to it. This line divides the space into two halves.
The basic idea is to identify the sign of the half-planes into which the line perpendicular to the vector
divides the space.