Programming Assignment3
Programming Assignment3
Question 1
Consider the following points on the plane
{(−3, −3.5), (−2, −1), (−2, −0.5), (−1, 0.5), (0, 1), (0, 2.5), (1, 3), (1, 4.8), (2, 6), (3, 7), (3, 10), }
Write a function poly_fit2d(data, d) that returns the coefficients of the best fitting polynomial of degree d through the points in set of points named data together with the
residual norm.
A. Find the best linear fit to the points. What is the residual norm?
B. Find the best cubic fit to the points. What is the residual norm?
C. Use MatplotLib to plot the points and the two curves on the same axes.
In [ ]: # Your Code Starts Here
data = np.array([
(-3, -3.5), (-2, -1), (-2, -0.5), (-1, 0.5), (0, 1), (0, 2.5),
(1, 3), (1, 4.8), (2, 6), (3, 7), (3, 10)
])
def poly_fit2d(data,d):
x = data[:, 0]
y = data[:, 1]
A = np.zeros((len(x), d+1))
for i in range(len(x)):
for j in range(d + 1):
A[i, j] = x[i] ** (d - j)
residuals = np.zeros(len(x))
for i in range(len(x)):
fitted_value = sum(coeffs[j] * (x[i] ** (d - j)) for j in range(d + 1))
residuals[i] = y[i] - fitted_value
residual_norm = np.sqrt(sum(residual ** 2 for residual in residuals))
plt.figure(figsize=(10, 6))
plt.plot(data[:, 0], data[:, 1], 'o', label='Data Points', markersize=8)
plt.plot(x_values, linear_fit, label=f'Linear Fit (Residual Norm: {linear_residual_norm:.2f})')
plt.plot(x_values, cubic_fit, label=f'Cubic Fit (Residual Norm: {cubic_residual_norm:.2f})')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Best Linear and Cubic Fits to Data Points')
plt.legend()
plt.show()
Question 2
Study the Python notebook on SVD for image compression. Pick up your own high resolution color image to experiment with. What is the number of components to get a
k
decent reconstruction after decomposition. Display the relative error in the compression for the given value. Make a plot of relative errors for different values.
k k
return compressed_img
img_rgb = img/255.0
k_values = [10, 20, 50, 100, 150, 200]
errors = []
for k in k_values:
compressed_img = svd(img_rgb, k)
error = relative_error(img_rgb, compressed_img)
errors.append(error)
plt.imshow(img1)
plt.axis("off")
plt.show()
#when k = 200
In [79]: img2 = svd(img_rgb, 25)
img2 = np.clip(img2, 0, 1)
plt.imshow(img2)
plt.axis("off")
plt.show()
#when k =25
plt.imshow(img3)
plt.axis("off")
plt.show()
#when k =100
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.15547262666145525..1.12
94772082186946].
Question 3
In designing a movie recommendation system, one creates a rating matrixR ∈ R
m×n
m for users and movies. The entries gives the rating of a movie (1-5) in the -th
n Rij j
column by a user in the -th row. The rating matrix is generally quite sparse. The missing entries are replaced by as a starter. Answer the following question for the
i 0
Part B: Replace missing entries in by the average movie ratings column-wise. Create
R Rnorm by subtracting the user average for every row.
In [172… # Your Code Starts Here
R_complete = R.apply(lambda avg: avg.fillna(avg.mean()), axis = 0)
user_mean = R_complete.mean(axis = 1)
R_norm
Part D: By using the above low-rank representation, the predicted rating of movie by user could be given by
j i
1/2 1/2 T
Rij ≈ μi + [U(k) S ]i: [S V ]:j
(k) (k) (k)
S_k_sqrt = np.sqrt(S_k)
U_S_sqrt = U_k @ S_k_sqrt
S_sqrt_Vt = S_k_sqrt @ Vt_k
mu_i = user_mean[user_id]
user_features = U_S_sqrt[user_id, :]
movie_features = S_sqrt_Vt[:, item_id]
rating_approximation = mu_i + np.dot(user_features, movie_features)
return rating_approximation
user_id = 1
item_id = 4
predicted_rating = predict_rating(user_id, item_id)
print(f"Predicted rating of movie {item_id} by user {user_id}: {predicted_rating}")
Part E: Make a tabular comparison of the actual ratings and predicted ratings for some randomly selected users and movies in the following format.
User Id Movie Id Actual Rating Predicted Rating
1234 987 5 4.89
2345 876 0 2.7
In [ ]: # Your Code Starts Here
user_ids = []
movie_ids = []
actual_ratings = []
predicted_ratings = []
user_ids.append(user_id)
movie_ids.append(movie_id)
actual_ratings.append(actual_rating)
predicted_ratings.append(predicted_rating)
results = pd.DataFrame({
"User Id": user_ids,
"Movie Id": movie_ids,
"Actual Rating": actual_ratings,
"Predicted Rating": predicted_ratings
})
results.head(10)