0% found this document useful (0 votes)
13 views

Programming Assignment3

Uploaded by

vidhishaanand017
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Programming Assignment3

Uploaded by

vidhishaanand017
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Programming Assignment 3

In [2]: import numpy as np


import matplotlib.pyplot as plt

Question 1
Consider the following points on the plane
{(−3, −3.5), (−2, −1), (−2, −0.5), (−1, 0.5), (0, 1), (0, 2.5), (1, 3), (1, 4.8), (2, 6), (3, 7), (3, 10), }

Write a function poly_fit2d(data, d) that returns the coefficients of the best fitting polynomial of degree d through the points in set of points named data together with the
residual norm.
A. Find the best linear fit to the points. What is the residual norm?
B. Find the best cubic fit to the points. What is the residual norm?
C. Use MatplotLib to plot the points and the two curves on the same axes.
In [ ]: # Your Code Starts Here
data = np.array([
(-3, -3.5), (-2, -1), (-2, -0.5), (-1, 0.5), (0, 1), (0, 2.5),
(1, 3), (1, 4.8), (2, 6), (3, 7), (3, 10)
])
def poly_fit2d(data,d):
x = data[:, 0]
y = data[:, 1]

A = np.zeros((len(x), d+1))
for i in range(len(x)):
for j in range(d + 1):
A[i, j] = x[i] ** (d - j)

AtA = np.zeros((d + 1, d + 1))


Atb = np.zeros(d + 1)

for i in range(d + 1):


for j in range(d + 1):
for k in range(len(x)):
AtA[i, j] += A[k, i] * A[k, j]
for k in range(len(x)):
Atb[i] += A[k, i] * y[k]

coeffs = np.linalg.solve(AtA, Atb)

residuals = np.zeros(len(x))
for i in range(len(x)):
fitted_value = sum(coeffs[j] * (x[i] ** (d - j)) for j in range(d + 1))
residuals[i] = y[i] - fitted_value
residual_norm = np.sqrt(sum(residual ** 2 for residual in residuals))

return coeffs, residual_norm

linear_coeffs, linear_residual_norm = poly_fit2d(data, 1)


cubic_coeffs, cubic_residual_norm = poly_fit2d(data, 3)

x_values = np.linspace(-4, 4, 100)

linear_fit = [sum(linear_coeffs[j] * (x ** (1 - j)) for j in range(2)) for x in x_values]


cubic_fit = [sum(cubic_coeffs[j] * (x ** (3 - j)) for j in range(4)) for x in x_values]

plt.figure(figsize=(10, 6))
plt.plot(data[:, 0], data[:, 1], 'o', label='Data Points', markersize=8)
plt.plot(x_values, linear_fit, label=f'Linear Fit (Residual Norm: {linear_residual_norm:.2f})')
plt.plot(x_values, cubic_fit, label=f'Cubic Fit (Residual Norm: {cubic_residual_norm:.2f})')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Best Linear and Cubic Fits to Data Points')
plt.legend()
plt.show()
Question 2
Study the Python notebook on SVD for image compression. Pick up your own high resolution color image to experiment with. What is the number of components to get a
k

decent reconstruction after decomposition. Display the relative error in the compression for the given value. Make a plot of relative errors for different values.
k k

In [63]: # Your Code Starts Here


img = plt.imread("3a37503f0ced98a450811d3b6cd6ad57.jpg")

print("Original shape: ",img.shape)


plt.imshow(img)
plt.show()

Original shape: (1196, 1920, 3)


In [64]: def svd(image,k):
compressed = []
for n in range(3):
U, S, V = np.linalg.svd(image[..., n])
U_k = U[:, :k]
S_k = np.diag(S[:k])
V_k = V[:k, :]
compress = np.dot(U_k, np.dot(S_k, V_k))
compressed.append(compress)

compressed_img = np.stack(compressed, axis=-1)

return compressed_img

def relative_error(original, compressed):

return np.linalg.norm(original - compressed) / np.linalg.norm(original)

img_rgb = img/255.0
k_values = [10, 20, 50, 100, 150, 200]
errors = []
for k in k_values:
compressed_img = svd(img_rgb, k)
error = relative_error(img_rgb, compressed_img)
errors.append(error)

In [66]: plt.figure(figsize=(8, 6))


plt.plot(k_values, errors, marker='o')
plt.xlabel("Number of Components (k)")
plt.ylabel("Relative Error")
plt.title("Relative Error vs. Number of Components")
plt.grid(True)
plt.show()
In [77]: img1 = svd(img_rgb, 200)
img1 = np.clip(img1, 0, 1)

plt.imshow(img1)
plt.axis("off")
plt.show()
#when k = 200
In [79]: img2 = svd(img_rgb, 25)
img2 = np.clip(img2, 0, 1)

plt.imshow(img2)
plt.axis("off")
plt.show()
#when k =25

In [82]: img3 = svd(img_rgb, 100)


img2 = np.clip(img3, 0, 1)

plt.imshow(img3)
plt.axis("off")
plt.show()
#when k =100
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.15547262666145525..1.12
94772082186946].

Question 3
In designing a movie recommendation system, one creates a rating matrixR ∈ R
m×n
m for users and movies. The entries gives the rating of a movie (1-5) in the -th
n Rij j

column by a user in the -th row. The rating matrix is generally quite sparse. The missing entries are replaced by as a starter. Answer the following question for the
i 0

MovieLens dataset available below. MovieLens Dataset 100k


Read Sections 2 and 3 for application of SVD in recommendation systems of this paper by Sarvar et. al..
Part A: Load the dataset (u.data) using pandas and create the rating matrix . This will require dataframe pivoting.
R

In [170… # Your Code Starts Here


import pandas as pd
data = pd.read_table("u.data", header=None)
data = pd.DataFrame(data)
data.columns = ["user_id", "item_id", "rating", "timestamp"]
data.head()

R = data.pivot(index="user_id", columns="item_id", values="rating")


R.head()
Out[170… item_id 1 2 3 4 5 6 7 8 9 10 ... 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682
user_id
1 5.0 3.0 4.0 3.0 3.0 5.0 4.0 1.0 5.0 3.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 4.0 NaN NaN NaN NaN NaN NaN NaN NaN 2.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 4.0 3.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 rows × 1682 columns
In [ ]:

Part B: Replace missing entries in by the average movie ratings column-wise. Create
R Rnorm by subtracting the user average for every row.
In [172… # Your Code Starts Here
R_complete = R.apply(lambda avg: avg.fillna(avg.mean()), axis = 0)

user_mean = R_complete.mean(axis = 1)

R_norm = R_complete.apply(lambda row: row - user_mean[row.name], axis=1)

R_norm

Out[172… item_id 1 2 3 4 5 6 7 8 9 10 ... 1673 1674 1675 1676


user_id
1 1.910010 -0.089990 0.910010 -0.089990 -0.089990 1.910010 0.910010 -2.089990 1.910010 -0.089990 ... -0.089990 0.910010 -0.089990 -1.089990
2 0.919898 0.126005 -0.046768 0.470138 0.222224 0.496822 0.718368 0.915332 0.816220 -1.080102 ... -0.080102 0.919898 -0.080102 -1.08010
3 0.817893 0.145681 -0.027092 0.489814 0.241900 0.516498 0.738044 0.935008 0.835896 0.771035 ... -0.060426 0.939574 -0.060426 -1.060426
4 0.789061 0.116850 -0.055924 0.460982 0.213068 0.487666 0.709212 0.906176 0.807064 0.742203 ... -0.089257 0.910743 -0.089257 -1.08925
5 0.961597 -0.038403 -0.005070 0.511836 0.263922 0.538520 0.760066 0.957030 0.857918 0.793057 ... -0.038403 0.961597 -0.038403 -1.038403
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .
939 0.771265 0.099053 -0.073721 0.443185 0.195272 0.469869 0.691416 0.888380 1.892946 0.724407 ... -0.107054 0.892946 -0.107054 -1.107054
940 0.820431 0.148219 -0.024554 -1.057888 0.244438 0.519035 0.942112 1.942112 -0.057888 0.773573 ... -0.057888 0.942112 -0.057888 -1.057888
941 1.919249 0.125356 -0.047417 0.469488 0.221575 0.496172 0.919249 0.914683 0.815570 0.750710 ... -0.080751 0.919249 -0.080751 -1.08075
942 0.778626 0.106415 -0.066359 0.450547 0.202633 0.477231 0.698777 0.895741 0.796629 0.731768 ... -0.099692 0.900308 -0.099692 -1.09969
943 0.804237 1.925918 -0.040749 0.476157 0.228244 0.502841 0.724387 0.921352 -0.074082 0.757379 ... -0.074082 0.925918 -0.074082 -1.07408
943 rows × 1682 columns
Part C: Perform the SVD of Rnorm as Rnorm = U SV
T
. Perform low-rank approximation of Rnorm by using k = 100 .
In [173… # Your Code Starts Here
import numpy as np
U, S, Vt = np.linalg.svd(R_norm, full_matrices=False)
k = 100
U_k = U[:, :k]
S_k = np.diag(S[:k])
Vt_k = Vt[:k, :]
R_norm_approx = U_k @ S_k @ Vt_k
R_norm_approx.shape

Out[173… (943, 1682)

Part D: By using the above low-rank representation, the predicted rating of movie by user could be given by
j i

1/2 1/2 T
Rij ≈ μi + [U(k) S ]i: [S V ]:j
(k) (k) (k)

where is the average of ratings by user .


μi i

In [174… # Your Code Starts Here

S_k_sqrt = np.sqrt(S_k)
U_S_sqrt = U_k @ S_k_sqrt
S_sqrt_Vt = S_k_sqrt @ Vt_k

In [184… def predict_rating(user_id, item_id):

mu_i = user_mean[user_id]

user_features = U_S_sqrt[user_id, :]
movie_features = S_sqrt_Vt[:, item_id]
rating_approximation = mu_i + np.dot(user_features, movie_features)

return rating_approximation

user_id = 1
item_id = 4
predicted_rating = predict_rating(user_id, item_id)
print(f"Predicted rating of movie {item_id} by user {user_id}: {predicted_rating}")

Predicted rating of movie 4 by user 1: 3.333059473007486

Part E: Make a tabular comparison of the actual ratings and predicted ratings for some randomly selected users and movies in the following format.
User Id Movie Id Actual Rating Predicted Rating
1234 987 5 4.89
2345 876 0 2.7
In [ ]: # Your Code Starts Here

random_users = np.random.choice(R_complete.index, 10)


random_movies = np.random.choice(R_complete.columns, 10)

user_ids = []
movie_ids = []
actual_ratings = []
predicted_ratings = []

for user_id, movie_id in zip(random_users, random_movies):


actual_rating = R_complete.loc[user_id, movie_id]
predicted_rating = predict_rating(user_id, movie_id)

user_ids.append(user_id)
movie_ids.append(movie_id)
actual_ratings.append(actual_rating)
predicted_ratings.append(predicted_rating)

results = pd.DataFrame({
"User Id": user_ids,
"Movie Id": movie_ids,
"Actual Rating": actual_ratings,
"Predicted Rating": predicted_ratings
})

results.head(10)

Out[ ]: User Id Movie Id Actual Rating Predicted Rating


0 431 834 2.200000 3.862701
1 348 1220 3.333333 3.585011
2 674 1297 2.833333 3.660598
3 10 1078 2.772727 2.117172
4 69 190 4.137097 3.965525
5 410 1429 2.750000 2.322252
6 728 528 4.132231 3.956549
7 179 872 3.095238 2.858756
8 350 1546 1.000000 2.989504
9 810 1047 2.835821 2.984848
In [ ]:

You might also like