Chapter 4
Chapter 4
sparsity
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Rob O'Callaghan
Director of Data
Sparse matrices
title The Great Gatsby The Catcher in the Rye Fifty Shades of Grey
User
User_233 3.0 NaN NaN
User_651 NaN 5.0 4.0
User_965 4.0 3.0 NaN
... ... ... ...
0.0114
[[4, 1],
[2, 2],
[3, 3]]
print(matrix_b)
[[1, 0, 4],
[0, 1, 6]]
[[ 4 1 22]
[ 2 2 20]
[ 3 3 30]]
Rob O'Callaghan
Director of Data
Why this helps with sparse matrices
Rob O'Callaghan
Director of Data
What SVD does
(220, 500)
avg_ratings = book_ratings_df.mean(axis=1)
print(avg_ratings)
array([[4.5 ],
[3.5],
[2.5],
[3.5],
...
[2.2]])
The Great Gatsby The Catcher in the Rye Fifty Shades of Grey
User_233 0.0 0.0 0.0
User_651 0.0 0.5 -0.5
User_965 0.5 -0.5 0.0
... ... ... ...
print(U.shape)
(610, 6)
print(Vt.shape)
(6, 1000)
sigma = np.diag(sigma)
print(sigma)
array([ 3.0 , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 4.8 , 0. , 0. , 0. , 0. ],
[ 0. , 0. , -12.6 , 0. , 0. , 0. ],
[ 0. , 0. , 0. , -3.8 , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 8.2 , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 7.3 ]),
print(book_ratings_df)
Rob O'Callaghan
Director of Data
Hold-out sets
print(actual_values[mask])
[4. 4. 5. 3. 3. ...]
print(predicted_values[mask])
print(mean_squared_error(actual_values[mask],
predicted_values[mask],
squared=False))
3.6223997
Rob O'Callaghan
Director of Data
Non-personalized models