0% found this document useful (0 votes)
6 views78 pages

Chapter 4

Uploaded by

niraj.karki5497
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views78 pages

Chapter 4

Uploaded by

niraj.karki5497
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Dealing with

sparsity
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
Sparse matrices

BUILDING RECOMMENDATION ENGINES IN PYTHON


Sparse matrices

BUILDING RECOMMENDATION ENGINES IN PYTHON


Sparse matrices

BUILDING RECOMMENDATION ENGINES IN PYTHON


Measuring sparsity
print(book_rating_df)

title The Great Gatsby The Catcher in the Rye Fifty Shades of Grey
User
User_233 3.0 NaN NaN
User_651 NaN 5.0 4.0
User_965 4.0 3.0 NaN
... ... ... ...

BUILDING RECOMMENDATION ENGINES IN PYTHON


Measuring sparsity
number_of_empty = book_ratings_df.isnull().values.sum()
total_number = user_ratings_df.size
sparsity = number_of_empty/total_number
print(sparsity)

0.0114

BUILDING RECOMMENDATION ENGINES IN PYTHON


Why sparsity matters

BUILDING RECOMMENDATION ENGINES IN PYTHON


Why sparsity matters

BUILDING RECOMMENDATION ENGINES IN PYTHON


Why sparsity matters

BUILDING RECOMMENDATION ENGINES IN PYTHON


Why sparsity matters

BUILDING RECOMMENDATION ENGINES IN PYTHON


Measuring sparsity per column
user_ratings_df.notnull().sum()

The Pelican Brief 1


Snow Crash 1
The Great Gatsby 12
Fifty Shades of Grey 9
Leviathan 1
..

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix factorization

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix factorization

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix factorization

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix multiplication

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix multiplication

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix multiplication

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix multiplication

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix multiplication

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix multiplication
print(matrix_x)

[[4, 1],
[2, 2],
[3, 3]]

print(matrix_b)

[[1, 0, 4],
[0, 1, 6]]

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix multiplication
import numpy as np

dot_product = np.dot(matrix_x, matrix_b)


print(dot_product)

[[ 4 1 22]
[ 2 2 20]
[ 3 3 30]]

BUILDING RECOMMENDATION ENGINES IN PYTHON


Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Matrix factorization
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
Why this helps with sparse matrices

BUILDING RECOMMENDATION ENGINES IN PYTHON


Why this helps with sparse matrices

BUILDING RECOMMENDATION ENGINES IN PYTHON


Why this helps with sparse matrices

BUILDING RECOMMENDATION ENGINES IN PYTHON


What matrix factorization looks like

BUILDING RECOMMENDATION ENGINES IN PYTHON


What matrix factorization looks like

BUILDING RECOMMENDATION ENGINES IN PYTHON


What matrix factorization looks like

BUILDING RECOMMENDATION ENGINES IN PYTHON


What matrix factorization looks like

BUILDING RECOMMENDATION ENGINES IN PYTHON


Latent features

BUILDING RECOMMENDATION ENGINES IN PYTHON


Latent features

BUILDING RECOMMENDATION ENGINES IN PYTHON


Latent features

BUILDING RECOMMENDATION ENGINES IN PYTHON


Information loss

BUILDING RECOMMENDATION ENGINES IN PYTHON


Information loss

BUILDING RECOMMENDATION ENGINES IN PYTHON


Information loss

BUILDING RECOMMENDATION ENGINES IN PYTHON


Information loss

BUILDING RECOMMENDATION ENGINES IN PYTHON


Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Singular value
decomposition
(SVD)
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
What SVD does

BUILDING RECOMMENDATION ENGINES IN PYTHON


What SVD does

BUILDING RECOMMENDATION ENGINES IN PYTHON


What SVD does

BUILDING RECOMMENDATION ENGINES IN PYTHON


What SVD does

BUILDING RECOMMENDATION ENGINES IN PYTHON


Prepping our data
print(book_ratings_df.shape)

(220, 500)

avg_ratings = book_ratings_df.mean(axis=1)
print(avg_ratings)

array([[4.5 ],
[3.5],
[2.5],
[3.5],
...
[2.2]])

BUILDING RECOMMENDATION ENGINES IN PYTHON


Prepping our data
user_ratings_pivot_centered = user_ratings_df.sub(avg_ratings, axis=0)
user_ratings_df.fillna(0, inplace=True)
print(user_ratings_df)

The Great Gatsby The Catcher in the Rye Fifty Shades of Grey
User_233 0.0 0.0 0.0
User_651 0.0 0.5 -0.5
User_965 0.5 -0.5 0.0
... ... ... ...

BUILDING RECOMMENDATION ENGINES IN PYTHON


Applying SVD
from scipy.sparse.linalg import svds
U, sigma, Vt = svds(user_ratings_pivot_centered)

print(U.shape)

(610, 6)

print(Vt.shape)

(6, 1000)

BUILDING RECOMMENDATION ENGINES IN PYTHON


Applying SVD
print(sigma)

[3.0, 4.8, -12.6, -3.8, 8.2, 7.3]

sigma = np.diag(sigma)
print(sigma)

array([ 3.0 , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 4.8 , 0. , 0. , 0. , 0. ],
[ 0. , 0. , -12.6 , 0. , 0. , 0. ],
[ 0. , 0. , 0. , -3.8 , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 8.2 , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 7.3 ]),

BUILDING RECOMMENDATION ENGINES IN PYTHON


Getting the final matrix

BUILDING RECOMMENDATION ENGINES IN PYTHON


Getting the final matrix

BUILDING RECOMMENDATION ENGINES IN PYTHON


Getting the final matrix

BUILDING RECOMMENDATION ENGINES IN PYTHON


Getting the final matrix

BUILDING RECOMMENDATION ENGINES IN PYTHON


Calculating the product in Python
recalculated_ratings = np.dot(U, sigma)

BUILDING RECOMMENDATION ENGINES IN PYTHON


Calculating the product in Python
recalculated_ratings = np.dot(np.dot(U, sigma), Vt)
print(recalculated_ratings)

[[ 0.1 -0.9 -3.6. ... ]


[ -2.3 0.5 -0.5 ... ]
[ 0.5 -0.5 2.0 ... ]
[ ... ... ... ... ]]

BUILDING RECOMMENDATION ENGINES IN PYTHON


Add averages back
recalculated_ratings = recalculated_ratings + avg_ratings.values.reshape(-1, 1)
print(recalculated_ratings)

[[ 4.6 3.6 0.9 ... ]


[ 1.8 4.0 3.0 ... ]
[ 3.0 2.0 4.5 ... ]
[ ... ... ... ... ]]

print(book_ratings_df)

[[ 5.0 4.0 NA ... ]


[ NA 4.0 3.0 ... ]
[ 3.0 2.0 NA ... ]
[ ... ... ... ... ]]

BUILDING RECOMMENDATION ENGINES IN PYTHON


Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Validating your
predictions
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
Hold-out sets

BUILDING RECOMMENDATION ENGINES IN PYTHON


Hold-out sets

BUILDING RECOMMENDATION ENGINES IN PYTHON


Hold-out sets

BUILDING RECOMMENDATION ENGINES IN PYTHON


Hold-out sets

BUILDING RECOMMENDATION ENGINES IN PYTHON


Hold-out sets

BUILDING RECOMMENDATION ENGINES IN PYTHON


Hold-out sets

BUILDING RECOMMENDATION ENGINES IN PYTHON


Separating the hold-out set
actual_values = act_ratings_df.iloc[:20, :100].values
act_ratings_df.iloc[:20, :100] = np.nan

Generate predictions as before.

predicted_values = calc_pred_ratings_df.iloc[:20, :100].values

BUILDING RECOMMENDATION ENGINES IN PYTHON


Masking the hold-out set
mask = ~np.isnan(actual_values)

print(actual_values[mask])

[4. 4. 5. 3. 3. ...]

print(predicted_values[mask])

[3.76, 4.35, 4.95, 3.5869079 3.686337 ...]

BUILDING RECOMMENDATION ENGINES IN PYTHON


Introducing RMSE (root mean squared error)

BUILDING RECOMMENDATION ENGINES IN PYTHON


Introducing RMSE (root mean squared error)

BUILDING RECOMMENDATION ENGINES IN PYTHON


Introducing RMSE (root mean squared error)

BUILDING RECOMMENDATION ENGINES IN PYTHON


Introducing RMSE (root mean squared error)

BUILDING RECOMMENDATION ENGINES IN PYTHON


Introducing RMSE (root mean squared error)

BUILDING RECOMMENDATION ENGINES IN PYTHON


Introducing RMSE (root mean squared error)

BUILDING RECOMMENDATION ENGINES IN PYTHON


RMSE in Python
from sklearn.metrics import mean_squared_error

print(mean_squared_error(actual_values[mask],
predicted_values[mask],
squared=False))

3.6223997

BUILDING RECOMMENDATION ENGINES IN PYTHON


Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Wrap up
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
Non-personalized models

BUILDING RECOMMENDATION ENGINES IN PYTHON


Content-based models

BUILDING RECOMMENDATION ENGINES IN PYTHON


Collaborative filtering

BUILDING RECOMMENDATION ENGINES IN PYTHON


Matrix factorization

BUILDING RECOMMENDATION ENGINES IN PYTHON


Congratulations!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

You might also like