0% found this document useful (0 votes)

7 views38 pages

Recommendation Chapter2

The document provides an introduction to building content-based recommendation engines using Python, focusing on item attributes, vectorization, and Jaccard similarity for calculating item distances. It also discusses text-based similarities using TF-IDF and cosine similarity to recommend books based on user profiles. Practical examples and code snippets are included to illustrate the concepts.

Uploaded by

Lê Nguyễn Thùy Dương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views38 pages

Recommendation Chapter2

Uploaded by

Lê Nguyễn Thùy Dương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Intro to content-

based
recommendations
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
What are content-based recommendations?

BUILDING RECOMMENDATION ENGINES IN PYTHON

Items' attributes or characteristics

BUILDING RECOMMENDATION ENGINES IN PYTHON

Vectorizing your attributes
ITEM A ribute 1 A ribute 2 A ribute 3 A ribute 4
Item_001 0 1 1 0
Item_002 1 0 1 0
Item_003 0 1 0 1

BUILDING RECOMMENDATION ENGINES IN PYTHON

One to many relationships
Book Genre Book Adventure Fantasy Tragedy ...
The Hobbit Adventure The
1 1 0 ...
Hobbit
The Hobbit Fantasy
The
The Great Gatsby Tragedy Great 0 0 1 ...
Gatsby
... ...
... ... ... ... ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Crosstabulation
pd.crosstab( , )

BUILDING RECOMMENDATION ENGINES IN PYTHON

Crosstabulation
pd.crosstab(book_genre_df['Book'], book_genre_df['Genre'])

Book Adventure Fantasy Tragedy Social commentary

The Hobbit 1 1 0 0
The Great Gatsby 0 0 1 1
A Game of Thrones 0 1 0 0
Macbeth 0 0 1 0
... ... ... ... ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Making content-
based
recommendations
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
Introducing the Jaccard similarity
Jaccard similarity:

A∩B
J(A, B) =
A∪B

BUILDING RECOMMENDATION ENGINES IN PYTHON

Calculating Jaccard similarity between books
genres_array_df :

Book Adventure Fantasy Tragedy Social commentary ...

The Hobbit 1 1 0 0 ...
The Great Gatsby 0 0 1 1 ...
A Game of Thrones 0 1 0 0 ...
Macbeth 0 0 1 0 ...
... ... ... ... ... ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Calculating Jaccard similarity between books
from sklearn.metrics import jaccard_score

hobbit_row = book_genre_df.loc['The Hobbit']

GOT_row = book_genre_df.loc['A Game of Thrones']

print(jaccard_score(hobbit_row, GOT_row))

0.5

BUILDING RECOMMENDATION ENGINES IN PYTHON

Finding the distance between all items
from scipy.spatial.distance import pdist, squareform

jaccard_distances = pdist(book_genre_df.values, metric='jaccard')

print(jaccard_distances)

[1. 0.5 1. 1. 0.5 1. ]

square_jaccard_distances = squareform(jaccard_distances)
print(square_jaccard_distances)

[[0. 1. 0.5 1. ]
[1. 0. 1. 0.5]
[0.5 1. 0. 1. ]
[1. 0.5 1. 0. ]]

BUILDING RECOMMENDATION ENGINES IN PYTHON

Finding the distance between all items
print(square_jaccard_distances)

[[0. 1. 0.5 1. ]
[1. 0. 1. 0.5]
[0.5 1. 0. 1. ]
[1. 0.5 1. 0. ]]

jaccard_similarity_array = 1 - square_jaccard_distances
print(jaccard_similarity_array)

[[1. 0. 0.5 0. ]
[0. 1. 0. 0.5]
[0.5 0. 1. 0. ]
[0. 0.5 0. 1. ]]

BUILDING RECOMMENDATION ENGINES IN PYTHON

Creating a usable distance table
distance_df = pd.DataFrame(jaccard_similarity_array,
index=genres_array_df['Book'],
columns=genres_array_df['Book'])
distance_df.head()

The Hobbit The Great Gatsby A Game of Thrones Macbeth ...

The Hobbit 1.00 0.15 0.75 0.01 ...
The Great Gatsby 0.15 1.00 0.01 0.43 ...
...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Comparing books
print(distance_df['The Hobbit']['A Game of Thrones'])

0.75

print(distance_df['The Hobbit']['The Great Gatsby'])

0.15

BUILDING RECOMMENDATION ENGINES IN PYTHON

Finding the most similar books
print(distance_df['The Hobbit'].sort_values(ascending=False))

title
The Hobbit 1.00
The Two Towers 0.91
A Game of Thrones 0.50
...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
Text-based
similarities
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
Working without clear attributes

BUILDING RECOMMENDATION ENGINES IN PYTHON

Term frequency inverse document frequency
Count of word occurrences
Total words in document
TF-IDF = Number of docs word is in
log( Total number of docs )

BUILDING RECOMMENDATION ENGINES IN PYTHON

Our data
book_summary_df :

Book Description
The Hobbit "Bilbo Baggins lives a simple life with his fellow hobbits in the shire..."
The Great Gatsby "Set in Jazz Age New York, the novel tells the tragic story of Jay ..."
A Game of Thrones "15 years have passed since Robert's rebellion, with a nine-year-long ..."
Macbeth "A brave Sco ish general receives a prophecy from a trio of witches ..."
... ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Instantiate the vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

tfidfvec = TfidfVectorizer( , )

BUILDING RECOMMENDATION ENGINES IN PYTHON

Filtering the data
from sklearn.feature_extraction.text import TfidfVectorizer

tfidfvec = TfidfVectorizer(min_df=2, )

BUILDING RECOMMENDATION ENGINES IN PYTHON

Filtering the data
from sklearn.feature_extraction.text import TfidfVectorizer

tfidfvec = TfidfVectorizer(min_df=2, max_df=0.7)

BUILDING RECOMMENDATION ENGINES IN PYTHON

Vectorizing the data
vectorized_data = tfidfvec.fit_transform(book_summary_df['Descriptions'])
print(tfidfvec.get_feature_names)

['age', 'ancient', 'angry', 'brave', 'battle', 'fellow', 'game', 'general', ...]

print(vectorized_data.to_array())

[[0.21, 0.53, 0.41, 0.64, 0.01, 0.02, ...

[0.31, 0.00, 0.42, 0.03, 0.00, 0.73, ...
[..., ..., ..., ..., ..., ..., ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Formatting the data
tfidf_df = pd.DataFrame(vectorized_data.toarray(),
columns=tfidfvec.get_feature_names())
tfidf_df.index = book_summary_df['Book']
print(tfidf_df)

| 'age'| 'ancient'| 'angry'| 'brave'| 'battle'| 'fellow'|...

|------------------|------|----------|--------|--------|---------|---------|...
| The Hobbit | 0.21| 0.53| 0.41| 0.64| 0.01| 0.02|...
| The Great Gatsby | 0.31| 0.00| 0.42| 0.03| 0.00| 0.73|...
| A Game of Thrones| 0.61| 0.42| 0.77| 0.31| 0.83| 0.03|...
| ...| ...| ...| ...| ...| ...| ...|...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Cosine similarity
Cosine Distance:
A.B
cos(θ) =
∣∣A∣∣ ⋅ ∣∣B∣∣

BUILDING RECOMMENDATION ENGINES IN PYTHON

Cosine similarity
from sklearn.metrics.pairwise import cosine_similarity

# Find similarity between all items

cosine_similarity_array = cosine_similarity(tfidf_summary_df)

# Find similarity between two items

cosine_similarity(tfidf_df.loc['The Hobbit'].values.reshape(1, -1),
tfidf_df.loc['Macbeth'].values.reshape(1, -1))

BUILDING RECOMMENDATION ENGINES IN PYTHON

Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N
User profile
recommendations
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Rob O'Callaghan
Director of Data
Item to item recommendations

BUILDING RECOMMENDATION ENGINES IN PYTHON

User profiles
tfidf_summary_df :

Book Adventure Fantasy Tragedy Social commentary

The Hobbit 1 1 0 0
Macbeth 0 0 1 0
... ... ... ... ...

User Pro le:

User Pro le Adventure Fantasy Tragedy Social commentary

User_001 ??? ??? ??? ???

BUILDING RECOMMENDATION ENGINES IN PYTHON

Extract the user data
list_of_books_read = ['The Hobbit', 'Foundation', 'Nudge']
user_books = tfidf_summary_df.reindex(list_of_books_read)
print(user_books)

age ancient angry brave battle fellow ...

The Hobbit 0.21 0.53 0.41 0.64 0.01 0.02 ...
Foundation 0.31 0.90 0.42 0.33 0.64 0.04 ...
Nudge 0.61 0.01 0.45 0.31 0.12 0.74 ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Build the user profile
user_prof = user_movies.mean()
print(user_prof)

age 0.376667
ancient 0.480000
angry 0.426667
brave 0.256667
...

print(user_prof.values.reshape(1,-1))

[0.376667, .480000, 0.426667, 0.256667, ...]

BUILDING RECOMMENDATION ENGINES IN PYTHON

Finding recommendations for a user
# Create a subset of only the non read books
non_user_movies = tfidf_summary_df.drop(list_of_movies_seen, axis=0)

# Calculate the cosine similarity between all rows

user_prof_similarities = cosine_similarity(user_prof.values.reshape(1, -1),
non_user_movies)
# Wrap in a DataFrame for ease of use
user_prof_similarities_df = pd.DataFrame(user_prof_similarities.T,
index=tfidf_summary_df.index,
columns=["similarity_score"])

BUILDING RECOMMENDATION ENGINES IN PYTHON

Getting the top recommendations
sorted_similarity_df = user_prof_similarities.sort_values(by="similarity_score",
ascending=False)
print(sorted_similarity_df)

similarity_score
Title
The Two Towers 0.422488
Dune 0.363540
The Magicians Nephew 0.316075
... ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

Let's practice!
B U I L D I N G R E C O M M E N D AT I O N E N G I N E S I N P Y T H O N

Singapore Math Global Finals 2025
No ratings yet
Singapore Math Global Finals 2025
4 pages
Character Sheet - Bard v3.5
100% (2)
Character Sheet - Bard v3.5
2 pages
The Legend of Zelda RPG
70% (10)
The Legend of Zelda RPG
279 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
6 pages
Code2pdf 66714d844f78a
No ratings yet
Code2pdf 66714d844f78a
2 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
5 pages
KNN Reccomendation
No ratings yet
KNN Reccomendation
7 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
22 pages
Exp 2
No ratings yet
Exp 2
14 pages
DL Project
No ratings yet
DL Project
9 pages
PRJ Movie Recommendation Data Science..
No ratings yet
PRJ Movie Recommendation Data Science..
7 pages
Source Code Book Recommender System
No ratings yet
Source Code Book Recommender System
2 pages
Code - Recommender System
No ratings yet
Code - Recommender System
8 pages
Movie Recommendation Engine Using Artificial Intelligence
No ratings yet
Movie Recommendation Engine Using Artificial Intelligence
30 pages
Shopping Cart Items Recommendation PDF
No ratings yet
Shopping Cart Items Recommendation PDF
8 pages
Assignment 5zeerak
No ratings yet
Assignment 5zeerak
6 pages
Advanced Recommender Systems With Python
No ratings yet
Advanced Recommender Systems With Python
13 pages
Team 10 Movie Prediction
No ratings yet
Team 10 Movie Prediction
14 pages
Assignment 5
No ratings yet
Assignment 5
6 pages
Delhi Technological University Project Proposal: Book Recommendation System
No ratings yet
Delhi Technological University Project Proposal: Book Recommendation System
6 pages
Book Recommendation Project
No ratings yet
Book Recommendation Project
15 pages
Your Paragraph Text
No ratings yet
Your Paragraph Text
13 pages
A Personalized Content Discovery Book 3
No ratings yet
A Personalized Content Discovery Book 3
4 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
Recommendation Engines
No ratings yet
Recommendation Engines
17 pages
Recommender System Unit Ii
No ratings yet
Recommender System Unit Ii
14 pages
Dsbda Mini Project
No ratings yet
Dsbda Mini Project
14 pages
Bda Mini Project Part2
No ratings yet
Bda Mini Project Part2
24 pages
Book Recommendation System Synopsis Format-1
No ratings yet
Book Recommendation System Synopsis Format-1
4 pages
CCS360 Lab Record
No ratings yet
CCS360 Lab Record
28 pages
Survey On Cinematics Recommendation System
No ratings yet
Survey On Cinematics Recommendation System
10 pages
The Book Recommendation System-1
No ratings yet
The Book Recommendation System-1
18 pages
Report System Predaction
No ratings yet
Report System Predaction
5 pages
DL Mini Project
No ratings yet
DL Mini Project
9 pages
Social Suggest Team Report
No ratings yet
Social Suggest Team Report
52 pages
Rosp
No ratings yet
Rosp
17 pages
Movie Recommend
No ratings yet
Movie Recommend
2 pages
Mod 4
No ratings yet
Mod 4
6 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
9 pages
Personalize Movie Recommendation System CS 229 Project Final Writeup
0% (1)
Personalize Movie Recommendation System CS 229 Project Final Writeup
6 pages
Dsbda Mini Project Aissms CLG
No ratings yet
Dsbda Mini Project Aissms CLG
10 pages
Beattie Scda Finalproj-1
No ratings yet
Beattie Scda Finalproj-1
24 pages
Movie Embeddings: I, J 1 2 I N N Ij I J Ij 2
No ratings yet
Movie Embeddings: I, J 1 2 I N N Ij I J Ij 2
3 pages
F24 Proj4
No ratings yet
F24 Proj4
6 pages
NM (2) - Merged
No ratings yet
NM (2) - Merged
16 pages
Recomender System Challenges (Repaired)
No ratings yet
Recomender System Challenges (Repaired)
5 pages
Phase 2 Report
No ratings yet
Phase 2 Report
55 pages
Divya NM (1) - 2
No ratings yet
Divya NM (1) - 2
41 pages
Recommendation Engine 1657857468
No ratings yet
Recommendation Engine 1657857468
15 pages
Dr.B.C.Royengi Neeri Ngcollege: Academyofprofessi Onalcourses Durgapur
No ratings yet
Dr.B.C.Royengi Neeri Ngcollege: Academyofprofessi Onalcourses Durgapur
33 pages
Movie Recommondation System Using Machine Learning
No ratings yet
Movie Recommondation System Using Machine Learning
8 pages
Project Report MRS
No ratings yet
Project Report MRS
47 pages
NM (2) - Merged - Organized
No ratings yet
NM (2) - Merged - Organized
16 pages
1stpaper NanlConf Textbook Rec Om Mender
No ratings yet
1stpaper NanlConf Textbook Rec Om Mender
3 pages
DSBDA Mini Project
No ratings yet
DSBDA Mini Project
11 pages
Week 13
No ratings yet
Week 13
26 pages
Bavya
No ratings yet
Bavya
2 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
Chapter 9 - Recommendation Systems
No ratings yet
Chapter 9 - Recommendation Systems
12 pages
Pract 1 Measuring The Document Similarity in Python
No ratings yet
Pract 1 Measuring The Document Similarity in Python
6 pages
Worksheet04 - Recommender Systems
No ratings yet
Worksheet04 - Recommender Systems
2 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
Dragon Rage Reference
No ratings yet
Dragon Rage Reference
36 pages
Skills and Strategies & History or Background
No ratings yet
Skills and Strategies & History or Background
5 pages
Pariah
No ratings yet
Pariah
3 pages
Phòng Giáo Dục Và Đào Tạo Hưng Hà Kỳ Kiểm Tra Chọn Học Sinh Giỏi Môn: Tiếng Anh 6 Thời gian làm bài: 120 phút (Đề kiểm tra này gồm 4 trang)
No ratings yet
Phòng Giáo Dục Và Đào Tạo Hưng Hà Kỳ Kiểm Tra Chọn Học Sinh Giỏi Môn: Tiếng Anh 6 Thời gian làm bài: 120 phút (Đề kiểm tra này gồm 4 trang)
7 pages
Tapa Variations Contest I-Xviii Puzzles List: Links To Previous Contests
No ratings yet
Tapa Variations Contest I-Xviii Puzzles List: Links To Previous Contests
54 pages
Cover Sheet 4 8 1 2 0 2 6 5 8 0 / B
No ratings yet
Cover Sheet 4 8 1 2 0 2 6 5 8 0 / B
43 pages
Workout 13.5
No ratings yet
Workout 13.5
3 pages
Scapula Notes - 1st Sem - BPT PhysiotherapyGang
No ratings yet
Scapula Notes - 1st Sem - BPT PhysiotherapyGang
5 pages
Probability Practice Sheet PDF 1
No ratings yet
Probability Practice Sheet PDF 1
6 pages
Ocena Pozicije
No ratings yet
Ocena Pozicije
65 pages
Game1bb Nash Eqm Mixed-1
No ratings yet
Game1bb Nash Eqm Mixed-1
10 pages
Lastexception 63733964661
No ratings yet
Lastexception 63733964661
6 pages
SB Terminology
No ratings yet
SB Terminology
3 pages
Basketball Terminologies
100% (1)
Basketball Terminologies
4 pages
Na - U5 - Comparative - Superlative Worksheet
No ratings yet
Na - U5 - Comparative - Superlative Worksheet
3 pages
Iba Eng P.papers Ans
No ratings yet
Iba Eng P.papers Ans
3 pages
Fifa 22 Skill
100% (1)
Fifa 22 Skill
4 pages
Book of Native Games
No ratings yet
Book of Native Games
18 pages
Ezren The Wizard Character Sheet
No ratings yet
Ezren The Wizard Character Sheet
4 pages
Pangalawang Ina Daw Ang Teacher:e Bat Nyo Binagsak? May Ina Ba Na Gusto Bumagsak Ang Anak!
No ratings yet
Pangalawang Ina Daw Ang Teacher:e Bat Nyo Binagsak? May Ina Ba Na Gusto Bumagsak Ang Anak!
2 pages
Blank Char Sheet 3.5 v.4
No ratings yet
Blank Char Sheet 3.5 v.4
15 pages
Autoduel Quarterly 10 1
No ratings yet
Autoduel Quarterly 10 1
36 pages
____
No ratings yet
____
2 pages
40k Bases
100% (1)
40k Bases
9 pages
January 28 - February 4, 2014 Sports Reporter
No ratings yet
January 28 - February 4, 2014 Sports Reporter
8 pages
PUSOY-DOS Interractive Web Game
No ratings yet
PUSOY-DOS Interractive Web Game
4 pages
Jack Pine Tribune - February 25, 2013
No ratings yet
Jack Pine Tribune - February 25, 2013
20 pages

Recommendation Chapter2

Uploaded by

Recommendation Chapter2

Uploaded by

Intro to content-

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

Book Adventure Fantasy Tragedy Social commentary

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

Book Adventure Fantasy Tragedy Social commentary ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

hobbit_row = book_genre_df.loc['The Hobbit']

BUILDING RECOMMENDATION ENGINES IN PYTHON

jaccard_distances = pdist(book_genre_df.values, metric='jaccard')

[1. 0.5 1. 1. 0.5 1. ]

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

The Hobbit The Great Gatsby A Game of Thrones Macbeth ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

print(distance_df['The Hobbit']['The Great Gatsby'])

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

tfidfvec = TfidfVectorizer(min_df=2, max_df=0.7)

BUILDING RECOMMENDATION ENGINES IN PYTHON

['age', 'ancient', 'angry', 'brave', 'battle', 'fellow', 'game', 'general', ...]

[[0.21, 0.53, 0.41, 0.64, 0.01, 0.02, ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

| 'age'| 'ancient'| 'angry'| 'brave'| 'battle'| 'fellow'|...

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

# Find similarity between all items

# Find similarity between two items

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

Book Adventure Fantasy Tragedy Social commentary

User Pro le:

User Pro le Adventure Fantasy Tragedy Social commentary

BUILDING RECOMMENDATION ENGINES IN PYTHON

age ancient angry brave battle fellow ...

BUILDING RECOMMENDATION ENGINES IN PYTHON

[0.376667, .480000, 0.426667, 0.256667, ...]

BUILDING RECOMMENDATION ENGINES IN PYTHON

# Calculate the cosine similarity between all rows

BUILDING RECOMMENDATION ENGINES IN PYTHON

BUILDING RECOMMENDATION ENGINES IN PYTHON

You might also like