Recommendation System 1696663388
Recommendation System 1696663388
Today, we are facing a very rapid growth in the volume and structure of the Internet. Users are more often found to be lost
in this complex and messy environment of websites due to their complex structure and large amounts of information. So
personalizing and simplifying the web is more important than ever before for users and owners of e-commerce websites.
Machine learning-based recommendation systems are powerful engines using machine learning algorithms to segment
customers based on their user data and behavioral patterns (such as purchase and browsing history, likes, or reviews) and
target them with personalized product and content suggestions
Today, it plays a very important role in sites that have a lot of hits, users or products, in the fields of entertainment, content-
based, e-commerce, advertising and social networks, etc., such as Netflix, youtube, amazon,lastfm, imdb, Yahoo, Spotify
and so on.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
movies = pd.read_csv('../input/movies-recomandation/tmdb_5000_movies.csv')
movies.head()
[{"id": In the
[{"id": 28,
1463, 22nd
"name":
"name": century, a
0 237000000 "Action"}, http://www.avatarmovie.com/ 19995 en Avatar 150.437577
"culture paraplegic
{"id": 12,
clash"}, Marine is
"nam...
{"id":... di...
Captain
[{"id": 270, Barbossa,
[{"id": 12, Pirates of the
"name": long
"name": Caribbean:
1 300000000 http://disney.go.com/disneypictures/pirates/ 285 "ocean"}, en believed 139.082615
"Adventure"}, At World's
{"id": 726, to be
{"id": 14, "... End
"na... dead,
ha...
A cryptic
[{"id": 28, [{"id": 470, message
"name": "name": from
2 245000000 "Action"}, http://www.sonypictures.com/movies/spectre/ 206647 "spy"}, en Spectre Bond’s 107.376788
{"id": 12, {"id": 818, past
"nam... "name... sends him
o...
[{"id": 849,
[{"id": 28, Following
"name":
"name": the death
"dc The Dark
3 250000000 "Action"}, http://www.thedarkknightrises.com/ 49026 en of District 112.312950
comics"}, Knight Rises Pictu
{"id": 80, Attorney
{"id":
"nam... Harve...
853,...
John
[{"id": 28, [{"id": 818, Carter is a
"name": "name": war-
4 260000000 "Action"}, http://movies.disney.com/john-carter 49529 "based on en John Carter weary, 43.926995
{"id": 12, novel"}, former
"nam... {"id":... military
ca...
print(f'\033[94m')
movies.shape
(4803, 20)
credits = pd.read_csv('../input/movies-recomandation/tmdb_5000_credits.csv')
credits.head()
0 19995 Avatar [{"cast_id": 242, "character": "Jake Sully", "... [{"credit_id": "52fe48009251416c750aca23", "de...
1 285 Pirates of the Caribbean: At World's End [{"cast_id": 4, "character": "Captain Jack Spa... [{"credit_id": "52fe4232c3a36847f800b579", "de...
2 206647 Spectre [{"cast_id": 1, "character": "James Bond", "cr... [{"credit_id": "54805967c3a36829b5002c41", "de...
3 49026 The Dark Knight Rises [{"cast_id": 2, "character": "Bruce Wayne / Ba... [{"credit_id": "52fe4781c3a36847f81398c3", "de...
4 49529 John Carter [{"cast_id": 5, "character": "John Carter", "c... [{"credit_id": "52fe479ac3a36847f813eaa3", "de...
print(f'\033[94m')
credits.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 movie_id 4803 non-null int64
1 title 4803 non-null object
2 cast 4803 non-null object
3 crew 4803 non-null object
dtypes: int64(1), object(3)
memory usage: 150.2+ KB
movies.shape
(4803, 20)
print(f'\033[94m')
movies.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 budget 4803 non-null int64
1 genres 4803 non-null object
2 homepage 1712 non-null object
3 id 4803 non-null int64
4 keywords 4803 non-null object
5 original_language 4803 non-null object
6 original_title 4803 non-null object
7 overview 4800 non-null object
8 popularity 4803 non-null float64
9 production_companies 4803 non-null object
10 production_countries 4803 non-null object
11 release_date 4802 non-null object
12 revenue 4803 non-null int64
13 runtime 4801 non-null float64
14 spoken_languages 4803 non-null object
15 status 4803 non-null object
16 tagline 3959 non-null object
17 title 4803 non-null object
18 vote_average 4803 non-null float64
19 vote_count 4803 non-null int64
dtypes: float64(3), int64(4), object(13)
memory usage: 750.6+ KB
import missingno
missingno.bar(movies, color="dodgerblue", sort="ascending", figsize=(10,5), fontsize=12);
0 237000000
1 300000000
2 245000000
3 250000000
4 260000000
Name: budget, dtype: int64
movies = movies.merge(credits,on='title')
movies.head(1)
[{"id": In the
[{"id": 28,
1463, 22nd
"name": [{"name": "Ingenious
"name": century, a
0 237000000 "Action"}, http://www.avatarmovie.com/ 19995 en Avatar 150.437577 Film Partners", "id":
"culture paraplegic
{"id": 12, 289...
clash"}, Marine is
"nam...
{"id":... di...
1 rows × 23 columns
movies.shape
(4809, 23)
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]
movies.head(2)
In the 22nd century, [{"id": 28, "name": [{"id": 1463, [{"cast_id": 242, [{"credit_id":
0 19995 Avatar a paraplegic Marine "Action"}, {"id": 12, "name": "culture "character": "Jake "52fe48009251416c750aca23",
is di... "nam... clash"}, {"id":... Sully", "... "de...
Pirates of the Captain Barbossa, [{"id": 12, "name": [{"id": 270, "name": [{"cast_id": 4, [{"credit_id":
1 285 Caribbean: At long believed to be "Adventure"}, {"id": "ocean"}, {"id": 726, "character": "52fe4232c3a36847f800b579",
World's End dead, ha... 14, "... "na... "Captain Jack Spa... "de...
movies.isnull().sum()
movie_id 0
title 0
overview 3
genres 0
keywords 0
cast 0
crew 0
dtype: int64
movies.dropna(inplace=True)
movies.iloc[0]['genres']
'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "na
me": "Science Fiction"}]'
movies.iloc[0:3]['genres']
0 [{"id": 28, "name": "Action"}, {"id": 12, "nam...
1 [{"id": 12, "name": "Adventure"}, {"id": 14, "...
2 [{"id": 28, "name": "Action"}, {"id": 12, "nam...
Name: genres, dtype: object
movies.loc[0:3]['genres']
2 206647 Spectre A cryptic message from Bond’s past sends him o...
3 49026 The Dark Knight Rises Following the death of District Attorney Harve...
Recommender Systems simply put, are AI algorithms that utilize Big Data to suggest additional products to consumers
based on a variety of features. These recommendations can be based on factors such as past purchases, demographic
info, their search history, time spent reviewing the product or a like, dislike or a comment left behind by these
consumers.The idea of Recommender Systems is that if you can narrow down the pool of selection options for your
customers to a few meaningful and relevant choices, they are more likely to make a purchase now, as well as come back
for more down the road.
The Collaborative filtering method for recommender systems is a method that is solely based on the past interactions that
have been recorded between users and items, in order to produce new recommendations.
Collaborative Filtering tends to find what similar users would like and the recommendations to be provided and in order to
classify the users into clusters of similar types and recommend each user according to the preference of its cluster.
The main idea that governs the collaborative methods is that through past user-item interactions when processed through
the system, it becomes sufficient to detect similar users or similar items to make predictions based on these estimated
facts and insights.
The content-based approach uses additional information about users and/or items. This filtering method uses item features
to recommend other items similar to what the user likes and also based on their previous actions or explicit feedback.
The main idea of content-based methods is to try to build a model, based on the available “features”, that explain the
observed user-item interactions.
Such a model helps us in making new predictions for a user pretty easily, with just a look at the profile of this user and
based on its information, to determine relevant movies to suggest.
def convert(obj):
L = []
for i in obj:
L.append(i['name'])
return L
import ast
ast.literal_eval('[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id
def convert(obj):
L = []
for i in ast.literal_eval(obj):
L.append(i['name'])
return L
movies['genres'].apply(convert)
movies["genres"] = movies['genres'].apply(convert)
movies.head(1)
In the 22nd century, a [Action, Adventure, [{"id": 1463, "name": [{"cast_id": 242, [{"credit_id":
0 19995 Avatar paraplegic Marine is Fantasy, Science "culture clash"}, "character": "Jake "52fe48009251416c750aca23",
di... Fiction] {"id":... Sully", "... "de...
movies["keywords"] = movies['keywords'].apply(convert)
movies.head(1)
In the 22nd century, a [Action, Adventure, [culture clash, future, [{"cast_id": 242, [{"credit_id":
0 19995 Avatar paraplegic Marine is Fantasy, Science space war, space "character": "Jake "52fe48009251416c750aca23",
di... Fiction] colon... Sully", "... "de...
movies['cast'][0]
'[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731
, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_id": "52fe48009251416c
750ac9cb", "gender": 1, "id": 8691, "name": "Zoe Saldana", "order": 1}, {"cast_id": 25, "character": "Dr. Grace
Augustine", "credit_id": "52fe48009251416c750aca39", "gender": 1, "id": 10205, "name": "Sigourney Weaver", "ord
er": 2}, {"cast_id": 4, "character": "Col. Quaritch", "credit_id": "52fe48009251416c750ac9cf", "gender": 2, "id
": 32747, "name": "Stephen Lang", "order": 3}, {"cast_id": 5, "character": "Trudy Chacon", "credit_id": "52fe48
009251416c750ac9d3", "gender": 1, "id": 17647, "name": "Michelle Rodriguez", "order": 4}, {"cast_id": 8, "chara
cter": "Selfridge", "credit_id": "52fe48009251416c750ac9e1", "gender": 2, "id": 1771, "name": "Giovanni Ribisi"
, "order": 5}, {"cast_id": 7, "character": "Norm Spellman", "credit_id": "52fe48009251416c750ac9dd", "gender":
2, "id": 59231, "name": "Joel David Moore", "order": 6}, {"cast_id": 9, "character": "Moat", "credit_id": "52fe
48009251416c750ac9e5", "gender": 1, "id": 30485, "name": "CCH Pounder", "order": 7}, {"cast_id": 11, "character
": "Eytukan", "credit_id": "52fe48009251416c750ac9ed", "gender": 2, "id": 15853, "name": "Wes Studi", "order":
8}, {"cast_id": 10, "character": "Tsu\'Tey", "credit_id": "52fe48009251416c750ac9e9", "gender": 2, "id": 10964,
"name": "Laz Alonso", "order": 9}, {"cast_id": 12, "character": "Dr. Max Patel", "credit_id": "52fe48009251416c
750ac9f1", "gender": 2, "id": 95697, "name": "Dileep Rao", "order": 10}, {"cast_id": 13, "character": "Lyle Wai
nfleet", "credit_id": "52fe48009251416c750ac9f5", "gender": 2, "id": 98215, "name": "Matt Gerald", "order": 11}
, {"cast_id": 32, "character": "Private Fike", "credit_id": "52fe48009251416c750aca5b", "gender": 2, "id": 1541
53, "name": "Sean Anthony Moran", "order": 12}, {"cast_id": 33, "character": "Cryo Vault Med Tech", "credit_id"
: "52fe48009251416c750aca5f", "gender": 2, "id": 397312, "name": "Jason Whyte", "order": 13}, {"cast_id": 34, "
character": "Venture Star Crew Chief", "credit_id": "52fe48009251416c750aca63", "gender": 2, "id": 42317, "name
": "Scott Lawrence", "order": 14}, {"cast_id": 35, "character": "Lock Up Trooper", "credit_id": "52fe4800925141
6c750aca67", "gender": 2, "id": 986734, "name": "Kelly Kilgour", "order": 15}, {"cast_id": 36, "character": "Sh
uttle Pilot", "credit_id": "52fe48009251416c750aca6b", "gender": 0, "id": 1207227, "name": "James Patrick Pitt"
, "order": 16}, {"cast_id": 37, "character": "Shuttle Co-Pilot", "credit_id": "52fe48009251416c750aca6f", "gend
er": 0, "id": 1180936, "name": "Sean Patrick Murphy", "order": 17}, {"cast_id": 38, "character": "Shuttle Crew
Chief", "credit_id": "52fe48009251416c750aca73", "gender": 2, "id": 1019578, "name": "Peter Dillon", "order": 1
8}, {"cast_id": 39, "character": "Tractor Operator / Troupe", "credit_id": "52fe48009251416c750aca77", "gender"
: 0, "id": 91443, "name": "Kevin Dorman", "order": 19}, {"cast_id": 40, "character": "Dragon Gunship Pilot", "c
redit_id": "52fe48009251416c750aca7b", "gender": 2, "id": 173391, "name": "Kelson Henderson", "order": 20}, {"c
ast_id": 41, "character": "Dragon Gunship Gunner", "credit_id": "52fe48009251416c750aca7f", "gender": 0, "id":
1207236, "name": "David Van Horn", "order": 21}, {"cast_id": 42, "character": "Dragon Gunship Navigator", "cred
it_id": "52fe48009251416c750aca83", "gender": 0, "id": 215913, "name": "Jacob Tomuri", "order": 22}, {"cast_id"
: 43, "character": "Suit #1", "credit_id": "52fe48009251416c750aca87", "gender": 0, "id": 143206, "name": "Mich
ael Blain-Rozgay", "order": 23}, {"cast_id": 44, "character": "Suit #2", "credit_id": "52fe48009251416c750aca8b
", "gender": 2, "id": 169676, "name": "Jon Curry", "order": 24}, {"cast_id": 46, "character": "Ambient Room Tec
h", "credit_id": "52fe48009251416c750aca8f", "gender": 0, "id": 1048610, "name": "Luke Hawker", "order": 25}, {
"cast_id": 47, "character": "Ambient Room Tech / Troupe", "credit_id": "52fe48009251416c750aca93", "gender": 0,
"id": 42288, "name": "Woody Schultz", "order": 26}, {"cast_id": 48, "character": "Horse Clan Leader", "credit_i
d": "52fe48009251416c750aca97", "gender": 2, "id": 68278, "name": "Peter Mensah", "order": 27}, {"cast_id": 49,
"character": "Link Room Tech", "credit_id": "52fe48009251416c750aca9b", "gender": 0, "id": 1207247, "name": "So
nia Yee", "order": 28}, {"cast_id": 50, "character": "Basketball Avatar / Troupe", "credit_id": "52fe4800925141
6c750aca9f", "gender": 1, "id": 1207248, "name": "Jahnel Curfman", "order": 29}, {"cast_id": 51, "character": "
Basketball Avatar", "credit_id": "52fe48009251416c750acaa3", "gender": 0, "id": 89714, "name": "Ilram Choi", "o
rder": 30}, {"cast_id": 52, "character": "Na\'vi Child", "credit_id": "52fe48009251416c750acaa7", "gender": 0,
"id": 1207249, "name": "Kyla Warren", "order": 31}, {"cast_id": 53, "character": "Troupe", "credit_id": "52fe48
009251416c750acaab", "gender": 0, "id": 1207250, "name": "Lisa Roumain", "order": 32}, {"cast_id": 54, "charact
er": "Troupe", "credit_id": "52fe48009251416c750acaaf", "gender": 1, "id": 83105, "name": "Debra Wilson", "orde
r": 33}, {"cast_id": 57, "character": "Troupe", "credit_id": "52fe48009251416c750acabb", "gender": 0, "id": 120
7253, "name": "Chris Mala", "order": 34}, {"cast_id": 55, "character": "Troupe", "credit_id": "52fe48009251416c
750acab3", "gender": 0, "id": 1207251, "name": "Taylor Kibby", "order": 35}, {"cast_id": 56, "character": "Trou
pe", "credit_id": "52fe48009251416c750acab7", "gender": 0, "id": 1207252, "name": "Jodie Landau", "order": 36},
{"cast_id": 58, "character": "Troupe", "credit_id": "52fe48009251416c750acabf", "gender": 0, "id": 1207254, "na
me": "Julie Lamm", "order": 37}, {"cast_id": 59, "character": "Troupe", "credit_id": "52fe48009251416c750acac3"
, "gender": 0, "id": 1207257, "name": "Cullen B. Madden", "order": 38}, {"cast_id": 60, "character": "Troupe",
"credit_id": "52fe48009251416c750acac7", "gender": 0, "id": 1207259, "name": "Joseph Brady Madden", "order": 39
}, {"cast_id": 61, "character": "Troupe", "credit_id": "52fe48009251416c750acacb", "gender": 0, "id": 1207262,
"name": "Frankie Torres", "order": 40}, {"cast_id": 62, "character": "Troupe", "credit_id": "52fe48009251416c75
0acacf", "gender": 1, "id": 1158600, "name": "Austin Wilson", "order": 41}, {"cast_id": 63, "character": "Troup
e", "credit_id": "52fe48019251416c750acad3", "gender": 1, "id": 983705, "name": "Sara Wilson", "order": 42}, {"
cast_id": 64, "character": "Troupe", "credit_id": "52fe48019251416c750acad7", "gender": 0, "id": 1207263, "name
": "Tamica Washington-Miller", "order": 43}, {"cast_id": 65, "character": "Op Center Staff", "credit_id": "52fe
48019251416c750acadb", "gender": 1, "id": 1145098, "name": "Lucy Briant", "order": 44}, {"cast_id": 66, "charac
ter": "Op Center Staff", "credit_id": "52fe48019251416c750acadf", "gender": 2, "id": 33305, "name": "Nathan Mei
ster", "order": 45}, {"cast_id": 67, "character": "Op Center Staff", "credit_id": "52fe48019251416c750acae3", "
gender": 0, "id": 1207264, "name": "Gerry Blair", "order": 46}, {"cast_id": 68, "character": "Op Center Staff",
"credit_id": "52fe48019251416c750acae7", "gender": 2, "id": 33311, "name": "Matthew Chamberlain", "order": 47},
{"cast_id": 69, "character": "Op Center Staff", "credit_id": "52fe48019251416c750acaeb", "gender": 0, "id": 120
7265, "name": "Paul Yates", "order": 48}, {"cast_id": 70, "character": "Op Center Duty Officer", "credit_id": "
52fe48019251416c750acaef", "gender": 0, "id": 1207266, "name": "Wray Wilson", "order": 49}, {"cast_id": 71, "ch
aracter": "Op Center Staff", "credit_id": "52fe48019251416c750acaf3", "gender": 2, "id": 54492, "name": "James
Gaylyn", "order": 50}, {"cast_id": 72, "character": "Dancer", "credit_id": "52fe48019251416c750acaf7", "gender"
: 0, "id": 1207267, "name": "Melvin Leno Clark III", "order": 51}, {"cast_id": 73, "character": "Dancer", "cred
it_id": "52fe48019251416c750acafb", "gender": 0, "id": 1207268, "name": "Carvon Futrell", "order": 52}, {"cast_
id": 74, "character": "Dancer", "credit_id": "52fe48019251416c750acaff", "gender": 0, "id": 1207269, "name": "B
randon Jelkes", "order": 53}, {"cast_id": 75, "character": "Dancer", "credit_id": "52fe48019251416c750acb03", "
gender": 0, "id": 1207270, "name": "Micah Moch", "order": 54}, {"cast_id": 76, "character": "Dancer", "credit_i
d": "52fe48019251416c750acb07", "gender": 0, "id": 1207271, "name": "Hanniyah Muhammad", "order": 55}, {"cast_i
d": 77, "character": "Dancer", "credit_id": "52fe48019251416c750acb0b", "gender": 0, "id": 1207272, "name": "Ch
ristopher Nolen", "order": 56}, {"cast_id": 78, "character": "Dancer", "credit_id": "52fe48019251416c750acb0f",
"gender": 0, "id": 1207273, "name": "Christa Oliver", "order": 57}, {"cast_id": 79, "character": "Dancer", "cre
dit_id": "52fe48019251416c750acb13", "gender": 0, "id": 1207274, "name": "April Marie Thomas", "order": 58}, {"
cast_id": 80, "character": "Dancer", "credit_id": "52fe48019251416c750acb17", "gender": 0, "id": 1207275, "name
": "Bravita A. Threatt", "order": 59}, {"cast_id": 81, "character": "Mining Chief (uncredited)", "credit_id": "
52fe48019251416c750acb1b", "gender": 0, "id": 1207276, "name": "Colin Bleasdale", "order": 60}, {"cast_id": 82,
"character": "Veteran Miner (uncredited)", "credit_id": "52fe48019251416c750acb1f", "gender": 0, "id": 107969,
"name": "Mike Bodnar", "order": 61}, {"cast_id": 83, "character": "Richard (uncredited)", "credit_id": "52fe480
19251416c750acb23", "gender": 0, "id": 1207278, "name": "Matt Clayton", "order": 62}, {"cast_id": 84, "characte
r": "Nav\'i (uncredited)", "credit_id": "52fe48019251416c750acb27", "gender": 1, "id": 147898, "name": "Nicole
Dionne", "order": 63}, {"cast_id": 85, "character": "Trooper (uncredited)", "credit_id": "52fe48019251416c750ac
b2b", "gender": 0, "id": 1207280, "name": "Jamie Harrison", "order": 64}, {"cast_id": 86, "character": "Trooper
(uncredited)", "credit_id": "52fe48019251416c750acb2f", "gender": 0, "id": 1207281, "name": "Allan Henry", "ord
er": 65}, {"cast_id": 87, "character": "Ground Technician (uncredited)", "credit_id": "52fe48019251416c750acb33
", "gender": 2, "id": 1207282, "name": "Anthony Ingruber", "order": 66}, {"cast_id": 88, "character": "Flight C
rew Mechanic (uncredited)", "credit_id": "52fe48019251416c750acb37", "gender": 0, "id": 1207283, "name": "Ashle
y Jeffery", "order": 67}, {"cast_id": 14, "character": "Samson Pilot", "credit_id": "52fe48009251416c750ac9f9",
"gender": 0, "id": 98216, "name": "Dean Knowsley", "order": 68}, {"cast_id": 89, "character": "Trooper (uncredi
ted)", "credit_id": "52fe48019251416c750acb3b", "gender": 0, "id": 1201399, "name": "Joseph Mika-Hunt", "order"
: 69}, {"cast_id": 90, "character": "Banshee (uncredited)", "credit_id": "52fe48019251416c750acb3f", "gender":
0, "id": 236696, "name": "Terry Notary", "order": 70}, {"cast_id": 91, "character": "Soldier (uncredited)", "cr
edit_id": "52fe48019251416c750acb43", "gender": 0, "id": 1207287, "name": "Kai Pantano", "order": 71}, {"cast_i
d": 92, "character": "Blast Technician (uncredited)", "credit_id": "52fe48019251416c750acb47", "gender": 0, "id
": 1207288, "name": "Logan Pithyou", "order": 72}, {"cast_id": 93, "character": "Vindum Raah (uncredited)", "cr
edit_id": "52fe48019251416c750acb4b", "gender": 0, "id": 1207289, "name": "Stuart Pollock", "order": 73}, {"cas
t_id": 94, "character": "Hero (uncredited)", "credit_id": "52fe48019251416c750acb4f", "gender": 0, "id": 584868
, "name": "Raja", "order": 74}, {"cast_id": 95, "character": "Ops Centreworker (uncredited)", "credit_id": "52f
e48019251416c750acb53", "gender": 0, "id": 1207290, "name": "Gareth Ruck", "order": 75}, {"cast_id": 96, "chara
cter": "Engineer (uncredited)", "credit_id": "52fe48019251416c750acb57", "gender": 0, "id": 1062463, "name": "R
hian Sheehan", "order": 76}, {"cast_id": 97, "character": "Col. Quaritch\'s Mech Suit (uncredited)", "credit_id
": "52fe48019251416c750acb5b", "gender": 0, "id": 60656, "name": "T. J. Storm", "order": 77}, {"cast_id": 98, "
character": "Female Marine (uncredited)", "credit_id": "52fe48019251416c750acb5f", "gender": 0, "id": 1207291,
"name": "Jodie Taylor", "order": 78}, {"cast_id": 99, "character": "Ikran Clan Leader (uncredited)", "credit_id
": "52fe48019251416c750acb63", "gender": 1, "id": 1186027, "name": "Alicia Vela-Bailey", "order": 79}, {"cast_i
d": 100, "character": "Geologist (uncredited)", "credit_id": "52fe48019251416c750acb67", "gender": 0, "id": 120
7292, "name": "Richard Whiteside", "order": 80}, {"cast_id": 101, "character": "Na\'vi (uncredited)", "credit_i
d": "52fe48019251416c750acb6b", "gender": 0, "id": 103259, "name": "Nikie Zambo", "order": 81}, {"cast_id": 102
, "character": "Ambient Room Tech / Troupe", "credit_id": "52fe48019251416c750acb6f", "gender": 1, "id": 42286,
"name": "Julene Renee", "order": 82}]'
def convert3(obj):
def convert3(obj):
L = []
counter = 0
for i in ast.literal_eval(obj):
if counter !=3:
L.append(i['name'])
counter += 1
else:
break
return L
movies['cast'].apply(convert3)
movies['cast'] = movies['cast'].apply(convert3)
movies.head(1)
In the 22nd century, a [Action, Adventure, [culture clash, future, [Sam Worthington, Zoe [{"credit_id":
0 19995 Avatar paraplegic Marine is Fantasy, Science space war, space Saldana, Sigourney "52fe48009251416c750aca23",
di... Fiction] colon... Weaver] "de...
movies['crew'][0]
def fetch_director(obj):
L = []
for i in ast.literal_eval(obj):
if i ['job'] == "Director":
L.append(i['name'])
return L
movies['crew'].apply(fetch_director)
0 [James Cameron]
1 [Gore Verbinski]
2 [Sam Mendes]
3 [Christopher Nolan]
4 [Andrew Stanton]
...
4804 [Robert Rodriguez]
4805 [Edward Burns]
4806 [Scott Smith]
4807 [Daniel Hsia]
4808 [Brian Herzlinger, Jon Gunn, Brett Winn]
Name: crew, Length: 4806, dtype: object
movies['crew'] = movies['crew'].apply(fetch_director)
movies.head(1)
In the 22nd century, a [Action, Adventure, [culture clash, future, space [Sam Worthington, Zoe [James
0 19995 Avatar
paraplegic Marine is di... Fantasy, Science Fiction] war, space colon... Saldana, Sigourney Weaver] Cameron]
movies['overview'][0]
'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes to
rn between following orders and protecting an alien civilization.'
movies['overview'].apply(lambda x:x.split())
movies.head(1)
[In, the, 22nd, century,, a, [Action, Adventure, [culture clash, future, space [Sam Worthington, Zoe [James
0 19995 Avatar
paraplegic, Marin... Fantasy, Science Fiction] war, space colon... Saldana, Sigourney Weaver] Cameron]
# Sam Worthington
# Convert
# SamWorthington
movies.head()
movie_id title overview genres keywords cast crew
[ChristianBale,
The Dark Knight [Following, the, death, [Action, Crime, [dccomics, crimefighter,
3 49026 MichaelCaine, [ChristopherNolan]
Rises of, District, Attorney... Drama, Thriller] terrorist, secretiden...
GaryOldman]
movies.head(1)
[In, the,
[In, the, 22nd, [Action, Adventure, [cultureclash, future, [SamWorthington, 22nd,
0 19995 Avatar century,, a, Fantasy, spacewar, ZoeSaldana, [JamesCameron] century,, a,
paraplegic, Marin... ScienceFiction] spacecolony, ... SigourneyWeaver] paraplegic,
Marin...
new_df = movies[["movie_id","title","tags"]]
new_df.head(2)
1 285 Pirates of the Caribbean: At World's End [Captain, Barbossa,, long, believed, to, be, d...
["loved","loving","love"]
["love","love","love"]
import nltk
# Example
def stem1(text):
y = []
for i in text.split():
ps.stem()
def stem(text):
y = []
for i in text.split():
y.append(ps.stem(i))
new_df['tags'][0]
['In',
'the',
'22nd',
'century,',
'a',
'paraplegic',
'Marine',
'is',
'dispatched',
'to',
'the',
'moon',
'Pandora',
'on',
'a',
'unique',
'mission,',
'but',
'becomes',
'torn',
'between',
'following',
'orders',
'and',
'protecting',
'an',
'alien',
'civilization.',
'Action',
'Adventure',
'Fantasy',
'ScienceFiction',
'cultureclash',
'future',
'spacewar',
'spacecolony',
'society',
'spacetravel',
'futuristic',
'romance',
'space',
'alien',
'tribe',
'alienplanet',
'cgi',
'marine',
'soldier',
'battle',
'loveaffair',
'antiwar',
'powerrelations',
'mindandsoul',
'3d',
'SamWorthington',
'ZoeSaldana',
'SigourneyWeaver',
'JamesCameron']
new_df.head(1)
new_df['tags'][0]
'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes to
rn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultu
reclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi ma
rine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver
JamesCameron'
new_df['tags'].apply(lambda x:x.lower())
new_df.head(1)
new_df['tags'][0]
'in the 22nd century, a paraplegic marine is dispatched to the moon pandora on a unique mission, but becomes to
rn between following orders and protecting an alien civilization. action adventure fantasy sciencefiction cultu
reclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi ma
rine soldier battle loveaffair antiwar powerrelations mindandsoul 3d samworthington zoesaldana sigourneyweaver
jamescameron'
new_df['tags'][1]
"captain barbossa, long believed to be dead, has come back to life and is headed to the edge of the earth with
will turner and elizabeth swann. but nothing is quite as it seems. adventure fantasy action ocean drugabuse exo
ticisland eastindiatradingcompany loveofone'slife traitor shipwreck strongwoman ship alliance calypso afterlife
fighter pirate swashbuckler aftercreditsstinger johnnydepp orlandobloom keiraknightley goreverbinski"
People sometimes confuse between ranking (or search ranking) system and recommender system, and some may even
think they are interchangeable. While both algorithms are trying to present items in a sorted way, there are some key
differences between these two:
Ranking algorithms normally put more relevant items closer to the top of the showing list whereas recommender systems
sometimes try to avoid overspecialization. A good recommender system should not recommend items that are too similar
to what users have seen before, and should diversify its recommendations.
Ranking algorithms rely on search query provided by users, who know what they are looking for. Recommender systems,
on the other hand, without any explicit inputs from users, aim to discovering things they might not have found otherwise.
Matrix Factorization
Latent factor models compress user-item matrix into a low-dimensional representation in terms of latent factors. One
advantage of using this approach is that instead of having a high dimensional matrix containing abundant number of
missing values we will be dealing with a much smaller matrix in lower-dimensional space.
A reduced presentation could be utilized for either user-based or item-based neighborhood algorithms that are presented
in the previous section. There are several advantages with this paradigm. It handles the sparsity of the original matrix
better than memory based ones. Also comparing similarity on the resulting matrix is much more scalable especially in
dealing with large sparse datasets.
An important decision is the number of factors to factor the user-item matrix. The higher the number of factors, the more
precise is the factorization in the original matrix reconstructions. Therefore, if the model is allowed to memorize too much
details of the original matrix, it may not generalize well for data it was not trained on. Reducing the number of factors
increases the model generalization.
vector = cv.fit_transform(new_df['tags']).toarray()
vector
vector[0]
vector.shape
(4806, 5000)
cv.get_feature_names()
['000',
'007',
'10',
'100',
'11',
'12',
'13',
'14',
'15',
'16',
'17',
'18',
'18th',
'19',
'1930s',
'1940s',
'1944',
'1950',
'1950s',
'1960s',
'1970s',
'1971',
'1974',
'1976',
'1980',
'1980s',
'1985',
'1990s',
'1999',
'19th',
'19thcentury',
'20',
'200',
'2003',
'2009',
'20th',
'24',
'25',
'30',
'300',
'3d',
'40',
'50',
'500',
'60',
'60s',
'70',
'aaron',
'aaroneckhart',
'abandoned',
'abducted',
'abigailbreslin',
'abilities',
'ability',
'able',
'aboard',
'abuse',
'abusive',
'academic',
'academy',
'accept',
'accepted',
'accepts',
'access',
'accident',
'accidental',
'accidentally',
'accompanied',
'accomplish',
'account',
'accountant',
'accused',
'ace',
'achieve',
'act',
'acting',
'action',
'actionhero',
'actions',
'activist',
'activities',
'activity',
'actor',
'actors',
'actress',
'acts',
'actual',
'actually',
'adam',
'adams',
'adamsandler',
'adamshankman',
'adaptation',
'adapted',
'addict',
'addicted',
'addiction',
'adolescence',
'adopt',
'adopted',
'adoption',
'adopts',
'adrienbrody',
'adult',
'adultery',
'adulthood',
'adults',
'advantage',
'adventure',
'adventures',
'advertising',
'advice',
'affair',
'affairs',
'affection',
'affections',
'afghanistan',
'africa',
'african',
'africanamerican',
'aftercreditsstinger',
'afterlife',
'aftermath',
'age',
'aged',
'agedifference',
'agency',
'agenda',
'agent',
'agents',
'aggressive',
'aging',
'ago',
'agree',
'agrees',
'ahead',
'aid',
'aided',
'aids',
'ailing',
'air',
'airplane',
'airplanecrash',
'airport',
'aka',
'al',
'alabama',
'alan',
'alaska',
'albert',
'alcohol',
'alcoholic',
'alcoholism',
'alecbaldwin',
'alex',
'alfredhitchcock',
'ali',
'alice',
'alien',
'alieninvasion',
'alienlife',
'aliens',
'alike',
'alive',
'allen',
'alliance',
'allied',
'allies',
'allow',
'allows',
'ally',
'alongside',
'alpacino',
'alter',
'alternate',
'alternative',
'alzheimer',
'amanda',
'amandapeet',
'amandaseyfried',
'amateur',
'amazing',
'ambassador',
'ambition',
'ambitious',
'ambulance',
'ambush',
'america',
'american',
'americanabroad',
'americanfootball',
'americans',
'amid',
'amidst',
'amnesia',
'amp',
'amsterdam',
'amusementpark',
'amy',
'amyadams',
'amysmart',
'analyst',
'anarchiccomedy',
'ancient',
'ancientrome',
'ancientworld',
'anderson',
'andiemacdowell',
'andrew',
'android',
'andy',
'andygarcía',
'angel',
'angelabassett',
'angeles',
'angelinajolie',
'angels',
'anger',
'anglee',
'angry',
'animal',
'animalattack',
'animalhorror',
'animals',
'animated',
'animation',
'ann',
'anna',
'annabelle',
'annafaris',
'annakendrick',
'anne',
'annehathaway',
'annemoss',
'annettebening',
'annie',
'anniversary',
'announces',
'annual',
'anonymity',
'anonymous',
'answer',
'answers',
'ant',
'antarctic',
'anthology',
'anthony',
'anthonyanderson',
'anthonyhopkins',
'anthonymackie',
'anthropomorphism',
'anti',
'antics',
'antihero',
'antoinefuqua',
'antoniobanderas',
'antonyelchin',
'apart',
'apartheid',
'apartment',
'ape',
'apes',
'apocalypse',
'apocalyptic',
'apparent',
'apparently',
'appear',
'appears',
'apple',
'appointed',
'apprentice',
'approach',
'approaches',
'approaching',
'april',
'aquarium',
'arab',
'arch',
'archaeologist',
'archeology',
'archer',
'architect',
'arctic',
'area',
'aren',
'arena',
'argument',
'arise',
'aristocrat',
'armed',
'arms',
'army',
'arnold',
'arnoldschwarzenegger',
'arrangedmarriage',
'arrangement',
'arrest',
'arrested',
'arrival',
'arrive',
'arrives',
'arriving',
'arrogant',
'art',
'arthur',
'artificialintelligence',
'artist',
'artistic',
'artists',
'arts',
'ashley',
'ashleyjudd',
'ashtonkutcher',
'asia',
'aside',
'ask',
'asked',
'asking',
'asks',
'aspirations',
'aspiring',
'assassin',
'assassinate',
'assassination',
'assassins',
'assault',
'assigned',
'assignment',
'assistant',
'assumes',
'asteroid',
'astronaut',
'astronauts',
'asylum',
'atheist',
'athlete',
'atlantic',
'atomicbomb',
'attack',
'attacked',
'attacks',
'attempt',
'attempting',
'attempts',
'attending',
'attends',
'attention',
'attic',
'attitude',
'attorney',
'attracted',
'attraction',
'attractive',
'audience',
'audiences',
'audition',
'august',
'aunt',
'austin',
'australia',
'australian',
'author',
'authorities',
'authority',
'autism',
'auto',
'avenge',
'average',
'avoid',
'awaits',
'awakens',
'award',
'away',
'awry',
'ax',
'babe',
'baby',
'bachelor',
'backdrop',
'background',
'backgrounds',
'bad',
'bag',
'bahamas',
'bail',
'balance',
'ball',
'ballet',
'baltimore',
'band',
'bandits',
'bangkok',
'banished',
'bank',
'banker',
'bankrobber',
'bankrobbery',
'bar',
'barely',
'bargained',
'barn',
'barney',
'barry',
'barrylevinson',
'barrysonnenfeld',
'bars',
'base',
'baseball',
'based',
'basedoncomicbook',
'basedongraphicnovel',
'basedonnovel',
'basedonplay',
'basedonstagemusical',
'basedontrueevents',
'basedontruestory',
'basedontvseries',
'basedonvideogame',
'basedonyoungadultnovel',
'basement',
'basketball',
'batman',
'battle',
'battlefield',
'battles',
'battling',
'bay',
'beach',
'bear',
'bears',
'beast',
'beasts',
'beat',
'beating',
'beautiful',
'beautifulwoman',
'beauty',
'becky',
'becominganadult',
'bed',
'bedroom',
'beer',
'befriends',
'began',
'begin',
'beginning',
'begins',
'behavior',
'beings',
'beliefs',
'believe',
'believed',
'believes',
'believing',
'beloved',
'ben',
'benaffleck',
'beneath',
'benfoster',
'beniciodeltoro',
'benjamin',
'benjaminbratt',
'benkingsley',
'bennett',
'benstiller',
'bent',
'berlin',
'best',
'bestfriend',
'bet',
'beth',
'betrayal',
'betrayed',
'bettemidler',
'better',
'betty',
'beverly',
'bible',
'big',
'bigger',
'biggest',
'biker',
'bikini',
'billhader',
'billionaire',
'billmurray',
'billnighy',
'billpaxton',
'billpullman',
'billy',
'billybobthornton',
'billycrudup',
'billycrystal',
'biography',
'bird',
'birth',
'birthday',
'bisexual',
'bishop',
'bit',
'bite',
'bitter',
'bizarre',
'black',
'blackmagic',
'blackmail',
'blackpeople',
'blade',
'blame',
'blind',
'bliss',
'block',
'blonde',
'blood',
'bloodsplatter',
'bloodthirsty',
'bloody',
'blow',
'blue',
'board',
'boarding',
'boardingschool',
'boat',
'bob',
'bobby',
'bobbyfarrelly',
'bobhoskins',
'bodies',
'body',
'bodyguard',
'bold',
'bollywood',
'bomb',
'bombing',
'bond',
'bonds',
'bone',
'book',
'books',
'border',
'bored',
'boredom',
'boring',
'born',
'boss',
'boston',
'botched',
'bound',
'boundaries',
'bounty',
'bountyhunter',
'box',
'boxer',
'boxing',
'boy',
'boyfriend',
'boys',
'bradleycooper',
'bradpitt',
'brain',
'brand',
'brandon',
'brave',
'bravery',
'brazil',
'brazilian',
'break',
'breakdown',
'breaking',
'breaks',
'breast',
'brendanfraser',
'brendangleeson',
'brent',
'brettratner',
'brian',
'briandepalma',
'bride',
'bridge',
'brief',
'brien',
'bright',
'brilliant',
'bring',
'bringing',
'brings',
'brink',
'britain',
'british',
'britishsecretservice',
'brittanymurphy',
'broadway',
'broke',
'broken',
'broker',
'brooklyn',
'brooks',
'brothel',
'brother',
'brotherbrotherrelationship',
'brothers',
'brothersisterrelationship',
'brought',
'brown',
'bruce',
'brucewillis',
'brutal',
'brutality',
'brutally',
'bryansinger',
'buck',
'buddies',
'buddy',
'buddycomedy',
'budget',
'build',
'building',
'built',
'bully',
'bullying',
'bumbling',
'bunny',
'burglar',
'buried',
'bus',
'bush',
'business',
'businessman',
'bust',
'busy',
'butler',
'buy',
'cabin',
'caesar',
'cage',
'cairo',
'cal',
'california',
'called',
'calls',
'calvin',
'camcorder',
'came',
'camera',
'cameraman',
'cameras',
'camerondiaz',
'camp',
'campaign',
'campbell',
'camping',
'campus',
'canada',
'canadian',
'cancer',
'candidate',
'candy',
'cannibal',
'capable',
'capital',
'capitalism',
'capt',
'captain',
'captive',
'capture',
'captured',
'captures',
'car',
'caraccident',
'carchase',
'carcrash',
'card',
'care',
'career',
'carefree',
'caretaker',
'caribbean',
'carjourney',
'carl',
'carlagugino',
'carmen',
'carol',
'carolina',
'carrace',
'carrie',
'carry',
'carrying',
'cars',
'cartel',
'carter',
'cartoon',
'caryelwes',
'case',
'caseyaffleck',
'cash',
'casino',
'cast',
'castle',
'cat',
'cataclysm',
'catastrophe',
'catch',
'catches',
'cateblanchett',
'catherinedeneuve',
'catherinekeener',
'catherinezeta',
'catholic',
'catholicism',
'cattle',
'caught',
'cause',
'caused',
'causes',
'causing',
'cavalry',
'cave',
'cavemen',
'celebrate',
'celebrated',
'celebration',
'celebrity',
'cell',
'cellphone',
'cemetery',
'center',
'centered',
'centers',
'central',
'centuries',
'century',
'ceo',
'ceremony',
'certain',
'chad',
'chain',
'chainsaw',
'challenge',
'challenged',
'challenges',
'champion',
'championship',
'chance',
'change',
'changed',
'changes',
'changing',
'channingtatum',
'chaos',
'chaotic',
'chapter',
'character',
'characters',
'charge',
'charged',
'charismatic',
'charles',
'charlie',
'charliesheen',
'charlizetheron',
'charlotte',
'charm',
'charming',
'chase',
'chased',
'chauffeur',
'cheating',
'cheerleader',
'chef',
'chemical',
'cher',
'chicago',
'chicken',
'chief',
'child',
'childabuse',
'childhero',
'childhood',
'children',
'chilling',
'china',
'chinese',
'chip',
'chiwetelejiofor',
'chloëgracemoretz',
'chloësevigny',
'chocolate',
'choice',
'choices',
'choose',
'chosen',
'chris',
'chriscolumbus',
'chriscooper',
'chrisevans',
'chrishemsworth',
'chrisklein',
'chrispine',
'chrisrock',
'christ',
'christian',
'christianbale',
'christianity',
'christianslater',
'christinaapplegate',
'christinaricci',
'christine',
'christmas',
'christmasparty',
'christmastree',
'christopher',
'christopherlambert',
'christopherlloyd',
'christophernolan',
'christopherplummer',
'christopherwalken',
'christophwaltz',
'chronicle',
'chronicles',
'chuck',
'church',
'cia',
'cigarettesmoking',
'cillianmurphy',
'cinema',
'circle',
'circuit',
'circumstances',
'circus',
'cities',
'citizens',
'city',
'civil',
'civilization',
'civilwar',
'claim',
'claims',
'claire',
'clairedanes',
'clan',
'clark',
'clash',
'class',
'classes',
'classic',
'classmate',
'classmates',
'classroom',
'claudevandamme',
'clay',
'clean',
'clear',
'clerk',
'client',
'clients',
'climate',
'climbing',
'clinteastwood',
'cliveowen',
'clock',
'clone',
'cloning',
'close',
'closer',
'club',
'clubs',
'clues',
'clutches',
'coach',
'coast',
'cocaine',
'code',
'cody',
'coffin',
'cohen',
'col',
'cold',
'coldwar',
'cole',
'colin',
'colinfarrell',
'colinfirth',
'collapse',
'colleague',
'colleagues',
'collect',
'collection',
'collector',
'college',
'collide',
'collins',
'collision',
'colombia',
'colonel',
'colony',
'color',
'colorado',
'colorful',
'coma',
'combat',
'combination',
'combined',
'come',
'comeback',
'comedian',
'comedic',
'comedy',
'comes',
'comet',
'comfort',
'comic',
'comics',
'coming',
'comingofage',
'comingout',
'command',
'commander',
'commercial',
'commit',
'commitment',
'committed',
'common',
'communication',
'communist',
'community',
'companion',
'company',
'compete',
'competing',
'competition',
'competitive',
'complete',
'completely',
'complex',
'complicated',
'complications',
'composer',
'computer',
'computervirus',
'conan',
'concert',
'conclusion',
'condition',
'conditions',
'confession',
'confidence',
'confident',
'conflict',
'confront',
'confronted',
'confused',
'congress',
'conman',
'connected',
'connection',
'connell',
'connor',
'conquer',
'conscience',
'consequences',
'conservative',
'considered',
'conspiracy',
'constant',
'constantly',
'construction',
'contact',
'contain',
'contemporary',
'contend',
'contest',
'continue',
'continues',
'continuing',
'contract',
'control',
'controlled',
'controlling',
'controversial',
'convention',
'converge',
'convict',
'convicted',
'convince',
'convinced',
'convinces',
'cook',
'cooking',
'cool',
'cooper',
'cop',
'cope',
'cops',
'core',
'corner',
'corners',
'corporate',
'corporation',
'corpse',
'corrupt',
...]
# Steaming
["loved","loving","love"]
["love","love","love"]
ps.stem('loved')
'love'
ps.stem('loving')
'love'
ps.stem('love')
'love'
stem('in the 22nd century, a paraplegic marine is dispatched to the moon pandora on a unique mission, but becomes tor
'in the 22nd century, a parapleg marin is dispatch to the moon pandora on a uniqu mission, but becom torn betwe
en follow order and protect an alien civilization. action adventur fantasi sciencefict cultureclash futur space
war spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi marin soldier battl lovea
ffair antiwar powerrel mindandsoul 3d samworthington zoesaldana sigourneyweav jamescameron'
similarity = cosine_similarity(vector)
similarity
similarity[0]
744
[(539, 0.26089696604360174),
(1194, 0.2581988897471611),
(260, 0.25110592822973776),
(1216, 0.24944382578492943),
(507, 0.24846467329894412)]
def recommend(movie):
index = new_df[new_df['title'] == movie].index[0]
distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
for i in distances[1:6]:
print(new_df.iloc[i[0]].title)
Quantum of Solace
Never Say Never Again
Skyfall
Thunderball
From Russia with Love
print(f'\033[94m')
recommend('John Carter')
References
https://www.zeolearn.com/magazine/recommendation-systems-in-machine-learning
https://www.analyticssteps.com/blogs/what-are-recommendation-systems-machine-learning
https://medium.com/@zeolearn/recommendation-systems-in-machine-learning-96444fd90702
https://www.itransition.com/machine-learning/recommendation-systems
https://muitnoida.edu.in/recommendation-system-in-machine-learning/
https://www.researchgate.net/figure/Types-of-recommendation-algorithms_fig5_322360704