Assignment 15 (Mini Project)
Assignment 15 (Mini Project)
Movie Recommendation
This notebook develops a content-based movie recommendation system using scikit-learn.
## 1. Introduction to Recommendation Systems A recommendation system suggests items
(movies, books, music, etc.) to users based on certain criteria. There are different types of
recommendation systems:
Content-Based Filtering – Recommends items similar to what the user likes.
Collaborative Filtering – Recommends items based on user behavior and preferences of
similar users.
Hybrid Filtering – A combination of both approaches.
Your project implements Content-Based Filtering using TF-IDF and Cosine Similarity.
2. Content-Based Filtering
This method suggests movies similar to a given movie by analyzing its features (e.g., genres,
keywords, tagline, cast, and director). The similarity between movies is calculated based on
their descriptions.
Steps in Content-Based Filtering: Select Relevant Features – The important attributes
(genres, keywords, tagline, cast, director) are extracted.
Preprocess Data – Handle missing values, clean text, and combine features.
Convert Text to Numerical Form – Use TF-IDF Vectorization to convert textual data into
numerical vectors.
Compute Similarity – Use Cosine Similarity to measure the closeness between movies.
Recommend Similar Movies – Retrieve and display movies with the highest similarity
scores.
Akhila Ohmkumar(Roll.No:03)
Assignment No :15(Mini project)
TF-IDF
TF × IDF TF-IDF=TF×IDF Where:
1)TF (Term Frequency) – How often a word appears in a document.
2)IDF (Inverse Document Frequency) – Gives importance to rare words by reducing the
weight of common words.
Example:
1)"Action movie with a great storyline."
2)"Comedy movie with a hilarious cast."
3)"Action thriller with a suspenseful plot."
The word "movie" appears frequently, so its importance is low, whereas "thriller" appears
rarely, so its importance is high.
In your notebook, TF-IDF is applied to the combined features of movies.
homepage id \
0 http://www.avatarmovie.com/ 19995
1 http://disney.go.com/disneypictures/pirates/ 285
2 http://www.sonypictures.com/movies/spectre/ 206647
3 http://www.thedarkknightrises.com/ 49026
4 http://movies.disney.com/john-carter 49529
keywords original_language \
0 culture clash future space war space colony so... en
Akhila Ohmkumar(Roll.No:03)
Assignment No :15(Mini project)
original_title \
0 Avatar
1 Pirates of the Caribbean: At World's End
2 Spectre
3 The Dark Knight Rises
4 John Carter
spoken_languages status \
0 [{"iso_639_1": "en", "name": "English"}, {"iso... Released
1 [{"iso_639_1": "en", "name": "English"}] Released
2 [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... Released
3 [{"iso_639_1": "en", "name": "English"}] Released
4 [{"iso_639_1": "en", "name": "English"}] Released
tagline \
0 Enter the World of Pandora.
1 At the end of the world, the adventure begins.
2 A Plan No One Escapes
3 The Legend Ends
4 Lost in our world, found in another.
cast \
0 Sam Worthington Zoe Saldana Sigourney Weaver S...
1 Johnny Depp Orlando Bloom Keira Knightley Stel...
2 Daniel Craig Christoph Waltz L\u00e9a Seydoux ...
3 Christian Bale Michael Caine Gary Oldman Anne ...
4 Taylor Kitsch Lynn Collins Samantha Morton Wil...
crew director
Akhila Ohmkumar(Roll.No:03)
Assignment No :15(Mini project)
[5 rows x 24 columns]
Feature Engineering
1.In your Movie Recommendation System, features are the attributes used to compare and
recommend similar movies. These features are extracted from the dataset and converted
into numerical vectors for similarity calculations.
movies.isnull().sum()
index 0
budget 0
genres 0
homepage 3091
id 0
keywords 0
original_language 0
Akhila Ohmkumar(Roll.No:03)
Assignment No :15(Mini project)
original_title 0
overview 3
popularity 0
production_companies 0
production_countries 0
release_date 1
revenue 0
runtime 2
spoken_languages 0
status 0
tagline 0
title 0
vote_average 0
vote_count 0
cast 0
crew 0
director 0
dtype: int64
Cosine Similarity
1.What It Is: Cosine similarity is a measure used to calculate the similarity between two
vectors in a multi-dimensional space. It's widely used in text analysis, recommendation
systems, and machine learning.
2.How It Works: The similarity is computed as the cosine of the angle between the vectors.
Values range from 0 (no similarity) to 1 (exact match).
Akhila Ohmkumar(Roll.No:03)
Assignment No :15(Mini project)
3.Formula:
A⋅B
Cosine Similarity =
| A )| B )
Where ( A ) and ( B ) are vectors, ( \cdot ) is the dot product, and ( |A| ) is the magnitude of
vector ( A ).
4.Applications: In your movie recommendation system, cosine similarity is calculated
between the feature vectors of movies (e.g., based on descriptions or genres) to find the
closest matches.
#getting the similarity scores using cosine similarity
similarity = cosine_similarity(feature_vectors)
print(similarity)
similarity.shape
(4803, 4803)
# Get the top_n most similar movies, excluding the movie itself
Akhila Ohmkumar(Roll.No:03)
Assignment No :15(Mini project)
sim_scores = sim_scores[1:top_n + 1]
Recommend Movies
To recommend movies based on user input:
difflib
difflib is a Python module used for comparing sequences, finding similarities between
strings, and performing approximate matching. In your movie recommendation system,
difflib is useful for handling user input variations when searching for a movie.
# Example code
movie_name = input('Enter your favourite movie name: ')
list_of_all_titles = movies['title'].tolist()
if find_close_match:
close_match = find_close_match[0]
print(f"Closest match found: {close_match}")
Akhila Ohmkumar(Roll.No:03)
Assignment No :15(Mini project)
close_match]['index'].values[0]
print(f"Index of the matched movie: {index_of_the_movie}")
else:
print(f"No close match found for '{movie_name}'.")
Enter your favourite movie name: Pirates of the Caribbean: At World's End
Akhila Ohmkumar(Roll.No:03)