0% found this document useful (0 votes)
14 views

Assignment 5

The project aims to build a movie recommendation system using both Collaborative Filtering and Content-Based Filtering based on user viewing history. It involves data collection from the MovieLens dataset, preprocessing, implementing recommendation algorithms, and evaluating model performance. The system combines both methods to enhance the accuracy and diversity of movie recommendations, with suggestions for further improvements such as hybrid models and a user interface.

Uploaded by

zeerakzoya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Assignment 5

The project aims to build a movie recommendation system using both Collaborative Filtering and Content-Based Filtering based on user viewing history. It involves data collection from the MovieLens dataset, preprocessing, implementing recommendation algorithms, and evaluating model performance. The system combines both methods to enhance the accuracy and diversity of movie recommendations, with suggestions for further improvements such as hybrid models and a user interface.

Uploaded by

zeerakzoya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Zeerak Mustafa khan 2200911540131

CSDS 2

PROJECT TITLE : Build a system that recommends movies to users


based on their viewing history.

Steps Involved:
1. Data Collection: We will use a simplified version of the MovieLens dataset (which
contains user ratings and movie details).
2. Data Preprocessing: Clean and prepare the data.
3. Recommendation Algorithms: Implement Collaborative Filtering and Content-Based
Filtering.
4. Model Evaluation: Evaluate the performance using metrics like RMSE.
5. Integration: A simple script to demonstrate how the system works.

1. Install Required Libraries


First, you need to install some Python libraries that we will use for this project. You can install
them via pip.

pip install pandas numpy scikit-learn surprise

2. Data Collection

For this project, we will use a small, sample dataset. You can download the MovieLens
dataset (or use any small version of the MovieLens dataset). For simplicity, we will use
a dataset that contains movie ratings by users. Here is an example dataset:

movie_id,title,genre 1,Toy Story (1995),Animation|Children|Comedy 2,Jumanji


(1995),Adventure|Children|Fantasy 3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance

user_id,movie_id,rating 1,1,5 1,2,4 2,1,4 2,3,3 3,2,5 3,4,2

3. Data Preprocessing

First, we load and clean the data.

import pandas as pd

# Load movie and ratings data


Zeerak Mustafa khan 2200911540131
CSDS 2

PROJECT TITLE : Build a system that recommends movies to users


based on their viewing history.
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')

# Display data
print(movies.head())
print(ratings.head())

4. Collaborative Filtering with surprise

We will use the surprise library to implement collaborative filtering. Specifically, we’ll
use Singular Value Decomposition (SVD) to predict ratings and recommend movies.

from surprise import SVD, Dataset, Reader

from surprise.model_selection import train_test_split

from surprise import accuracy

# Prepare the data for Surprise

reader = Reader(line_format='user item rating timestamp', sep=',')

data = Dataset.load_from_df(ratings[['user_id', 'movie_id', 'rating']], reader)

# Split the data into training and testing sets

trainset, testset = train_test_split(data, test_size=0.2)

# Build and train the SVD model

model = SVD()

model.fit(trainset)
Zeerak Mustafa khan 2200911540131
CSDS 2

PROJECT TITLE : Build a system that recommends movies to users


based on their viewing history.

# Make predictions

predictions = model.test(testset)

# Evaluate the model

rmse = accuracy.rmse(predictions)

print(f"RMSE: {rmse}")

5. Content-Based Filtering

Content-based filtering recommends items (movies) based on the attributes of the items
and user preferences. In this case, we'll recommend movies based on genres that the
user has already liked.

Steps:

1. Vectorize Movie Genres: We'll use one-hot encoding for movie genres.
2. Compute Similarity: We'll calculate similarity between movies based on genres
using cosine similarity.

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.metrics.pairwise import cosine_similarity

# One-hot encode the genres

count = CountVectorizer(stop_words='english')

genre_matrix = count.fit_transform(movies['genre'])

# Compute cosine similarity


Zeerak Mustafa khan 2200911540131
CSDS 2

PROJECT TITLE : Build a system that recommends movies to users


based on their viewing history.
cosine_sim = cosine_similarity(genre_matrix, genre_matrix)

# Create a DataFrame for the cosine similarity

cosine_sim_df = pd.DataFrame(cosine_sim, index=movies['title'],


columns=movies['title'])

print(cosine_sim_df)

6. Movie Recommendation

We now combine both filtering approaches to recommend movies to a user based on


their history.

Example: Recommend Movies for a User Who Rated "Toy Story (1995)" Highly

def recommend_movies(user_id, ratings, cosine_sim_df, model):

# Get the movies the user has already watched

user_ratings = ratings[ratings['user_id'] == user_id]

rated_movies = user_ratings['movie_id'].values

recommended_movies = []

for movie_id in rated_movies:

movie_title = movies[movies['movie_id'] == movie_id]['title'].values[0]

# Get movies similar to the ones rated highly

similar_movies =
cosine_sim_df[movie_title].sort_values(ascending=False).index[1:3]
Zeerak Mustafa khan 2200911540131
CSDS 2

PROJECT TITLE : Build a system that recommends movies to users


based on their viewing history.
recommended_movies.extend(similar_movies)

# Recommend movies based on Collaborative Filtering as well

user_predictions = [model.predict(user_id, movie_id) for movie_id in range(1,


len(movies) + 1)]

sorted_predictions = sorted(user_predictions, key=lambda x: x.est, reverse=True)

# Get top 5 movie recommendations

top_5_recommendations = [x.iid for x in sorted_predictions[:5]]

return list(set(recommended_movies)), top_5_recommendations

# Test the function for user_id 1

recommended_movies_content, recommended_movies_collab =
recommend_movies(1, ratings, cosine_sim_df, model)

print("Content-Based Recommendations:", recommended_movies_content)

print("Collaborative Filtering Recommendations:", recommended_movies_collab)

Content-Based Recommendations: Based on movie genres similar to the one that the
user watched.

Collaborative Filtering Recommendations: Based on what other similar users rated


highly.

7. Conclusion:
Zeerak Mustafa khan 2200911540131
CSDS 2

PROJECT TITLE : Build a system that recommends movies to users


based on their viewing history.
This system uses both Collaborative Filtering and Content-Based Filtering to
recommend movies to users based on their viewing history. The system evaluates
movie preferences using a combination of user behavior and movie attributes.

● Collaborative Filtering helps predict ratings based on past user-item


interactions.
● Content-Based Filtering suggests movies based on similar genres to those the
user has already watched.

By combining both techniques, the system offers more accurate and diverse movie
recommendations.

Further Improvements:

● Use hybrid models to combine both collaborative and content-based methods


effectively.
● Implement Matrix Factorization techniques like SVD++ for better
recommendations.
● Add a user interface to display the recommendations dynamically in a web
application.

You might also like