Assignment 5
Assignment 5
CSDS 2
Steps Involved:
1. Data Collection: We will use a simplified version of the MovieLens dataset (which
contains user ratings and movie details).
2. Data Preprocessing: Clean and prepare the data.
3. Recommendation Algorithms: Implement Collaborative Filtering and Content-Based
Filtering.
4. Model Evaluation: Evaluate the performance using metrics like RMSE.
5. Integration: A simple script to demonstrate how the system works.
2. Data Collection
For this project, we will use a small, sample dataset. You can download the MovieLens
dataset (or use any small version of the MovieLens dataset). For simplicity, we will use
a dataset that contains movie ratings by users. Here is an example dataset:
3. Data Preprocessing
import pandas as pd
# Display data
print(movies.head())
print(ratings.head())
We will use the surprise library to implement collaborative filtering. Specifically, we’ll
use Singular Value Decomposition (SVD) to predict ratings and recommend movies.
model = SVD()
model.fit(trainset)
Zeerak Mustafa khan 2200911540131
CSDS 2
# Make predictions
predictions = model.test(testset)
rmse = accuracy.rmse(predictions)
print(f"RMSE: {rmse}")
5. Content-Based Filtering
Content-based filtering recommends items (movies) based on the attributes of the items
and user preferences. In this case, we'll recommend movies based on genres that the
user has already liked.
Steps:
1. Vectorize Movie Genres: We'll use one-hot encoding for movie genres.
2. Compute Similarity: We'll calculate similarity between movies based on genres
using cosine similarity.
count = CountVectorizer(stop_words='english')
genre_matrix = count.fit_transform(movies['genre'])
print(cosine_sim_df)
6. Movie Recommendation
Example: Recommend Movies for a User Who Rated "Toy Story (1995)" Highly
rated_movies = user_ratings['movie_id'].values
recommended_movies = []
similar_movies =
cosine_sim_df[movie_title].sort_values(ascending=False).index[1:3]
Zeerak Mustafa khan 2200911540131
CSDS 2
recommended_movies_content, recommended_movies_collab =
recommend_movies(1, ratings, cosine_sim_df, model)
Content-Based Recommendations: Based on movie genres similar to the one that the
user watched.
7. Conclusion:
Zeerak Mustafa khan 2200911540131
CSDS 2
By combining both techniques, the system offers more accurate and diverse movie
recommendations.
Further Improvements: