Movie Recommender System Using Content Based AndCollaborative Filtering
Movie Recommender System Using Content Based AndCollaborative Filtering
ISSN No:-2456-2165
Madhuri Sahu5
Achamma Thomas6
5,6
Assistant Professor ,Artificial Intelligence, G.H.Raisoni College Of Engineering, Nagpur , India
Abstract:- Technology has evolved a lot from basic to Using this rating information of thousands and
advanced such as Machine learning, deep learning, thousands of users and tens of thousands of movies we can
Internet of things, Data Mining and many more. somehow predict the type of movie the user can like in the
Recommender systems provide users with personalized future. We are using predicted rating as the feature for
suggestions for products or services also this system only recommending the movie of similar type.
rely on collaborative filtering. Movies are the source of
Entertainment but finding the desired content is the The Higher the predicted rating, the higher is the
problem. Aim of this paper is to improve the accuracy chance of recommending that movie for the user. We are
and performance of the regular filtering technique and using the surprise library for this approach which is also
also to recommend movies based on the content of the calledthe simple python recommendation system engine. We
movie which users have watched earlier. Collaborative are improving the cinematch algorithm for this. We used the
filtering recommends movies to user A based on the collaborativefiltering plus the xgboost regressor.
interest of similar user B. Netflix is internally using a
cinematch algorithm for the collaborative filtering we Netflix comes with an algorithm called the cinematch
are improving the accuracy and the performance of system which has some errors for this they are using root
regular technique. Content based filtering will help mean square error.
Netflix boost their turnover by providing similar movies
which users have watched earlier on any of the We want to improve the cinematch algorithm and the
OTT(Over The Top) platforms. We have used a surprise Content based approach tothe solution.
library along with the xgboost regressor which makes
our model improve from regular technique. We have II. DATA
also designed the frontend for the content based
recommendation system system for Netflix. We got the data from Netflix where we have movie ids
ranging from 1 to 17770 sequentially. Customer id range
Keywords:- Content Based , Collaborative Filtering, from 1 to 2649429 there are 48018 users. Ratings are on a
Recommender System, Surprise-Library, User-Based five steer scale 1 to 5 and dates have format YYYY-MM-
Recommender, Item-Based Recommender. DD It has actually 5 text files where we have all this date.
eg:- 1: 1488844,3; 2005-09-66 ". Customer id, rating, date.
I. INTRODUCTION
The median users rated 89 movies. So percentile of XI. COLD START PROBLEM
users rated below 89 movies and 50 percentile of users
above 89 movies. If we slice a problem with the time as 80: 20 split there
might be some users who are present in train data and not in
Then we plot numbers of ratings per weekday. And we the test data.
found Saturday and Sunday traffic is much favored because
on Sunday and Saturday people go for outings. There might be some users who joined late and also
there might be new movies. For a person or user we have
1500 data where there is no data. This cold start problem in
a recommendation system. And when we looked for the cold
start problem with users it found that 15% of the users we
are not present to new users. Total movies are 17770 and we
have 17424 in trend data, that is 346 movies did not appear
in train data.
We Feature Data for Regression and use Now, we will mix 13 features and the baseline model.
XGBoost regression with RMSE as error. Then we We have 13 features and the 14th feature will be BSIPR . It
featured data for the surprise library and used the baseline is the output of the baseline model, and then apply XGboost
model of the surprise library with RMSE. Using step 1 and 2 on top of this. RMSE is 1.076 and MAPE is 34.49, and
we again used XG boost regression with features along with BSIPR is a list of important features we came to know.
RMSE. Then we use surprise KNN with user-user similarity
and RMSE as error and again we use surprise KNN model Surprise KNN is our next tape which is internally
with item-item similarity as step 4 combined step 1,2, 3 and using similar users and similar movies. k is similar to users
4 as the features and used XG boost regressor with RMSE of user (U) andwho rated movie (i) it is the cosine similarity.
as error. Then for feature 5 we use Matrix factorization But we use here baseline pearson correlation Coefficient and
SVDand for feature 6 we used matrix factorization svd ++ now we have RMSE is 1.0726 and MAPE as 35.02 , then
as a feature using all the feature sets 1 to 6 we implemented we use Matrix factorization technique in which RMSE is
XG boost regression and RMSE as the error. 1.072 and MAPE is 35.01 And test is lowest among all that
wetrain till now, then we went with svd++ inwhich RSME is
Step 1 1.072 and MAPE is also same.
Is to sample the data our train has 405k * 17k and test
has 349k * 17k so, we use 10k users and 1K movies as train
data and 5K users and 500 movies as tests. And then we
work on the model and find which model is best and then
use that model for allthe data.