SRMDB - in (B28 - Research Paper)
SRMDB - in (B28 - Research Paper)
SRMDB - in (B28 - Research Paper)
Abstract – There is plentiful data/content available online There are many ways of recommending movies to users
and it is increasing exponentially day by day. Therefore, the depending on the genre, language, etc. Additionally, there
users need a product that can suggest movies at better are methods that look for the closeness in b/w different users
accuracy and performance. Type of content liked by user to provide a movie by the system. There are many algorithms
varies from one user to another. Every online service company to form a recommendation system such as
aims to grab as many clients as possible. Here, the
Recommender systems come into play. The objective of this • Content based algorithm
project is to basically build a fast & better movie
recommendation system with review analysis. We have • Collaborative algorithms
proposed to build a model using content-based filtering • Hybrid approach
algorithm (supervised learning) with the help of cosine
similarity measure and levenshtein distance for efficient
results. The challenges faced by the users are the problems of
the scalability, data sparsity and automation. By the end of this Content Based Recommendation System: It’s a supervised
paper, we aim to eradicate these problems and build a logical learning algorithms that has a set of outcomes for
and practical model using ajax requests, APIs, and few other particular inputs and later the system predicts on different
resources. inputs provided by the user. It uses properties like genre,
director, description, actor, etc. for movies, to make
Keywords—content-based filtering, supervised learning, recommendations to users. In this project, we will mainly
cosine similarity measure, levenshtein distance, Naïve Bayes focus on the content-based recommendation system.
classification, TF-IDF Vectorizer
Collaborative filtering is an unsupervised learning where the
I. INTRODUCTION system is given inputs but there is no particular output for
the inputs , instead the system categorizes on similar
People living in today’s world depends upon speed and patterns or shapes and classifies them as together.
genuine products. A user easily wants to be engaged with Predictions are made from ratings provided by people. Each
the content available in front of his/her eyes. The row represents a person's movie rating and each column
recommender system aim is to provide items to the users shows a movie's rating
that the user is not aware of. Basically, a recommendation
system as the name suggests help to provide related Combined approach: In the hybrid approach, we combine
products that the user might like because that would be the two recommended filtering techniques known as
useful to him/her. There are tons of ways to provide collaborative filtering with content-based filtering method
recommendation to users depending on his/her needs. We to get the best benefit and achieve better results and
will focus on the movie recommendation system which reduce the challenges faced by the respective approach.
will provide suggestions or something to recommend them
to watch more movies. Basically, a recommender system is To build the movie recommendation system, we will
a subclass of data filtering method that seeks to predict the work on the following steps:
rating or the preference a user might give to an item. Data Collection: First the data is collected and
Recommendation system offer help a lot in recommending chosen on which the system will work on.
content to users. The first recommendation system came in
1992, and it’s still growing to achieve more accuracy and Data Pre-processing: Next the data is processed
provide better results to users that in turn grows the multiple times and the important part is taken out to
organisation or the company. For example, people who buy work on (Training-testing data)
Apple smartphones also tend to buy Apple smartwatch Model Creation: The model is built on the
together, so a recommendation would be formed to algorithms chosen and first the testing data is taken
recommend Apple watches whenever any user buys an to work on and eventually on whole dataset.
Apple smartphone. With so many practical applications
around us today, therefore, it is not possible to live without Website/App creation: The website is created
the recommendation systems. where the working of model is checked multiple
times to check the efficiency and ease UI/UX for movies that are released on or prior to July 2017. The data
user. includes crew, cast, plot keywords, budget, posters, gross,
Deployment: The last step of the project is to release date, language, manufacturing company, TMDB
deploy it on cloud, it is the process of deploying the votes, country and average number of votes. This dataset
model in a real environment. The model can be is made up of files accommodating 260000000 (2.6 crores)
deployed in a variety of different environments and reviews from 270,000 users for 45,000 movies. They are
will often be integrated into applications via an API. rated throughout 1-5 and are seized from the Group Lens
For this project, we have used Heroku cloud official website.
environment for deployment.
To help evaluate the recommendation system, we have Figure 3 – Architecture Diagram of Recommendation System
used three different data sets available in Movie Lens,
which was generated by the group lens research team for With the help of python Flask, we were able to create a web
the project. framework for our project. The data was first collected and
pre-processed as our requirement using python and its
1. IMDB 5000 Movie Dataset (test) various libraries. The frontend and templates were created
2. The Movies Dataset (train) using HTML/CSS/JS. Further the recommendations are
passed to user whenever a request is made with the help
3. List of movies 2018-2020 AJAX which allows the data to be sent and received to and
from a database / server.
The Movies dataset comprises of a metadata i.e. 45,000
APIs were used to fetch the metadata (i.e. posters, title,
movies in the Full Movies dataset. This list includes
ratings etc) from the TMDB database. TMDB API
provides is available for everyone to use. It provides a
quick, consistent and reliable way to get the third party
data.
The dataset used is: The Movie Dataset & Wikipedia (2018-
2020). The training-testing data of 80-20 has not been used,
instead the approach of 60-40 training-testing data has been Cosine similarity is beneficial as it is independent of the
used. size of the dataset which is not in Euclidean distance
Then, the Sentiment Evaluation was performed on the method. I f some data is separated by a huge gap due to the
reviews to check if they were positive or negative and to size of the dataset, using cosine similarity it could have a
build the model for the same, we used the following features: small angle that represents higher the similarity.
1) Stop words
B. System Analysis
This is a numeric value between 0 and 1 that measures how C. Pseudo Code of Proposed System
similar two items are to each other. each other on a scale of
zero to one. This similarity is obtained by measuring the
similarity between the textual details of the two elements. Steps Overview
As such, similarity is a measure of the degree of similarity Step 1 Import the dataset and perform the data pre-
between the given textual details of two items. This can be processing steps.
accomplished by cosine similarity.
Step 2 Import the required libraries and generate the
count matrix with the help of count vectorizer
method.
COSINE Similarity: It is a system of measuring the
similarities b/w datasets. It overall represents the facet of the Step 3 Then use the Cosine similarity measure to
object in the dataset. In cosine similarity the data is taken and determine the angle b/w documents
treated as some non-zero(0) vectors whose trigonometric independent of their size.
cosine angle b/w them is taken to give the similarity Step 4 Initiate a directory setup for the website page
measure. The dot product of the data is taken and the divided where the main input field is set-out.
by their lengths.
Step 5 Connect the page to flask and render it.
In this project the code and data can be acquired from the
below mentioned GitHub link
srmDB.in-Movie-Recommendation-System
ACKNOWLEDGMENT
We are grateful to Dr P Murali for his insightful and
constructive suggestions during the project's design
planning and development. It is much appreciated that he is
prepared to devote his time.
REFERENCES