SRMDB - in (B28 - Research Paper)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

srmDB.

in – PRODUCT BASED MOVIE


RECOMMENDATION SYSTEM USING AJAX
REQUESTS WITH SENTIMENT EVALUATION
R Lavanya
Vasu Jhawar Aayush Agrawal
Assistant Professor
BTech Computer Science BTech Computer Science
Computing Technologies
SRMIST, Kattankulathur SRMIST, Kattankulathur
SRMIST, Kattankulathur
Chennai, India Chennai, India
Chennai, India
[email protected] [email protected]
[email protected]

Abstract – There is plentiful data/content available online There are many ways of recommending movies to users
and it is increasing exponentially day by day. Therefore, the depending on the genre, language, etc. Additionally, there
users need a product that can suggest movies at better are methods that look for the closeness in b/w different users
accuracy and performance. Type of content liked by user to provide a movie by the system. There are many algorithms
varies from one user to another. Every online service company to form a recommendation system such as
aims to grab as many clients as possible. Here, the
Recommender systems come into play. The objective of this • Content based algorithm
project is to basically build a fast & better movie
recommendation system with review analysis. We have • Collaborative algorithms
proposed to build a model using content-based filtering • Hybrid approach
algorithm (supervised learning) with the help of cosine
similarity measure and levenshtein distance for efficient
results. The challenges faced by the users are the problems of
the scalability, data sparsity and automation. By the end of this Content Based Recommendation System: It’s a supervised
paper, we aim to eradicate these problems and build a logical learning algorithms that has a set of outcomes for
and practical model using ajax requests, APIs, and few other particular inputs and later the system predicts on different
resources. inputs provided by the user. It uses properties like genre,
director, description, actor, etc. for movies, to make
Keywords—content-based filtering, supervised learning, recommendations to users. In this project, we will mainly
cosine similarity measure, levenshtein distance, Naïve Bayes focus on the content-based recommendation system.
classification, TF-IDF Vectorizer
Collaborative filtering is an unsupervised learning where the
I. INTRODUCTION system is given inputs but there is no particular output for
the inputs , instead the system categorizes on similar
People living in today’s world depends upon speed and patterns or shapes and classifies them as together.
genuine products. A user easily wants to be engaged with Predictions are made from ratings provided by people. Each
the content available in front of his/her eyes. The row represents a person's movie rating and each column
recommender system aim is to provide items to the users shows a movie's rating
that the user is not aware of. Basically, a recommendation
system as the name suggests help to provide related Combined approach: In the hybrid approach, we combine
products that the user might like because that would be the two recommended filtering techniques known as
useful to him/her. There are tons of ways to provide collaborative filtering with content-based filtering method
recommendation to users depending on his/her needs. We to get the best benefit and achieve better results and
will focus on the movie recommendation system which reduce the challenges faced by the respective approach.
will provide suggestions or something to recommend them
to watch more movies. Basically, a recommender system is To build the movie recommendation system, we will
a subclass of data filtering method that seeks to predict the work on the following steps:
rating or the preference a user might give to an item.  Data Collection: First the data is collected and
Recommendation system offer help a lot in recommending chosen on which the system will work on.
content to users. The first recommendation system came in
1992, and it’s still growing to achieve more accuracy and  Data Pre-processing: Next the data is processed
provide better results to users that in turn grows the multiple times and the important part is taken out to
organisation or the company. For example, people who buy work on (Training-testing data)
Apple smartphones also tend to buy Apple smartwatch  Model Creation: The model is built on the
together, so a recommendation would be formed to algorithms chosen and first the testing data is taken
recommend Apple watches whenever any user buys an to work on and eventually on whole dataset.
Apple smartphone. With so many practical applications
around us today, therefore, it is not possible to live without  Website/App creation: The website is created
the recommendation systems. where the working of model is checked multiple
times to check the efficiency and ease UI/UX for movies that are released on or prior to July 2017. The data
user. includes crew, cast, plot keywords, budget, posters, gross,
 Deployment: The last step of the project is to release date, language, manufacturing company, TMDB
deploy it on cloud, it is the process of deploying the votes, country and average number of votes. This dataset
model in a real environment. The model can be is made up of files accommodating 260000000 (2.6 crores)
deployed in a variety of different environments and reviews from 270,000 users for 45,000 movies. They are
will often be integrated into applications via an API. rated throughout 1-5 and are seized from the Group Lens
For this project, we have used Heroku cloud official website.
environment for deployment.

Figure 2-Plotted chart of Movie Dataset

III. PRODUCT METHODOLOGY

A. Architecture Diagram and System Working

Figure 1 - Project Flow

II. THE DATASET

To help evaluate the recommendation system, we have Figure 3 – Architecture Diagram of Recommendation System
used three different data sets available in Movie Lens,
which was generated by the group lens research team for With the help of python Flask, we were able to create a web
the project. framework for our project. The data was first collected and
pre-processed as our requirement using python and its
1. IMDB 5000 Movie Dataset (test) various libraries. The frontend and templates were created
2. The Movies Dataset (train) using HTML/CSS/JS. Further the recommendations are
passed to user whenever a request is made with the help
3. List of movies 2018-2020 AJAX which allows the data to be sent and received to and
from a database / server.
The Movies dataset comprises of a metadata i.e. 45,000
APIs were used to fetch the metadata (i.e. posters, title,
movies in the Full Movies dataset. This list includes
ratings etc) from the TMDB database. TMDB API
provides is available for everyone to use. It provides a
quick, consistent and reliable way to get the third party
data.
The dataset used is: The Movie Dataset & Wikipedia (2018-
2020). The training-testing data of 80-20 has not been used,
instead the approach of 60-40 training-testing data has been Cosine similarity is beneficial as it is independent of the
used. size of the dataset which is not in Euclidean distance
Then, the Sentiment Evaluation was performed on the method. I f some data is separated by a huge gap due to the
reviews to check if they were positive or negative and to size of the dataset, using cosine similarity it could have a
build the model for the same, we used the following features: small angle that represents higher the similarity.

1) Stop words

2) TFIDF Vectorizer – It is abbreviated for Term


Frequency Inverse Document Frequency. It simply is an
algorithm that changes the text of the data into a relevant
depiction of numbers that is fitted into the machine
algorithm for predictions to be made. TF-IDF weighting
negates the effect of high frequency words in determining
the importance of an item (document). We are considering
genre as an important metric to recommend movies to users.
To evaluate the distance or the similarity between two
movies, there are different techniques of measuring and in
this paper we have used the cosine similarity.
Figure 4 - Cosine Distance
3) Naive Bayes (Multinomial NB algorithm) – for
checking the accuracy score of the model.
LEVENSHTEIN DISTANCE:
To use the model, we used the Web Scraping method to It is a string matrix to measure the distance between two
collect content and data from internet. This data is mostly sequences easily. We used it to get the closest match for the
saved in the local file to be worked upon. We used it to get searched movie.
the reviews and comments from the internet, and with the
help of library BeautifulSoup4 in python which creates parse
tree for HTML & XML texts. It helps in automated
conversion of documents into Unicode.

B. System Analysis

To build an efficient product we have proposed a model that


works on content-based filtering method with the cosine
similarity measure to build a recommendation system which
is more accurate and reliable.
Similarity (It decides which item is most similar to the
item the user likes? Here we use similarity.)

This is a numeric value between 0 and 1 that measures how C. Pseudo Code of Proposed System
similar two items are to each other. each other on a scale of
zero to one. This similarity is obtained by measuring the
similarity between the textual details of the two elements. Steps Overview
As such, similarity is a measure of the degree of similarity Step 1 Import the dataset and perform the data pre-
between the given textual details of two items. This can be processing steps.
accomplished by cosine similarity.
Step 2 Import the required libraries and generate the
count matrix with the help of count vectorizer
method.
COSINE Similarity: It is a system of measuring the
similarities b/w datasets. It overall represents the facet of the Step 3 Then use the Cosine similarity measure to
object in the dataset. In cosine similarity the data is taken and determine the angle b/w documents
treated as some non-zero(0) vectors whose trigonometric independent of their size.
cosine angle b/w them is taken to give the similarity Step 4 Initiate a directory setup for the website page
measure. The dot product of the data is taken and the divided where the main input field is set-out.
by their lengths.
Step 5 Connect the page to flask and render it.

Step 6 .Then in the terminal open the python


file and give the link into the browser.

Step 7 The user provides the name of movie and if


that is available in the dataset, the cosine
measure will be calculated giving the top 10
similar movies. It will be displayed to the
user on the screen.

Step 8 If the movie that is searched is not in the


dataset, a message will be displayed.

IV. RESULTS AND OUTCOMES

A. Accuracy of Sentimental Analysis Model


The Naive Bayes multinomial/polynomial classifier is
suitable for classification with distinct features (Eg. word
count for text classification). Polynomial distributions
usually require an integer number of objects. In practice,
however, fractional counts like Tfidf can also work.

Fig -5: Observed accuracy of sentimental analysis.

B. Product Features V. CONCLUSION

Live-working prototype - https://srmdb-in.herokuapp.com/ The recommended system(algorithm) uses the textual


metadata of movies such as plot, cast, genre, release year,
and other information to inspect them and then recommends
the most similar movies related to the search to the user. Our
system requires only 1 movie that the user is fascinated in to
make the suitable recommendations.

Following Technology/Software Used:

• Python packages and libraries


(Flask/nltk/scikit/numpy/pickle/pandas/etc.)
• HTML/CSS/JS
• JUPYTER NOTEBOOK/
• PYCHARM IDE 20.22
• Microsoft Visual Studio Code
Automated search bar feature -
https://cdn.jsdelivr.net/npm/@tarekraafat/autocomplete.js@ In this paper we explicitly pointed out the problems and
7.2.0/dist/css/autoComplete.min.css managed to eradicate them by building a better
recommender system. Also, we have taken a different scope [8] Choi, S. M., & Han, Y. S. (2010, September). A content
recommendation system based on category correlations. 2010
of training and testing data which have increased the Fifth international multi-conference on computing in the
efficiency of the results achieved. global information technology (pp. 66-70). IEEE.
The accuracy of our system is 80% more than any other [9] Son, J., & Kim, S. B. (2017). Content-based filtering for
system and we got a website that can be used in any social recommendation systems using multiattribute networks.
networking site to recommend movies to the users. Expert systems with applications, 89, 404-412.
[10] Konstan, J. A., & Riedl, J. (2012). Recommender systems:
from algorithms to user experience. User modeling and user-
adapted interaction, 22(1-2), 101-123.
VI. TERMS OF REFERNCE
[11] http://www.awesomestats.in/python-recommending-movies/
[12] https://ieeexplore.ieee.org/document/8058367
Future work will include recommending popular movies by [13] http://recommender-systems.org/content-based-filtering/
tracking movies that users have searched for in nearby [14] https://www.academyofdatascience.com/Blog_page/blog_3.ht
locations. By combining a user's search history with that of ml
a geographically contextual user (those who live nearby),
we can provide more location-related recommendations. In
addition, the use of user-rated movie ratings on websites
such as Rotten Tomatoes, Metacritic, IMDb, etc. allows us
to combine our method with a collaborative filtering method
into a hybrid model to get the most out of both approaches.

VII. CODE AND DOCUMENTATION

In this project the code and data can be acquired from the
below mentioned GitHub link
srmDB.in-Movie-Recommendation-System

ACKNOWLEDGMENT
We are grateful to Dr P Murali for his insightful and
constructive suggestions during the project's design
planning and development. It is much appreciated that he is
prepared to devote his time.

We also want to thank Professor R Lavanya, our project


guide, for her patient guidance, enthusiastic support, and
constructive critiques of our research. We’d also like to
express our gratitude to her for assisting us and keeping our
progress on track.

REFERENCES

[1] Zhang, Jiang, et al. "Personalized real-time movie


recommendation system: Practical prototype and evaluation."
Tsinghua Science and Technology 25.2 (2019): 180-191.
[2] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and
Gerhard Friedrich. Recommender systems: an introduction.
Cambridge University Press, 2010.
[3] Suvir Bhargav. Efficient features for movie recommendation
systems. 2014.
[4] Yibo Wang, Mingming Wang, and Wei Xu, A Sentiment-
Enhanced Hybrid Recommender System for Movie
Recommendation: A Big Data Analytics Framework, Hindawi
Wireless Communications and Mobile Computing Volume
2018, Article ID 8263704
[5] Rujhan Singla, Saamarth Gupta, Anirudh Gupta, Dinesh
Kumar Vishwakarma, FLEX: A Content Based Movie
Recommender, 978-1-7281-6221-8/20/$31.00 ©2020 IEEE.
[6] F. Furtado, A, Singh, Movie Recommendation System Using
Machine Learning, Int. J. Res. Ind. Eng. Vol. 9, No. 1 (2020)
84–98
[7] Jochen Nessel, Barbara Cimpa, The MovieOracle - Content
Based Movie Recommendations, 978-0-7695-4513-4/11
$26.00 © 2011 IEEE.

You might also like