Synopsis
Synopsis
IMDb REVIEW
PROJECT SYNOPSIS
OF MAJOR PROJECT
BACHELOR OF TECHNOLOGY
IN
SUBMITTED BY
Literature Review
Objectives
Methodology
References
Introduction to the project
Movies are the most convenient ways to the people for entertainment.
But only a few movies are successful and are rated high. There are
many ratings websites that will help the movie fanatics to decide
which movie they should watch and which they should not. Websites
like IMDB, Rotten tomatoes, etc. are the leading ones amongst those.
The rating on these websites determine the success of the movie by
giving it a score out of 10 based on the stars given by the viewers. But,
there isn’t any method that can provide the prediction based on movie
reviews. So, to determine the success of the movie based on reviews,
sentiment analysis comes into picture. Sentiment analysis is the
interpretation and classification of emotions within text data using text
analysis techniques. Sentiment analysis allows businesses to identify
customer sentiment toward products, brands or services in online
conversations and feedback. Sentiment analysis models focus on
polarity (positive, negative, neutral) but also on feelings and emotions
(angry, happy, sad, etc), and even on intentions (e.g. interested v. not
interested). Sentiment analysis has become a hot topic and many big
companies are investing their resources to predict the results for their
businesses. The working principle of sentiment analysis includes
tokenization, word filtering, stemming and classifications. In
tokenization, text needs to be segmented into units such as words/
numbers or punctuations. Next step stemming which is the process of
removing prefixes and affixes to convert a particular word into its
stem. After preprocessing, we analyze the dataset by performing
classification using Naïve Bayes, Support Vector Machine and Logistic
Regression. Here, we determine the best model based on accuracy.
Hence, We analyze and study the features that affect the scores of our
review text and finally classify the movie as positive or negative.
Huge textual data is available on sites like Amazon, IMDB, Rotten
Tomatoes on movies and analyzing such massive data manually is a
tedious task. So, to speed up the process, programmers use certain
techniques to extract out public opinion. One of which is using
sentiment analysis. Sentiment analysis is a submodule of opinion
mining where the analysis focuses on the extraction of text and
opinions of the people on a particular topic. We are making use of
IMDB reviews on movies to predict how the users have rated the
movies and predict the movies that have a positive or negative review.
We proposed a model that includes different sentiment analysis
methods which will help us to extract useful information from the data
and predict which is the most suitable classifier for this particular
domain by looking at accuracy. Models like Naïve Bayes, Support
Vector Machine and Logistic regressions. Due to the lack of strong
grammatical formats in movie reviews which is an informal jargon we
also take into account the N-Grams and count vectorizer approach.
Tokenization is used to transfer the input string into a word vector,
stemming is used for extracting the root of the words, while feature
selection fetches the essential word and lastly classification is used to
classify the movie as positive or negative.
Methodology
In our experiment we will made use of Naïve Bayes, Logistic
Regression, and Support Vector Machine. We will trained our model
on the above classifiers to predict the movie polarity as positive or
negative
Naïve Bayes: It is a classification algorithm, primarily used for text
classification involving high dimensional training data sets. Example
spam filtering, sentiment analysis etc. This algorithm learns the
probability of an object with certain features belonging to a particular
class.