IMDb Movie Review Sentiment Analysis
IMDb Movie Review Sentiment Analysis
IMDb Movie Review Sentiment Analysis
Presented By:-
Shehzada Alam
Sameer Pophali
Shreyas Wankhede
Sagar Bhutada
Shehnaaz Shareen
Agenda
• Problem Statement
• Proposed Solution
• Solution Details
• Model Building
• Challenges
• Data Analysis & Visualization.
• Conclusion
Problem Statement
• The Internet Movie Database (IMDb) is one of the world’s most popular sources for movie, TV and celebrity
content with more than 100 million unique visitors per month.
• IMDb has huge collection of movies database that includes various details of movies along with different ratings
and user reviews.
• This movie reviews affects everyone from audience, film critics to the production company.
• Idea of our project is to scarp the data from IMDb and form an analysis that will help data analyst or production
company to decide how they are going to proceed with making a new movie, second is to form a model to predict
what are the sentiments of movies based on user reviews.
• Established the Database Connection as we are storing this movie data in MySQL.
• While searching the movie detail if the entry is not present then fetch the detail from imdb through webscraping and
with API, insert the record in database and display result back to the user.
• For analysis extract the data from database into dataframe and visualize the data to get some insights.
• “Sentiment analysis is an important research area that identifies the people’s sentiments, opinions and emotions
underlying a text.”
• Sentiment Analysis based on User Reviews and created a new column(polarity) which includes this Labels (Positive and
Negative).
• We have used “Unsupervised lexicon based method”, which are dictionaries or vocabularies of polar words specially
constructed for sentiment classification task.
• The system uses VADER (Valence Aware Dictionary and sEntiment Reasoner) based lexicon method for sentiment
analysis that not only tells about the positive,negative, neutral and compound score between -1 to +1 but gives positive
or negative sentiment of reviews based on this score.
• To extract this sentiments we have computed the polarity of the given reviews (whether the text is positive or
negative).
Model Building
• In order to make sense to our machine learning algorithm we have converted each review to a numeric
representation which is called 'Vectorization'.
• The system uses TF-IDF Vectorizer (Term Frequency-Inverse document frequency) that transforms a count
matrix to a normalized frequency representation in float.
• Splitting the movie data into Train and Test set (80-20 ratio).
• SVM: The objective of a Linear SVC (Support Vector Classifier) is to fit the data, returning a best fit hyperplane
that divides or classifies the data.
Accuracy= 79.61%
Precision Recall f1-score support
Negative 0.77 0.82 0.79 371
Positive 0.83 0.77 0.80 404
Total 0.80 0.80 0.80 775
Django Framework
Frontend
Challenges
• Using omdb API we were not able to fetch Budget and User Reviews hence we have scrapped the data
from imdb website using imdb id.
• While fetching 'Budget' from imdb website the amount were present in different currency format so we
have converted the currencies in USD by using CurrencyConverter package.
• After getting polarity for all the user reviews, this reviews was needed to be converted in numerical
representation to fit the classification model.
• While rendering seaborn graphs on Django Framework we were getting several response errors and
were not able to display the plots.
Data Analysis and Visualization:
1) Top 10 Rated Movies:
2) Top 10 High Budget Movies:
Critic Vs Audience Rating:
Distribution of Critic or Audience Rating: