Sentiment Analysis IMDB Review - Presentation

Uploaded by

varun190104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views19 pages

Sentiment Analysis IMDB Review - Presentation

Uploaded by

varun190104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Sentiment Analysis

of IMDB Movie
Reviews

BINGHAMTON UNIVERSITY
STATE UNIVERSITY OF NEW YORK
Team Members -
SPRING 2020
CS-580L-01

- Chaitanya Kulkarni
B-Num: B00814455
SUNY Binghamton
[email protected]

- Abhimanyu Singh
B-Num: B00813542
SUNY Binghamton
[email protected]

- Necati Anil Ayan

B-Num: B00777933
SUNY Binghamton
[email protected]
1. Motivation
- Most convenient source of entertainment
- Often confused about whether to watch that
particular movie or not
- We check websites for ratings and reviews
- Most of them show rating based on the stars
given
- But no method to know about the success of
the movies based on the reviews and
comments
2. Introduction
- To determine the success or failure based on
the reviews, we analyze the text
- Interpret and classify the emotions within
textual data
- Focuses on polarity - Positive reviews or
Negative reviews
- We used 3 classiﬁers -
● Logistic Regression
● Naive Bayes
● Support Vector Machine (SVM)
3. Dataset Info
- The dataset can be found at -
Large Movie Review Dataset v1.0

- Dataset provided by Stanford

- 50,000 movie reviews from IMDB website
- 25K for training data & 25K for testing data
4. Data Preprocessing
- Removing HTML tags
- Remove special characters and stopwords
- Lemmatization
- Tokenization
5. Feature Extraction
- Is used to convert feature (words in this case)
into some form.
Bag of Words Approach
● Term Frequency
● CountVectorizer with scikit-learn
6. Classiﬁcation Models
- Classiﬁers -
● Logistic Regression
● Naive Bayes
● Support Vector Machine (SVM)
A. Logistic Regression
- Discriminative Model
- Logistic Function (Mostly Sigmoid)
- Pretty good for binary output model (positive or
negative)
- Not good for non-linear solutions
B. Naive Bayes
- Generative Model
- Assumes all features are conditionally
independent
- Based on Bayes Theorem

- May be absurd for real-life cases since

predictors are usually dependent.
C. Support Vector Machine (SVM)
- Objective is to define hyperplane in
N-dimensional space to classify data points.
- Goal is the maximizing margin (between two
data points)
- Also good for non-linear solutions (Kernel trick)
- Very suitable for text classification
7. Result Evaluation
● ACCURACY
● A result evaluation measure
● It is the ratio of correctly predicted
observations to the total number of
observations
● Higher the accuracy better the
model
7. Result Evaluation(cont.)
● LOGISTIC REGRESSION
● Accuracy: 86.89%
● SUPPORT VECTOR MACHINE
● Accuracy: 85.29%
● NAIVE BAYES
● Accuracy: 85.48%
8. Conclusion
- Highest accuracy obtained on Logistic
Regression model.
- LR performs better than Naive Bayes
- LR outperforms the SVM, which is known to be
the best choice for textual data.
9. Our Learning
● Before handling the data we need to clean the data. This is called data
preprocessing. Necessary to clear missing, noisy and inconsistent data which
might affect accuracy.
● Semintent Analysis is extraction of the emotions from within the textual data. It is a
technique based upon Natural Language Processing.
● Logistic regression and support vector machines are closely linked. Both can be
viewed as taking a probabilistic model and minimizing some cost associated with
misclassification based on the likelihood ratio.
● Naive Bayes classifier is the generative model. Naive Bayes also assumes that the
features are conditionally independent.
● Both Naive Bayes and Logistic regression are linear classifiers, Logistic Regression
makes a prediction for the probability using a direct functional form whereas
Naive Bayes figures out how the data was generated given the results.
● Thus, when the training size reaches infinity then discriminative model logistic
regression performs better than the generative model Naive Bayes.
● Hence in our project we get more accuracy for Logistic Regression than the other
two classifiers.

- Chaitanya Kulkarni
9. Our Learning
● Understood that main difficulty is to make dataset ready for training (Data
Preprocessing).
● Understood the differences between generative and discriminative
models.
● When dataset size is very large like infinite discriminative models are
preferable. However, Generative models can reach its asymptotic faster
since it needs fewer training set to do that.
● As known that SVM is the one the best classifiers. However, it performs
poorly in our dataset. Main reason of this may be we have ignored
hyperparameter tuning. If we consider hyperparameter tuning, accuracies
for SVM and LR may increase but Naive Bayes accuracy may remain stable.

- Necati A Ayan
9. Our Learning
● Got to learn that, constructing and training a model is only a small part of a
machine learning project, major task lies in Data Pre-processing
● Learned various ways to clean the textual data and why is it necessary, to
make it suitable to feed to our learning model.
● Learned to apply what was taught in class in our project, like which models
to choose for a specific task, which in our case was text classification.
● Learned how to improve the accuracies of the models by employing
various optimization techniques like hyperparameter tuning
● Learned the actual difference between probabilistic and Binary Classifiers
● Got to know that, though some models are considered better for some
tasks, but other models can outperform them in some data-sets, which
happened in our case.

- Abhimanyu Singh
10. References
[1]. MaisYasen, Sara Tedmori. “Movies Reviews Sentiment Analysis and
Classification”. IEEE Jordon International Joint Conference on Electrical
Engineering and Information Technology (JEEIT). 978-1-5386-7942-5.
[2]. Tirath Prasad Sahu, Sanjeev Ahuja. “Sentiment Analysis of movie reviews: A
study on feature selection and classification algorithms”. International Conference on
Microelectronics, Computing, and Communication (MicroCom).978-1-4673-6621-2.
[3]. Wijayanto, Unggul and Sarno, Ritanarto. “An Experimental Study of Supervised
Sentiment Analysis Using Gaussian Naïve Bayes”.
476-481.10.1109/ISEMANTIC.2018.8549788.
[4]. Tejaswini M. Untawale, G. Choudhari. “Implementation of Sentiment
Classification of Movie Reviews by Supervised Machine Learning Approaches”.
978-1-5386-7808-4.
Thank You

Zhi-Hua Zhou (Auth.) - Machine Learning (2021, Springer) (10.1007 - 978-981!15!1967-3) - Libgen - Li
100% (1)
Zhi-Hua Zhou (Auth.) - Machine Learning (2021, Springer) (10.1007 - 978-981!15!1967-3) - Libgen - Li
460 pages
NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
Sentiment Analysis Using Naïve Bayes Classifier
No ratings yet
Sentiment Analysis Using Naïve Bayes Classifier
23 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
0% (1)
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
22 pages
Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report
No ratings yet
Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report
13 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
No ratings yet
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
27 pages
Sentiment Analysis Using Feature Selection and Machine Learning Algorithms
No ratings yet
Sentiment Analysis Using Feature Selection and Machine Learning Algorithms
48 pages
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
No ratings yet
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
4 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Internship Presentation
No ratings yet
Internship Presentation
16 pages
Sentiment Analysis
100% (1)
Sentiment Analysis
19 pages
Document Classification Using Machine Learning: What Is Document Classifier?
No ratings yet
Document Classification Using Machine Learning: What Is Document Classifier?
9 pages
Sentimental Analysis Final Year Project
No ratings yet
Sentimental Analysis Final Year Project
21 pages
SentA Russir Day2
No ratings yet
SentA Russir Day2
33 pages
Malignant Comments Classifier Project
No ratings yet
Malignant Comments Classifier Project
30 pages
Sentiment Analysis: A NLP And: 2. Detailed Approach
No ratings yet
Sentiment Analysis: A NLP And: 2. Detailed Approach
6 pages
A Comparative Study On Linear Classifier PDF
No ratings yet
A Comparative Study On Linear Classifier PDF
3 pages
Cs221 Report
No ratings yet
Cs221 Report
16 pages
Sentiment Analysis of Talaash Movie Reviews Using Text Mining Approach
No ratings yet
Sentiment Analysis of Talaash Movie Reviews Using Text Mining Approach
9 pages
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
No ratings yet
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
4 pages
Synopsis
No ratings yet
Synopsis
8 pages
Sentiment Analysis Using Machine Learning Classifiers
No ratings yet
Sentiment Analysis Using Machine Learning Classifiers
41 pages
Twitter Analysis
No ratings yet
Twitter Analysis
8 pages
Sentiment Analysis To Measure The Users Opinion by Using Machine Learning Techniques
No ratings yet
Sentiment Analysis To Measure The Users Opinion by Using Machine Learning Techniques
15 pages
Cronbach's Alpha PDF
No ratings yet
Cronbach's Alpha PDF
2 pages
DL Project
No ratings yet
DL Project
21 pages
Quantitative Methods FINAL QUIZ 1 - Attempt Review
No ratings yet
Quantitative Methods FINAL QUIZ 1 - Attempt Review
2 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Sentiment Analysis of A Product Based On User Reviews Using Random Forests Algorithm
No ratings yet
Sentiment Analysis of A Product Based On User Reviews Using Random Forests Algorithm
5 pages
49098-Article Text-137754-1-10-20210814
No ratings yet
49098-Article Text-137754-1-10-20210814
8 pages
MLRD 2
No ratings yet
MLRD 2
15 pages
Introduction To Analytics - BBA 2020 - CO
No ratings yet
Introduction To Analytics - BBA 2020 - CO
13 pages
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
6 pages
Analyzing Sentiment Using IMDb Dataset
No ratings yet
Analyzing Sentiment Using IMDb Dataset
4 pages
Lab Report - CSE 816
No ratings yet
Lab Report - CSE 816
17 pages
Group 4 MovieReview
No ratings yet
Group 4 MovieReview
10 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
Arima Model
No ratings yet
Arima Model
19 pages
Probit and Logit Models Stata Program and Output PDF
No ratings yet
Probit and Logit Models Stata Program and Output PDF
10 pages
EViews Introduction
No ratings yet
EViews Introduction
31 pages
Advanced Topic Data Mining
No ratings yet
Advanced Topic Data Mining
40 pages
Final Presentation
No ratings yet
Final Presentation
18 pages
01-Simple Regression
No ratings yet
01-Simple Regression
13 pages
Ejercicois Eviews
100% (1)
Ejercicois Eviews
10 pages
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
No ratings yet
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
2 pages
IR - Group1
No ratings yet
IR - Group1
27 pages
ML Project Report
No ratings yet
ML Project Report
26 pages
2.5 Modeling Real-World Data
No ratings yet
2.5 Modeling Real-World Data
17 pages
Research Paper Text Classification
No ratings yet
Research Paper Text Classification
17 pages
Sample Problem: Doing Linear Regression On These Set of Data
No ratings yet
Sample Problem: Doing Linear Regression On These Set of Data
2 pages
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
No ratings yet
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
22 pages
Data Analytics Courses in Pune
No ratings yet
Data Analytics Courses in Pune
25 pages
Chapter 2 SOLVING NONLINEAR EQUATION 3
No ratings yet
Chapter 2 SOLVING NONLINEAR EQUATION 3
14 pages
Spss Trainingboek Advanced Statistics and Datamining
No ratings yet
Spss Trainingboek Advanced Statistics and Datamining
204 pages
JOU Classification of Sentiment Reviews Using N-Gram Machine Learning
No ratings yet
JOU Classification of Sentiment Reviews Using N-Gram Machine Learning
10 pages
Module4 TextAnalytics
No ratings yet
Module4 TextAnalytics
9 pages
Econometrics Pset #1
No ratings yet
Econometrics Pset #1
5 pages
NLP Project (Documentation)
No ratings yet
NLP Project (Documentation)
8 pages
Major Project Final TABLE DIAGRAM
No ratings yet
Major Project Final TABLE DIAGRAM
28 pages
MADHU-IEEE Update
No ratings yet
MADHU-IEEE Update
5 pages
DR S.K-IEEE-updated-29-07-24
No ratings yet
DR S.K-IEEE-updated-29-07-24
5 pages
Document From Atharva
No ratings yet
Document From Atharva
8 pages
MADHU IEEE Updated 28 07 24
No ratings yet
MADHU IEEE Updated 28 07 24
5 pages
MADHU IEEE Updated 27 05 24
No ratings yet
MADHU IEEE Updated 27 05 24
5 pages
Sample Questions
No ratings yet
Sample Questions
5 pages
InOpe - 1 - Forecasting
No ratings yet
InOpe - 1 - Forecasting
2 pages
CHAPTER 7 Measures of Relationship
No ratings yet
CHAPTER 7 Measures of Relationship
6 pages
Sentiment Analysis of Imdb Movie Reviews: A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of Imdb Movie Reviews: A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
7 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
Econometrics Odar N. Gujarati: Chapter # 2: Two-Variable Regression Analysis: Some Basic Ideas
No ratings yet
Econometrics Odar N. Gujarati: Chapter # 2: Two-Variable Regression Analysis: Some Basic Ideas
11 pages
ML Notes
No ratings yet
ML Notes
16 pages
Sachida Paudel
No ratings yet
Sachida Paudel
15 pages
MLL
No ratings yet
MLL
2 pages
Trắc nghiệm KTL C11 15
No ratings yet
Trắc nghiệm KTL C11 15
27 pages
21csc305p Machine Learning Unit 5
No ratings yet
21csc305p Machine Learning Unit 5
61 pages
Sentiment Analysis Detailed IMRaD
No ratings yet
Sentiment Analysis Detailed IMRaD
3 pages
ML
No ratings yet
ML
3 pages
Naive Bayes - Text Classification and Sentiment
No ratings yet
Naive Bayes - Text Classification and Sentiment
19 pages
Classification of Movie Reviews Using Complemented Naive Bayesian Classifier
No ratings yet
Classification of Movie Reviews Using Complemented Naive Bayesian Classifier
7 pages
Ece 208: Ai/Ml For Electronics Engineers
No ratings yet
Ece 208: Ai/Ml For Electronics Engineers
331 pages
Sentiment Analysis of Movie Reviews Using Machine Learning: Members
No ratings yet
Sentiment Analysis of Movie Reviews Using Machine Learning: Members
17 pages
Heart Disease
No ratings yet
Heart Disease
6 pages
ITD253 L6 TextClassificationClustering
No ratings yet
ITD253 L6 TextClassificationClustering
39 pages
DL Notes B Div
No ratings yet
DL Notes B Div
13 pages
1.1.1 - 5 LESSON PLAN Fdsa
No ratings yet
1.1.1 - 5 LESSON PLAN Fdsa
7 pages
PDS - Proj - Report-2 RISHI B VATSAL P ANISHA M
No ratings yet
PDS - Proj - Report-2 RISHI B VATSAL P ANISHA M
49 pages
Lec # 9
No ratings yet
Lec # 9
18 pages
Learning Check 12.4: Pearson's Correlation Coefficient
No ratings yet
Learning Check 12.4: Pearson's Correlation Coefficient
7 pages

Sentiment Analysis IMDB Review - Presentation

Uploaded by

Sentiment Analysis IMDB Review - Presentation

Uploaded by

Sentiment Analysis

- Necati Anil Ayan

- Dataset provided by Stanford

- May be absurd for real-life cases since

You might also like