Sentiment Analysis Classification For Rotten Tomatoes Phrases On Kaggle

This document discusses applying machine learning techniques to classify the sentiment of phrases from Rotten Tomatoes reviews. It begins by introducing the dataset and exploring the distribution of sentiment labels. Most phrases are neutral or positive, and about half are 5 words or less. The goal is to predict sentiment labels using features like unigram and bigram counts. Several models are considered, including a baseline majority class predictor, bag-of-words naive Bayes, and potentially regressing phrase length. The accuracy of different models will be evaluated on a test set.

Uploaded by

John Jihn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views4 pages

Sentiment Analysis Classification For Rotten Tomatoes Phrases On Kaggle

Uploaded by

John Jihn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Sentiment Analysis Classification for Rotten Tomatoes

Phrases on Kaggle
Kevin Hung
[email protected]

ABSTRACT Table 1. Sentiment Label Coding

In the second assignment for CSE 190: Data Mining and
Sentiment Label
Predictive Analytics, we apply some techniques to improve
the accuracy of classifying Rotten Tomatoes phrase Negative 0
sentiments. Somewhat Negative 1

General Terms Neutral 2

Algorithms, Experimentation Somewhat Positive 3
Positive 4
Keywords
Classification, Sentiment Analysis, Opinion Mining,
Naïve Bayes, Binned, Regression
3. Exploratory Analysis
1. INTRODUCTION Analyzing the prior distribution of sentiment labels is
Applying sentiment analysis on reviews based on text important in developing an intuition and obtaining
features is distinct from rating-scale inference problems like reasonable sense of what kind of predictions our models
predicting the rating value on a review (e.g. movies, should make as described in the later section.
restaurants, etc) because we can gain more details and
insight on the human component (e.g. opinions, emotions, Figure 1. Sentiment Label Distribution
feelings) than with numerical features. As Richard
Hamming once famously stated, “The purpose of
computing is insight, not numbers.” One of the main and
many applications of classifying the sentiment of Rotten
Tomatoes phrases through automation and machine
learning is to save the human effort of evaluating each
phrase manually.

2. DATASET
The original Rotten Tomatoes sentences were gathered as
described in Pang and Lee's (2005) [1] approach to
sentiment classification using metric learning, using 10,662
review snippets which were usually a sentence long. Then
Socher et al. from Stanford NLP refined the snippet data
into a more fine-grained form of pharsed phrases and used
Amazon Mechanical Turk to outsource the manual task of
interfacing and annotating the sentiments of the phrases
[2].
The sentiment labels appear to be very symmetric and
For the version of the data we obtained from the Kaggle
slightly peakier than the normal distribution. The most
website [3], a tab delimited file containing around 156,060
frequent label is neutral which is the clear baseline that our
training records with only the phrase's original sentence id
basic model should predict.
and the actual phrase as the features and the sentiment value
as the label. The distribution of the unigrams, bigrams and
trigrams comprised 42% of phrases in the training set, and
The second file for testing contains 66,292 records with
the following boxplot describes the lengths of the phrases
only the sentence id and the phrase values provided.
and shows 75% of them being 10 or less and the rest mainly
being 10-20 words in length:
Figure 2. Phrase Length Distribution labels divided by the total number of samples or the
distance between 1 and the Hamming Loss:

The main types of model we consider are those that deal

with classification since there are 5 possible discrete
categories, but we can also try a regression model since the
sentiment labels are ordinal and scaled. Then we round the
output of our regression to the closest integer.
A model that uses clustering could also be a possibility,
and it would have to aggregate the labels of the closest
neighboring training points. Unsupervised models however
The fact that nearly half of the phrases are less than 5 would not be appropriate in helping us predict the sentiment
tokens in length makes it reasonable for us to develop our class.
features based on unigram and bigram counts that we can
design models that take in a term frequency matrix as input.
4.1 Features
Figure 3. Phrase Length Grouped by Sentiment
The combination of features we will use is a subset of
the number of counts for each unigram/bigram and the
number of tokens in the phrase.
The only pre-processing of the feature will be the
lowercasing the text of the review. Also we leave in
punctuation since some of them have sentiment labels
assigned in the training set.
As described in our exploratory phrase, based on
figures 2 and 3, we assume unigram and bigram counts to
be reasonable features to represent as a term frequency or
term frequency – inverse document frequency matrix.
Finally given the difference between the variation in phrase
length (number of tokens) for each sentiment category, we
can use binning on a threshold like using the output of one
Finally the third figure is the most informative in that it
model to handle phrases with less than or equal to 5 tokens
shows 2-sentiment reviews having less variance and that
and the output of a second one for phrases containing more
there is a threshold at 5 tokens where we can design our
than 5 tokens.
model to use binning: using one model to tackle the case
where the phrase has less than or equal to 5 tokens and
5. Models
another model for more than 5 tokens. Also if we are using
regression models, phrase length can be a possible feature. 5.1 Baseline

4 Prediction Objective The most simplistic model we can use as a baseline to

compare our results with more complex models to predict
After performing a preliminary exploration of the the majority sentiment category, and the most frequently
data, the task/objective we are tackling is predicting the appearing sentiment value is neutral: 2
sentiment label given Phrases as features, using a
supervised model, either classification or regression, from
machine learning. And the score for the baseline model is 0.51789.
To evaluate our model, we submit a list of labels that
5.2 Bag-of-words Multinomial Naïve Bayes
our models output given the phrases from the training set as
input online to Kaggle. Then Kaggle will calculate the
categorization accuracy between 0 and 1, assuming that the The next model is also very simplistic in the naïve sense
competition either uses the number of correctly predicted of just counting the unigrams and representing it as a term-
document matrix. The multinomial variation of NB can be
described as:

where

where the prior probability is

6. Related Works and Literature

and the conditional probability is The Rotten Tomatoes phrases data obtained for this study
originated from Pang and Lee's work in which they describe
using item and label similarity for metric labeling/positive
sentence percentage and compare it with multi-class SVM
The accuracy obtained improves to 0.58681. (one-versus-all) and regression, and they found that
incorporating PSP helps improve average accuracies. Also
5.3 Linear Regression with Polar Words mentioned in the dataset section was Sochel et al.'s work in
creating a sentiment tree bank and labeling the nodes using
Neural Networks obtaining high accuracy of nearly 80%
The third model is an attempt to use linear regression much past the baseline, which qualifies it as state-of-the-art.
with unigrams appearing in the non-neutral training phrases: In the following section we will see that their results and
conclusions are far accurate than our findings.
Other related datasets the 50,000 IMDb positive and
So that our model is of the form: negative reviews like in described in Maas et al.'s work in
developing probabilistic model related to LDA that can
learn word vector representations and is able to capture
sentiment and semantics similarities[4].
The techniques and theories described in the cited
works were too advanced to incorporate into the models
The number of features is around 700, but the results of used in this study but features and models that were
the regression model sets the accuracy below the baseline mentioned in common included are Naïve Bayes, SVM tf-
with a score of 0.50952. idf matrix representations and similarity measures. The
following section on results and conclusions cover the use
5.4 Binned Multinomial Naïve Bayes of Naïve Bayes and tf-idf matrix representations and an
Given that we explored the length/number of tokens in attempt at calculating similar phrases.
phrases grouped by sentiment categories, we found that the
mass of neutral phrases had less than or equal to 5 tokens, 7 Results and Conclusions
so we decide to use that as a threshold for our binned The models we developed in this study do not perform
multinomial NB model. We trained two multinomial NB as well as the state-of-the-art or even close to the top scores
models based having more or no more than 5 tokens and of the Kaggle competition. Some of the Kagglers
predicted values correspondingly too. implemented their own RNN.

As a result our model score increased to 0.60457. The significant results and insight we gained in this
study is that Naïve Bayes again outperforms linear
regression in simplicity (i.e. no need to calculate the weight
5.5 Nearest Neighbor based on Cosine Similarity of vectors, just count the number of times each unigram
TF-IDF appears) and accuracy. And another significant result is that
the binning threshold discovered in the exploratory section
The model that used clustering of similar phrases can help increase accuracy by 2%.
based on TF-IDF features did not have adequate or
reasonable computation time, but the decision function
developed is listed below:
Table 3. Model Performance 8. Acknowledgements
A deep token of appreciation for all members of the Data
Model Score
Science community at UCSD and the Computer Science
0.60457 and Engineering Department for giving the opportunity to
Binned Multinomial Naïve Bayes offer a Data Mining course at an undergraduate level.

0.58681 9. REFERENCES
Multinomial Naïve Bayes [1] Pang, Bo, and Lillian Lee. "Seeing stars:
Exploiting class relationships for sentiment
0.51789 categorization with respect to rating scales."
Baseline Proceedings of the 43rd Annual Meeting on
Association for Computational Linguistics.
0.50952 Association for Computational Linguistics, 2005.
Linear Regression [2] Socher, Richard, et al. "Recursive deep models
for semantic compositionality over a sentiment
treebank." Proceedings of the conference on
The feature representation that worked well is the term- empirical methods in natural language processing
document matrix, unlike the best fitting line found by linear (EMNLP). Vol. 1631. 2013.
regression. An explanation as to why linear regression [3] https://www.kaggle.com/c/sentiment-analysis-on-
performed worse than the baseline is that there is a high movie-reviews/data
bias/misassumption that adding weights linearly based
[4] Maas, Andrew L., et al. "Learning word vectors for
feature words represents the sentiment accurately. Because
sentiment analysis."Proceedings of the 49th
of the misassumption and high inaccuracy, the
Annual Meeting of the Association for
interpretation of the parameters for linear regression can not
Computational Linguistics: Human Language
reliably represent the sentiment of the phrase.
Technologies-Volume 1. Association for
The models used in this study were not complex, and Computational Linguistics, 2011.
scaling was not an issue given the size of the training and
testing sets. If there were more time and resources to
conduct the study, then overfitting could be estimated using
cross-validation.

Team 10 Primer
No ratings yet
Team 10 Primer
12 pages
NeuralHack Stage 2 Python
100% (1)
NeuralHack Stage 2 Python
2 pages
Revision Questions 12
No ratings yet
Revision Questions 12
1 page
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
No ratings yet
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
55 pages
Physical Science - q4 - Slm13-Pages-Deleted
No ratings yet
Physical Science - q4 - Slm13-Pages-Deleted
5 pages
PES3701 Assignment 3
No ratings yet
PES3701 Assignment 3
3 pages
ICT 7 Learning Module
No ratings yet
ICT 7 Learning Module
77 pages
Planning For Estidama
No ratings yet
Planning For Estidama
34 pages
Mutations
No ratings yet
Mutations
48 pages
Text Representation: Lecture # 6
No ratings yet
Text Representation: Lecture # 6
21 pages
Comparison of Word Embedding Features Using Deep Learning in Sentiment Analysis
No ratings yet
Comparison of Word Embedding Features Using Deep Learning in Sentiment Analysis
10 pages
Literature Review On Iron and Steel Industry
100% (2)
Literature Review On Iron and Steel Industry
6 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Capstone Project Data Science
No ratings yet
Capstone Project Data Science
5 pages
Sentiment Analysis Using Bert Model
No ratings yet
Sentiment Analysis Using Bert Model
8 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
Chapter 8 Text Analytics
No ratings yet
Chapter 8 Text Analytics
42 pages
Movie Genre Prediction From Plot Summaries by Comparing Various Classification Algorithms
No ratings yet
Movie Genre Prediction From Plot Summaries by Comparing Various Classification Algorithms
3 pages
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
No ratings yet
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
12 pages
Sepam80 64REF Wiring 4wire Low-Voltage Transformer T81 v0
No ratings yet
Sepam80 64REF Wiring 4wire Low-Voltage Transformer T81 v0
2 pages
MADHU IEEE Updated 27 05 24
No ratings yet
MADHU IEEE Updated 27 05 24
5 pages
Install
No ratings yet
Install
3 pages
Toaz - Info Detailed Lesson Plan DLP For Demo Teaching Parallelism PR
No ratings yet
Toaz - Info Detailed Lesson Plan DLP For Demo Teaching Parallelism PR
3 pages
34.1.18 AOAC Official Method 948.14 Succinic Acid in Eggs
No ratings yet
34.1.18 AOAC Official Method 948.14 Succinic Acid in Eggs
2 pages
TRC P4P Proposal
No ratings yet
TRC P4P Proposal
48 pages
Prof K V Subbaraju
No ratings yet
Prof K V Subbaraju
26 pages
Researchpaper
No ratings yet
Researchpaper
9 pages
Detailed Report
No ratings yet
Detailed Report
6 pages
05 - Feature Engineering (Text)
No ratings yet
05 - Feature Engineering (Text)
28 pages
Testing MCQ
No ratings yet
Testing MCQ
59 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
THE RESEARCH PROCESS Ed 201 3
No ratings yet
THE RESEARCH PROCESS Ed 201 3
25 pages
Rectus Tema
No ratings yet
Rectus Tema
486 pages
O'Neal
No ratings yet
O'Neal
7 pages
Top Machine Learning Informations About Different Algorithms
No ratings yet
Top Machine Learning Informations About Different Algorithms
63 pages
A2mot En5
100% (1)
A2mot En5
5 pages
RES Presentation
No ratings yet
RES Presentation
21 pages
Classification
No ratings yet
Classification
81 pages
Machine Learning With Python - Unit-5
No ratings yet
Machine Learning With Python - Unit-5
26 pages
Synopsis
No ratings yet
Synopsis
10 pages
Addressing Sentiment Analysis Challenges
No ratings yet
Addressing Sentiment Analysis Challenges
8 pages
Nistgcr10 917 8 PDF
No ratings yet
Nistgcr10 917 8 PDF
268 pages
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
No ratings yet
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
2 pages
Document Classification Using Distributed Machine Learning
No ratings yet
Document Classification Using Distributed Machine Learning
4 pages
1 s2.0 S187705091630463X Main
No ratings yet
1 s2.0 S187705091630463X Main
6 pages
Reference Photo:: 9-7/8 In. (250.8mm) QD503X
No ratings yet
Reference Photo:: 9-7/8 In. (250.8mm) QD503X
1 page
Lecture5 421
No ratings yet
Lecture5 421
115 pages
Guidance Mandatory Competence Attainment Report (v7) Final 04072012
No ratings yet
Guidance Mandatory Competence Attainment Report (v7) Final 04072012
8 pages
Apply Word Vectors For Sentiment Analysis of APP Reviews
No ratings yet
Apply Word Vectors For Sentiment Analysis of APP Reviews
5 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
Practical 9 - Text Mining
No ratings yet
Practical 9 - Text Mining
22 pages
ZamoshchinSegall PredictingRedditPostPopularity
No ratings yet
ZamoshchinSegall PredictingRedditPostPopularity
5 pages
Sentiment Analysis Using Machine Learning Classifiers
No ratings yet
Sentiment Analysis Using Machine Learning Classifiers
41 pages
Classifier Series - Naive Bayes Sentiment Analysis
No ratings yet
Classifier Series - Naive Bayes Sentiment Analysis
10 pages
Black Spot Study and Accident Prediction Model Using Multiple Liner Regression PDF
No ratings yet
Black Spot Study and Accident Prediction Model Using Multiple Liner Regression PDF
16 pages
Sentiment Analysis Presentationnotes
No ratings yet
Sentiment Analysis Presentationnotes
4 pages
2012 Liviu P. Dinu, Iulia Iuga, 2012. The Naive Bayes Classifier in Opinion Mining - in Search of The Best Feature
No ratings yet
2012 Liviu P. Dinu, Iulia Iuga, 2012. The Naive Bayes Classifier in Opinion Mining - in Search of The Best Feature
12 pages
Admnadvt
No ratings yet
Admnadvt
2 pages
Displacement and Acceleration C Programming
No ratings yet
Displacement and Acceleration C Programming
11 pages
NLP Asgn3
No ratings yet
NLP Asgn3
6 pages
Notes On Anova: Dr. Mcintyre Mcdaniel College Revised: August 2005
No ratings yet
Notes On Anova: Dr. Mcintyre Mcdaniel College Revised: August 2005
10 pages
MBA Marketing Research Project Guidelines
No ratings yet
MBA Marketing Research Project Guidelines
7 pages
Maneesha Nidigonda Verzeo Major Project
No ratings yet
Maneesha Nidigonda Verzeo Major Project
11 pages
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
No ratings yet
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
4 pages
Sentiment Analysis of Rotten Tomatoes For Box Office Revenue Prediction
No ratings yet
Sentiment Analysis of Rotten Tomatoes For Box Office Revenue Prediction
6 pages
Analyzing Variations of Opinions On Twitter: R. Nisha Pauline
No ratings yet
Analyzing Variations of Opinions On Twitter: R. Nisha Pauline
5 pages
Book Recommendation System
No ratings yet
Book Recommendation System
12 pages
Parikh IdentifyTagsFromMillionsOfTextQuestion PDF
No ratings yet
Parikh IdentifyTagsFromMillionsOfTextQuestion PDF
5 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
228 International Conference On Engineering Technologies (ICENTE'17)
No ratings yet
228 International Conference On Engineering Technologies (ICENTE'17)
3 pages
Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report
No ratings yet
Sentiment Analysis On Movie Reviews: Natural Language Processing UML602 Project Report
13 pages
Twiiter Sentiment Analysis
No ratings yet
Twiiter Sentiment Analysis
15 pages
Sentiment Analysis in Twitter: Rohit Kumar Jha (11615) Sakaar Khurana (10627)
No ratings yet
Sentiment Analysis in Twitter: Rohit Kumar Jha (11615) Sakaar Khurana (10627)
9 pages
Sentiment Analysis: A Baseline Algorithm
No ratings yet
Sentiment Analysis: A Baseline Algorithm
13 pages
GSoC 2017 Proposal - Rajat Arora
No ratings yet
GSoC 2017 Proposal - Rajat Arora
9 pages
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
No ratings yet
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Reviews
4 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Compusoft, 3 (6), 999-1001 PDF
No ratings yet
Compusoft, 3 (6), 999-1001 PDF
3 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
5 pages
Book Reviews Sent
No ratings yet
Book Reviews Sent
8 pages
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
No ratings yet
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
5 pages
Robert Chan, Michael Wang, Multiclass Sentiment Analysis of Movie Reviews
No ratings yet
Robert Chan, Michael Wang, Multiclass Sentiment Analysis of Movie Reviews
5 pages
Sentiment Classification of Reviews Using Sentiwordnet: 9Th. It & T Conference
No ratings yet
Sentiment Classification of Reviews Using Sentiwordnet: 9Th. It & T Conference
10 pages
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Less Is More: Selecting Informative Unigrams For Sentiment Classification
No ratings yet
Less Is More: Selecting Informative Unigrams For Sentiment Classification
10 pages
BagOfopinionColing10 Camera-Ready Final
No ratings yet
BagOfopinionColing10 Camera-Ready Final
9 pages
Cs221 Report
No ratings yet
Cs221 Report
16 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
17 pages
An Ontology-Based Sentiment Classification Methodology For Online Consumer Reviews
100% (2)
An Ontology-Based Sentiment Classification Methodology For Online Consumer Reviews
7 pages