0% found this document useful (0 votes)
102 views1 page

Sentiment Analysis Poster

Uploaded by

Ghada Amakrane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views1 page

Sentiment Analysis Poster

Uploaded by

Ghada Amakrane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Sentiment Analysis

Feature addition and Accuracy improvement


Prof. Brian Reese, Akshina Banerjee

University of Minnesota – Twin Cities, College of Liberal Arts, Institute of Linguistics

Introduction Research Question Results and Conclusion


v Definition: v Research Question: v Significance testing:
Sentiment analysis refers to the use of natural language processing, text analysis and computational The previous illustration revealed the following shortcoming of the BoW model:
BoW vs Dependency model
linguistics to identify and extract subjective information in source materials. In simpler terms, it is a tool Information on dependency is excluded. Dependency refers to how a word modifies another Accuray rates
built in computer software to determine the opinion of the writer of a piece of text. Such texts generally
and contributes to a meaning shift of the modified word. 0.9
either bear a positive, negative or neutral mood and that is what sentiment analysis seeks to find.
Example: ‘never’ modifies ‘failed’ to mean situations of success or neither success nor 0.89
For example, failure.
1. “The value of X company’s shares skyrocketed” – Positive Sentiment Thus, this research seeks to ADD the feature of dependencies to the traditional method of 0.88
2. “This movie is the worst that I have seen in years” – Negative Sentiment text classification to check whether accuracy rates of sentiment analysis improves. 0.87
3. “The product was not the best but it suited well with some of my requirements.” – Neutral So as to not exclude any lexical item, all the words in a given text will be extracted, along with
Sentiment each dependency. 0.86

Illustration: 0.85
v Text Classification:
0.84

0.83

0.82
25-75% 20-80% 10-90%
Dependency model (% accuracy) Bag of Words model (% accuracy)

Methodology • The p-value for the difference in the accuracy rates under all the three splits is less than 0.00001.
• Since 𝛼= 0.05 was chosen, any p-value below 0.05 indicates that the results are significant.

• Data used: IMDB movie data set v 10-fold cross validation :

• Total number of movie reviews: 42929 Dependency Model BoW Model

• Steps taken to add feature and calculate accuracy score: Cross Validation: Scores: Cross Validation: Scores:
1 0.877 1 0.843
v Traditional Method of Text Representation:
v Step 1: 2 0.842 2 0.849
Bag of Words (BoW) approach Dependency 3 0.836 3 0.852
Movie review with sentiment Movie review representation:
Parser dependency parsed
4 0.850 4 0.853
• What it is : A set of words that is chosen before the text classification. The selection can be 5 0.857 5 0.849
made in multiple ways, e.g. it could be the n most frequent words in the entire training corpus. Example: Dependent: ‘never’; Governor: ‘failed’ 6 0.854 6 0.844
v Step 2: 7 0.839 7 0.858
Feature 8 0.838 8 0.844
• How it is used : The words from the text are matched to the existing words (and the sentiments Movie review representation: Formation of dependency
9 0.860 9 0.848
that they denote) in BoW and then the classifier gives a prediction of the sentiment. dependency parsed Extraction pairs
10 0.828 10 0.845
Example: dependent + governor : ‘never + failed’
• Why it is used : Since this approach does not involve the employment of any linguistic structure, • The observations under the cross validation are puzzling because the deviations from the mean accuracy
v Step 3: score are high.
it is simple and this simplicity makes BoW popular
• The same cross validation for the BoW model has very small deviations in accuracy scores, if any.
Movie review representation: Scikit
Dependency pairs, individual Naïve Bayes Classifier v Conclusion:
• Example : Classification of ‘The movie was great’ (extraction of nouns and adjectives): Words Learn
• Adding the feature of dependencies significantly improved the accuracy rates.
BoW = {movie, film, great, horrible, tedious} Text representation = {movie:1, film:0, great:1, • Further research should look into:
v Step 4: • Why the cross validation is showing an anomalous behavior in case of the dependency
horrible:0, tedious:0}. This information is passed on to the training algorithm which will be trained to Test-Train model.
associate individual features (e.g great:1) with sentiment labels – in this case “positive” Classifier Accuracy rates • Tokenizing the corpus for html tags and re-running the experiment.
Split • The most informative features to see how the dependency pairs are classified.

v Criticism for BoW :


Data / Observations Acknowledgments and Selected References
No linguistic structure is considered while classification because two main assumptions of BoW are – (a)
Acknowledgments:
Word order/ word position does NOT matter, (b) Lexical category of words (nouns vs verbs vs adjectives
• I would like to thank my mentor Professor Brian Reese for immense support and active involvement throughout the research project.
etc) does NOT matter.
• I would like to thank a fellow student and friend Aaron Free for his massive contribution towards the computational part of the project.
Train-Test split Dependency model (accuracy) • Last but not the least, I would like to thank Undergraduate Research Opportunity Program (UROP) for funding this project.

v Illustration of Problem: Selected References:


25-75% 0.887 • Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval 2.1-2 (2008): 1-
Classification of ‘The company never failed’ (extraction of nouns and verbs):
135.
Let BoW = { company, failed } Text representation = { company: 1, failed: 1, succeded: 0 } 20-80% 0.886 • Maas, Andrew L., et al. "Learning word vectors for sentiment analysis." Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.
Sentiment Label = “Negative” WRONG SENTIMENT LABEL!
10-90% 0.884

You might also like