0% found this document useful (0 votes)
108 views48 pages

Sentiment Analysis Using Feature Selection and Machine Learning Algorithms

The document summarizes the key steps in a proposed methodology for sentiment analysis using feature selection and machine learning algorithms. The methodology involves preprocessing text data to remove stop words and stem words, selecting important features using chi-square scoring, and classifying the sentiment using a Naive Bayes classifier. The goal is to automatically predict the sentiment class of reviews to maximize the performance of the model.

Uploaded by

Shruti Pant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views48 pages

Sentiment Analysis Using Feature Selection and Machine Learning Algorithms

The document summarizes the key steps in a proposed methodology for sentiment analysis using feature selection and machine learning algorithms. The methodology involves preprocessing text data to remove stop words and stem words, selecting important features using chi-square scoring, and classifying the sentiment using a Naive Bayes classifier. The goal is to automatically predict the sentiment class of reviews to maximize the performance of the model.

Uploaded by

Shruti Pant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Sentiment Analysis using Feature Selection

and Machine Learning Algorithms

Presented By

Shruti Pant
Under Guidance Of:

Ms. Kalpana Jain : Major Advisor


Dr. Naveen Choudhary : Co-Advisor
Dr. Naveen Jain : Advisor
Dr. Chitranjan Agarwal : DRI Nominee
Contents:
1) Introduction of Sentiment Analysis
2) Literature Survey
3) Research Gap and Motivation
4) Objective of thesis
5) Design Issue
6) Design Flow
7) Experimental Results
8) List of Publication
9) Scope of Future Enhancement
10) References
11) Changes
Introduction to
Sentiment Analysis
What is Machine Learning?
Machine Learning
Study of algorithms that improve their performance at
some task with experience
Optimize a performance criterion using example
data or past experience.
Role of Statistics: Inference from a sample
Role of Computer science: Efficient algorithms to
Solve the optimization problem
Representing and evaluating the model for
inference
Types

Supervised Learning
Classification
Regression
Unsupervised Learning
Reinforcement Learning
What people think?
What others think has always been an important piece of information

Which car should I buy?

Which schools should I


apply to?

Which Professor to work for?

Whom should I vote for?


So whom shall I ask?
Pre Web
Friends and relatives
Acquaintances
Consumer Reports

Post Web
I dont know who..but apparently its a good phone. It has good battery life and
Blogs (google blogs, livejournal)
E-commerce sites (amazon, ebay)
Review sites (CNET, PC Magazine)
Discussion forums (forums.craigslist.org,
forums.macrumors.com)
Friends and Relatives (occasionally)
Basics Of Sentiments
Holder (source) of attitude
Target (aspect) of attitude
Type of attitude
From a set of types
Like, love, hate, value, desire, etc.
Or (more commonly) simple weighted
Polarity: positive or negative
Text containing the attitude
Sentence or entire document
Sentiment
A thought, view, or attitude, especially one based
mainly on emotion instead of reason

Sentiment Analysis
aka opinion mining
use of natural language processing (NLP)
and computational techniques to automate
the extraction or classification of
sentiment from typically unstructured text
Identify the orientation of opinion in a piece of text

The movie The movie


was fabulous! was horrible!

Approaches : Classifier Based


Lexicon Based
A. Sentence Level Classification
Assumption: a sentence contains only one
opinion
Task 1: identify if sentence is opinionated
classes: objective and subjective
Task 2: determine polarity of sentence
classes: positive and negative

Quiz:
This is a beautiful bracelet..
Is this sentence subjective/objective?
Is it positive or negative ?
B. Document(post/review) Level
Classification

Assumption:
each document focuses on a single object

contains opinion from a single opinion holder

Task: determine overall sentiment orientation


in document
classes: positive and negative
C. Feature Level Classification

Goal: produce a feature-based opinion summary of


multiple reviews

Task 1: Identify and extract object features that


have been commented on by an opinion holder (e.g.
picture,battery life).
Task 2: Determine polarity of opinions on features
classes: positive and negative
Task 3: Group feature synonyms
Need of Sentiment Analysis
Consumer information
Product reviews
Marketing
Consumer attitudes
Trends
Politics
Politicians want to know voters views
Voters want to know policitians stances and who
else supports them
Social
Find like-minded individuals or communities
Literature Review
[1] presented an unsupervised method on document level using
point wise mutual information to classify reviews are
recommended and not recommended. This algorithm achieved
74% accuracy.

[2] extended the work using PMI and Latent Semantic Analysis
(LSA) and achieved the accuracy of 82.2%.

[3] analysed movie review dataset on several supervised machine


learning algorithms (SVM, NB and MaxEnt) and different feature
selection techniques on a movie reviews dataset. He applied
various pre-processing techniques like stemming or lemmatization.
He used NB and MaxEnt with POS tagging to increase the
performance more than SVM.
[4] proposed an approach using IMDB movie database where it
labelled the document into objective and subjective to find
minimum s-t cut in graph to achieve the accuracy of 85%.

[5] used machine learning techniques for interlanguage (English,


Dutch and French) studies. Feature selection along with negation
unigrams and stemming is performed for relevant features and then
Multinomial nave Bayes, SVM and maximum entropy are
compared to get the overall performance.

[6] performed sentiment analysis using various feature selections


schemes like tf-idf and term occurrence and classifies the dataset
using SVM and Naive Bayes to show the performance
comparison.
Research Gap and
motivation
Sentimental analysis is a hot topic of research.
Use of electronic media is increasing day by
day.
Time is money or even more valuable than
money therefore instead of spending times in
reading and figuring out the positivity or
negativity of text we can use automated
techniques for sentimental analysis.
Sentiment analysis is used in opinion mining.
Example Analyzing a product based on its
reviews and comments.
Key is to find the best
classifier according to the
dataset used and space and
time available

Earlier work have used


Lexicon analysis to find the
sentiment of the word

Earlier works have used


different feature selection
techniques and even
classifier to build a automatic
model
Objective
A heuristic based on Nave Bayes classifier
is designed to automatically predict the class
of the incoming review in order to maximize
the performance of the model.
Design Issue
Supervised / Classification
unsupervised / Hybrid Algorithm

Feature Selection
Techniques Lexicon Or Machine
Based
Design Flow
Dataset (Phase 1)
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002.
Thumbs up? Sentiment Classification using Machine Learning
Techniques. EMNLP-2002, 7986.
Bo Pang and Lillian Lee. 2004. A Sentimental Education:
Sentiment Analysis Using Subjectivity Summarization Based
on Minimum Cuts. ACL, 271-278

Polarity detection:
Is an IMDB movie review positive or
negative?
Data: Polarity Data 2.0:
http://www.cs.cornell.edu/people/pabo/movi
e-review-data
IMDB data in the Pang and Lee
database
when _star wars_ came out some snake eyes is the most
twenty years ago , the image of aggravating kind of movie :
traveling throughout the stars has the kind that shows so much
become a commonplace image . potential then becomes
[] unbelievably disappointing .
when han solo goes light speed , its not just because this is a
the stars change to bright lines , brian depalma film , and since
going towards the viewer in lines hes a great director and one
that converge at an invisible point whos films are always greeted
cool . with at least some fanfare .
_october sky_ offers a much and its not even because this
simpler imagethat of a single was a film starring nicolas
white dot , traveling horizontally cage and since he gives a
across the night sky . [. . . ] brauvara performance , this
film is hardly worth his talents
Pre-Processing (Phase 2)
Pre-processing is done in our proposed methodology to remove
the words which impede our process of sentiment analysis by
increasing the number of false positives or false negative.
In our model stop words are removed using Tf-idf. Term
Frequency- Inverse Document Frequency is known to find the
important and no so important word in the document. NLTK
also comes with an in-built list of 128 stop words which is
also included in our model to select the not relevant words.
We have done this by importing stopwords from NLTK
corpus.
Stemming algorithms attempt to automatically remove
suffixes (and in some cases prefixes) in order to find the root
word or stem of a given word. NLTK provides several
stemmer interfaces. In our proposed method we have used
porter stemmer to find the root words.
Feature Selection (Phase 3)
Feature selection is used to increase the
effectiveness of the model. Features which are
important are selected and fed to the classifier.
In our proposed methodology we used chi square
as a scoring function with which we can find if
two terms are associated to each other
(collocation correlation of two words or words
that are more likely to occur together).
It helps us in understanding if a word is
informative or not. If a word mainly occurs in
positive review and rarely in negative reviews it
can main that the word is important. So we find
how common a word is in a particular class
compared to other classes.
Feature Selection (Phase 4)
In Machine learning, A nave Bayes
classifier is a family of simple,
baseline probabilistic classifier based
on Bayes theorem with strong but
nave independence assumptions.
Experimental Results
Accuracy
100
93
90 84.75
84
81.6
80 75.25 76.5

70

60 Accuracy

50

40

30

20

10

Figure 4.2: Accuracy comparison of chi square and


information gain applied on our proposed methodology
with G. Tripathi et al
Precision
100 93
90.4 90.15
86.17
83.6 82.63
90

80

70

60 Precision

50

40

30

20

10

Figure 4.3: Precision comparison of chi square and


information gain applied on our proposed methodology
with G. Tripathi et al
Recall

100 94
88
90 81.6 81
80

70
59.5
56.5
60 Recall

50

40

30

20

10

Figure 4.3: Recall comparison of chi square and


information gain applied on our proposed methodology
with G. Tripathi et al
F-MEASURE
100

90

80

70

60 F-MEASURE

50 93.49
83.96 83.5 83.23
40 69.53 71.68

30

20

10

Figure 4.4: F-measure comparison of chi square and


information gain applied on our proposed methodology
with G. Tripathi et al
List of Publications
Pant, S., & Jain, K. (2017). Sentiment Analysis
using Feature Selection and Classification
Algorithms A survey. IJIERT,4(3), 109-113.

Pant, S., & Jain, K. (2017). Sentiment Analysis


using Feature Selection and Classification
Algorithms. IJIERT,4(5), 5-11.
Scope of Future
Enhancement
We would like to extend this technique on other
domains of opinion mining likes newspaper
articles, product reviews, political discussion
forums etc. We would like to apply in-depth
concepts of NLP for improved prediction of the
polarity of the document.
We are planning to make automatic sentiment
classifier for more than one languages starting
from the Hindi language. As nowadays
multilingual messages are posted on social
websites, so we will able to predict the sentiment
for any language.
It is worth extending the research using hybrid
techniques for sentiment analysis.
References
1. Turney, P. D. 2002, July. Thumbs up or thumbs down?:
semantic orientation applied to unsupervised
classification of reviews. In Proceedings of the 40th
annual meeting on association for computational
linguistics pp. 417-424.
2. Turney, P. D., & Littman, M. L. 2003. Measuring praise
and criticism: Inference of semantic orientation from
association. ACM Transactions on Information Systems
(TOIS), 21 : 315-346.
3. Pang, B., Lee, L., & Vaithyanathan, S. 2002, July.
Thumbs up?: sentiment classification using machine
learning techniques. In Proceedings of the ACL-02
conference on Empirical methods in natural language
processing-Volume 10 pp. 79-86.
4. Pang, B., & Lee, L. 2004, July. A sentimental
education: Sentiment analysis using subjectivity
summarization based on minimum cuts. In Proceedings
of the 42nd annual meeting on Association for
Computational Linguistics pp. 271-275.
5. Boiy, E., & Moens, M. F. 2009. A machine learning
approach to sentiment analysis in multilingual Web
texts. Information retrieval, 12 : 526-558.
6. Tripathi, G., & Naganna, S. 2015. Feature selection and
classification approach for sentiment analysis. Machine
Learning and Applications: An International
Journal, 2 : 1-16.
Changes
1

4
5

8
9

10

11

12

13
Thank You

You might also like