0% found this document useful (0 votes)

108 views48 pages

Sentiment Analysis Using Feature Selection and Machine Learning Algorithms

The document summarizes the key steps in a proposed methodology for sentiment analysis using feature selection and machine learning algorithms. The methodology involves preprocessing text data to remove stop words and stem words, selecting important features using chi-square scoring, and classifying the sentiment using a Naive Bayes classifier. The goal is to automatically predict the sentiment class of reviews to maximize the performance of the model.

Uploaded by

Shruti Pant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views48 pages

Sentiment Analysis Using Feature Selection and Machine Learning Algorithms

Uploaded by

Shruti Pant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Sentiment Analysis using Feature Selection

and Machine Learning Algorithms

Presented By

Shruti Pant
Under Guidance Of:

Ms. Kalpana Jain : Major Advisor

Dr. Naveen Choudhary : Co-Advisor
Dr. Naveen Jain : Advisor
Dr. Chitranjan Agarwal : DRI Nominee
Contents:
1) Introduction of Sentiment Analysis
2) Literature Survey
3) Research Gap and Motivation
4) Objective of thesis
5) Design Issue
6) Design Flow
7) Experimental Results
8) List of Publication
9) Scope of Future Enhancement
10) References
11) Changes
Introduction to
Sentiment Analysis
What is Machine Learning?
Machine Learning
Study of algorithms that improve their performance at
some task with experience
Optimize a performance criterion using example
data or past experience.
Role of Statistics: Inference from a sample
Role of Computer science: Efficient algorithms to
Solve the optimization problem
Representing and evaluating the model for
inference
Types

Supervised Learning
Classification
Regression
Unsupervised Learning
Reinforcement Learning
What people think?
What others think has always been an important piece of information

Which car should I buy?

Which schools should I

apply to?

Which Professor to work for?

Whom should I vote for?

So whom shall I ask?
Pre Web
Friends and relatives
Acquaintances
Consumer Reports

Post Web
I dont know who..but apparently its a good phone. It has good battery life and
Blogs (google blogs, livejournal)
E-commerce sites (amazon, ebay)
Review sites (CNET, PC Magazine)
Discussion forums (forums.craigslist.org,
forums.macrumors.com)
Friends and Relatives (occasionally)
Basics Of Sentiments
Holder (source) of attitude
Target (aspect) of attitude
Type of attitude
From a set of types
Like, love, hate, value, desire, etc.
Or (more commonly) simple weighted
Polarity: positive or negative
Text containing the attitude
Sentence or entire document
Sentiment
A thought, view, or attitude, especially one based
mainly on emotion instead of reason

Sentiment Analysis
aka opinion mining
use of natural language processing (NLP)
and computational techniques to automate
the extraction or classification of
sentiment from typically unstructured text
Identify the orientation of opinion in a piece of text

The movie The movie

was fabulous! was horrible!

Approaches : Classifier Based

Lexicon Based
A. Sentence Level Classification
Assumption: a sentence contains only one
opinion
Task 1: identify if sentence is opinionated
classes: objective and subjective
Task 2: determine polarity of sentence
classes: positive and negative

Quiz:
This is a beautiful bracelet..
Is this sentence subjective/objective?
Is it positive or negative ?
B. Document(post/review) Level
Classification

Assumption:
each document focuses on a single object

contains opinion from a single opinion holder

Task: determine overall sentiment orientation

in document
classes: positive and negative
C. Feature Level Classification

Goal: produce a feature-based opinion summary of

multiple reviews

Task 1: Identify and extract object features that

have been commented on by an opinion holder (e.g.
picture,battery life).
Task 2: Determine polarity of opinions on features
classes: positive and negative
Task 3: Group feature synonyms
Need of Sentiment Analysis
Consumer information
Product reviews
Marketing
Consumer attitudes
Trends
Politics
Politicians want to know voters views
Voters want to know policitians stances and who
else supports them
Social
Find like-minded individuals or communities
Literature Review
[1] presented an unsupervised method on document level using
point wise mutual information to classify reviews are
recommended and not recommended. This algorithm achieved
74% accuracy.

[2] extended the work using PMI and Latent Semantic Analysis
(LSA) and achieved the accuracy of 82.2%.

[3] analysed movie review dataset on several supervised machine

learning algorithms (SVM, NB and MaxEnt) and different feature
selection techniques on a movie reviews dataset. He applied
various pre-processing techniques like stemming or lemmatization.
He used NB and MaxEnt with POS tagging to increase the
performance more than SVM.
[4] proposed an approach using IMDB movie database where it
labelled the document into objective and subjective to find
minimum s-t cut in graph to achieve the accuracy of 85%.

[5] used machine learning techniques for interlanguage (English,

Dutch and French) studies. Feature selection along with negation
unigrams and stemming is performed for relevant features and then
Multinomial nave Bayes, SVM and maximum entropy are
compared to get the overall performance.

[6] performed sentiment analysis using various feature selections

schemes like tf-idf and term occurrence and classifies the dataset
using SVM and Naive Bayes to show the performance
comparison.
Research Gap and
motivation
Sentimental analysis is a hot topic of research.
Use of electronic media is increasing day by
day.
Time is money or even more valuable than
money therefore instead of spending times in
reading and figuring out the positivity or
negativity of text we can use automated
techniques for sentimental analysis.
Sentiment analysis is used in opinion mining.
Example Analyzing a product based on its
reviews and comments.
Key is to find the best
classifier according to the
dataset used and space and
time available

Earlier work have used

Lexicon analysis to find the
sentiment of the word

Earlier works have used

different feature selection
techniques and even
classifier to build a automatic
model
Objective
A heuristic based on Nave Bayes classifier
is designed to automatically predict the class
of the incoming review in order to maximize
the performance of the model.
Design Issue
Supervised / Classification
unsupervised / Hybrid Algorithm

Feature Selection
Techniques Lexicon Or Machine
Based
Design Flow
Dataset (Phase 1)
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002.
Thumbs up? Sentiment Classification using Machine Learning
Techniques. EMNLP-2002, 7986.
Bo Pang and Lillian Lee. 2004. A Sentimental Education:
Sentiment Analysis Using Subjectivity Summarization Based
on Minimum Cuts. ACL, 271-278

Polarity detection:
Is an IMDB movie review positive or
negative?
Data: Polarity Data 2.0:
http://www.cs.cornell.edu/people/pabo/movi
e-review-data
IMDB data in the Pang and Lee
database
when _star wars_ came out some snake eyes is the most
twenty years ago , the image of aggravating kind of movie :
traveling throughout the stars has the kind that shows so much
become a commonplace image . potential then becomes
[] unbelievably disappointing .
when han solo goes light speed , its not just because this is a
the stars change to bright lines , brian depalma film , and since
going towards the viewer in lines hes a great director and one
that converge at an invisible point whos films are always greeted
cool . with at least some fanfare .
_october sky_ offers a much and its not even because this
simpler imagethat of a single was a film starring nicolas
white dot , traveling horizontally cage and since he gives a
across the night sky . [. . . ] brauvara performance , this
film is hardly worth his talents
Pre-Processing (Phase 2)
Pre-processing is done in our proposed methodology to remove
the words which impede our process of sentiment analysis by
increasing the number of false positives or false negative.
In our model stop words are removed using Tf-idf. Term
Frequency- Inverse Document Frequency is known to find the
important and no so important word in the document. NLTK
also comes with an in-built list of 128 stop words which is
also included in our model to select the not relevant words.
We have done this by importing stopwords from NLTK
corpus.
Stemming algorithms attempt to automatically remove
suffixes (and in some cases prefixes) in order to find the root
word or stem of a given word. NLTK provides several
stemmer interfaces. In our proposed method we have used
porter stemmer to find the root words.
Feature Selection (Phase 3)
Feature selection is used to increase the
effectiveness of the model. Features which are
important are selected and fed to the classifier.
In our proposed methodology we used chi square
as a scoring function with which we can find if
two terms are associated to each other
(collocation correlation of two words or words
that are more likely to occur together).
It helps us in understanding if a word is
informative or not. If a word mainly occurs in
positive review and rarely in negative reviews it
can main that the word is important. So we find
how common a word is in a particular class
compared to other classes.
Feature Selection (Phase 4)
In Machine learning, A nave Bayes
classifier is a family of simple,
baseline probabilistic classifier based
on Bayes theorem with strong but
nave independence assumptions.
Experimental Results
Accuracy
100
93
90 84.75
84
81.6
80 75.25 76.5

60 Accuracy

Figure 4.2: Accuracy comparison of chi square and

information gain applied on our proposed methodology
with G. Tripathi et al
Precision
100 93
90.4 90.15
86.17
83.6 82.63
90

60 Precision

Figure 4.3: Precision comparison of chi square and

information gain applied on our proposed methodology
with G. Tripathi et al
Recall

100 94
88
90 81.6 81
80

70
59.5
56.5
60 Recall

Figure 4.3: Recall comparison of chi square and

information gain applied on our proposed methodology
with G. Tripathi et al
F-MEASURE
100

60 F-MEASURE

50 93.49
83.96 83.5 83.23
40 69.53 71.68

Figure 4.4: F-measure comparison of chi square and

information gain applied on our proposed methodology
with G. Tripathi et al
List of Publications
Pant, S., & Jain, K. (2017). Sentiment Analysis
using Feature Selection and Classification
Algorithms A survey. IJIERT,4(3), 109-113.

Pant, S., & Jain, K. (2017). Sentiment Analysis

using Feature Selection and Classification
Algorithms. IJIERT,4(5), 5-11.
Scope of Future
Enhancement
We would like to extend this technique on other
domains of opinion mining likes newspaper
articles, product reviews, political discussion
forums etc. We would like to apply in-depth
concepts of NLP for improved prediction of the
polarity of the document.
We are planning to make automatic sentiment
classifier for more than one languages starting
from the Hindi language. As nowadays
multilingual messages are posted on social
websites, so we will able to predict the sentiment
for any language.
It is worth extending the research using hybrid
techniques for sentiment analysis.
References
1. Turney, P. D. 2002, July. Thumbs up or thumbs down?:
semantic orientation applied to unsupervised
classification of reviews. In Proceedings of the 40th
annual meeting on association for computational
linguistics pp. 417-424.
2. Turney, P. D., & Littman, M. L. 2003. Measuring praise
and criticism: Inference of semantic orientation from
association. ACM Transactions on Information Systems
(TOIS), 21 : 315-346.
3. Pang, B., Lee, L., & Vaithyanathan, S. 2002, July.
Thumbs up?: sentiment classification using machine
learning techniques. In Proceedings of the ACL-02
conference on Empirical methods in natural language
processing-Volume 10 pp. 79-86.
4. Pang, B., & Lee, L. 2004, July. A sentimental
education: Sentiment analysis using subjectivity
summarization based on minimum cuts. In Proceedings
of the 42nd annual meeting on Association for
Computational Linguistics pp. 271-275.
5. Boiy, E., & Moens, M. F. 2009. A machine learning
approach to sentiment analysis in multilingual Web
texts. Information retrieval, 12 : 526-558.
6. Tripathi, G., & Naganna, S. 2015. Feature selection and
classification approach for sentiment analysis. Machine
Learning and Applications: An International
Journal, 2 : 1-16.
Changes
1

4
5

8
9

13
Thank You

3 Sem Dbms Notes
No ratings yet
3 Sem Dbms Notes
104 pages
Twitter Sentiment Analysis Project Report Compressed
No ratings yet
Twitter Sentiment Analysis Project Report Compressed
33 pages
Sentiment Analysis
100% (2)
Sentiment Analysis
198 pages
Case Study-Retail Analytics
100% (1)
Case Study-Retail Analytics
11 pages
A Project Report: A Study On Recommender Systems Employed by Indian E-Commerce Companies
No ratings yet
A Project Report: A Study On Recommender Systems Employed by Indian E-Commerce Companies
64 pages
Training Report On Machine Learning PDF
No ratings yet
Training Report On Machine Learning PDF
28 pages
Big Data Computing Spark Built-In Libraries
No ratings yet
Big Data Computing Spark Built-In Libraries
11 pages
Mini Project On: Gender and Age Detection Using Opencv
100% (1)
Mini Project On: Gender and Age Detection Using Opencv
17 pages
Emotion Detection
No ratings yet
Emotion Detection
17 pages
Analysis of Road Traffic Fatal Accident Using Data Mining Techniques
No ratings yet
Analysis of Road Traffic Fatal Accident Using Data Mining Techniques
11 pages
EE 322 Control Theory
No ratings yet
EE 322 Control Theory
37 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Deep Learning Assignment
No ratings yet
Deep Learning Assignment
8 pages
A New Approach To Parts of Speech Tagging in Malayalam
No ratings yet
A New Approach To Parts of Speech Tagging in Malayalam
10 pages
Chatbot in Python
No ratings yet
Chatbot in Python
45 pages
b10 PDF
100% (1)
b10 PDF
6 pages
Business Analytics Using Python Sentiment Analytics: Cyrus Lentin
100% (1)
Business Analytics Using Python Sentiment Analytics: Cyrus Lentin
28 pages
PSRC Working Group C43
No ratings yet
PSRC Working Group C43
111 pages
B3 Mini Project Document
No ratings yet
B3 Mini Project Document
69 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
Sentiment Analysis Report
No ratings yet
Sentiment Analysis Report
31 pages
Depression Detection Using Python Django and Tensorflow and Machine Learning
No ratings yet
Depression Detection Using Python Django and Tensorflow and Machine Learning
26 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
No ratings yet
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
27 pages
Data Mining
100% (1)
Data Mining
85 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
Final Report
100% (1)
Final Report
20 pages
On Evaluation of Liquefaction Potential in Modern Methodology
100% (1)
On Evaluation of Liquefaction Potential in Modern Methodology
25 pages
Comparative Study of Stock Trend Prediction Using Time Delay, Recurrent and Probabilistic Neural Networks
100% (1)
Comparative Study of Stock Trend Prediction Using Time Delay, Recurrent and Probabilistic Neural Networks
15 pages
Project Report
No ratings yet
Project Report
45 pages
Vijay Kumar: IBPS (SO) I.T.O Cer: Operating System Study Notes
No ratings yet
Vijay Kumar: IBPS (SO) I.T.O Cer: Operating System Study Notes
20 pages
Sentiment Analysis of Restaurant Customer
100% (1)
Sentiment Analysis of Restaurant Customer
6 pages
Sentiment Analysis
100% (1)
Sentiment Analysis
19 pages
Workshop Schedule
No ratings yet
Workshop Schedule
2 pages
Neural Networks and Deep Learning - Coursera
No ratings yet
Neural Networks and Deep Learning - Coursera
7 pages
Big Data
No ratings yet
Big Data
30 pages
Interim Project - Sentiment Analysis of Movie
No ratings yet
Interim Project - Sentiment Analysis of Movie
101 pages
Face Detection & Emotion Recognition
No ratings yet
Face Detection & Emotion Recognition
26 pages
5.web Data Mining
No ratings yet
5.web Data Mining
41 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
46 pages
SMIL - Multimodal Learning With Severely Missing Modality 2021
No ratings yet
SMIL - Multimodal Learning With Severely Missing Modality 2021
9 pages
Python Programming (Int 213) : Report For House Price Prdiction
No ratings yet
Python Programming (Int 213) : Report For House Price Prdiction
23 pages
Major Pro On Sentiment Analysis of Mobile Reviews PDF
No ratings yet
Major Pro On Sentiment Analysis of Mobile Reviews PDF
73 pages
ResearchPaperRecommenderSystems ALiteratureSurvey Preprint
No ratings yet
ResearchPaperRecommenderSystems ALiteratureSurvey Preprint
70 pages
Analysis of Crop Yield Using Machine Learning: A Minor Project Report
No ratings yet
Analysis of Crop Yield Using Machine Learning: A Minor Project Report
51 pages
Ant Colony Optimization: 22c: 145, Chapter 12
No ratings yet
Ant Colony Optimization: 22c: 145, Chapter 12
38 pages
Agents and Environment
No ratings yet
Agents and Environment
35 pages
A Debate On AI
No ratings yet
A Debate On AI
6 pages
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Movie Recommendation System Using Machine Learning
No ratings yet
Movie Recommendation System Using Machine Learning
23 pages
Project On Sentimental Analysis: Submitted by
No ratings yet
Project On Sentimental Analysis: Submitted by
17 pages
Machine Learning and Web Scraping Lesson02
No ratings yet
Machine Learning and Web Scraping Lesson02
29 pages
Pask Meaning of Cybernetics in Behavioural Sciences
No ratings yet
Pask Meaning of Cybernetics in Behavioural Sciences
18 pages
Ch10-Image Segmentation
No ratings yet
Ch10-Image Segmentation
22 pages
Sentiment Analysis On Movie Reviews Using RNN
No ratings yet
Sentiment Analysis On Movie Reviews Using RNN
10 pages
Web Mining
No ratings yet
Web Mining
13 pages
Logistic Regression Project With Python
No ratings yet
Logistic Regression Project With Python
14 pages
Artificial Intelligence in The Military
No ratings yet
Artificial Intelligence in The Military
10 pages
A Movie Recommendation System Based On A Convolutional Neural Network
No ratings yet
A Movie Recommendation System Based On A Convolutional Neural Network
13 pages
Data Scientist Exercise
No ratings yet
Data Scientist Exercise
2 pages
ANFIS Final Presentation
No ratings yet
ANFIS Final Presentation
28 pages
IMDB Movie Review Analysis
No ratings yet
IMDB Movie Review Analysis
9 pages
FSR Question Bank
No ratings yet
FSR Question Bank
2 pages
An Empirical Case Study On Indian Consumers' Sentiment Towards Electric Vehicles - A Big Data Analytics Approach
No ratings yet
An Empirical Case Study On Indian Consumers' Sentiment Towards Electric Vehicles - A Big Data Analytics Approach
12 pages
A Survey On Opinion Mining and Sentiment Analysis: Tasks, Approaches and Applications1-S2.0-S0950705115002336-Main
No ratings yet
A Survey On Opinion Mining and Sentiment Analysis: Tasks, Approaches and Applications1-S2.0-S0950705115002336-Main
33 pages
Lab 4-Image Segmentation Using U-Net
No ratings yet
Lab 4-Image Segmentation Using U-Net
9 pages
Starbucks Sentiment Analysis Using VADER
No ratings yet
Starbucks Sentiment Analysis Using VADER
23 pages
Lecture Note Chapter 11 PID Controller Design Tuning and Troubleshooting 2016
No ratings yet
Lecture Note Chapter 11 PID Controller Design Tuning and Troubleshooting 2016
61 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
21 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
8 pages
Study On Movie Recommendation System Using Machine Learning
No ratings yet
Study On Movie Recommendation System Using Machine Learning
4 pages
AI For Improving Procurement
No ratings yet
AI For Improving Procurement
6 pages
Text Mining Project Report
No ratings yet
Text Mining Project Report
27 pages
Final Big Data
No ratings yet
Final Big Data
23 pages
Sentiment Analysis of Product Review
No ratings yet
Sentiment Analysis of Product Review
6 pages
Feature Extraction of Geo-Tagged Twitter Data For Sentiment Analysis
No ratings yet
Feature Extraction of Geo-Tagged Twitter Data For Sentiment Analysis
6 pages
Text Analytics
No ratings yet
Text Analytics
30 pages
Robotics
No ratings yet
Robotics
5 pages
Movies Recommendation System Using Cosine Similarity
No ratings yet
Movies Recommendation System Using Cosine Similarity
5 pages
Project Photo Share)
No ratings yet
Project Photo Share)
58 pages
Amazon Product Review Sentiment Analysis With Machine Learning
No ratings yet
Amazon Product Review Sentiment Analysis With Machine Learning
4 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
8 pages
Case - Study of Data Warehouse
No ratings yet
Case - Study of Data Warehouse
14 pages
Different Text Mining Techniques
No ratings yet
Different Text Mining Techniques
4 pages
PR Assignment 01 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 01 - Seemal Ajaz (206979)
7 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
Unit 3 AS1 Final Draft of Summary Response Paper
No ratings yet
Unit 3 AS1 Final Draft of Summary Response Paper
3 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
5 pages
Association Rules
No ratings yet
Association Rules
64 pages
Chi Square Formul1
No ratings yet
Chi Square Formul1
2 pages
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
No ratings yet
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
4 pages
H.V.P.M's College of Engineering and Technology, Amravati
No ratings yet
H.V.P.M's College of Engineering and Technology, Amravati
23 pages
Confusion Matrix
No ratings yet
Confusion Matrix
2 pages
Database Management System
No ratings yet
Database Management System
4 pages
Engineering-A Review Web Data Scrapping
No ratings yet
Engineering-A Review Web Data Scrapping
4 pages