0% found this document useful (0 votes)
17 views4 pages

IEEE Paper Format

The document discusses sentiment analysis using machine learning and deep learning techniques, focusing on analyzing social media data, particularly Twitter, to gauge public sentiment. It outlines various methods, including polarity-based sentiment analysis and deep learning models like LSTM and CNN, to classify tweets as positive or negative. The paper also presents experimental results showing that deep learning models outperform traditional machine learning classifiers in terms of accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

IEEE Paper Format

The document discusses sentiment analysis using machine learning and deep learning techniques, focusing on analyzing social media data, particularly Twitter, to gauge public sentiment. It outlines various methods, including polarity-based sentiment analysis and deep learning models like LSTM and CNN, to classify tweets as positive or negative. The paper also presents experimental results showing that deep learning models outperform traditional machine learning classifiers in terms of accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Sentiment Analysis using Machine Learning and

Deep Learning
YogeshChandra
AntoreepJana
ISSA, DRDO,Delhi,INDIA
Delhi Technological University, Delhi, INDIA
[email protected]
[email protected]
Abstract – With the increasing rate at which data is created by
internet users on various platforms, it becomes necessary to As per dsayce website, every second, on average, around
analyze and make use of the data by the Defense and other 6,000 tweets as reported in November 2018[2]. Twitter user in
Government Organizations and know the sentiment of the people. Japan set a new world record of 143,199 tweets per second
This shall help the organizations take control of their actions and (TPS) by tweeting “balus” during a television broadcast of
decide the step stobe taken shortly. Added to it, when something Hayao Miyazaki's anime classic Castle in the Sky on Aug 2.
crucial is happening in the nation, it is of paramount importance
to decide every step without hurting/violating the sentiments of the
Also, in the year 2016, Every second, on average, around
people. In the era of Micro blogging, which has become quite a 6,000tweetsaretweetedonTwitter,whichcorrespondstoover
popular tool of communication, millions of users share their views 350,000 tweetssent perminute,500 million tweetsper day and
and opinions on various day-to-day life issues concerning them around 200 billion tweets per year.
directly or indirectly through social media platforms like Twitter, This paper covers techniques and methods that can be
Reddit, Tumblr, Facebook. Data from these sites can be efficiently
implemented to directly enable opinion-oriented information-
usedfor marketing or social studies. In this paper, we have taken
into account various methods to perform sentiment
seeking systems. For this paper, python library named
analysis.Sentiment Analysis has been performed by using Machine “tweepy”, “scikit-learn”, and framework “keras” was also used
Learning Classifiers. Polarity-based sentiment analysis, and Deep to test as per user's will on live sentiments and analyze the
Learning Models are used to classify user’s tweets as having present state about any issue or topic.
‘positive’ or ‘negative’ sentiment. The idea behind taking in In this paper, we propose effective ways to perform
various model architectures wasto account for the variance in the sentiment analysis or opinion-mining using user tweets. The
opinions and thoughts existing on such social media platforms.
restofthepaperisorganizedas follows:SectionIIofthepaper
These classification models can further be implemented to classify
live tweets on twitter on any topic. presents the related works that have taken place. Section III
briefs about the research methodology. Section IV talks about
Keywords- -twitter, sentiment analysis, polarity, machine the dataset used for this paper. Section V talks about the
learning, deep learning, LSTM, CNN experimentsconductedandtheresultsobtainedfromthem.The
paper isconcluded in SectionVIby whatandhowwork can be
I. INTRODUCTION further extended.
Sentiment analysis is the field dealing and analyzing
people's opinions, sentiments, and attitudes, behavioral II. RELATEDWORK
responses to certain events or incidents based on the written In this section, wediscuss therelated works thathave taken
language available. It has become one of the most active areas in the field of sentiment analysis anopinion mining and the
in research due to surge in the fields like machine learning and various ways and techniques used to exploit and make use of
deep learning, and the blending of such areas with the earlier themicroblogging data. With the large availability of blogs and
used statistical methods of Natural Language Processing. social network websites, many researchers have delved into
opinion mining and sentiment analysis. Similar brief insights
Sudden bloom in sentiment analysis comes with the were presented by [18]. The authors described existing
growing interest and active participation in social media suchas techniques and approaches for opinion-oriented information
reviews of various topics and arts, forums discussions about retrieval.
prevalent issues, blogs and micro-blogs about opinions and
information sharing, Twitter and social networks talk about InapaperonLarge-ScaleSentimentAnalysisforNewsand
something which is trending in the local area or globally. With Blogs, by [3], They talk about the opinions expressed in
the growth in users using digital media to communicate and newspapers and blogs while reporting on recent events. They
express themselves, organizations and individuals can extract assigned a score to each distinct entity in the text corpus
outtheinformationneededfromtheopinionsandsentimentsof the indicating positive or negative opinion.
users[1].
The idea behind it is that news can be either good or bad.
For thissentiment identification phase was used and associated
expressedopinionwitheachrelevantentitywasgiven,andfor

978-93-80544-38-0/20/$31.00⃝c2020IEEE 1

Authorized licensed use limited to: University of Melbourne. Downloaded on May 31,2020 at 17:45:49 UTC from IEEE
relativeevaluationwithothersinthesameclass,asentiment
An interesting paper by [10], tried to present a new method
aggregation and score phase are used.
for sentiment classification based on extracting and analyzing
The process was in three stages – 1) Algorithmic appraisal groups such as “very good” . A set of attribute values
Construction of Sentiment Dictionaries 2) Sentiment Index in several task-independent semantic taxonomies, based on
Formation 3) Evaluationof Significance. They figured outhow Appraisal theory is an appraisal group. Lexicon of appraising
sentiment can vary by demographic group, news source or adjectives and their modifiers was built using Semi-automated
geographic location. Also, the degree to which the sentiment methods to classify the movie reviews using features based
indicespredictfuturechangesinpopularityormarketbehavior. upon these taxonomiescombined withstandard “bagofwords”
features.
In another paper by [4], to detect sentiment and topic
simultaneously from the text, a probabilistic modeling
framework has been based on LDA (Latent Dirichlet
Allocation), called joint sentiment/topic model (JST). This
model is fully unsupervised, unlike other machine learning
models that require supervision. The model has been evaluated
on the movie review dataset and the preliminary results have
shown promising results achieved by JST.
A paper by [5] uses the concept of recognizing contextual
polarity in Phrase-level Sentiment Analysis. It first determines
whether an expression is neutral or polar and then removes the
ambiguity of the polar expressions. Doing this, the contextual
polarity is identified by the system from a large subset of
sentimentexpressions,givingwaybetterresults.Sentencelevel
orevenphrase-levelarerequiredformulti-perspectivequestion
answering and summarization, opinion-oriented information
extraction, and mining product reviews. The main motive is to
be able to successfully pinpoint positive and negative
sentiments.
Since the aim of the paper was to improve the percentage
accuracy of the classifier models some similar work was
performed by [6],method where publicized stream of tweets
from Twitter microblogging site are preprocessed andclassified
based on their emotional content and then, precision and recall Fig.1.WorkflowforTwitterReviewsDataset
are used to perform the analysis of performance in such cases.
Deep learning methods can also be utilized to perform a
Besides the typicalmachine learning techniques, there exist sentiment analysis of a text. In [11], authors proposed a deep
a plethora of other techniques in sentiment analysis. In [7], convolutional neural network that exploits from character-to-
lexiconmodelsareusedforthedescriptionofverbs,nouns,and sentence level information to perform sentiment analysis of
adjectives.Themodelachievestofind thedifferentsubjectivity short texts. Two different domains were used for application:
relations expressing separate attitudes among the actors in a The Stanford Sentiment Tree bank (SSTb), which contains
sentence. sentences from movie reviews; and the Stanford twitter
In another work, by [8], they have applied statistical sentiment corpus (STS), which contains twitter messages.
approaches in which they try to differentiate between features In [12], exploited Twitter for the purposes of sentiment
having sentiment and features not having sentiment. A set of analysis, and it was done for another time by they approached
newfeatureselectionschemeswasproposedthatwoulduse the the sentiment analysis informof two tasks.The firsttypetakes
Content and Syntax Model to automatically learn a set of into consideration a five-point scale, which confers an ordinal
features in review document by separating entities that are charactertotheclassificationtask.Thesecondtypeisaboutthe
being reviewed from the subjective expressions that describe correct estimation of the prevalence of each class of interest,
those entities in terms of polarities. which is called quantification in supervised learning.
Similar to [7] work, [9] an enhanced lexical resource was Combining various methods to perform a task can yield
developed. It is the result of automatic annotationof all synsets better results. A paper by [13], shows a new combined method
of WORDNET according to notions of “positivity”, by the combination of rule-based classification, supervised
“negativity”, “neutral”. Sysnet talks about the ‘positivity', learning and machine learning. The results then showed that a
‘negativity' and objectivity associated with it. hybrid classification can improve the classification
effectiveness in terms of macro- and micro- averaged F1.
A Paper utilizing convolution neural network by[17],
utilizesdeepconvolutionalnetworkstoperformsentiment

2 7thInternationalConferenceonComputingForSustainableGlobalDevelopment (INDIACom)

Authorized licensed use limited to: University of Melbourne. Downloaded on May 31,2020 at 17:45:49 UTC from IEEE
analysis in multiple languages without using machine
the tweets are passed through the deep learning models and
translation and is achieving good accuracy scores. This is done
training is performed for the tweets. Later on, testing of the
by convoluting on character level embeddings for finding out
modelsisperformedonthetestdatasetandtheperformancesof
the polarity of tweets.
various deep learning models were calculated and evaluated.
III. THEAPPROACH The trained model can be saved and loaded back to save the
training time. The model can then be used for predicting the
Earlier experiments to perform sentiment analysis on sentiments of new tweets as obtained from the dataframe using
Twitter Dataset were based on polarity-based sentimentanalysis the tweepy script file. The input length of the tweets must be
and somemachine learning classifiers.In this research paper, we adjusted appropriately by using the padding of the tweets.
have adopted a voting-based classification system and
thepolarity-based sentimentanalysisisdoneonlivetweets fetched Classifiers taken for comparison were Naive Bayes, MNB
using Twitter API. Classifier, Bernoulli Classifier, Logistic Regression, Stochastic
Gradient Descent, Linear SVC, NuSVC Classifier. Also,
Then Machine Learning Classifier algorithms were also polarity-based methods were used to identify the positive and
used to check the performance. Usually, one machine or negative sentiment of tweet. The classification results showed
different machine learning algorithms are used and their that Deep Learning models using LSTM-CNN gave the best
performances are compared. We have used a confidence classification results.
measure by combining several machine learning algorithms.
The models (Machine Learning Models, Deep Learning V. EXPERIMENTSANDRESULTSANALYSIS
Models, or Polarity-based methods) are trained using features. Using the Twitter API available on apps.twitter.com and
The core idea of machine learning is feature engineering. various datasets of Twitter on various topics, results were
generated.
Machine Learning needs feature extraction and feature
engineering. What would happen if that could be automatedand For comparative study, these results were plotted in a bar
that's how we decided to try deep learning models. Weused graph and results were analyzed.
several combinations of deep learning models and compared
the relative performance. A custom python script was made which would utilize the
tweepy library and fetch tweets from twitter by specifying the
IV. PROBLEMASSESSMENTANDTOOLSDESCRIPTION keywords, like ‘rafale' or any other term associated with the
military. The number of tweets/sizes of the dataset required by
To perform the research, we have chosen dataset(s) theanalystcan bepassed asafunction argument. Thisprepares us
available on Kaggle/Twitter API. For experimental purposes, with the live test dataset on which we/the analyst wants to have
Kaggle kernels were used. predictions to be made on.
The datasets used to experiment were ‘First GOP Debate'
[14], ‘Bitcoin Tweets' [15] and ‘IMDb movie reviews' and live
tweets using Twitter API.
In the training of machine learning models, the features
were made by tokenizing the sentences/tweets after performing
data cleaning of the tweets. A separate script was made to
perform data cleaning of the tweets being received from the
twitter API and store them in a dataframe. This can be used by
simply importing the python script file and getting the
dataframe of the twitter tweets with a keyword as required by
the user and the number of tweets/dataset size as mentioned by
the user/analyst. Now, after having the dataset,
tokenizationandlemmatizationofthetweetswereperformedtohave
tokens of tweets and stopwords removal was done to eliminate
the common words. The machine learning classifier models
have then trained on these word features of the tweets and the
corresponding sentiment associated with a tweet as obtained
from a labeled dataset. The predictions were made on the test Fig.2.CumulativeAnalysisofvariousclassifiersforTwitterDataAnalysis
dataset. The predictions can also be performed from the
dataframe obtained from the python script using tweepy to Variousmachinelearningclassifierscanbeusedtoperform
analyze the sentiment on the ‘keyword' of user concern. training. In our paper, we have used seven different machine
When working with deep learning models for sentiment learning classifiers and then used a voting system to perform
analysis, the feature selection and mapping (feature the classification and the associated confidence of the
engineering) are performed automatically. Padding is done to classification is calculated by calculating the ratio of the
maintainuniforminputlengthofallthetweetsbeingpassed, numberofvotes(classificationsinfavour)ofaclasstothetotal
numberofvotes(classificationsorclassifiers).Themodelsare

7thInternationalConferenceonComputingForSustainableGlobalDevelopment(INDIACom) 3

Authorized licensed use limited to: University of Melbourne. Downloaded on May 31,2020 at 17:45:49 UTC from IEEE
saved and can be utilized to perform prediction on a dataset
As a future outline, work can be done to find out which
that is being loaded from tweepy using the custom python
architecturecanbedesignedandimplementedsothattheDeep
script.
Learning Model achieves an accuracy percentage as good as
The accuracy percentage for machine learning classifiers human accuracy.Thiswilllead to further enhancementfor this
lingersaround 81 percentto 90 percentfor training and testing domain of work.
data.
ACKNOWLEDGEMENT
The deep learning models are fed with padding of the
tweetsand trainedon the trainingdataset. The trainedmodel is Authors convey their gratitude to Director, ISSA for
testedonatest set,andtheresultsfortraining andtesting were guiding, supporting and allowing to publish and present the
much better, in the range of 85 percent to 97 percent for paper at the conference.
training and testing data. The reason being deep learning REFERENCES
models like RNN or LSTM remember to utilize the context of
[1] A.PakandP.Paroubek,"Twitteras aCorpusforSentimentAnalysisand
the token or the feature which wasn't happening in the case of Opinion Mining," LREc. Vol. 10. No. 2010, 2010.
machine learning models, in which features were simply [2] D. Sayce, "David Sayce," [Online].
adjacent words. Due to taking into context the words in the Available:https://www.dsayce.com/social-media/tweets-day/.
whole sentence in the scenario, the percentage accuracy of the [3] N.Godbole,M.SrinivasaiahandS.Skiena,"Large-ScaleSentimentAnalysis
deep learning models was better than the machine learning for News and Blogs," UVM, 2007.
models. The deep learning models take much longer to be [4] C. Lin and Y. He, "Joint sentiment/topic model for sentiment
trained (even on GPU) and it is recommended to save the analysis,"ACM,2009.
model after training is performed and load the model when [5] T. Wilson, J. Wiebe and P. Hoffmann, "Recognizing Contextual
required to perform the predictions. Polarityin Phrase-Level Sentiment Analysis," ACLWEB, 2005.
[6] B. Gokulakrishnan, P. Priyanthan, T. Ragavan, N. Prasath and A.
Therefore, we were able to conclude that using deep Perera,"OpinionminingandsentimentanalysisonaTwitterdatastream,"IEEE,
learning models definitely gives better classification results 2012.
compared to machine learning classifiers or polarity-based [7] I.MaksandP.Vossen,"Alexiconmodelfordeepsentimentanalysisand opinion
methodsonTwitterDataset(s)andcanbereliablyusedtoserve the mining applications," ScienceDirect, 2012.
purpose of sentiment assessment of the crowd which is not [8] A. Duric and F. Song, "Feature selection for sentiment analysis based
possible to do manually when the number of tweets you are oncontent and syntax models," ScienceDirect, 2012.
dealing with is in millions. [9] S. Baccianella, A. Esuli and F. Sebastiani, "SENTIWORDNET 3.0:
AnEnhanced Lexical Resource," Researchgate.
VI. CONCLUSION AND FUTURESCOPE [10] C.Whitelaw,N.GargandS.Argamon,"Usingappraisalgroupsforsentiment
analysis," ACM, 2005.
Analyzing the amount of data that is being generated [11] C.N.d.SantosandM.Gatti,"DeepConvolutionalNeuralNetworksfor,"
throughinternetmediaandmicrobloggingbyusersoffacilities ACLWEB, 2009.
available like Twitter, Reddit, Facebook, etc., and [12] P. Nakov, A. Ritter, S.
understanding the behavior of people using the internet is Rosenthal,F.(http://bvicam.ac.in/indiacom/Downloads.asp)
useful; it can generate both informative insights and revenue Sebastiani and V.Stoyanov, "SemEval-2016 Task 4: Sentiment Analysis
in Twitter,"ACLWEB,2016.
for the business.
[13] M.ThelwallandR.Prabowo,"Sentimentanalysis:Acombinedapproach,"
In this paper, the tweets data is collected and then passed Science Direct, 2009.
through machine learning classifiers. After being classified by [14] C. D. f. E. library, "Kaggle," 2016.
theindividual classifiers,a voted classification mechanismhas [Online].https://www.kaggle.com/crowdflower/first-gop-debate-
twitter-sentiment. [Accessed 2019].
been used to finally obtain the class of the ‘tweet' and the
[15] Suran, "Kaggle," 2018. [Online].
percentage confidence on it. Polarity method for classification https://www.kaggle.com/skularat/bitcoin-tweets.[Accessed2019].
has also been used to find the percentage of positive and
[16] B. Liu and L. Zhang, "A Survey of Opinion Mining and
negative tweets. Lastly, Deep Learning Models have been SentimentAnalysis," Springer Link, 2012.
impliedtoclassifythetweets.ModelslikeRNN,LSTM,CNN- [17] Wehrmann, Joonatas, Willian Becker, Henry EL Cagnini, and Rodrigo
RNN have been utilized to classify the tweets. C.Barros. “A character-based convolutional neural network for language-
agnostic Twitter Sentiment Analysis”, IEEE, 2017..
Deep Learning models like CNN-RNN, LSTM, etc., and
[18] Pang,Bo,andLillianLee."Opinionminingandsentimentanalysis." Foundations
their various combinations have shown better performance and Trends® in Information Retrieval 2, no. 1–2(2008):1-135.
compared to the machine learning algorithms. Added to this,
Deep Learning Models were trained on various datasets of
different domains and the models were achieving very high
percentage accuracy for test data in respective domains. Doing
this ensures that the final model will account for all the
possiblevarianceinthesocialmedia.Thefinalpredictiontobe made
by the model will be a voted system of all the models.

4 7thInternationalConferenceonComputingForSustainableGlobalDevelopment (INDIACom)

Authorized licensed use limited to: University of Melbourne. Downloaded on May 31,2020 at 17:45:49 UTC from IEEE

You might also like