Twitter_Sentiment_Analysis_using_Deep_Learning
Twitter_Sentiment_Analysis_using_Deep_Learning
Abstract: The whole world is changing rapidly with current Manual extraction and analysis of opinion is an impossible
innovations, using the Internet, has become a fundamental task because the content is disorganized and written in natural
requirement in people's lives. Nowadays, a massive amount of language. Sentiment analysis can be used to analyze opinions
data made by social networks based on daily user activities.
automatically, that usually modeled as a text classification
Gathering and analyzing people's opinions are crucial for
business applications when they are extracted and analyzed
problem. Text classification is a crucial task for natural
accurately. This data helps the corporations to improve product language processing that can be performed in many
quality and provide better customer service. But manually applications that use understanding the natural language to
analyzing opinions is an impossible task because the content is determine the purpose and meaning behind the text and apply
unorganized. For this reason, we applied sentiment analysis that it to resolve multiple issues [1]. The ambiguity of the texts
is the process of extracting and analyzing the unorganized data makes NLP extremely difficult in which a phrase or statement
automatically. The primary steps to perform sentiment analysis is not explicitly determined, and like most other languages,
include data collection, pre-processing, word embedding,
ambiguity, is broad in the English language [2].
sentiment detection, and classification using deep learning
techniques. This work focused on the binary classification of Sentiment analysis is the classification of sentiments within
sentiments for three product reviews of fast-food restaurants. text data using text analysis techniques. It can be used to
Twitter is chosen as the source of data to perform analysis. All change the public view of peoples automatically from
tweets were collected automatically by using Tweepy. The unstructured data into structured data about brands, items,
experimented dataset divided into half of the positive and half of services, and politics. This data can be considerably helpful
the negative tweets. In this paper, three deep learning techniques for business applications to revise marketing strategy by
implemented, which are Convolutional Neural Network (CNN), understanding the customer feelings on products also,
Bi-Directional Long Short-Term Memory (Bi-LSTM), and
sentiment analysis of people’s comments on social media sites
CNN-Bi-LSTM, The performance of each of them measured and
compared in terms of accuracy, precision, recall, and F1 score can easily indicate whether consumers are satisfied with the
Finally, Bi-LSTM scored the highest performance in all metrics products or not. It can help companies and corporations to get
compared to the two other techniques. feedback from target consumers to identify their strengths and
weaknesses also to know exactly how to raise the quality of
Keywords: Sentiment Analysis, CNN, Bi-LSTM, NLP their items or services [3].
(Natural Language Processing)
This paper involves four sections organized in the
I. INTRODUCTION following manner: Section 1 presents a summary of deep
learning and how deep learning works. Section 2 shows the
Internet usage has become a fundamental requirement in methodology of our work. Section 3 determines the
people's lives, as they can buy and sell things or services classification techniques used for this work. Section 4 is the
online. Nowadays, social networking services provide a result and evaluation.
simple form of communication that permits users to exchange
information and opinions directly with each other, or on a II. DEEP LEARNING
public platform, and they become the most significant
resources for collecting information about people's feelings Deep learning is a specific part of machine learning in
and sentiments on various topics. Thus, if someone needs to artificial intelligence (AI) and consists of algorithms that
buy an item, there is no need to ask their friends and family for allow the software to train itself to perform tasks by exposing
opinions on the products. Presently multiple user surveys are multilayer neural networks to massive amounts of data [4].
available in public web forums. Gathering and analyzing Recently, deep learning algorithms have given effective
people's opinions are crucial, especially when they are performance in natural language processing uses, comprising
extracted and analyzed appropriately. sentiment analysis over multiple datasets [5]. The greatest
value of deep learning is that we do not need to manually
extract features, instead of that, they take word embedding, as
input which containing context information, and the middle
Revised Manuscript Received on May 30, 2020. layers of the neural network learn the features during the
* Correspondence Author training phase by themselves. Words are expressed in the high
Ghazi A*, Software Engineering, Firat University, Elazig, Turkey.
[email protected]
dimensional vector and feature extraction performed by the
Fatih Ö, Assistant Professor, Software Engineering, Firat University, neural network [6].
Elazig, Turkey. [email protected] The main reason deep learning starts very rapidly due to
provide superior performance on various issues also makes
© The Authors. Published by Blue Eyes Intelligence Engineering and problem-solving much easier because it is fully automatic [7].
Sciences Publication (BEIESP). This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
The deep neural network is about assigning inputs to targets review related to a class label, this data is transferred to
through a deep chain of simple data transformations (layers), classifiers to train and learn, then test reviews are provided to
and such layers are learning by observing many samples of the model, and classification is performed through these
input and targets. trained classifiers, finally, reviews classified into the positive
Transformation, performed through a layer that or negative. In this paper, three deep learning techniques
parameterized by own weights, also termed parameters. implemented which are Bi-LSTM, CNN, and
Learning layers means discovering a series of values for the CNN-Bi-LSTM.
weights of all layers in the network in such a way the network
will precisely set the input samples for the targets associated A. Data Collection
with them. The deep neural network can include many million Data collection is the process of accumulating data in many
parameters, and getting the right value for each parameter diverse sources. When deep learning grows popular, training
looks like a dispiriting duty because changing the value of the data is much needed for good performance [8]. Data
individual parameter will influence the behavior of all other collection mainly comprises of data acquisition, data labeling,
parameters, for this purpose loss function also called and enhancement of present data. The data acquisition
objective function will be used to computes a distance score purpose is to get datasets that deep learning models can be
by comparing the prediction of the network and the real object able to train on it, and it includes three methods: data
to estimate how far the predicted output is from the real
discovery, data augmentation, and data generation. Data
object. After computing the distance score between the
discovery is essential when it needs to share or quest for new
predictions of the network and the real target by using a loss
datasets and more datasets can be accessed on the Web [9].
function, this score is utilized as a feedback sign to slightly
improve the value of weights, in a way that will reduce the Data augmentation is an extension of the data discovery as
loss score, this improvement is a function of the optimizer that present datasets are improved by appending more exterior
performs what’s called the Backpropagation algorithm. In the data. Data generation utilized when no external data set is
Backpropagation algorithm, at first weights of the network are available. In this work, all tweets are written in English and
appointed with random values, hence the network only collected by using Twitter API, for this reason, Tweepy which
performs a sequence of arbitrary shifts. Generally, the results is a python library was used to connect with a Twitter
are ideally far from what it should be, so the loss score is too Application Programming Interface (API) to extract real-time
high. But, in each case the network handles, the weights are data from Twitter, and it installed by using pip command: pip
slightly modified in the right trend, and the loss score install Tweepy. The data originally stored in CSV file format
decreases, that's the training loop that iterated enough times, with two columns which are, review and sentiment.
giving weight values that reduce the loss function. The lowest
loss network is the network where the outputs are closest to B. Pre-Processing
the targets. Pre-processing is a very significant step to convert the text
in a human language into a machine-readable form for more
processing, and it affects the efficiency of other steps. The
pre-processing step aims to make the data more machines
readable to reduce ambiguity in feature extraction. In this
work, some steps were used to normalize the text which is,
converting the upper case to lower case, remove duplicate
text, stop words, numbers, multiple spaces, special characters,
a single character, punctuation marks, URLs, Html, mention,
and hashtags. Also performing lemmatization for words, it is
the process of substituting words with a stem or base words to
decrease inflectional structure to a typical root structure, and
expansion of slangs and abbreviations.
Hence, words should transform into feature vectors, or word LSTMs rather than one LSTM on the input sequence,
embedding’s [10]. The word vectors can be learned by where all time steps of the input sequence are available.
feeding a large group of raw text into a network and training it The first one passed on the input sequence without
for a sufficient amount of time. After training word modification, and the other one passed on a reversed
embedding, it used to extract similarities between words or copy of the input sequence, and it connects them to the
other relationships. same output, thus at each time step, the networks can
This method has gotten a great deal of consideration in the have backward and forward information about the
text, including sentiment analysis due to its capabilities to sequences. This extra setting adds to the network and
take the syntactic and semantic similarities between words. enhances the accuracy of the network. Bi-LSTMs are
For example, vectors for the words food and rice will have mainly useful when an input context is needed. For
higher similarities than the rice and car vectors. Recently, example, in sentiment analysis, performance can be
different strategies developed to create meaningful models improved by knowing the words before and after existing
that can learn word embedding’s from huge texts. The most words.
popular methods are word2vec [6, 10] and global vectors
C. Convolutional neural network (CNN)
(Glove) [11]. At present both of the methods are among the
most reliable and useful word embedding methods that can CNN is a special kind of neural network originally
turn words into meaningful vectors. intended to computer vision and exploits layers with
convolving filters that implemented to local features. It
D. Sentiment Classification broadly applied in different applications such as NLP,
In this work, after performing word embedding and speech processing, and computer vision. The network
creating a feature vector classification is done using CNN, contains neurons with weights and biases that are
CNN-Bi-LSTM, and Bi-LSTM, then the performance of their modified based on training data by some learning
results are compared. algorithm, and it has local receptive fields, which are
small regions of neurons in the input layer connected to
IV. CLASSIFICATION TECHNIQUES the neurons of the hidden layer. The CNN structure
In this part, we explore some various techniques used to consists of convolution layers, a pooling layer, and one
sentiment analysis. They are as follows: or more fully connected layers [16]. 1d-CNN, first
intended by Kim, works with patterns in one dimension
A. Long short-term memory (LSTM) and tends to be useful in natural language processing, it
LSTM is a specific kind of RNN that functions are more receives sentences of different lengths as input and
sophisticated, and it learns to manage the flow of information provides fixed-length vectors as output [17]. The
[12]. The standard RNN has an issue of gradient vanishing or maximum sentence length processed by the network, the
exploding. To conquer these issues, an LSTM intended by longer sentence is cut and the shorter sentence filled with
Hochreiter and Schmidhuber [13]. LSTM involved a memory zero vectors. Next, the dropout regularization utilized to
cell, input gate, output gate, and a forget gate. Data can be manage over-fitting.
saved, read, or write from cell-like information in a
computer's memory [14]. The cell makes decisions about V. RESULT AND EVALUATION
what to read, write, or erase through opened and closed gates.
These gates work on the signals they receive and pass or block In this part, we discuss the results obtained through
data due to its strength or weakness. This division of CNN, CNN-Bi-LSTM, and Bi-LSTM, all trained on 200k
responsibilities enables the network model to retain Twitter datasets that were created by us, and the results
information for long periods [15]. are compared based on metrics like accuracy, precision,
recall, and f1 score.
In building any deep learning model, one of the primary
tasks is to evaluate its performance; the performance of each
technique used in this work is measured, by computing
different metrics and the ultimate purpose behind working
with different metrics is to understand how well a deep
learning model is going to perform on unseen data. In this
work, the following metrics are used: Accuracy is the
proportion of the accurately analyzed samples to the total
number of samples.
TP + TN
Accuracy = (1)
TP + TN + FP + FN
Figure II: LSTM with its gates Precision is the number of accurate positive analyzed
samples to the number of predicted positive results by the
B. Bidirectional long short-term memory (Bi-LSTM) classifier.
Bidirectional LSTMs are an extension of standard
LSTMs that can enhance the efficiency of the model in
sequencing classification issues. Bi-LSTMs train on two
TP
Recall = (3)
TP + FN
Precision * Recall
F1 = 2 * (4)
Precision + Recall
In the above equations, TP is the true positive and predicted
correctly, FP is the false positive and predicted incorrectly,
Figure III: Best model statistics of correct and incorrect
TN is the true negative and predicted correctly, FN is the false
predictions
negative and predicted incorrectly.
Table 2 illustrates the results and performance
comparisons of the models, in this work, the VI. CONCLUSION
experimented dataset split into training, validation, and Sentiment analysis is the application that many companies use
testing set. The training set was given 80% of the dataset, to boost their advancement. Most business organizations
while the validation and testing set were each given 10% consider that the success of their business depends only on
of the dataset. Models trained on 162,000 records, customer satisfaction. It presents a significant role in the
validate on 18,000 records, and tested on 20,000 records. research field of text mining, which helps to extract and
analyze a vast collection of unorganized data collected on the
Table II: Results and performance comparisons of the web. It applies machine learning and deep learning algorithms
models that integrate with text mining to get valuable information
from unorganized data, and this information has many
Techniques Accuracy Precision recall F1 score advantages for business applications to improve product
CNN 89.0 89.0 89.0 89.0 quality, revise marketing strategy, provide better customer
service, and identify customer reviews in which helps the
CNN-Bi-LSTM 89.36 89.5 89.5 89.0 company to identify its strengths and weaknesses. All of
which add up to boost sales and revenue. This work focused
Bi-LSTM 90.3 90.5 90.5 90.0 on analyzing sentiments of product reviews into positive or
negative using sentiment analysis techniques. We used CNN,
In this work, the result of each technique achieved in CNN-Bi-LSTM, and Bi-LSTM in the analysis process, also
the following configurations: each of the models word2vec technique was used for word embedding. In this
configured with a dropout layer to restrict the neural work, tweets, collected automatically from Twitter using
network from memorizing the training set, which is Tweepy and all the techniques experimented on the 200k
dataset that divided into half of the positive and half of the
useful to prevent the overfitting. The models compiled
negative tweets. In conclusion, the results showed that CNN
with the Adam optimizer with the batch size of 128 for 15
takes less time to process than Bi-LSTM, but performance is
epochs, the output layer in all models is a
not up to the mark. Bi-LSTM achieved better performance
fully-connected dense layer with sigmoid activation that compared to CNN, and CNN-Bi-LSTM especially when the
makes a binary prediction. In the CNN model; the network volume of the data, is huge because it builds the relationship
has a three-layer of 1d-CNN that all layer implemented between hidden vectors at each time step, that's why
with 64 filters and a kernel size of 1, 2, 3 respectively, performed the task much slower, finally, Bi-LSTM achieved
after each layer a max-pooling layer with 2 pooling filter the highest accuracy of 90.3%.
size is applied that selects the value with the highest
weight only and ignores the rest values which REFERENCES
significantly enhance the results of the convolutional 1. S. Xu, H. Liang and T. Baldwin, "Unimelb at semeval-2016 tasks 4a and
layer and reduces the input to the next layer. Also, it has 4b: An ensemble of neural networks and a word2vec based model for
a one flattens layer that transforms a two-dimensional sentiment classification," in Proceedings of the 10th international
matrix of features into a vector that can be fed into the workshop on semantic evaluation (SemEval-2016), 2016.
AUTHORS PROFILE
Ghazi A, he obtained his B.Sc. in Computer Science
from University of Sulaimani, Iraq, in 2012. Currently
pursuing his Master degree in Software Engineering.
He has 3 years of teaching experience as an Asst.
Lecturer. His teaching includes Data Structure, Object
Oriented Programming, and Graphics with C#. He has
published 1 research paper in “2019 7th International
Symposium on Digital Forensics and Security (ISDFS)”. His interest is in
Programming language and deep learning.