Survey of Deep Learning Approaches For Twitter Text Classification
Survey of Deep Learning Approaches For Twitter Text Classification
Survey of Deep Learning Approaches For Twitter Text Classification
Science (IJAERS)
Peer-Reviewed Journal
ISSN: 2349-6495(P) | 2456-1908(O)
Vol-9, Issue-12; Dec, 2022
Journal Home Page Available: https://ijaers.com/
Article DOI: https://dx.doi.org/10.22161/ijaers.912.12
Received: 15 Nov 2022, Abstract— Sentiment analysis (also known as opinion mining or emotion
Receive in revised form: 03 Dec 2022, AI) is the use of natural language processing, text analysis, computational
linguistics, and biometrics to systematically identify, extract, quantify,
Accepted: 10 Dec 2022,
and study affective states and subjective information. Sentiment analysis
Available online: 17 Dec 2022 is widely applied to voice of the customer materials such as reviews and
©2022 The Author(s). Published by AI survey responses, online and social media, and healthcare materials for
Publication. This is an open access article applications that range from marketing to customer service to clinical
under the CC BY license medicine. With the rise of deep language models, such as RoBERTa, also
(https://creativecommons.org/licenses/by/4.0/). more difficult data domains can be analyzed, e.g., news texts where
authors typically express their opinion/sentiment less explicitly. Sentiment
Keywords— Convolution Neural Network
analysis aims to extract opinion automatically from data and classify
(CNN), Recurrent Neural Network (RNN),
them as positive and negative. Twitter widely used social media tools,
Long Short-Term Memory (LSTM), Deep
been seen as an important source of information for acquiring people’s
Learning, Bidirectional Long Short-Term
attitudes, emotions, views, and feedbacks. Within this context, Twitter
Memory (BiLSTM), Bidirectional Encoder
sentiment analysis techniques were developed to decide whether textual
Representations from Transformers (BERT),
tweets express a positive or negative opinion. In contrast to lower
Robustly Optimized BERT Pre-training
classification performance of traditional algorithms, deep learning
Approach (RoBERTa).
models, including Convolution Neural Network (CNN) and Bidirectional
Long Short-Term Memory (Bi-LSTM), have achieved a significant result
in sentiment analysis. Keras is a Deep Learning (DL) framework that
provides an embedding layer to produce the vector representation of
words present in the document. The objective of this work is to analyze the
performance of deep learning models namely Convolutional Neural
Network (CNN), Simple Recurrent Neural Network (RNN) and Long Short-
Term Memory (LSTM), bidirectional Long Short-Term Memory (Bi-
LSTM), BERT and RoBERTa for classifying the twitter reviews. From the
experiments conducted, it is found that RoBERTa model performs better
than CNN and simple RNN for sentiment classification.
share widely their opinions and reviews. Generally, Deep learning is a technique which is nowadays used in
sentiment analysis categories into three levels namely a wide range of applications, The advantages of deep
document-level, sentence-level, and feature-level. learning models include automatic feature extraction,
Document-level sentiment analysis classifies the whole easy computation due to the use of accelerated hardware,
review document as either positive or negative. Semantic provides best performance even with huge amount of
orientation approaches and machine learning approaches data. Deep learning models achieve significant results in
are the two methods used for sentiment classification. sentiment analysis speech recognition and computer visions.
Semantic orientation approaches determine the word’s There are some deep learning algorithms that are widely
polarity using a corpus or dictionary. They do not perform used in sentiment analysis are Convolution Neural Network
well in terms of classification accuracy because there is (CNN) and Recurrent Neural Network (RNN) Simple RNN
no single knowledge base which provides polarity for and RNN with LSTM tries to analyses for sentiment
every domain. Machine Learning (ML) approaches classification. Stochastic Gradient Descent, RMSprop are
initially build a model from the labelled data and then used as optimizers and their performance is evaluated.
use the built model to classify the test data. They require Word2Vec and Glove models were used as word embedding
large amount of labelled training data to build an efficient technique to present the tweets in the form of numeric values
model[1]. or vectors. These models are pre-train unsupervised word
Machine Learning involves algorithms which extract vectors that are trained with a large collection of words and
knowledge from data for creating predictions, rather can capture word semantics. The study applied these
than involving humans to manually develop rules and different word vector models to verify effectiveness of the
build models for resolving enormous amount of model.
knowledge. There are three types of algorithms used for Sentiment analysis and emotion analysis are performed. Text
machine learning such as supervised, unsupervised and Blob is used for annotating the sentiments data while
reinforcement learning. In supervised learning, the data emotions are annotated using the Text2Emotion model.
with class labels also called training data is used by the Positive, negative, and neutral sentiments are used while
machine learning algorithms to construct a model. The emotions are classified into happy, sad, surprise, angry, and
trained model is then used to identify the class label of fear. The suitability and performance of three feature
new unseen test data. In unsupervised learning, the model engineering approaches are studied including term
automatically finds patterns and relationships in the frequency-inverse document frequency (TF-IDF), bag of
dataset by creating clusters in it [2]. Reinforcement words (BoW), and Word2Vec. Experiments are performed
learning aims to develop a system or an agent that learns using several well-known machine learning models such as
from the rewards and punishments received from the support vector machine (SVM), logistic regression (LR),
environment In document-level sentiment classification, Gaussian Naive Bayes (GNB), extra tree classifier (ETC),
lexical, syntactic, and semantic features in a document decision tree (DT), and k nearest neighbour (KNN).
are first extracted. Then, weights are assigned to these
features using binary, Term Frequency (TF) and Term
II. LITERATURE REVIEW
Frequency- Inverse Document Frequency (TF-IDF)
weighting schemes and given as input to the machine K. S. Kalaivani and S. Uma suggested approaches for
learning algorithms. The performance of ML based deep learning. Keras is a Deep Learning (DL) framework
sentiment classification depends on the feature extraction that provides an embedding layer to produce the vector
techniques, feature selection methods and feature representation of words present in the document. analyzed
weighting schemes used. It is not always possible to get the performance of three deep learning models namely
labelled data for all the domains to train the model. Also, Convolutional Neural Network (CNN), Simple Recurrent
machine learning approaches require manual effort to Neural Network (RNN) and Long Short-Term Memory (LS
extract the features. To address the above issues, this TM) for classifying the book reviews. From the experiments
work introduces deep learning models for sentiment conducted, it is found that LSTM model performs better
classification. than CNN and simple RNN for sentiment classification.
Twitter tweets contain hidden valued information that can Sakirin Tan and Rachid Ben Said implemented
be used to determine an author’s attitude for a contextual ConvBiLSTM;a word embedding model which converts
polarity in the text [2]. Even though statistical machine tweets into numerical values, CNN layer receives feature
learning algorithms per- form well for simpler sentiment embedding as input and produces smaller dimension of
analysis applications, these algorithms cannot be features, and the Bi-LSTM model takes the input from the
generalized to more complex text classification problems. CNN layer and produces classification result [4]. Word2Vec
and GloVe were distinctly applied to observe the impact of such as unigrams, bigrams and trigrams from multi-scale
the word embedding result on the proposed model. features.
ConvBiLSTM was applied with retrieved Tweets and SST-2 To overcome the problem of capturing sentiments present
datasets. ConvBiLSTM model with Word2Vec on retrieved in the text from long-time steps, Huang et al., developed
Tweets dataset outperformed the other models with 91.13% a novel model called Sentence Representation-Long
accuracy. Short- Term Memory (SR-LSTM) [10]. The variants of
Sungheetha and Sharma [5] introduced a new LSTM such as peephole connection LSTM, coupled
Capsule model known as Trans Cap to address the issue input output forget LSTM, gated recurrent unit (GRU)
of labelling the aspect-level data. Aspect and dynamic and bidirectional LSTM were implemented. Finally, the
routing algorithms are used to transfer the knowledge authors concluded that the newly introduced models SR-
from the document-level task to aspect-level task. The LSTM and SSR-LSTM build more accurate model
authors proved that the proposed model performs better compared to other models for IMDB, Yelp 2014 and Yelp
than the state-of- the art models for aspect-level 2015.
sentiment analysis. Peng et al., introduced a novel deep graph CNN model to
Kalaivani and Kuppuswami improved the capture non-consecutive relations and long range
performance of syntactic features for document-level semantic relations for large scale text classification [11].
sentiment classification by backing off the head word or In few applications like sentiment analysis, capturing
modifier word to the corresponding POS cluster [6]. The long range semantics is more important than sequential
authors proved that the use of WFO based feature information. Initially, the text was converted into graph-
selection technique to select prominent generalized of-words and graph convolution operation was performed
syntactic features outperforms other existing features for to capture the text semantics. From the results, it is clear
classifying product reviews. that the proposed model performs better than the existing
Soujanya Poria and Devamanyu Hazarika discussed classification models.
this perception by pointing out the shortcomings and under-
explored, yet key aspects of this field necessary to attain true III. PROPOSED WORK
sentiment understanding. We analysed the significant leaps
A. Deep Learning
responsible for its current relevance. Further, we attempt to
chart a possible course for this field that covers many Recently, deep learning algorithms have achieved
overlooked and unanswered questions [7]. remarkable results in natural language processing area.
They represent data in multiple and successive layers. They
Ambreen nazir, Yuan Rao, Ling Sun explore the Issues
can capture the syntactic features from sentences
and challenges that are related to extraction of different
automatically without extra feature extracting techniques,
aspects and their relevant sentiments, relational mapping
which consume more resource and time. This is the reason
between aspects, interactions, dependencies, and contextual-
why deep learning models have attracted attention from
semantic relationships between different data objects for
NLP researchers to explore sentiment classification. By
improved sentiment accuracy, and prediction of sentiment
making use of a multi-layer perceptron structure in deep
evolution dynamicity [8].
learning, CNN can learn high-dimensional, non-linear, and
Kian Long Tan, Chin Poo Lee, Kian Ming Lim proposed complex classification. As a result, CNN is used in many
The Robustly optimized BERT approach maps the words applications such as computer vision, image processing, and
into a compact meaningful word embedding space while the speech recognition.
Long Short-Term Memory model captures the long-distance
B. Convolutional Neural Network
contextual semantics effectively. hybrid model outshines the
state-of-the-art methods by achieving F1-scores of 93%, Figure 1 shows the architecture of CNN which consists of
91%, and 90% on IMDb dataset, Twitter US Airline a convolutional layer, pooling layer. Flatten layer and a
Sentiment dataset, and Sentiment140 dataset, respectively dense layer. Generally, CNN is used for image, audio and
[8]. video applications like image classification, semantic
segmentation, object detection etc., In recent times, it has
A densely connected convolutional neural network with
been applied to text classification and has shown good
multi-scale feature attention was developed by Wang et
performance, So, in this work it is used for sentiment
al., for text classification [9]. Dense connections are used
classification as convolutional filters present in this model
to easily generate large N-gram features from various
is able to automatically learn the prominent features for
smaller N-gram features. Feature attention mechanism is
this task.
used to select effective features with varying N-grams
1) Embedding Layer: Neither the machine learning for multiclass classification and sigmoid is used for
algorithms nor deep learning algorithms can directly binary classification. Since, the reviews are either
process the raw text. It should be converted into a positive or negative, sigmoid activation function is used.
numerical form for further analysis. Two most used C. Recurrent Neural Network
embeddings are frequency-based embeddings and
Recurrent neural network (RNN) is a subdivision of
prediction-based embeddings. Frequency based
networks which are applicable for studying representation
embeddings use count vector, TFIDF vector or co-
of subsequent data such as Natural language processing.
occurrence vector to represent the documents. Since
It yields an objective function that depends not only on
these methods are limited in representing.
the current input but also along with earlier state output
2) Convolutional Layer: The purpose of this layer is or hidden state. Here, earlier state output is a function of
to select the high-level features for sentiment earlier state input. The current state output is a function of
classification. As the name implies, convolution previous input and output.
operation is performed in this layer. A filter is move over
the input matrix to construct the feature map. The
feature map size is managed by three criterions such as where b is the bias value, W represents the weights for the
depth, stride, and padding. Depth depends on number of previous output and U is the weight for the current input.
filters used for convolution operation. t is used to denote the position in the sequence.
3) Pooling Layer: This layer is introduced to reduce Figure 2 shows the architecture of simple RNN. The raw
the dimensions of the feature that was produced as data is pre-processed and a vocabulary is constructed
output from the convolutional layer. This layer reduces which contains unique words present in the document.
the computations needed to reduce the dimensionality of This is passed to an embedding layer which provides the
the data. There are three types of pooling namely max embedding value for each and every word present in the
pooling, average pooling, and sum pooling. In this work, vocabulary. The embedding values are passed to a
max pooling is used. Max pooling identifies the simple recurrent neural network. It predicts the output
maximum value from the portion of the data covered by for current text depending on previous output and input.
the kernel or filter. From the literature, it is found that The output of SRNN layer is passed to dropout layer
max pooling outperforms average and sum pooling in which avoids overfitting by dropping some of the
various applications. features that are not prominent. Finally, dense layer
along with the activation function provides the polarity
of the review.
the initial time steps can have its impact till the later time F. BERT (Bidirectional Encoder Representations from
steps. So, the relevant information gets added and Transformers)
irrelevant information gets removed via gates in the cell It is a Natural Language Processing Model which
state during the training process. achieve state-of-the-art accuracy on many NLP and
NLU tasks such as: BERT is basically an Encoder stack
of transformer architecture. A transformer architecture
is an encoder-decoder network that uses self-attention
on the encoder side and attention on the decoder side.
BERT makes use of Transformer, an attention
mechanism that learns contextual relations between
words (or sub-words) in a text. In its vanilla form,
Transformer includes two separate mechanisms — an
encoder that reads the text input and a decoder that
produces a prediction for the task. Since BERT’s goal
is to generate a language model, only the encoder
mechanism is necessary.
E. Bidirectional LSTM
Bi-LSTM is one of RNN algorithms to improve LSTM which
has shortcomings of text sequence features. It solves the task
of sequential modelling better than LSTM [32], [33]. In
LSTM, information is flowed from backward to forward,
whereas the information in Bi-LSTM flows in both
directions backward to forward and from forward to
backward by using two hidden states. The structure of Bi-
LSTM makes it a pioneer in sentiment classification because
Fig 5. Architecture of BERT
it can learn the context more effectively. Figure 4 shows the
architecture of Bi-LSTM [34]. By utilising two ways of
direction, input data of both preceding and succeeding F. RoBERTa (Robustly Optimized
sequence in Bi-LSTM are retained, unlike the standard RNN Bidirectional Encoder Representations from
model that needs decay to include future data. Transformers)
RoBERTa The RoBERTa model is an extension of
Bidirectional Encoder Representation from Transformers
(BERT). The BERT and RoBERTa fall under the
Transformers [2] family that was developed for sequence-
to-sequence modeling to address the long-range
dependencies problem.