Survey of Deep Learning Approaches For Twitter Text Classification

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

International Journal of Advanced Engineering Research and

Science (IJAERS)
Peer-Reviewed Journal
ISSN: 2349-6495(P) | 2456-1908(O)
Vol-9, Issue-12; Dec, 2022
Journal Home Page Available: https://ijaers.com/
Article DOI: https://dx.doi.org/10.22161/ijaers.912.12

Survey of Deep Learning Approaches for Twitter Text


Classification
Mr. Lukesh Kadu, Dr.Manoj Deshpande, Dr.Vijaykumar Pawar

1Research Scholar, A.C.Patil College of Engineering, Kharghar, Navi Mumbai, India


2Computer Departement, A.C.Patil College of Enginering, Khargahr, Navi Mumbai, India
3Principal, A.C.Patil College of Engineering, Khargahr,Navi Mumbai, India

Received: 15 Nov 2022, Abstract— Sentiment analysis (also known as opinion mining or emotion
Receive in revised form: 03 Dec 2022, AI) is the use of natural language processing, text analysis, computational
linguistics, and biometrics to systematically identify, extract, quantify,
Accepted: 10 Dec 2022,
and study affective states and subjective information. Sentiment analysis
Available online: 17 Dec 2022 is widely applied to voice of the customer materials such as reviews and
©2022 The Author(s). Published by AI survey responses, online and social media, and healthcare materials for
Publication. This is an open access article applications that range from marketing to customer service to clinical
under the CC BY license medicine. With the rise of deep language models, such as RoBERTa, also
(https://creativecommons.org/licenses/by/4.0/). more difficult data domains can be analyzed, e.g., news texts where
authors typically express their opinion/sentiment less explicitly. Sentiment
Keywords— Convolution Neural Network
analysis aims to extract opinion automatically from data and classify
(CNN), Recurrent Neural Network (RNN),
them as positive and negative. Twitter widely used social media tools,
Long Short-Term Memory (LSTM), Deep
been seen as an important source of information for acquiring people’s
Learning, Bidirectional Long Short-Term
attitudes, emotions, views, and feedbacks. Within this context, Twitter
Memory (BiLSTM), Bidirectional Encoder
sentiment analysis techniques were developed to decide whether textual
Representations from Transformers (BERT),
tweets express a positive or negative opinion. In contrast to lower
Robustly Optimized BERT Pre-training
classification performance of traditional algorithms, deep learning
Approach (RoBERTa).
models, including Convolution Neural Network (CNN) and Bidirectional
Long Short-Term Memory (Bi-LSTM), have achieved a significant result
in sentiment analysis. Keras is a Deep Learning (DL) framework that
provides an embedding layer to produce the vector representation of
words present in the document. The objective of this work is to analyze the
performance of deep learning models namely Convolutional Neural
Network (CNN), Simple Recurrent Neural Network (RNN) and Long Short-
Term Memory (LSTM), bidirectional Long Short-Term Memory (Bi-
LSTM), BERT and RoBERTa for classifying the twitter reviews. From the
experiments conducted, it is found that RoBERTa model performs better
than CNN and simple RNN for sentiment classification.

I. INTRODUCTION Sentiment analysis is important from the perspective of


Social media is a platform for people to express their business or politics where it highly impacts the strategic
feelings, feedback, and opinions. To understand the decision making, Therefore, sentiment analysis is recognized
sentiment context of the text, sentiment analysis plays the as a significant technique to generate useful information
role to determine whether the sentiment of the text is from unstructured data sources such as tweets or reviews.
positive, negative, neutral or any other personal feeling. Social media platforms, including Twitter, Facebook,
Instagram, blogs, reviews and news websites allow people to

www.ijaers.com Page | 106


Kadu et al. International Journal of Advanced Engineering Research and Science, 9(12)-2022

share widely their opinions and reviews. Generally, Deep learning is a technique which is nowadays used in
sentiment analysis categories into three levels namely a wide range of applications, The advantages of deep
document-level, sentence-level, and feature-level. learning models include automatic feature extraction,
Document-level sentiment analysis classifies the whole easy computation due to the use of accelerated hardware,
review document as either positive or negative. Semantic provides best performance even with huge amount of
orientation approaches and machine learning approaches data. Deep learning models achieve significant results in
are the two methods used for sentiment classification. sentiment analysis speech recognition and computer visions.
Semantic orientation approaches determine the word’s There are some deep learning algorithms that are widely
polarity using a corpus or dictionary. They do not perform used in sentiment analysis are Convolution Neural Network
well in terms of classification accuracy because there is (CNN) and Recurrent Neural Network (RNN) Simple RNN
no single knowledge base which provides polarity for and RNN with LSTM tries to analyses for sentiment
every domain. Machine Learning (ML) approaches classification. Stochastic Gradient Descent, RMSprop are
initially build a model from the labelled data and then used as optimizers and their performance is evaluated.
use the built model to classify the test data. They require Word2Vec and Glove models were used as word embedding
large amount of labelled training data to build an efficient technique to present the tweets in the form of numeric values
model[1]. or vectors. These models are pre-train unsupervised word
Machine Learning involves algorithms which extract vectors that are trained with a large collection of words and
knowledge from data for creating predictions, rather can capture word semantics. The study applied these
than involving humans to manually develop rules and different word vector models to verify effectiveness of the
build models for resolving enormous amount of model.
knowledge. There are three types of algorithms used for Sentiment analysis and emotion analysis are performed. Text
machine learning such as supervised, unsupervised and Blob is used for annotating the sentiments data while
reinforcement learning. In supervised learning, the data emotions are annotated using the Text2Emotion model.
with class labels also called training data is used by the Positive, negative, and neutral sentiments are used while
machine learning algorithms to construct a model. The emotions are classified into happy, sad, surprise, angry, and
trained model is then used to identify the class label of fear. The suitability and performance of three feature
new unseen test data. In unsupervised learning, the model engineering approaches are studied including term
automatically finds patterns and relationships in the frequency-inverse document frequency (TF-IDF), bag of
dataset by creating clusters in it [2]. Reinforcement words (BoW), and Word2Vec. Experiments are performed
learning aims to develop a system or an agent that learns using several well-known machine learning models such as
from the rewards and punishments received from the support vector machine (SVM), logistic regression (LR),
environment In document-level sentiment classification, Gaussian Naive Bayes (GNB), extra tree classifier (ETC),
lexical, syntactic, and semantic features in a document decision tree (DT), and k nearest neighbour (KNN).
are first extracted. Then, weights are assigned to these
features using binary, Term Frequency (TF) and Term
II. LITERATURE REVIEW
Frequency- Inverse Document Frequency (TF-IDF)
weighting schemes and given as input to the machine K. S. Kalaivani and S. Uma suggested approaches for
learning algorithms. The performance of ML based deep learning. Keras is a Deep Learning (DL) framework
sentiment classification depends on the feature extraction that provides an embedding layer to produce the vector
techniques, feature selection methods and feature representation of words present in the document. analyzed
weighting schemes used. It is not always possible to get the performance of three deep learning models namely
labelled data for all the domains to train the model. Also, Convolutional Neural Network (CNN), Simple Recurrent
machine learning approaches require manual effort to Neural Network (RNN) and Long Short-Term Memory (LS
extract the features. To address the above issues, this TM) for classifying the book reviews. From the experiments
work introduces deep learning models for sentiment conducted, it is found that LSTM model performs better
classification. than CNN and simple RNN for sentiment classification.
Twitter tweets contain hidden valued information that can Sakirin Tan and Rachid Ben Said implemented
be used to determine an author’s attitude for a contextual ConvBiLSTM;a word embedding model which converts
polarity in the text [2]. Even though statistical machine tweets into numerical values, CNN layer receives feature
learning algorithms per- form well for simpler sentiment embedding as input and produces smaller dimension of
analysis applications, these algorithms cannot be features, and the Bi-LSTM model takes the input from the
generalized to more complex text classification problems. CNN layer and produces classification result [4]. Word2Vec

www.ijaers.com Page | 107


Kadu et al. International Journal of Advanced Engineering Research and Science, 9(12)-2022

and GloVe were distinctly applied to observe the impact of such as unigrams, bigrams and trigrams from multi-scale
the word embedding result on the proposed model. features.
ConvBiLSTM was applied with retrieved Tweets and SST-2 To overcome the problem of capturing sentiments present
datasets. ConvBiLSTM model with Word2Vec on retrieved in the text from long-time steps, Huang et al., developed
Tweets dataset outperformed the other models with 91.13% a novel model called Sentence Representation-Long
accuracy. Short- Term Memory (SR-LSTM) [10]. The variants of
Sungheetha and Sharma [5] introduced a new LSTM such as peephole connection LSTM, coupled
Capsule model known as Trans Cap to address the issue input output forget LSTM, gated recurrent unit (GRU)
of labelling the aspect-level data. Aspect and dynamic and bidirectional LSTM were implemented. Finally, the
routing algorithms are used to transfer the knowledge authors concluded that the newly introduced models SR-
from the document-level task to aspect-level task. The LSTM and SSR-LSTM build more accurate model
authors proved that the proposed model performs better compared to other models for IMDB, Yelp 2014 and Yelp
than the state-of- the art models for aspect-level 2015.
sentiment analysis. Peng et al., introduced a novel deep graph CNN model to
Kalaivani and Kuppuswami improved the capture non-consecutive relations and long range
performance of syntactic features for document-level semantic relations for large scale text classification [11].
sentiment classification by backing off the head word or In few applications like sentiment analysis, capturing
modifier word to the corresponding POS cluster [6]. The long range semantics is more important than sequential
authors proved that the use of WFO based feature information. Initially, the text was converted into graph-
selection technique to select prominent generalized of-words and graph convolution operation was performed
syntactic features outperforms other existing features for to capture the text semantics. From the results, it is clear
classifying product reviews. that the proposed model performs better than the existing
Soujanya Poria and Devamanyu Hazarika discussed classification models.
this perception by pointing out the shortcomings and under-
explored, yet key aspects of this field necessary to attain true III. PROPOSED WORK
sentiment understanding. We analysed the significant leaps
A. Deep Learning
responsible for its current relevance. Further, we attempt to
chart a possible course for this field that covers many Recently, deep learning algorithms have achieved
overlooked and unanswered questions [7]. remarkable results in natural language processing area.
They represent data in multiple and successive layers. They
Ambreen nazir, Yuan Rao, Ling Sun explore the Issues
can capture the syntactic features from sentences
and challenges that are related to extraction of different
automatically without extra feature extracting techniques,
aspects and their relevant sentiments, relational mapping
which consume more resource and time. This is the reason
between aspects, interactions, dependencies, and contextual-
why deep learning models have attracted attention from
semantic relationships between different data objects for
NLP researchers to explore sentiment classification. By
improved sentiment accuracy, and prediction of sentiment
making use of a multi-layer perceptron structure in deep
evolution dynamicity [8].
learning, CNN can learn high-dimensional, non-linear, and
Kian Long Tan, Chin Poo Lee, Kian Ming Lim proposed complex classification. As a result, CNN is used in many
The Robustly optimized BERT approach maps the words applications such as computer vision, image processing, and
into a compact meaningful word embedding space while the speech recognition.
Long Short-Term Memory model captures the long-distance
B. Convolutional Neural Network
contextual semantics effectively. hybrid model outshines the
state-of-the-art methods by achieving F1-scores of 93%, Figure 1 shows the architecture of CNN which consists of
91%, and 90% on IMDb dataset, Twitter US Airline a convolutional layer, pooling layer. Flatten layer and a
Sentiment dataset, and Sentiment140 dataset, respectively dense layer. Generally, CNN is used for image, audio and
[8]. video applications like image classification, semantic
segmentation, object detection etc., In recent times, it has
A densely connected convolutional neural network with
been applied to text classification and has shown good
multi-scale feature attention was developed by Wang et
performance, So, in this work it is used for sentiment
al., for text classification [9]. Dense connections are used
classification as convolutional filters present in this model
to easily generate large N-gram features from various
is able to automatically learn the prominent features for
smaller N-gram features. Feature attention mechanism is
this task.
used to select effective features with varying N-grams

www.ijaers.com Page | 108


Kadu et al. International Journal of Advanced Engineering Research and Science, 9(12)-2022

1) Embedding Layer: Neither the machine learning for multiclass classification and sigmoid is used for
algorithms nor deep learning algorithms can directly binary classification. Since, the reviews are either
process the raw text. It should be converted into a positive or negative, sigmoid activation function is used.
numerical form for further analysis. Two most used C. Recurrent Neural Network
embeddings are frequency-based embeddings and
Recurrent neural network (RNN) is a subdivision of
prediction-based embeddings. Frequency based
networks which are applicable for studying representation
embeddings use count vector, TFIDF vector or co-
of subsequent data such as Natural language processing.
occurrence vector to represent the documents. Since
It yields an objective function that depends not only on
these methods are limited in representing.
the current input but also along with earlier state output
2) Convolutional Layer: The purpose of this layer is or hidden state. Here, earlier state output is a function of
to select the high-level features for sentiment earlier state input. The current state output is a function of
classification. As the name implies, convolution previous input and output.
operation is performed in this layer. A filter is move over
the input matrix to construct the feature map. The
feature map size is managed by three criterions such as where b is the bias value, W represents the weights for the
depth, stride, and padding. Depth depends on number of previous output and U is the weight for the current input.
filters used for convolution operation. t is used to denote the position in the sequence.
3) Pooling Layer: This layer is introduced to reduce Figure 2 shows the architecture of simple RNN. The raw
the dimensions of the feature that was produced as data is pre-processed and a vocabulary is constructed
output from the convolutional layer. This layer reduces which contains unique words present in the document.
the computations needed to reduce the dimensionality of This is passed to an embedding layer which provides the
the data. There are three types of pooling namely max embedding value for each and every word present in the
pooling, average pooling, and sum pooling. In this work, vocabulary. The embedding values are passed to a
max pooling is used. Max pooling identifies the simple recurrent neural network. It predicts the output
maximum value from the portion of the data covered by for current text depending on previous output and input.
the kernel or filter. From the literature, it is found that The output of SRNN layer is passed to dropout layer
max pooling outperforms average and sum pooling in which avoids overfitting by dropping some of the
various applications. features that are not prominent. Finally, dense layer
along with the activation function provides the polarity
of the review.

Fig 2. Architecture of SRNN


Fig 1. Architecture of CNN

D. Long Short-Term Memory


4) Flatten Layer: Flattening layer is used to convert
The main component of LSTM is the cell state and
the feature matrix into a vector of feature values. So, the
various gates. The cell state is responsible for
unified pooled feature matrix is converted into a single
transferring the information along the sequence chain.
column vector.
The cell state acts like a memory by carrying the
5) Dense Layer: Dense layer is used to identify the information for the complete processing of the entire
class label depending on the activation function used. sequence. Here, the short-term memory issue of RNN is
Activation functions used may be SoftMax or sigmoid overcome such that even the relevant information from
based on the type of classification task. SoftMax is used

www.ijaers.com Page | 109


Kadu et al. International Journal of Advanced Engineering Research and Science, 9(12)-2022

the initial time steps can have its impact till the later time F. BERT (Bidirectional Encoder Representations from
steps. So, the relevant information gets added and Transformers)
irrelevant information gets removed via gates in the cell It is a Natural Language Processing Model which
state during the training process. achieve state-of-the-art accuracy on many NLP and
NLU tasks such as: BERT is basically an Encoder stack
of transformer architecture. A transformer architecture
is an encoder-decoder network that uses self-attention
on the encoder side and attention on the decoder side.
BERT makes use of Transformer, an attention
mechanism that learns contextual relations between
words (or sub-words) in a text. In its vanilla form,
Transformer includes two separate mechanisms — an
encoder that reads the text input and a decoder that
produces a prediction for the task. Since BERT’s goal
is to generate a language model, only the encoder
mechanism is necessary.

Fig 3. Architecture of LSTM

E. Bidirectional LSTM
Bi-LSTM is one of RNN algorithms to improve LSTM which
has shortcomings of text sequence features. It solves the task
of sequential modelling better than LSTM [32], [33]. In
LSTM, information is flowed from backward to forward,
whereas the information in Bi-LSTM flows in both
directions backward to forward and from forward to
backward by using two hidden states. The structure of Bi-
LSTM makes it a pioneer in sentiment classification because
Fig 5. Architecture of BERT
it can learn the context more effectively. Figure 4 shows the
architecture of Bi-LSTM [34]. By utilising two ways of
direction, input data of both preceding and succeeding F. RoBERTa (Robustly Optimized
sequence in Bi-LSTM are retained, unlike the standard RNN Bidirectional Encoder Representations from
model that needs decay to include future data. Transformers)
RoBERTa The RoBERTa model is an extension of
Bidirectional Encoder Representation from Transformers
(BERT). The BERT and RoBERTa fall under the
Transformers [2] family that was developed for sequence-
to-sequence modeling to address the long-range
dependencies problem.

Fig 4. Architecture of BiLSTM

www.ijaers.com Page | 110


Kadu et al. International Journal of Advanced Engineering Research and Science, 9(12)-2022

Three deep learning architectures CNN, simple RNN


and LSTM are compared for document-level sentiment
classification. Below figures shows the training and
testing accuracy obtained for all the three networks.
From the figures, it is clear that LSTM shows superior
performance when compared to other two networks in
terms of accuracy. The reason behind this is that both
CNN and simple RNN models are not able to remember
the sequence of words like LSTM network.

Fig 6. Architecture of RoBERTa

Transformer models comprise three components, namely


tokenizer, transformers, and heads. The tokenizer converts
the raw text into the sparse index encodings. Then, the
transformers reform the sparse content into contextual
embedding for deeper training. The heads are implemented
to wrap the transformers model so that the contextual Fig 7: Performance comparison of CNN
embedding can be used for the downstream tasks. The
components of the Transformers are depicted in Figure 6.

IV. RESULT AND DISCUSSION


A. Dataset Used
1) Huge crash in stock market 2022
Gathered Tweets related to Stock Market Crash in 2022
from
Twitter which performs various task NLP task on this data
source. The sentiment of the tweet’s column consists of
three categories: Positive 12542 tweets Neutral 11498
tweets Negative 9906 tweets.
2) Stock Market TWEETS Data-NL2021
Twitter is one of the most popular social networks for
sentiment analysis. This data set of tweets are related to the
stock market. Fig 8: Performance comparison of SRNN
We collected 943,672 tweets between April 9 and July 16,
2020, using the S&P 500 tag (#SPX500), the references to
the top 25
3) Stock Market Tweet | Sentiment Analysis lexicon
Tweets were collected between April 9 and July 16, 2020
using not only the SPX500 tag but also the top 25
companies in the index and "#stocks". 1300 tweets were
manually classified and reviewed. All the source code used
to download tweets, check the top words, and evaluate the
sentiment are present.

www.ijaers.com Page | 111


Kadu et al. International Journal of Advanced Engineering Research and Science, 9(12)-2022

ConvBiLSTM Deep Learning Model-Based Approach for


Twitter Sentiment Classification: from IEEE Access March
19, 2021.
[6] A. Sungheetha & R. Sharma, “Trans capsule model for
sentiment classification”, Journal of Artificial Intelligence,
vol. 2, no. 03, pp. 163-169, 2020.
[7] K. S. Kalaivani & S. Kuppuswami, “Exploring the use of
syntactic dependency features for document -level sentiment
classification”, Bulletin of the Polish Academy of Sciences.
Technical Sciences, vol. 67, no. 2, 2019.
[8] Soujanya Poria, Devamanyu Hazarika, Navonil Majumder,
Rada Mihalcea: Current Challenges and New Directions in
Sentiment Analysis Research”, IEEE Transactions on
Affective Computing 2020 IEEE
[9] Kian Long Tan, Chin Poo Lee, Kian Ming Lim, “RoBERTa-
LSTM: A Hybrid Model for Sentiment Analysis with
Transformer and Recurrent Neural Network”, IEEE Access
March 2,2022.
Fig 9: Performance comparison of LSTM [10] S. Wang, M. Huang & Z. Deng, Z, “Densely Connected
CNN with Multi-scale Feature Attention for Text
Classification”, In IJCAI, pp. 4468-4474, 2018.
V. CONCLUSIONS [11] A. Yenter & A. Verma, A, “Deep CNN-LST M with
The performance of three deep learning models is combined kernels from multiple branches for IMDb review
analysed for document-level sentiment classification. sentiment analysis”, In 2017 IEEE 8th Annual Ubiquitous
For sentiment classification, the local and non-local Computing, Electronics and Mobile Communication
relationship between the words in the sentence should be Conference, pp. 540-546, 2017.
[12] A. Yenter & A. Verma, A, “Deep CNN-LST M with
considered for improved classification performance. The
combined kernels from multiple branches for IMDb review
proposed approach helps the model to classify text sentiment analysis “In 2017 IEEE 8th Annual Ubiquitous
sentiment effectively by capturing both local and global Computing, Electronics and Mobile Communication
dependencies in the contextual of sentences. The model is Conference, pp. 540-546, 2017
trained and evaluated on tweets dataset like Stock Market [13] S. Wang, M. Huang & Z. Deng, Z, “Densely Connected
Tweet, Sentiment Analysis lexicon, Stock Market CNN with Multi-scale Feature Attention for Text
TWEETS Data-NL2021 and Huge crash in stock market Classification”, In IJCAI, pp. 4468-4474, 2018.
2022 dataset. Finally, the model could classify text sentiment [14] G. Rao, W. Huang, Z. Feng & Q. Cong, “LSTM with sentence
representations for document-level sentiment classification”,
effectively on both datasets. The experiment result verified
Neurocomputing, vol. 308, pp. 49-57, 2018.
the feasibility and effectiveness of model. In the future, the
[15] H. Peng, J. Li, Y. He, Y. Liu, M. Bao, L. Wang & Yang, Q,
performance of other deep learning models may be “Large- scale hierarchical text classification with recursively
analyzed for sentiment classification. regularized deep graphcnn”, In Proceedings of the 2018
World Wide Web Conference pp. 1063-1072, 2018.
[16] Adyan Marendra Ramadhani & Hong Soon Goo, “Twitter
REFERENCES Sentiment Analysis using Deep Learning Methods”, 2017 7th
[1] K. S. Kalaivani, S. Uma & Dr.C.S. Kanimozhiselvi International Annual Engineering Seminar (InAES),
“Comparison of Deep Learning Approaches for Sentiment Yogyakarta, Indonesia
Classification” IEEE Xplore, ICICT 2021.
[2] Borja Arroyo Galendei, Silvia Uribes. “Conspiracy or not? A
Deep Learning Approach to Spot It on Twitter” IEEE Access,
February 15, 2022,
[3] Naila Aslam, Furqan Rustam, Ernesto Lee & Patrick Bernard
Washington “Sentiment Analysis and Emotion Detection on
Cryptocurrency Related Tweets Using Ensemble LSTM-GRU
Model” IEEE Access March 19, 2022.
[4] Ambreen Nazir &Yuan Rao, Lianwei Wu, “Issues and
Challenges of Aspect-based Sentiment Analysis: A
Comprehensive Survey”, IEEE Transaction on Affective
Computing, Vol. 13, No. 2, April-June 2022.
[5] Sakiran Tam, Rachid Ben Said & O. Ozgur Tanriover, “A

www.ijaers.com Page | 112

You might also like