0% found this document useful (0 votes)
13 views18 pages

Roberta-Bilstm: A Context-Aware Hybrid Model For Sentiment Analysis

The document presents RoBERTa-BiLSTM, a hybrid deep learning model for sentiment analysis that combines the strengths of RoBERTa and Bidirectional Long Short-Term Memory (BiLSTM) networks. This model effectively addresses challenges in sentiment analysis, such as long dependencies and lexical diversity, achieving superior accuracy on datasets like Twitter US Airline, IMDb, and Sentiment140. Experimental results indicate that RoBERTa-BiLSTM outperforms existing models, demonstrating its potential for enhancing sentiment analysis across various applications.

Uploaded by

tegar.cbl47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views18 pages

Roberta-Bilstm: A Context-Aware Hybrid Model For Sentiment Analysis

The document presents RoBERTa-BiLSTM, a hybrid deep learning model for sentiment analysis that combines the strengths of RoBERTa and Bidirectional Long Short-Term Memory (BiLSTM) networks. This model effectively addresses challenges in sentiment analysis, such as long dependencies and lexical diversity, achieving superior accuracy on datasets like Twitter US Airline, IMDb, and Sentiment140. Experimental results indicate that RoBERTa-BiLSTM outperforms existing models, demonstrating its potential for enhancing sentiment analysis across various applications.

Uploaded by

tegar.cbl47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO.

8, AUGUST 2021 1

RoBERTa-BiLSTM: A Context-Aware Hybrid


Model for Sentiment Analysis
Md. Mostafizer Rahman, Ariful Islam Shiplu, Yutaka Watanobe, Member, IEEE, and Md. Ashad Alam

Abstract—With the rapid advancement of technology and its such as social media, entertainment, education, programming,
easy accessibility, online activity has become an integral part of sports, and business. Particularly, social media platforms have
everyday human life. Expressing opinions, providing feedback, emerged as primary mediums for communication, facilitating
and sharing feelings by commenting on various platforms, includ-
ing social media, education, business, entertainment, and sports, discussions on a broad spectrum of topics [1]–[4]. The com-
arXiv:2406.00367v1 [cs.CL] 1 Jun 2024

has become a common phenomenon. Effectively analyzing these ments generated on these platforms hold significant value for
comments to uncover latent intentions holds immense value in decision-making and strategy formulation. Sentiment analy-
making strategic decisions across various domains. However, sev- sis1 , the process of extracting actual sentiment or underlying
eral challenges hinder the process of sentiment analysis including meaning from comments [5], plays a crucial role in under-
the lexical diversity exhibited in comments, the presence of long
dependencies within the text, encountering unknown symbols standing public thinking. It has emerged as a prominent topic
and words, and dealing with imbalanced datasets. Moreover, in the field of Natural Language Processing (NLP) due to
existing sentiment analysis tasks mostly leveraged sequential its significance [6]. Understanding the latent intentions within
models to encode the long dependent texts and it requires user comments is crucial for various applications, including
longer execution time as it processes the text sequentially. In brand monitoring [7], market analysis [8], [9], and sentiment
contrast, the Transformer requires less execution time due to
its parallel processing nature. In this work, we introduce a analysis [10]. However, accurately discerning the intended
novel hybrid deep learning model, RoBERTa-BiLSTM, which meaning behind comments remains a considerable challenge.
combines the Robustly Optimized BERT Pretraining Approach Hence, it is crucial to devise an effective sentiment analysis
(RoBERTa) with Bidirectional Long Short-Term Memory (BiL- model capable of comprehending long-distance dependencies,
STM) networks. RoBERTa is utilized to generate meaningful unfamiliar words and symbols, as well as code-mixed lan-
word embedding vectors, while BiLSTM effectively captures the
contextual semantics of long-dependent texts. The RoBERTa- guages (comments containing two or more languages), while
BiLSTM hybrid model leverages the strengths of both se- also adeptly managing the lexical diversity present in texts.
quential and Transformer models to enhance performance in A significant number of approaches have been proposed em-
sentiment analysis. We conducted experiments using datasets ploying machine learning (ML), deep learning (DL), Recurrent
from IMDb, Twitter US Airline, and Sentiment140 to evaluate
Neural Networks (RNNs), and Transformer [11]-based large
the proposed model against existing state-of-the-art methods. Our
experimental findings demonstrate that the RoBERTa-BiLSTM language models (LLMs) for comment analysis. In a study
model surpasses baseline models (e.g., BERT, RoBERTa-base, [12], three ML models—Naı̈ve Bayes (NB), Support Vector
RoBERTa-GRU, and RoBERTa-LSTM), achieving accuracies of Machine (SVM), and K-Nearest Neighbor (KNN)—were ap-
80.74%, 92.36%, and 82.25% on the Twitter US Airline, IMDb, plied to Twitter data to comprehend people’s sentiments. The
and Sentiment140 datasets, respectively. Additionally, the model
NB model achieved an accuracy of 75.58% and outperformed
achieves F1-scores of 80.73%, 92.35%, and 82.25% on the same
datasets, respectively. the other models. In [13], Logistic Regression (LR), XgBoost,
NB, Decision Tree (DT), SVM, and Random Forest (RF) ML
Index Terms—Sentiment Analysis, Comment Classifica-
tion, Deep Learning, Transformer, RNN, RoBERTa-BiLSTM, algorithms were utilized for sentiment analysis of Twitter US
RoBERTa, BiLSTM, Natural Language Understanding. Airline dataset. Text preprocessing steps, such as stop word
and punctuation removal, case folding, and stemming, were
I. I NTRODUCTION performed before model training. The SVM model achieved
an accuracy of 83.31%, surpassing the performance of the

I N today’s world, technological advancements have em-


powered individuals to provide feedback and review, ex-
press opinions, and share feelings across various platforms
compared models. Similarly, various ML algorithms have been
employed to conduct sentiment analysis on diverse datasets
such as Sentiment140 and Airline reviews [14]–[18].
Md. Mostafizer Rahman is with the Department of Computer and Informa- In addition to ML algorithms, RNN models like Long
tion Systems, University of Aizu, Japan and is with the Information and Com- Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM),
munication Technology Cell, Dhaka University of Engineering & Technology,
Gazipur, Bangladesh (e-mail: [email protected], [email protected]). Gated Recurrent Unit (GRU), and Convolutional Neural Net-
Ariful Islam Shiplu is with the Department of Computer Science and Engi- work (CNN) have been utilized, achieving superior accuracy
neering, Dhaka University of Engineering & Technology, Gazipur, Bangladesh for comment analysis compared to ML models [19]–[23].
(e-mail: [email protected]).
Yutaka Watanobe is with the Department of Computer and Information Surveys of DL models for sentiment analysis presented in
Systems, University of Aizu, Japan (e-mail: [email protected]). research [24], [25]. In some studies, researchers have explored
Md. Ashad Alam is with the Ochsner Center for Outcomes Research, New
Orleans, LA 70121, USA (e-mail: [email protected]). 1 Sentiment Analysis, Comment Analysis, and Text Analysis are used
0000–0000/00$00.00
interchangeably
© 2021 IEEE to convey the same meaning.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 2

techniques utilizing pre-trained word embeddings such as employed to generate representative word embeddings of
Doc2vec, Word2vec, fastText, and GloVe [26], while word the text. RoBERTa’s features, including extensive train-
embedding is crucial for sentiment analysis. Many ML and ing with large datasets, training on longer sequences,
DL models have been proposed for sentiment analysis, with removal of the next sentence predictive objective, and
sequential models particularly adept at encoding long-distance dynamically changing masking patterns during training,
dependencies in text. However, sequential models are com- render it highly effective in sentiment analysis tasks.
putationally less efficient due to their serialized processing Conversely, the bidirectional data processing nature of
capability. In contrast, Transformer-based LLMs take com- BiLSTM aids in capturing long-range temporal depen-
paratively less computational time due to their parallelized dencies within word embeddings, which is beneficial in
processing capability. text analytics.
In recent years, advancements in LLMs, particularly (ii) Experimental results reveal that RoBERTa-BiLSTM
Transformer-based architectures such as Generative Pre- achieved an A of 80.74% for Twitter US Airline,
trained Transformer (GPT) [27], Bidirectional Encoder Rep- 92.36% for IMDb, and 82.25% for Sentiment140
resentations from Transformers (BERT) [28], and Robustly datasets. These findings demonstrate that RoBERTa-
Optimized BERT Pretraining Approach (RoBERTa) [29], have BiLSTM outperforms RoBERTa-base, RobERTa-GRU,
offered even greater potential for improving sentiment analysis RobERTa-LSTM, and other state-of-the-art models.
tasks [30]–[35]. The Transformer-based model leverages the (iii) Hyperparameters are fine-tuned to ascertain the optimal
attention mechanism [36] which makes it more effective in parameters for the RoBERTa-BiLSTM model for senti-
NLP tasks. Particularly, attention mechanism calculates a ment analysis. Data augmentation is applied to an im-
weighted sum of the input embeddings, where the weights are balanced dataset to showcase the model’s performance
determined by a learned compatibility function between the before and after augmentation.
query and key embeddings [26]. This capability empowers the The rest of the paper is organized as follows: Section II
model to adeptly capture long-range dependencies within the presents a review of related research employing ML, DL, and
input sequence, resulting in the creation of more informative LLMs for sentiment analysis. Section III presents the task
representations. Younas et al. [37] proposed two LLMs, Mul- description of this research. Section IV elaborates the proposed
tilingual BERT (mBERT) and XML-RoBERTa (XML-R), for RoBERTa-BiLSTM approach for sentiment analysis. Section
the analysis of code-mixed language comments on a Twitter V presents details on the dataset and the preprocessing steps
dataset. Experimental results demonstrated that mBERT and undertaken for experiments. Section VI provides insight into
XML-R achieved accuracy (A) scores of 69% and 71%, the hyperparameters employed and their fine-tuning policy
respectively. In a study [38], the BERT model exhibited in this research. Following that, in Section VII, we present
superior performance (with an A of 85.4%) compared to comprehensive experimental results spanning various datasets
ML models for comment analysis. Moreover, comprehensive to showcase the performance of the model. Section VIII
surveys on text analysis with Transformer-based LLMs are engages in a discussion of the obtained results. Lastly, in
presented in studies [39], [40]. Poria et al. [41] discussed Section IX, we conclude this study, reflecting on its findings
existing challenges and explored new research directions in and offering insights into future research directions.
sentiment analysis.
In this paper, we propose a hybrid model, RoBERTa- II. R ELATED W ORKS
BiLSTM, designed for sentiment analysis. This model
In this section, we present a comprehensive review of
harnesses the strengths of both Transformer-based LLM,
sentiment analysis research, focusing on various methods
RoBERTa, and RNN-based model, BiLSTM. The RoBERTa
employed, including ML, DL, and LLM. Additionally, our
model serves as the encoder, tokenizing the input sequence
literature review encompasses diverse datasets, such as movie
and encoding it into representative word embeddings. These
reviews, social media text, program code, mixed text, airline
embeddings are then fed into the BiLSTM via a dropout
review text, and Covid-19 datasets. Through an in-depth anal-
layer. The BiLSTM model effectively captures the long-range
ysis, we explore the effectiveness and applicability of different
dependencies within the word embeddings while mitigating
techniques and models for sentiment analysis across diverse
the gradient vanishing issue often encountered in RNNs. By
domains.
processing the input sequence in both forward and backward
directions, the BiLSTM enhances the model’s contextual un-
derstanding of the text [42], [43]. To discern the relationship A. Machine Learning
between the output of the BiLSTM model and the class labels, The study [12] contributes to understanding public senti-
a dense layer is incorporated. Additionally, a Softmax function ment towards 2019 Republic of Indonesia presidential can-
is applied to the classification layer to estimate the probability didates by conducting sentiment analysis on Twitter data.
distribution of the class labels. The main contributions of the Three ML algorithms (e.g., NB, SVM, and KNN) are used
paper are outlined as follows: to classify sentiments, providing insights into public opinion
(i) We propose a hybrid context-aware model, RoBERTa- dynamics during the election period. Several steps including
BiLSTM, for sentiment analysis. This model combines data collection and text preprocessing are taken into account
the strengths of LLM and RNN to enhance the un- before the training process. The results indicate that the
derstanding of textual content. The RoBERTa model is sentiment polarity of the combined data achieves an A of
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 3

80.1%. Rahat et al. [17] discuss the importance of sentiment egories. Shiplu et al. [44] effectively address the critical
analysis across various domains and propose the development need for precise comment classification in the era of rapid
of a platform to discern opinions as positive, negative, or data expansion. Their comprehensive methodology integrates
neutral using supervised ML techniques. It highlights data diverse machine learning algorithms and voting techniques,
preprocessing techniques applied to social media content for yielding promising results. The proposed ensemble model,
extracting structured reviews. Additionally, it explores the RF+AdaBoost+SVM+Soft-Voting, attains an impressive A of
application of algorithms to classify sentiments. The results of approximately 98% on a YouTube dataset.
the experiment indicate that SVM outperforms NB algorithm
in terms of overall A, precision (P), and recall (R) values, B. Deep Learning
particularly in the context of analyzing airline reviews. The
In recent years, researchers have made significant advance-
evaluation demonstrates the effectiveness of the proposed
ments in sentiment analysis by employing DL techniques.
approach in accurately classifying sentiments across different
Rhanoui et al. [45] proposed a CNN-BiLSTM model with
domains.
Doc2vec embeddings, achieving a remarkable A of 90.66%,
The study [18] utilizes Sentiment140 for computer simu-
surpassing existing methods. Similarly, Anbukkarasi et al.
lations, a widely used dataset for sentiment analysis tasks
tailored a combined character-based DBLSTM model for
due to its large volume of Twitter messages annotated with
sentiment analysis of Tamil tweets, outperforming LSTM
sentiment labels. The contribution of this paper lies in its
with an A of 86.2% and an F1 of 81% [46]. Dholpuria et
proposition of a novel scheme for sentiment analysis, tailored
al. [22] conducted a comparative analysis between DL and
for real-time stream data, by integrating Laplace Smoothing
traditional supervised ML classifiers for sentiment analysis of
with Binarized Naive Bayes Classifier (NBC) and leveraging
movie reviews. They found that combining DL with traditional
the distributed and parallel processing capabilities of SparkR.
methods significantly improved classification accuracy, with
In terms of accuracy, the computer simulations conducted with
CNN achieving an A, P, R, and F1 of 99.33%, 99.67%,
Sentiment140 consistently demonstrate superior performance
99.02%, and 99.35%, respectively. Alahmary et al. [21] filled a
of the proposed approach over existing schemes. The integra-
research gap by applying DL techniques to sentiment analysis
tion of NBC effectively improves sentiment analysis accuracy,
of Saudi dialect texts, with BiLSTM achieving the highest A
while the utilization of SparkR environment further enhances
of 94%, surpassing LSTM ( A of 92%) and SVM ( A of
efficiency, making real-time sentiment analysis feasible and
86.4%). Thinh et al. [23] utilized DL models with a feature
accurate. Madhuri and collaborators [16] focus on collecting
extractor comprising convolutional, max pooling, and batch
tweets. This targeted tweets dataset enables the framework
normalization layers, achieving promising results with an A
to provide contextually relevant sentiment analysis for the
of 90.02% on the IMDb review sentiment dataset. Haoyue
railway domain. They proposed a framework that employs a
et al. [47] conducted a survey on DL methods for sentiment
comprehensive evaluation procedure. This empirical study also
analysis. They comprehensively discuss benchmark datasets,
demonstrates the effectiveness of the proposed framework in
evaluation metrics, and the performance of existing DL meth-
sentiment analysis. Through rigorous evaluation in terms of
ods. Furthermore, they address current challenges and future
P, R, and F1-score (F1), the framework showcases its utility
research directions in sentiment analysis using DL methods.
in extracting valuable insights from social media data, thereby
These studies collectively demonstrate the effectiveness of DL
facilitating informed decision-making and strategic planning
approaches in accurately discerning sentiments from textual
for enterprises operating in the railway domain. Prabhakar et
data across diverse domains and languages.
al. [14] use different places on the internet to collect what
customers think about US Airlines. This includes places like
Skytrax where people leave reviews and Twitter where they C. Large Language Model
share short messages. These places give important information Singh et al. [48] employed the BERT model for sentiment
used to teach and test the new way of understanding people’s analysis of COVID-19 datasets, achieving a notable A of
feelings. Their research introduces an improved Adaboost 94% on tweets collected from various regions. Tan et al.
approach for sentiment analysis, which obtained the highest [49] presents a comprehensive approach to sentiment analysis
P, R, and F1 scores of 78%, 65%, 68%, respectively. in social media text, addressing the challenges of lexical
Saad [13] leveraging a dataset comprising tweets from diversity and imbalanced datasets. They introduce innovative
different US airlines. The process begins with preprocessing data augmentation techniques using GloVe word embeddings
steps, including cleaning the tweets and extracting features to and propose a hybrid DL model that integrates RoBERTa
create a Bag of Words (BoW) model. This paper employs six and LSTM. Their experimental results demonstrate the effec-
ML algorithms—SVM, LR, RF, XgBoost, NB, and DT—in the tiveness of the hybrid model, which surpasses state-of-the-art
classification phase to categorize tweets. The author utilizes methods across multiple datasets. Their study highlights the
the K-Fold Cross-Validation technique to split the data into potential practical applications of their approach in sentiment
70% of training and 30% of testing for validation purposes. analysis, particularly in the context of social media data.
The accuracy of each classifier is evaluated using metrics The study [37] contributes to the field by addressing the
such as A, P, R, and F1. After comparing the results, challenge of analyzing imperfect and informal languages, such
SVM emerges with the highest A of 83.31%, signifying as code-mixed Roman Urdu and English, prevalent on social
its effectiveness in categorizing tweets into sentiment cat- media platforms. They propose a state-of-the-art DL model for
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 4

Fig. 1: The proposed RoBERTa-BiLSTM hybrid model architecture

sentiment analysis, leveraging mBERT and XLM-R models. N egative, or N eutral/N one with high accuracy. Moreover,
Experimental results demonstrate that the XLM-R model, with we are motivated to tackle the question of how Transformer
tuned hyperparameters, outperforms mBERT, achieving an F1 and RNN-based hybrid model enhance sentiment analysis
score of 71%. This research underscores the efficacy of the performance. In response, we introduce a hybrid model,
XLM-R model in handling code-mixed text sentiment analysis RoBERTa-BiLSTM, aimed at enhancing sentiment analysis
without relying on lexical normalization or language dictionar- performance.
ies, thereby contributing significantly to the advancement of
sentiment analysis techniques for diverse linguistic contexts. IV. P ROPOSED RO BERTA -B I LSTM A PPROACH
Furthermore, pretrained LLMs are being utilized for sentiment In this section, we describe the proposed hybrid model,
analysis across various domains, including education, film, RoBERTa-BiLSTM, for sentiment analysis. The RoBERTa-
energy, feature unification, feature extraction, and more [50]– BiLSTM model blends the strengths of Transformer and RNN
[55]. architectures to enhance efficacy and accuracy in sentiment
analysis tasks. The architecture of the proposed model is
III. TASK D ESCRIPTION
depicted in Figure 1. The pretrained RoBERTa model acts
Let x = {x1 , x2 , x3 , · · · xn } be the set of input text. The as the encoder in the proposed hybrid model, tokenizing all
input text can be represented as an embedding matrix, with its input text and efficiently mapping tokens into meaningful
mathematical formulation described by Equation (1). word embedding representations. These word embeddings,
  generated by the pretrained RoBERTa model, are then fed
e1,1 e1,2 · · · e1,d 

  into a BiLSTM layer to capture long-range dependencies
 e2,1 e2,2 · · · e2,d 

 

. . .

 within the sequence of word embeddings. A dropout layer
El,d = .. .. . .. .. (1)
  is inserted between RoBERTa and BiLSTM to promote model
el−1,1 el−1,2 · · · el−1,d 





 generalization and prevent overfitting. Subsequently, a dense

el,1 el,2 ··· el,d
 layer is added to understand the correlation between the
output of BiLSTM and sentiment class labels. Finally, the
where E ∈ Rl×d is the embedding matrix, d is the
classification layer employs the Softmax function to estimate
dimension of word embedding, and l is the text length.
probability distributions of the classes. The overall steps of
Each word (α) in the text (x) can be represented as a d-
the proposed RoBERTa-BiLSTM hybrid model are outlined as
dimensional embedding, where α ∈ Rd . The word em-
pseudocode in Algorithm 1. The components of the proposed
bedding of P ositive, N eutral, or N egative is used to
hybrid model are detailed below.
represent the text polarity embedding m ∈ Rd , m ∈
{P ositive, N egative, N eutral/N one}. Please note that the
number of polarities/classes, denoted as m, may vary for A. RoBERTa
different datasets. For a given text x, our goal is to analyze RoBERTa, an extension of the BERT model, emerges as a
the text x with a model to detect its associated sentiment powerful asset in the realm of NLP, engineered to enhance the
polarity/class m ∈ {P ositive, N egative, N eutral/N one} effectiveness of natural language understanding (NLU) tasks.
with high accuracy. For example, for a given text ”ABCAir With its 12-layer architecture and 768 hidden states per layer,
although hour delay every single staff member ticket desk ad- RoBERTa aims to surpass the limitations of its predecessor by
miral club sweet pie”, what would be the polarity/class of the amalgamating extensive pretraining with fine-tuning strategies.
text? The model needs to predict the class as either P ositive, First introduced by Facebook AI researchers in 2019 [29],
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 5

Algorithm 1 The overall steps of the proposed sentiment [56], ii) OpenWebText (38 GB) [57],(iii) CC-News (76 GB),
analysis approach based on the RoBERTa-BiLSTM model ( and (iv) Stories (31 GB). Together, these sources contribute
1: Input: Text/Comments X = {x1 , x2 , x3 , · · · , xn } to an impressive 161 GB of text data.
2: Output: Predict the underlying sentiment (e.g., The proposed hybrid model builds upon RoBERTa as its
positive, neutral, or negative) based on foundational layer, harnessing its advanced capabilities in con-
the given text/comments. textual learning and tokenization. Unlike BERT’s static mask-
3: Lowercasing the text to ensure uniformity. ing, RoBERTa adopts dynamic masking, enabling the model
4: Set of elements for removal E={ special symbols, to extract insights from diverse input sequences. Additionally,
URLs, hashtags, punctuation, special RoBERTa’s training on a large text corpus and utilization
characters, numbers, the, an, a} of byte-level Byte Pair Encoding for tokenization enhance
5: for each text x ∈ X do its computational efficiency and vocabulary robustness. The
6: Initialize a list W [] for words proposed model utilizes the pretrained RoBERTa tokenizer to
7: for each word α ∈ x do break down raw text into subword tokens, preserving semantic
8: if α ∈ E then meaning while mitigating the impact of out-of-vocabulary
9: Remove the word α from the text x words. Each token is then assigned a unique input ID, token
10: else ID, and attention mask, facilitating focused processing within
11: Assign words W [] ← α the RoBERTa framework. The harmonious integration of
12: end if RoBERTa’s capabilities with additional refinements highlights
13: end for its crucial role in advancing NLP performance and compre-
14: end for hension across diverse domains, pushing the boundaries of
15: Lemmatization is applied to W . language understanding to unprecedented heights.
16: Perform RoBERTa Tokenization to assign a unique
Input ID, Token ID, and Attention Mask to
B. Dropout Layer
each token.
17: RoBERTa Framework comprises 12 layers, each com- The dropout layer emerges as a pivotal technique in DL
posed of 768 hidden states.It is utilized to generate word and Transformer models, playing a crucial role in preventing
embeddings, E ∈ Rl×d , as per Eq (1). overfitting and improving model generalization. Originating
18: Dropout Layer is implemented to promote model gener- from the seminal work of Srivastava et al. [58], dropout
alization and mitigate overfitting. involves randomly deactivating a fraction of neurons within
19: BiLSTM Layer is added to extract features and en- a layer during each training iteration, thereby reducing the
hance the model’s predictive accuracy by leveraging both interdependence between neurons and preventing the network
contextual information from RoBERTa and long-range from memorizing noise in the training data. This regularization
dependencies between tokens. technique has been widely adopted in various neural network
20: Flatten Layer is added to reshape the input from multi- architectures, including CNNs, RNNs, and Transformers, with
dimensional tensor to one-dimensional tensor for the sub- significant success in enhancing performance on tasks such as
sequent dense layer. image classification, NLP, and speech recognition [59]. The
21: Dense Layer is added to connect all inputs from the versatility and effectiveness of dropout make it a fundamental
preceding layer to all activation units in the subsequent component in the toolkit of DL practitioners, facilitating the
layer. training of more robust and generalizable models. The dropout
22: Classification Layer is utilized to predict the sentiment layer is placed between RoBERTa and BiLSTM layers in the
of the text/comment. proposed hybrid model.

C. BiLSTM
RoBERTa embodies the Robustly Optimized BERT approach, BiLSTM, short for Bidirectional Long Short-Term Mem-
striving to enhance performance across a spectrum of NLU ory, represents a type of RNN architecture comprising two
tasks including text classification, question answering, and LSTM networks. One LSTM processes the input sequence
natural language inference. Both BERT and RoBERTa employ from left to right (known as the forward LSTM), while the
the Transformer architecture to facilitate sequence-to-sequence other processes it from right to left (known as the backward
tasks, relying on self-attention mechanisms to discern de- LSTM) [60], [61]. This bidirectional processing nature allows
pendencies between inputs and outputs. RoBERTa adopts BiLSTM to comprehend sequences more effectively, render-
a self-supervised approach, where it undergoes pretraining ing it particularly valuable in NLP tasks such as sentiment
on raw text data without requiring human annotations. It analysis [62]–[64], entity recognition [65], [66], and machine
autonomously generates inputs and corresponding labels from translation. Hochreiter and Schmidhuber [67] introduced the
the provided texts. RoBERTa’s training data comprises a vast LSTM architecture to tackle the vanishing gradient problem in
corpus, totaling ten times (10×) the size of BERT’s training RNNs. Graves and Schmidhuber [60], [68] further investigated
data. This extensive dataset comprises four primary sources: the effectiveness of BiLSTMs in tasks such as phoneme classi-
(i) BookCorpus combined with English Wikipedia (16 GB) fication and handwriting recognition. Their work demonstrated
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 6

the superiority of BiLSTMs over unidirectional models in cap- 6) Output (yt ):


turing rich contextual information. Consequently, BiLSTMs
yt = Softmax(Why ht + by ) (7)
have emerged as a widely adopted architecture for various
sequential data analysis tasks, offering enhanced performance Equation (7) generates the output prediction at time step t
and robustness. In the proposed hybrid model, the output of by applying a Softmax function to the linear transformation
the RoBERTa layer is passed through a dropout layer before of the hidden state ht .
being fed into the BiLSTM layer. The BiLSTM possesses the
ability to retain past information and effectively manage long- D. Flatten Layer
range dependencies within the input. Therefore, integrating a
BiLSTM as a feature extractor enhances the model’s predictive In the proposed architecture of the RoBERTa-BiLSTM
accuracy by leveraging both the contextual information from model, the flatten layer plays a pivotal role in facilitating
RoBERTa and the long-range dependencies between tokens. the transition from the output of the BiLSTM layer to the
The BiLSTM model architecture is described by the following subsequent dense layer. Specifically, the flatten layer serves
set of equations [42], [68]. to reshape the output of the BiLSTM layer into a format
1) Input Gate (it ): compatible with the input requirements of the dense layer.
The output of the BiLSTM layer typically comprises a tensor
f f f f f
it = σ(Wix xt + Wih ht−1 + Wic ct−1 + bfi ) with multiple dimensions, representing the sequential nature
b b b b b
⊙ σ(Wix xt + Wih ht+1 + Wic ct+1 + bbi ) (2) of the input data processed by the bidirectional LSTM units.
However, the subsequent dense layer expects its input to be
Equation (2) controls the flow of information into the cell in the form of a flat tensor with a one-dimensional shape. To
state Ct at time step t in both the forward and backward bridge this disparity in tensor shapes, the flatten layer performs
directions. It combines the input from the forward LSTM the essential operation of transforming the multi-dimensional
f f f f f
(Wix xt + Wih ht−1 + Wic ct−1 + bfi ) and the backward LSTM tensor output of the BiLSTM layer into a one-dimensional
(Wix xt + Wih ht+1 + Wic ct+1 + bbi ) using element-wise mul-
b b b b b
tensor [26]. This transformation is achieved by unrolling all
tiplication. The sigmoid function σ squashes the input to a the elements of the tensor, effectively converting it into a linear
range between 0 and 1, determining the extent to which each sequence.
component affects the input gate.
2) Forget Gate (ft ):
E. Dense Layer
ft = σ(Wffx xt + Wffh hft−1 + Wffc cft−1 + bff )
The dense layer, also known as the fully connected layer,
⊙ σ(Wfbx xt + Wfbh hbt+1 + Wfbc cbt+1 + bbf ) (3) plays a crucial role in capturing the relationships between
the hidden states generated by the BiLSTM layer and the
Equation (3) determines which information from the previ-
class labels. This layer establishes dense connectivity by
ous cell state ct−1 should be discarded or forgotten in both the
connecting all neurons from the preceding layer to those
forward and backward directions. It combines the forget gate
in the subsequent layer. In the proposed model, two dense
computations from the forward and backward LSTMs using
layers are incorporated. The first dense layer encodes the
element-wise multiplication.
3) Cell State Update (Ct ): relationship between the flattened output of the BiLSTM and
the class labels. It captures the underlying patterns in the data
f f f
Ct = ft ⊙ Ct−1 + it ⊙ tanh(Wcx xt + Wch ht−1 + bfc ) to facilitate classification. The second dense layer performs
b
+ it ⊙ tanh(Wcx b b
xt + Wch ht+1 + bbc ) (4) the final classification by generating a probability distribution
using the Softmax activation function for the classes. The
Equation (4) updates the cell state Ct by combining in- Softmax function squashes the output values to fall within
formation from both the forward and backward directions. the range of [0, 1], ensuring that the sum of the probabilities
It combines the contributions from the input gate in both equals 1.
directions and the tanh activations of the candidate values.
4) Output Gate (ot ):
F. Softmax Layer
f f f f f
ot = σ(Wox xt + Woh ht−1 + Woc ct + bfo ) The Softmax layer serves as the top layer in the proposed
b b b b b
⊙ σ(Wox xt + Woh ht+1 + Woc ct + bbo ) (5) hybrid model for sentiment classification. Also referred to
Equation (5) regulates which parts of the cell state Ct should as the classification or output layer, it applies the Softmax
be used to compute the hidden state ht at the current time step function to generate probability distributions for the classes.
t in both the forward and backward directions. It combines the This function can be described as follows:
output gate computations from both directions using element- eOj
wise multiplication. Sof tmax(O)j = PM (8)
5) Hidden State (ht ): k=1 eOk
M denotes the number of sentiment classes. The numera-
ht = ot ⊙ tanh(Ct ) (6)
tor, eOj , represents the exponential
PMfunction applied to each
Equation (6) computes the hidden state ht by applying the element of O. The denominator, k=1 eOk , denotes the sum
output gate to the hyperbolic tangent of the cell state Ct . of the exponential functions of all elements.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 7

(a) Twitter US Airline (b) IMDb Review (c) Sentiment140


Fig. 2: The sample distribution of the Twitter US Airline, IMDb Review, and Sentiment140 datasets

V. DATASET syntactically important but semantically less relevant. Further-


In this study, we utilized three publicly available datasets: more, lemmatization [71] is applied to bring words to their
IMDb, Twitter US Airline, and Sentiment140, to evaluate original form, enhancing semantic consistency. For instance,
the performance of the proposed RoBERTa-BiLSTM model. the word caring is transformed into care after a lemmatization
Figure 2 illustrates the sample data distributions within these operation. Although both lemmatization and stemming serve
three datasets. The Twitter US Airline dataset [49] consists the same purpose of reducing words to their base forms,
of 14,640 tweets categorized into three sentiment classes: lemmatization offers several advantages over stemming. The
positive, neutral, and negative. These tweets were gathered data preprocessing steps standardize the raw data by removing
from customers of six major American Airlines, namely Delta, irrelevant information. This allows the LLMs to effectively
US-Airways, Virgin America, Southwest, and United, for process the filtered data, thereby enhancing the performance
sentiment analysis. The distribution of tweets across sentiment of sentiment analysis.
classes reveals an imbalance, with 62.69% of tweets classified
as negative, 16.14% as positive, and 21.17% as neutral. The
IMDb dataset [69] comprises a total of 50,000 reviews, evenly
split between positive and negative reviews, resulting in a
balanced dataset with 50% of samples allocated to each
class. The Sentiment140 dataset [70] is a sizable collection
of approximately 1.6 million tweets designed for sentiment
analysis. The Sentiment140 dataset was compiled from Twitter
by Stanford University in 2009. This dataset features an equal Fig. 3: The data preprocessing steps of the proposed approach
distribution of tweets across positive and negative classes, with
each class representing 50% of the dataset.

A. Data Preprocessing VI. H YPERPARAMETERS


In sentiment analysis, data preprocessing plays a crucial
role in filtering out irrelevant elements that could potentially In the proposed RoBERTa-BiLSTM model, various sets
hinder the performance of the ML models. Figure 3 illustrates of hyperparameters are employed to achieve superior re-
the comprehensive data preprocessing steps involved in the sults. The selection of an optimal set of hyperparameters is
proposed RoBERTa-BiLSTM approach. Since the texts or pivotal for accurately analyzing sentiments. RoBERTa and
comments within datasets are collected from users of platforms BERT, both LLMs, are separately combined with different
like IMDb or Twitter, they may contain a mix of upper RNN architectures, including LSTM, GRU, and BiLSTM.
and lower-case text. Therefore, case folding is an essential During the training process, different learning rates (l) (i.e.,
preprocessing step to ensure a consistent text case. All texts l = {0.0001, 0.00001, 0.000001}) and hidden units (h) (i.e.,
are converted to lowercase to maintain uniformity. Moreover, h = {128, 256, 512}) of the RNN are utilized. Please note that


irrelevant elements such as special symbols, punctuation, the number of hidden units (h) will be doubled (2×h) (i.e., h
←−
hashtags, URLs, special characters, and numbers are removed + h ) for the BiLSTM due to its forward (→) and backward
from the text. Another significant aspect of data preprocessing (←) data processing nature. The loss function plays a crucial
involves the elimination of stop-words, which carry little role in network optimization by computing the overall model
meaning in sentiment analysis. Stop-words, such as the, an, loss in each training epoch. Given that sentiment analysis
and a, are frequently occurring words in the text that are involves a multi-class classification problem, categorical cross-
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 8

entropy is chosen as the loss function (L). It is defined as confusion matrix [42]. Accuracy (A), for instance, is defined
follows: as the ratio of correct predictions to the total number of
M
X predictions, as follows:
L(p) = − yi log(yˆi ) (9)
i=1 |M |
1 X X
where p denotes the model parameter; M denotes the num- A= Υ(f (x) = fˆ(x)) (10)
N i=1
ber of classes; and yi and yˆi are the true and predicted labels, x:f (x)=i
respectively for the ith sample. Table I provides a summary where Υ is a function that returns 1 if the class is true
of the hyperparameters and their corresponding values used in and 0, otherwise. Here, f (x) ∈ M = {1, 2, 3, · · · }. We also
the experiments. computed the precision, recall, and F1-score under weighted-
TABLE I: The sets of hyperparameters for model fine-tuning average settings. The F1-score is derived from the mean of
all F1-scores of individual classes, taking into account the
Hyperparameters Values
support of each class. The term support refers to the
Large Language Models BERT (bert-base-uncased [28]),
number of instances per class, while weight denotes the
(LLMs) RoBERTa (roberta-base [29]) ratio of instances of each class to the total instances. The
Recurrent Neural Networks GRU, LSTM, BiLSTM weighted-precision (Pw ), recall (Rw ), and F1-score (F1w )
(RNNs)
are calculated in Eqs. (11)-(13):
Optimization method AdamW, SGD, RMSprop, Rprop
Loss function (L) Categorical Cross Entropy
(cross_entropy) |M | P|M |
Epochs (epoch) 5 1 X T Pi i=1 Pi × |qi |
Dropout (d) 0.1 Pw = × |qi | = (11)
|Q| i=1 T Pi + F Pi |Q|
Learning rates (l) 0.0001, 0.00001, 0.000001
Hidden units (h) of RNNs 128, 256, 512
Datasets IMDb, Twitter US Airline, Senti- |M | P|M |
ment140 1 X T Pi i=1 Ri × |qi |
Training data 90% Rw = × |qi | = (12)
|Q| i=1 T Pi + F Ni |Q|
Validation data 5%
Testing data 5%
|M |
1 X
F1w = F1i × |qi | (13)
|Q| i=1
VII. E XPERIMENTAL R ESULTS where |Q| is the sum of all supports and |qi | is the support
In this section, we present the experimental results of the of the i class.
RBERTa-base, RoBERTa-GRU, RoBERTa-LSTM, RoBERTa-
BiLSTM, BERT-GRU, BERT-LSTM, and BERT-BiLSTM
2

4
91.3

90.0

models on the IMDb, Twitter US Airline, and Sentiment140

7
82.1
8
90
79.7

datasets. Furthermore, the hyperparameters are meticulously 74.5


9
6
71.8

80
fine-tuned to determine the most optimal settings and param-
eters for each model. Finally, we compare the overall per- 70
A (%)

formance of the proposed RoBERTa-BiLSTM model against


3

3
2

60
49.8

49.8
49.3

state-of-the-art models.
50

40
A. Implementation Details
30
The experiments are conducted within the operating system IMDb Twitter US Airline Sentiment140
environment of Ubuntu 22.04.4 LTS 64-bit. The hardware
specifications are detailed as follows: Processor: AMD Ryzen l = 0.0001 l = 0.00001 l = 0.000001

9 3950X 16-core processor with 32 threads, RAM: 64GB,


Graphics: NVIDIA GeForce RTX 3090/PCIe/SSE2, Graphics Fig. 4: Assessing the test accuracy of the RoBERTa-base
Memory: 24GB, and Disk Capacity: 500GB. model using various hyperparameters:
l = {0.0001, 0.00001, 0.000001}. The model is trained for 5
epochs using the AdamW optimizer.
B. Evaluation Metrics
In the context of sentiment analysis, it is important to
comprehend the underlying meaning of comments/texts and
then classify them as positive, negative, or neutral. The pro- C. Results
posed hybrid RoBERTa-BiLSTM model serves as a classifier Comprehensive experiments are conducted using three dif-
designed specifically for sentiment analysis. To assess the ferent datasets (e.g., IMDb, Twitter US Airline, and Senti-
performance of the classifier, a confusion matrix is employed. ment140) and various hyperparameter settings to showcase
Metrics such as accuracy, precision, recall, and F1-score are the effectiveness of the proposed model. Table II presents the
derived using elements such as TP, FP, TN, and FN from the quantitative classification results of F1w , Pw , and Rw for
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 9

TABLE II: Quantitative results (F1w , Pw , and Rw ) for sentiment analysis using the RoBERTa-base model are based on the
IMDb, Twitter US Airline, and Sentiment140 datasets
Learning Model IMDb Dataset Twitter US Airline Dataset Sentiment140 Dataset
Rate (l) Evaluation F1w Pw Rw F1w Pw Rw F1w Pw Rw
Training 0.333259 0.249933 0.499933 0.679082 0.677392 0.728218 0.333219 0.249897 0.499897
0.0001 Validation 0.342260 0.258064 0.508000 0.680925 0.677001 0.733607 0.337725 0.253960 0.503944
Test 0.325805 0.243246 0.493200 0.675668 0.690883 0.718579 0.331457 0.248312 0.498310
Training 0.956633 0.957143 0.956644 0.911728 0.912569 0.911278 0.859462 0.859888 0.859499
0.00001 Validation 0.908376 0.909582 0.908400 0.780347 0.784632 0.777322 0.818549 0.818888 0.818617
Test 0.913101 0.914400 0.913200 0.801151 0.807024 0.797814 0.821697 0.822092 0.821734
Training 0.917354 0.917387 0.917356 0.748574 0.746235 0.752353 0.333219 0.249897 0.499897
0.000001 Validation 0.903604 0.903632 0.903600 0.727966 0.727360 0.730874 0.337725 0.253960 0.503944
Test 0.900390 0.900429 0.900400 0.739893 0.738560 0.745902 0.331457 0.248312 0.498310

TABLE III: Quantitative results (F1w , Pw , and Rw ) for sentiment analysis using the RoBERTa-GRU model are based on the
IMDb, Twitter US Airline, and Sentiment140 datasets
Learning Model Hidden IMDb Dataset Twitter US Airline Dataset Sentiment140 Dataset
Rate (l) Evaluation Units (h) F1w Pw Rw F1w Pw Rw F1w Pw Rw
128 0.333407 0.250067 0.500067 0.633524 0.604036 0.675774 0.333219 0.249897 0.499897
Training 256 0.333407 0.250067 0.500067 0.497897 0.533193 0.633728 0.333219 0.249897 0.499897
512 0.333259 0.249933 0.499933 0.605418 0.541333 0.692927 0.333219 0.249897 0.499897
128 0.324483 0.242064 0.492000 0.598217 0.574599 0.633880 0.337725 0.253960 0.503944
0.0001 Validation 256 0.324483 0.242064 0.492000 0.516451 0.552385 0.648907 0.337725 0.253960 0.503944
512 0.342260 0.258064 0.508000 0.643967 0.585739 0.722678 0.337725 0.253960 0.503944
128 0.340916 0.256846 0.506800 0.585397 0.552140 0.631148 0.331457 0.248312 0.498310
Test 256 0.340916 0.256846 0.506800 0.477446 0.524948 0.612022 0.331457 0.248312 0.498310
512 0.325805 0.243246 0.493200 0.579547 0.516748 0.669399 0.331457 0.248312 0.498310
128 0.955797 0.955939 0.955800 0.870027 0.869620 0.873103 0.863399 0.863412 0.863400
Training 256 0.938483 0.938665 0.938489 0.820532 0.828654 0.832271 0.863389 0.863402 0.863390
512 0.939976 0.940030 0.939978 0.829548 0.832554 0.838039 0.863479 0.863479 0.863479
128 0.917603 0.917945 0.9176 0.789664 0.787591 0.795082 0.819446 0.819461 0.819443
0.00001 Validation 256 0.911955 0.912378 0.912000 0.764612 0.766811 0.782787 0.819859 0.819892 0.819856
512 0.918005 0.918166 0.918000 0.782206 0.782099 0.795082 0.819120 0.819128 0.819117
128 0.925966 0.926384 0.926000 0.793064 0.792070 0.795082 0.822510 0.822512 0.822511
Test 256 0.916803 0.916975 0.916800 0.771077 0.776701 0.784153 0.823194 0.823216 0.823199
512 0.919576 0.919801 0.919600 0.790581 0.794184 0.799180 0.822123 0.822124 0.822123
128 0.920795 0.920900 0.920800 0.808283 0.806835 0.811020 0.822880 0.822998 0.822894
Training 256 0.926133 0.926140 0.926133 0.813906 0.812793 0.817168 0.823222 0.823254 0.823226
512 0.926422 0.926423 0.926422 0.809176 0.807872 0.812234 0.823492 0.823514 0.823495
128 0.904380 0.904489 0.904400 0.751607 0.751851 0.751366 0.812731 0.812763 0.812744
0.000001 Validation 256 0.914787 0.914858 0.914800 0.758944 0.755749 0.763661 0.813230 0.813231 0.813232
512 0.912003 0.912016 0.912000 0.756455 0.754155 0.759563 0.813305 0.813306 0.813308
128 0.904404 0.904513 0.904400 0.773350 0.772348 0.774590 0.816133 0.816201 0.816137
Test 256 0.909602 0.909615 0.909600 0.772407 0.770946 0.774590 0.816200 0.816222 0.816200
512 0.911598 0.911599 0.911600 0.775682 0.774644 0.777322 0.816262 0.816284 0.816263

the RoBERTa-base model with learning rates of l = 0.0001, Tables III-V display the results of F1w , Pw , and
0.00001, and 0.000001. RoBERTa-base model is a type of Rw metrics for the RoBERTa-GRU, RoBERTa-LSTM, and
model that is not combined with RNNs. The model is trained RoBERTa-BiLSTM models across three datasets. These ex-
for 5 epochs using the AdamW optimizer. It is observed that periments adhere to specific hyperparameter settings (i.e.,
the RoBERTa-base model achieved F1w , Pw , and Rw scores l = {0.0001, 0.00001, 0.000001}, epoch = 5, h =
of 91.31%, 91.44%, and 91.32%, respectively for the IMDb {128, 256, 512}, d = 0.1, and optimizer=AdamW). Evaluation
dataset; 80.12%, 80.70%, and 79.78%, respectively for the encompasses all models across training, validation, and test
Twitter US Airline dataset; and 82.17%, 82.21%, and 82.17%, data splits. The RoBERTa-GRU model achieved F1w , Pw , and
respectively for the Sentiment140 dataset with l = 0.00001. Rw scores of 92.60%, 92.64%, and 92.60%, respectively, for
Similarly, the model achieved higher F1w , Pw , and Rw scores the IMDb; and 79.06%, 79.42%, and 79.92%, respectively, for
during training and validation. We experimented with faster the Twitter US Airline. It also garnered an F1w , Pw , and Rw
learning rates (l = 0.0001) and slower learning rates (l = score of 82.32% for the Sentiment140 dataset. The RoBERTa-
0.000001) across the three datasets. However, the RoBERTa- LSTM model obtained F1w , Pw , and Rw scores of 92.04%
base model failed to achieve better results compared to the for the IMDb; 80.32%, 80.47%, and 80.33%, respectively,
outcomes obtained with l = 0.00001. Figure 4 illustrates that for the Twitter US Airline; and F1w , Pw , and Rw scores
the RoBERTa-base model achieved the comparatively higher of 82.29% for the Sentiment140 dataset. On the other hand,
A of 91.32%, 79.78%, and 82.17% for the IMDb, Twitter US the RoBERTa-BiLSTM model achieved F1w , Pw , and Rw
Airline, and Sentiment140 datasets, respectively, when using scores of 92.35%, 92.46%, and 92.36%, respectively, for the
l = 0.00001. IMDb; 80.73%, 80.94%, and 80.74%, respectively, for the
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

TABLE IV: Quantitative results (F1w , Pw , and Rw ) for sentiment analysis using the RoBERTa-LSTM model are based on
the IMDb, Twitter US Airline, and Sentiment140 datasets
Learning Model Hidden IMDb Dataset Twitter US Airline Dataset Sentiment140 Dataset
Rate (l) Evaluation Units (h) F1w Pw Rw F1w Pw Rw F1w Pw Rw
128 0.333259 0.249933 0.499933 0.483979 0.393857 0.627580 0.333219 0.249897 0.499897
Training 256 0.730056 0.767716 0.737644 0.606745 0.552990 0.698467 0.333219 0.249897 0.499897
512 0.333259 0.249933 0.499933 0.624402 0.657810 0.689891 0.333219 0.249897 0.499897
128 0.342260 0.258064 0.508000 0.503843 0.414018 0.643443 0.337725 0.253960 0.503944
0.0001 Validation 256 0.729988 0.763215 0.736000 0.633612 0.590176 0.717213 0.337725 0.253960 0.503944
512 0.342260 0.258064 0.508000 0.626151 0.587101 0.684426 0.337725 0.253960 0.503944
128 0.325805 0.243246 0.493200 0.448003 0.358035 0.598361 0.331457 0.248312 0.498310
Test 256 0.721702 0.755312 0.729600 0.597086 0.555527 0.691257 0.331457 0.248312 0.498310
512 0.325805 0.243246 0.493200 0.604107 0.555312 0.677596 0.331457 0.248312 0.498310
128 0.953644 0.953644 0.953644 0.828337 0.830639 0.826579 0.860572 0.860574 0.860572
Training 256 0.938178 0.938186 0.938178 0.871825 0.871062 0.873862 0.844569 0.844902 0.844603
512 0.938999 0.939022 0.939000 0.877842 0.877212 0.879781 0.862044 0.862074 0.862047
128 0.919198 0.919200 0.919200 0.771294 0.775925 0.767760 0.820595 0.820595 0.820595
0.00001 Validation 256 0.915990 0.916037 0.916000 0.783467 0.781502 0.786885 0.818002 0.818426 0.818028
512 0.917602 0.917608 0.917600 0.778456 0.776641 0.782787 0.819945 0.820022 0.819944
128 0.920399 0.920400 0.920400 0.795464 0.803598 0.790984 0.822586 0.822589 0.822586
Test 256 0.918003 0.918030 0.918000 0.803185 0.804711 0.803279 0.821310 0.821626 0.821359
512 0.917596 0.917610 0.917600 0.799630 0.801161 0.800546 0.822894 0.822914 0.822899
128 0.923443 0.923466 0.923444 0.805770 0.804582 0.809047 0.822898 0.822926 0.822901
Training 256 0.919124 0.919325 0.919133 0.807996 0.806568 0.810716 0.822861 0.822894 0.822865
512 0.924955 0.924972 0.924956 0.804269 0.802848 0.807301 0.823017 0.823042 0.823019
128 0.908000 0.908000 0.908000 0.749337 0.747838 0.751366 0.812013 0.812018 0.812018
0.000001 Validation 256 0.905160 0.905476 0.905200 0.754998 0.753753 0.756831 0.812903 0.812906 0.812907
512 0.913603 0.913616 0.913600 0.748105 0.747633 0.748634 0.812691 0.812693 0.812694
128 0.909191 0.908000 0.909200 0.785478 0.784112 0.788251 0.815862 0.815879 0.815862
Test 256 0.900796 0.901254 0.900800 0.775626 0.774391 0.777322 0.815974 0.815999 0.815975
512 0.911197 0.911204 0.911200 0.777189 0.775390 0.780055 0.815548 0.815577 0.815549

TABLE V: Quantitative results (F1w , Pw , and Rw ) for sentiment analysis using the RoBERTa-BiLSTM model are based on
the IMDb, Twitter US Airline, and Sentiment140 datasets
Learning Model Hidden IMDb Dataset Twitter US Airline Dataset Sentiment140 Dataset
Rate (l) Evaluation Units (h) F1w Pw Rw F1w Pw Rw F1w Pw Rw
128 0.333259 0.249933 0.499933 0.576871 0.522566 0.675622 0.333219 0.249897 0.499897
Training 256 0.835726 0.835790 0.835733 0.652124 0.650691 0.70636 0.333219 0.249897 0.499897
512 0.333259 0.249933 0.499933 0.483979 0.393857 0.627580 0.333219 0.249897 0.499897
128 0.342260 0.258064 0.508000 0.603579 0.562282 0.693989 0.337725 0.253960 0.503944
0.0001 Validation 256 0.834410 0.834507 0.834400 0.659741 0.646649 0.70765 0.337725 0.253960 0.503944
512 0.342260 0.258064 0.508000 0.503843 0.414018 0.643443 0.337725 0.253960 0.503944
128 0.325805 0.243246 0.493200 0.552052 0.507510 0.654372 0.331457 0.248312 0.498310
Test 256 0.837603 0.837611 0.837600 0.622737 0.609742 0.68306 0.331457 0.248312 0.498310
512 0.325805 0.243246 0.493200 0.448003 0.358035 0.598361 0.331457 0.248312 0.498310
128 0.954020 0.954107 0.954022 0.828071 0.832336 0.825668 0.861205 0.861205 0.861205
Training 256 0.958036 0.958392 0.958044 0.881230 0.880769 0.882210 0.861912 0.861914 0.861912
512 93.7617 93.7767 93.7622 0.834221 0.834907 0.840012 0.861940 0.861940 0.861940
128 0.918779 0.918942 0.918800 0.779927 0.785797 0.775956 0.820196 0.820205 0.820194
0.00001 Validation 256 0.922786 0.923801 0.922800 0.779584 0.780695 0.780055 0.820573 0.820604 0.820570
512 0.913549 0.914065 0.913600 0.771039 0.767785 0.780055 0.820208 0.820212 0.820207
128 0.918803 0.918842 0.918800 0.784169 0.792642 0.780055 0.822361 0.822362 0.822361
Test 256 0.923529 0.924562 0.923600 0.807334 0.809353 0.807377 0.822485 0.822486 0.822486
512 0.916802 0.917067 0.916800 0.798464 0.797720 0.803279 0.821984 0.821985 0.821985
128 0.922776 0.922821 0.922778 0.804605 0.803048 0.808288 0.822227 0.822256 0.822230
Training 256 0.922800 0.922800 0.922800 0.806545 0.805071 0.810033 0.822150 0.822182 0.822154
512 0.919013 0.919213 0.919022 0.803353 0.801858 0.806846 0.822890 0.822913 0.822893
128 0.911603 0.911621 0.911600 0.759718 0.757919 0.762295 0.812627 0.812631 0.812631
0.000001 Validation 256 0.910399 0.910399 0.910400 0.750240 0.748403 0.752732 0.812564 0.812569 0.812569
512 0.905163 0.905443 0.905200 0.747260 0.744380 0.751366 0.812579 0.812580 0.812581
128 0.908792 0.908826 0.908800 0.789076 0.787694 0.790984 0.815298 0.815319 0.815299
Test 256 0.909201 0.909205 0.909200 0.784110 0.782377 0.786885 0.815573 0.815602 0.815574
512 0.906404 0.906574 0.906400 0.781751 0.780117 0.784153 0.815875 0.815891 0.815874

Twitter US Airline; and 82.25% for the Sentiment140 dataset dation on the Sentiment140 dataset when compared to other
with l = 0.00001 and h = 256. Furthermore, RoBERTa- models. It can be seen that the RoBERTa-GRU and RoBERTa-
BiLSTM obtained superior F1w , Pw , and Rw scores of LSTM models failed to outperform the RoBERTa-BiLSTM
95.80%, 95.84%, and 95.80%, respectively, during training, model on the IMDb and Twitter US Airline datasets. How-
and 92.28%, 92.38%, and 92.28%, respectively, during vali- ever, the RoBERTa-GRU model exhibited marginally better
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11

6
6

8
6

8
8

2
2
6

2
8
6

4
4
92.6

92.0

92.3
91.8

91.7

91.8
91.9

91.6
91.6

91.1
90.9

90.8
91.1

90.9
90.8
90.9

90 . 6
90 . 4

6
83.7
90 90 90

6
72.9
80 80 80
A (%)

A (%)

A (%)
70 70 70
8

2
2
50.6

50.6

49.3

49.3

49.3

49.3
49.3
60 60 60

50 50 50

40 40 40

30 30 30
h = 128 h = 256 h = 512 h = 128 h = 256 h = 512 h = 128 h = 256 h = 512

l = 0.0001 l = 0.00001 l = 0.000001 l = 0.0001 l = 0.00001 l = 0.000001 l = 0.0001 l = 0.00001 l = 0.000001

(a) RoBERTa-GRU (b) RoBERTa-LSTM (c) RoBERTa-BiLSTM


Fig. 5: Assessing the test accuracy of the RoBERTa-GRU, RoBERTa-LSTM, and RoBERTa-BiLSTM models using a range
of hyperparameters: l = {0.0001, 0.00001, 0.000001}, h = {128, 256, 512}. The models are trained for 5 epochs using the
AdamW optimizer on the IMDb dataset.
2

4
5

3
1

3
2

9
3

2
1
6

3
79.9

80.3

80.7
1
80.0

80.3
79.5

79.1

78.8
78.4

79.1

78.6
77.7

78 . 4
78.0
77.4

77.4

77.7

78 . 0
3

1
6
4

80 80

69.1
80

68.3
4
67.7
66.9
1

65.4
63.1

4
61.2

59.8

70

59.8
70 70
A (%)

A (%)

A (%)
60 60 60

50 50 50

40 40 40

30 30 30
h = 128 h = 256 h = 512 h = 128 h = 256 h = 512 h = 128 h = 256 h = 512

l = 0.0001 l = 0.00001 l = 0.000001 l = 0.0001 l = 0.00001 l = 0.000001 l = 0.0001 l = 0.00001 l = 0.000001

(a) RoBERTa-GRU (b) RoBERTa-LSTM (c) RoBERTa-BiLSTM


Fig. 6: Assessing the test accuracy of the RoBERTa-GRU, RoBERTa-LSTM, and RoBERTa-BiLSTM models using a range
of hyperparameters: l = {0.0001, 0.00001, 0.000001}, h = {128, 256, 512}. The models are trained for 5 epochs using the
AdamW optimizer on the Twitter US Airline dataset.
2

5
4
6
5

9
3
2

6
9

3
5
82.3

82.2

82.2
82.2
82.2

82.2
82.2

82.2

82.1

81 . 5
81.6
81.6

81.6

81.6

81.5
81.5

81.5
81.5

80 80 80

70 70 70
A (%)

A (%)

A (%)

3
3

3
3

49.8

49.8

49.8
49.8

49.8

49.8
49.8

49.8

49.8

60 60 60

50 50 50

40 40 40

30 30 30
h = 128 h = 256 h = 512 h = 128 h = 256 h = 512 h = 128 h = 256 h = 512

l = 0.0001 l = 0.00001 l = 0.000001 l = 0.0001 l = 0.00001 l = 0.000001 l = 0.0001 l = 0.00001 l = 0.000001

(a) RoBERTa-GRU (b) RoBERTa-LSTM (c) RoBERTa-BiLSTM


Fig. 7: Assessing the test accuracy of the RoBERTa-GRU, RoBERTa-LSTM, and RoBERTa-BiLSTM models using a range
of hyperparameters: l = {0.0001, 0.00001, 0.000001}, h = {128, 256, 512}. The models are trained for 5 epochs using the
AdamW optimizer on the Sentiment140 dataset.

performance (with F1w , Pw , and Rw scores of 82.32% for Sentiment140 dataset. Similarly, the RoBERTa-LSTM model
the Test data split) on the Sentiment140 dataset compared to obtained a higher A scores of 92.04%, 80.33%, and 82.29%
the others models. for the IMDb, Twitter US Airline, and Sentiment140 datasets,
Figures 5-7 illustrate the accuracy of the RoBERTa-GRU, respectively, with l = 0.00001 and h values of 128, 256,
RoBERTa-LSTM, and RoBERTa-BiLSTM models with var- and 512. In contrast, the RoBERTa-BiLSTM model achieved
ious hyperparameters across three datasets. The RoBERTa- a superior A scores of 92.36%, 80.74%, and 82.25% for
GRU model achieved a higher A score of 92.60% with the IMDb, Twitter US Airline, and Sentiment140 datasets,
l = 0.00001 and h = 128 for the IMDb dataset, 79.92% respectively, with l = 0.00001 and h = 256. It is evident
with l = 0.00001 and h = 512 for the Twitter US Airline that the RoBERTa-GRU and RoBERTa-LSTM models attained
dataset, and 82.32% with l = 0.00001 and h = 256 for the higher A scores with different h values (128, 256, and 512)
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

TABLE VI: Quantitative results for sentiment analysis employing the RoBERTa-GRU, RoBERTa-LSTM, and RoBERTa-
BiLSTM models with three distinct optimizers (SGD, RMSprop, and Rprop), alongside fixed hyperparameters of l = 0.00001
and h = 256, are presented. The models are trained for 5 epochs on the IMDb, Twitter US Airline, and Sentiment140 datasets.
Model IMDb Dataset Twitter US Airline Dataset Sentiment140 Dataset
Optimizer Model
Evaluation F1w Pw Rw F1w Pw Rw F1w Pw Rw
Training 0.963533 0.963570 0.963533 0.484001 0.393887 0.627580 0.746607 0.746608 0.746607
RoBERTa-LSTM Validation 0.916404 0.916431 0.916400 0.503843 0.414018 0.643443 0.747410 0.747415 0.747408
Test 0.920796 0.920810 0.920800 0.448003 0.358035 0.598361 0.748623 0.748630 0.748623
Training 0.965733 0.965738 0.965733 0.483979 0.393857 0.627580 0.741888 0.741962 0.741904
SGD RoBERTa-BiLSTM Validation 0.925197 0.925202 0.925200 0.503843 0.414018 0.643443 0.741821 0.741940 0.741824
Test 0.929595 0.929620 0.929600 0.448003 0.358035 0.598361 0.742817 0.742878 0.742838
Training 0.626660 0.637177 0.630911 0.483979 0.393857 0.627580 0.745768 0.746113 0.745840
RoBERTa-GRU Validation 0.628920 0.637516 0.633200 0.503843 0.414018 0.643443 0.745893 0.746358 0.745943
Test 0.612301 0.626618 0.617600 0.448003 0.358035 0.598361 0.745891 0.746187 0.745968
Training 0.953308 0.954296 0.953333 0.882130 0.883383 0.881603 0.835982 0.836355 0.836021
RoBERTa-LSTM Validation 0.915163 0.916821 0.915200 0.792620 0.800576 0.789617 0.812310 0.812576 0.812369
Test 0.919888 0.921577 0.920000 0.797753 0.806394 0.795082 0.814887 0.815251 0.814923
Training 0.964042 0.964182 0.964044 0.881943 0.881552 0.882438 0.837697 0.837741 0.837702
RMSprop RoBERTa-BiLSTM Validation 0.924805 0.924905 0.924800 0.784245 0.785119 0.784153 0.813109 0.813181 0.813107
Test 0.927978 0.928223 0.928000 0.798651 0.801249 0.797814 0.817244 0.817276 0.817252
Training 0.958806 0.959552 0.958822 0.885970 0.886035 0.885929 0.823524 0.823648 0.823538
RoBERTa-GRU Validation 0.919167 0.920749 0.919200 0.783879 0.783694 0.784153 0.805222 0.805287 0.805244
Test 0.920689 0.922380 0.920800 0.809253 0.814673 0.806011 0.808515 0.808637 0.808524
Training 0.962152 0.962310 0.962156 0.815864 0.814367 0.820431 0.795564 0.795577 0.795566
RoBERTa-LSTM Validation 0.917604 0.917839 0.917600 0.775373 0.772986 0.781421 0.790234 0.790249 0.790231
Test 0.919178 0.919386 0.919200 0.781915 0.780306 0.785519 0.792521 0.792522 0.792522
Training 0.928581 0.929063 0.9286 0.907253 0.907184 0.908925 0.822915 0.823134 0.822941
Rprop RoBERTa-BiLSTM Validation 0.917997 0.918589 0.9180 0.781882 0.779601 0.786885 0.804620 0.804749 0.804655
Test 0.909949 0.910488 0.9100 0.798375 0.798201 0.800546 0.806506 0.806782 0.806534
Training 0.968666 0.968710 0.968667 0.926327 0.926314 0.926457 0.795433 0.795617 0.795461
RoBERTa-GRU Validation 0.917605 0.917779 0.917600 0.801109 0.801719 0.800546 0.789173 0.789467 0.789192
Test 0.919581 0.919746 0.919600 0.801475 0.804779 0.799180 0.791183 0.791363 0.791220

across datasets. On the other hand, the RoBERTa-BiLSTM in most cases, the models yield inferior results when utilizing
model consistently yielded better A scores across datasets these alternative optimizers (SGD, RMSprop, and Rprop), in
when utilizing l = 0.00001 and h = 256. The RoBERTa- comparison to those attained with AdamW optimizer.
BiLSTM model outperformed other models when employing
the hyperparameter settings l = 0.00001 and h = 256 across
2

4
2

91.3

91.2
91.1

datasets.

1
5
81.8

81.8
81.7
7
3

3
90
79.3
78.8

78 . 8

Moreover, the impact of different optimizers on model 80

performance is explored, as detailed in Table VI. Three ad-


A (%)

70
ditional optimizers—SGD, RMSprop, and Rprop [72]—are
60
examined, each with a learning rate (l = 0.00001) and hidden
50
units (h = 256), showcasing their effects across datasets.
The RoBERTa-BiLSTM model achieved the highest F1w 40

scores of 92.96% and 92.80% using SGD and RMSprop 30

optimizers, respectively, for the IMDb dataset. Meanwhile, IMDb Twitter US Airline Sentiment140

the RoBERTa-GRU model obtained a slightly higher F1w


BERT-GRU BERT-LSTM BERT-BiLSTM
score of 91.96% compared to the RoBERTa-BiLSTM model
(which achieved an F1w score of 91.00%) when employing Fig. 8: Assessing the test accuracy of the BERT-GRU,
the Rprop optimizer. For the Twitter US Airline dataset, all BERT-LSTM, BERT-BiLSTM models using the
models yielded suboptimal results with SGD. Among them, the hyperparameter set l = 0.00001 and h = 256. The models
RoBERTa-BiLSTM model obtained higher F1w , Pw , and Rw are trained for 5 epochs using the AdamW optimizer.
scores of 44.80%, 35.80%, and 59.84%, respectively. How-
ever, notable enhancements are observed with RMSprop and Furthermore, experiments are conducted using the BERT-
Rprop optimizers, where the RoBERTa-GRU model achieves GRU, BERT-LSTM, and BERT-BiLSTM models with iden-
the F1w scores of 80.93% and 80.15%, respectively. On the tical hyperparameter settings (i.e., l = 0.00001, h = 256,
other hand, across the Sentiment140 dataset, all models exhibit optimizer=AdamW) across datasets. Table VII illustrates that
nearly identical results with SGD, hovering around F1w , Pw , the BERT-GRU model achieved F1w scores of 91.11%,
and Rw scores of approximately 74.50%. Yet, the RoBERTa- 77.57%, and 81.83% for the IMDb, Twitter US Airline, and
BiLSTM model demonstrates better performance, achieving Sentiment140 datasets, respectively, when evaluated with the
higher F1w scores of 81.72% and 80.65% with RMSprop and test data splits. Similarly, the BERT-LSTM model garnered
Rprop, respectively, compared to other models. It is clear that F1w scores of 91.32%, 77.72%, and 81.75% for the same
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13

TABLE VII: Quantitative results for sentiment analysis using the BERT-GRU, BERT-LSTM, and BERT-BiLSTM models with
the hyperparameter set l = 0.00001, h = 256, and optimizer= AdamW. The models are trained for 5 epochs on the IMDb,
Twitter US Airline, and Sentiment140 datasets.
Model IMDb Dataset Twitter US Airline Dataset Sentiment140 Dataset
Model
Evaluation F1w Pw Rw F1w Pw Rw F1w Pw Rw
Training 0.959173 0.959407 0.959178 0.809176 0.807872 0.812234 0.857501 0.857591 0.857509
BERT-GRU Validation 0.906006 0.906140 0.906000 0.756455 0.754155 0.759563 0.814906 0.815074 0.814910
Test 0.911175 0.911380 0.911200 0.775682 0.774644 0.777322 0.818300 0.818387 0.818316
Training 0.957062 0.957256 0.957067 0.804269 0.802848 0.807301 0.857269 0.857271 0.857269
BERT-LSTM Validation 0.908406 0.908540 0.908400 0.748105 0.747633 0.748634 0.815563 0.815568 0.815561
Test 0.913163 0.913528 0.913200 0.777189 0.775390 0.780055 0.817501 0.817503 0.817502
Training 0.958131 0.958214 0.958133 0.803353 0.801858 0.806846 0.856565 0.856565 0.856565
BERT-BiLSTM Validation 0.910402 0.910409 0.910400 0.747260 0.744380 0.751366 0.814562 0.814568 0.814560
Test 0.912382 0.912518 0.912400 0.781751 0.780117 0.784153 0.818091 0.818090 0.818091

2
3
5

5
92.3

80.7

82.3

82.3
80.7
92.3

82.2

82.2
4
3
3
2

82.1

82.1

82.1
82.1
80.3
80.3
1
80.1

1
79.7

81.8

81.8
91.8

91.8
8

8
91 . 6

91.6

92 80 82

3
78.8
2
1
4

91.3
91.3

2
91.2

91.2

78.4
(%)

(%)

(%)
1
77.7

91 78 81

1
77.1

90 76 80

STM ase GR
U M STM STM bas
e
GR
U M STM STM ase GR
U M STM
iL Ta-
b
Ta- LST BiL BiL Ta- Ta- LST BiL BiL Ta-
b
Ta- LST BiL
T-B ER ER Ta- Ta- RT- ER ER Ta- Ta- RT- ER ER Ta- Ta-
ER RoB ER RoB ER RoB ER
B RoB RoB o B ER BE RoB RoB o B ER BE RoB RoB o B ER
R R R

F1w Score A Score F1w Score A Score F1w Score A Score

(a) IMDb (b) Twitter US Airline (c) Sentiment140


Fig. 9: Comparisons of F1w and A scores among BERT-BiLSTM, RoBERTa-base, RoBERTa-GRU, RoBERTa-LSTM, and
RoBERTa-BiLSTM models, considering hyperparameters l = 0.00001, h = 256, and optimizer=AdamW across the IMDb,
Twitter US Airline, and Sentiment140 datasets.

datasets. In contrast, the BERT-BiLSTM model obtained F1w best model, RoBERTa-LSTM (F1w and A scores of 80.32%
scores of 91.24%, 78.18%, and 81.81% for the respective and 80.33%, respectively). On the other hand, for the Sen-
datasets. Figure 8 presents a comparative analysis of test timent140 dataset, the RoBERTa-GRU model achieves F1w
accuracy among the BERT-based models. The BERT-BiLSTM and A scores of 82.32% (Figure 9c ), while the RoBERTa-
model achieved higher A of 91.24%, 79.37%, and 81.81% BiLSTM model achieves F1w and A scores of 82.25%,
for the IMDb, Twitter US Airline, and Sentiment140 datasets, slightly lower (by 0.07%) than the best model performance.
respectively. The BERT-BiLSTM model performed relatively
better compared to others. It consistently garnered higher VIII. D ISCUSSION
results across all datasets. However, none of the BERT models In this section, we discuss the performance of the proposed
achieved superior results compared to any of the RoBERTa- RoBERTa-BiLSTM model and conduct a comparative analysis
based models. with various ML and DL methods within the realm of senti-
ment analysis. The discussion addresses the impact of data
A comparison among the best-performing models including
augmentation on model performance for imbalanced datasets.
BERT-BiLSTM, RoBERTa-base, RoBERTa-GRU, RoBERTa-
Additionally, we examine the scalability and limitations of the
LSTM, and RoBERTa-BiLSTM is conducted, considering hy-
proposed model.
perparameters l = 0.00001, h = 256, and optimizer=AdamW
across the IMDb, Twitter US Airline, and Sentiment140
datasets, as illustrated in Figure 9. Figure 9a demonstrates that A. Performance Analysis
the proposed RoBERTa-BiLSTM model achieves the highest This paper introduces a hybrid approach combining LLM
F1w and A scores of 92.35% and 92.36%, respectively, and RNN for sentiment analysis. We conduct comprehen-
surpassing other top-performing models on the IMDb dataset. sive experiments using various models, including BERT,
Similarly, for the Twitter US Airline dataset, the RoBERTa- RoBERTa, RoBERTa-GRU, RoBERTa-LSTM, and RoBERTa-
BiLSTM model attains F1w and A scores of 80.73% and BiLSTM, with different hyperparameter sets (such as learning
80.74%, respectively (Figure 9b), enhancing classification rate (l), optimizers, hidden RNN units (h)), across three
performance by approximately 0.40% compared to the nearest datasets: IMDb, Twitter US Airline, and Sentiment140. Tables
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

TABLE VIII: The experimental results (P, R, F1, and A ) comparisons between ML models and the proposed RoBERTa-
BiLSTM model for sentiment analysis on the IMDb, Twitter US Airline, and Sentiment140 datasets.
IMDb Dataset Twitter US Airline Dataset Sentimet140 Dataset
ML Models
P R F1 A P R F1 A P R F1 A
NB [18] 0.87 0.87 0.87 0.8701 0.79 0.44 0.45 0.6950 0.77 0.77 0.77 0.7657
LR [22] 0.90 0.90 0.90 0.8712 0.78 0.69 0.72 0.8050 0.78 0.78 0.78 0.7801
DT [73] 0.74 0.73 0.73 0.7346 0.62 0.56 0.58 0.7114 0.69 0.62 0.59 0.6234
KNN [22] 0.78 0.77 0.77 0.7737 0.60 0.60 0.60 0.6841 0.66 0.60 0.57 0.6039
AdaBoost [74] 0.83 0.83 0.83 0.8337 0.67 0.63 0.65 0.7459 0.71 0.70 0.69 0.6994
RoBERTa-BiLSTM 0.9246 0.9236 0.9235 0.9236 0.8094 0.8074 0.8073 0.8074 0.8225 0.8225 0.8225 0.8225

TABLE IX: The experimental results (P, R, F1, and A ) comparisons between DL models and the proposed RoBERTa-
BiLSTM model for sentiment analysis on the IMDb, Twitter US Airline, and Sentiment140 datasets.
IMDb Dataset Twitter US Airline Dataset Sentimet140 Dataset
DL Models
P R F1 A P R F1 A P R F1 A
GRU [75] 0.88 0.88 0.88 0.8788 0.73 0.71 0.72 0.7855 0.78 0.78 0.78 0.7896
LSTM [75] 0.85 0.85 0.85 0.8511 0.71 0.69 0.69 0.7756 0.79 0.79 0.79 0.7910
BiLSTM [76] 0.87 0.86 0.86 0.8628 0.71 0.69 0.70 0.7746 0.78 0.78 0.78 0.7853
CNN-LSTM [77] 0.86 0.86 0.86 0.88607 0.68 0.69 0.69 0.7602 0.77 0.77 0.77 0.7753
CNN-BiLSTM [45] 0.86 0.86 0.86 0.8616 0.70 0.65 0.67 0.7732 0.77 0.77 0.77 0.7758
RoBERTa-BiLSTM 0.9246 0.9236 0.9235 0.9236 0.8094 0.8074 0.8073 0.8074 0.8225 0.8225 0.8225 0.8225

II-VI present the F1w , Pw , and Rw scores obtained by [75] model achieved higher A and F1 scores of 87.88% and
these models across datasets. Among these models, RoBERTa- 88.00%, respectively, for the IMDb dataset, and 78.55% and
BiLSTM achieved the highest F1w scores of 92.35%, 80.73%, 72.00%, respectively, for the Twitter dataset. In comparison,
and 82.25% for the IMDb, Twitter, and Sentiment140 datasets, the proposed RoBERTa-BiLSTM models attained A and F1
respectively, surpassing the performance of RoBERTa-base, scores of 92.36% and 92.35%, respectively, for the IMDb
RoBERTa-GRU, and RoBERTa-LSTM. Furthermore, the ex- dataset, and 80.74% and 80.73%, respectively, for the Twitter
periments encompass various BERT-based models, including dataset. The proposed model demonstrated improvements of
BERT-GRU, BERT-LSTM, and BERT-BiLSTM, as presented approximately 5.00% in A and F1 scores for the IMDb
in Table VII. It is evident that none of the BERT models out- dataset and about 2.20% in A and 8.00% in F1 scores for the
performs the RoBERTa-based models on any dataset. Figure 9 Twitter US Airline dataset compared to the GRU [75] model.
presents a comparative analysis of the experimented models, Conversely, the LSTM [75] model achieved A and F1 scores
showcasing the effectiveness of the RoBERTa-BiLSTM model of 79.10% and 79.00% for the Sentiment140 dataset, while
across all three datasets by consistently achieving top results. the RoBERTa-BiLSTM model obtained A and F1 scores of
To ensure a fair comparison with the proposed RoBERTa- 82.25%, marking an enhancement of 3.25% in A and F1
BiLSTM model, we consider the performance of both ML and scores compared to LSTM [75].
DL models for sentiment analysis on the IMDb, Twitter, and
Sentiment140 datasets. Table VIII displays the performance of
ML models such as NB, LR, DT, AdaBoost, KNN, alongside
RoBERTa-BiLSTM on these datasets. Among the ML models,
LR [22] achieved the highest A and F1 scores of 87.12%
and 90.00%, respectively, on the IMDb dataset. However, the
proposed RoBERTa-BiLSTM model surpassed these results
with an A and F1 score of 92.36% and 92.35%, respectively,
marking an improvement of approximately 5.00% in A and
2.35% in F1 score over the LR model. Similarly, on the
Twitter dataset, LR [22] achieved the top A and F1 score of
80.50% and 72.00%, respectively, among ML models. Never- (a) Before Augmentation (b) After Augmentation
theless, these scores are lower (about 0.25% in A and 8.00% Fig. 10: Twitter US Airline dataset before and after data
in F1 score) compared to the RoBERTa-BiLSTM model. For augmentation.
the Sentiment140 dataset, again LR [22] attained the highest
A and F1 scores of 78.01% and 78.00%, respectively, among The pretrained RoBERTa model, having been trained on
other ML models. In contrast, the RoBERTa-BiLSTM model vast amounts of text data, possesses the ability to discern
attained A and F1 scores of 82.25%, marking a notable intricate patterns and relationships between words and phrases.
improvement of approximately 4.00% in both A and F1 Additionally, its dynamic masking patterns enable it to gen-
scores compared to the ML models. eralize and adapt to new text sequences. RoBERTa encodes
Table IX illustrates the performance of DL models for the lengthy text sequences into word embedding representations,
sentiment analysis task across the same datasets. The GRU while the BiLSTM model excels at capturing long-distance
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15

The concept behind the proposed hybrid architecture can be

7
4
95.9

95.7

95.7
95.7
100 adapted for various task-specific LLMs such as CodeBERT,
CodeT5, CodeT5+, and Llama. Particularly, in programming
90 education, the fine-tuned model can be effectively employed
4

4
3
80.9

80.7

80.7
80.7
for tasks like programming language identification, error de-
80
tection in program code, code-clone detection, code genera-
%

70
tion, and code summarization. Additionally, the architecture
of RoBERTa-BiLSTM can be beneficial in other application
60 domains where tasks involve complex, diverse, and large
datasets.
50
Before Augmentation After Augmentation
D. Threats to Validity
Pw Score Rw Score F1w Score A Score
In this paper, we proposed a RoBERTa-BiLSTM model
Fig. 11: Comparing the performance of the for sentiment analysis, encompassing various procedures from
RoBERTa-BiLSTM model before and after data data preprocessing to model development. Our proposed
augmentation on the Twitter US Airline dataset model achieved significant results in sentiment analysis com-
pared to other state-of-the-art models across the IMDb, Twitter
US Airline, and Sentiment140 datasets. However, the results of
dependencies within the input by processing it in both forward the proposed model may vary due to several factors: (i) differ-
and backward directions. The proposed RoBERTa-BiLSTM ing strategies for data preprocessing, (ii) variations in experi-
model capitalizes the power of both RoBERTa and BiLSTM mental environments/platforms, (iii) diverse hyperparameters
models. Experimental results and comparisons highlight the and their values, (iv) discrepancies in datasets, (v) variances in
efficacy of the RoBERTa-BiLSTM model in the sentiment base models (e.g., roberta-base, xlm-roberta-base)
analysis task. of the LLMs, and (vi) differences in model architectures.
In the follow-up work, we aim to validate the performance
B. Impact of Data Augmentation of the proposed RoBERTa-BiLSTM model by addressing the
Data augmentation aims to balance the class samples by aforementioned challenges.
increasing the sample size of minority classes [78]. In this
study, the Twitter US Airline dataset exhibits class imbalance,
as depicted in Figure 2a. The objective of data augmentation is IX. C ONCLUSION
to evaluate the performance of the RoBERTa-BiLSTM model In this paper, we introduce a novel hybrid model, RoBERTa-
before and after augmentation. Figure 10 depicts the Twitter BiLSTM, designed for the task of sentiment analysis. We pro-
US Airline dataset before and after augmentation. We synthet- vide a detailed description of the architecture and theory un-
ically generate samples for the neutral and positive classes to derlying the proposed RoBERTa-BiLSTM model. This model
match the sample count of the negative class. Figure 11 com- leverages the strengths of both RoBERTa and BiLSTM models
pares the performance of the RoBERTa-BiLSTM model on to effectively analyze text. The proposed model achieves high
the Twitter US Airline dataset before and after augmentation. F1 and A scores in sentiment analysis tasks across the IMDb,
Post-augmentation, the model achieves F1w and A scores Twitter US Airline, and Sentiment140 datasets. Experimen-
of 95.74% and 95.77%, respectively, representing a notable tal results demonstrate that the proposed RoBERTa-BiLSTM
improvement of approximately 15% in both A and F1w model attains an average A and F1w score of 85.12% and
scores. Thus, augmenting the imbalanced dataset significantly 85.11%, respectively, across all three datasets. In comparison,
enhances the model’s performance. the RoBERTa-base, RoBERTa-GRU, and RoBERTa-LSTM
models achieve average A of 84.42%, 84.14%, and 84.76%,
C. Scalability respectively, along with F1w scores of 84.53%, 83.70%,
Sentiment analysis is a text analytical application used to and 84.75%, respectively. Notably, the proposed RoBERTa-
discern the polarity of text/comments by accurately interpret- BiLSTM model improves sentiment analysis an average A by
ing the underlying meaning of the provided text. The proposed 0.70% compared to the RoBERTa-base model and by 0.36%
RoBERTa-BiLSTM model showcases its efficacy in sentiment compared to the RoBERTa-LSTM model. The proposed model
analysis tasks by achieving higher A scores (92.36%, 80.74%, demonstrates superior performance on imbalanced datasets,
and 82.25% for the IMDb, Twitter, and Sentiment140 datasets, such as Twitter US Airline. Moreover, we fine-tuned various
respectively) in comment classification. Additionally, the pro- hyperparameters to assess their impact on model performance.
posed model notably enhances the average A score of 0.70% Additionally, we explored the suitability and scalability of the
compared to RoBERTa-base model. These findings suggest proposed model across other application domains. The integra-
that the proposed RoBERTa-BiLSTM model holds potential tion of RoBERTa and BiLSTM proves to be a powerful, steady,
for various application domains such as business, economics, and efficient approach for sentiment analysis, positioning the
politics, education, and programming. proposed model as a potential candidate for various NLP tasks.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 16

R EFERENCES [22] T. Dholpuria, Y. Rana, and C. Agrawal, “A sentiment analysis approach


through deep learning for a movie review,” in 2018 8th International
[1] P. S. Jothi, M. Neelamalar, and R. S. Prasad, “Analysis of social network- Conference on Communication Systems and Network Technologies
ing sites: A study on effective communication strategy in developing (CSNT). IEEE, 2018, pp. 173–181.
brand communication,” Journal of media and communication studies, [23] N. K. Thinh, C. H. Nga, Y.-S. Lee, M.-L. Wu, P.-C. Chang, and J.-C.
vol. 3, no. 7, pp. 234–242, 2011. Wang, “Sentiment analysis using residual learning with simplified cnn
[2] A. Bruns and T. Highfield, “Is habermas on twitter?: Social media and extractor,” in 2019 IEEE International Symposium on Multimedia (ISM).
the public sphere,” in The Routledge companion to social media and IEEE, 2019, pp. 335–3353.
politics. Routledge, 2015, pp. 56–73. [24] H. Liu, I. Chatterjee, M. Zhou, X. S. Lu, and A. Abusorrah, “Aspect-
[3] F. Xiong and Y. Liu, “Opinion formation on social media: an empirical based sentiment analysis: A survey of deep learning methods,” IEEE
approach,” Chaos: An Interdisciplinary Journal of Nonlinear Science, Transactions on Computational Social Systems, vol. 7, no. 6, pp. 1358–
vol. 24, no. 1, 2014. 1375, 2020.
[4] T. Oliveira, B. Araujo, and C. Tam, “Why do people share their travel [25] A. Mabrouk, R. P. D. Redondo, and M. Kayed, “Deep learning-based
experiences on social media?” Tourism Management, vol. 78, p. 104041, sentiment classification: A comparative survey,” IEEE Access, vol. 8,
2020. pp. 85 616–85 638, 2020.
[5] R. Pandey and J. P. Singh, “Bert-lstm model for sarcasm detection [26] K. L. Tan, C. P. Lee, and K. M. Lim, “Roberta-gru: a hybrid deep
in code-mixed social media post,” Journal of Intelligent Information learning model for enhanced sentiment analysis,” Applied Sciences,
Systems, vol. 60, no. 1, pp. 235–254, 2023. vol. 13, no. 6, p. 3915, 2023.
[6] S. Tam, R. B. Said, and Ö. Ö. Tanriöver, “A convbilstm deep learning [27] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving
model-based approach for twitter sentiment classification,” IEEE Access, language understanding by generative pre-training,” 2018.
vol. 9, pp. 41 283–41 293, 2021. [28] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
[7] A. Tedeschi and F. Benedetto, “A cloud-based big data sentiment of deep bidirectional transformers for language understanding,” arXiv
analysis application for enterprises’ brand monitoring in social media preprint arXiv:1810.04805, 2018.
streams,” in 2015 IEEE 1st International Forum on Research and [29] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
Technologies for Society and Industry Leveraging a better tomorrow L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
(RTSI). IEEE, 2015, pp. 186–191. pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[8] T. Rao, S. Srivastava et al., “Analyzing stock market movements using [30] J. Wu, C. Ye, and H. Zhou, “Bert for sentiment classification in software
twitter sentiment analysis,” 2012. engineering,” in 2021 International Conference on Service Science
(ICSS). IEEE, 2021, pp. 115–121.
[9] S. Bharathi and A. Geetha, “Sentiment analysis for effective stock
[31] Z. Gao, A. Feng, X. Song, and X. Wu, “Target-dependent sentiment
market prediction,” International Journal of Intelligent Engineering and
classification with bert,” Ieee Access, vol. 7, pp. 154 290–154 299, 2019.
Systems, vol. 10, no. 3, pp. 146–154, 2017.
[32] S. Alaparthi and M. Mishra, “Bert: A sentiment analysis odyssey,”
[10] A. Alrumaih, A. Al-Sabbagh, R. Alsabah, H. Kharrufa, and J. Baldwin,
Journal of Marketing Analytics, vol. 9, no. 2, pp. 118–126, 2021.
“Sentiment analysis of comments in social media.” International Journal
[33] Y. Wu, Z. Jin, C. Shi, P. Liang, and T. Zhan, “Research on the application
of Electrical & Computer Engineering (2088-8708), vol. 10, no. 6, 2020.
of deep learning-based bert model in sentiment analysis,” arXiv preprint
[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, arXiv:2403.08217, 2024.
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in [34] R. Cheruku, K. Hussain, I. Kavati, A. M. Reddy, and K. S. Reddy,
neural information processing systems, vol. 30, 2017. “Sentiment classification with modified roberta and recurrent neural
[12] M. Wongkar and A. Angdresey, “Sentiment analysis using naive bayes networks,” Multimedia Tools and Applications, pp. 1–19, 2023.
algorithm of the data crawler: Twitter,” in 2019 Fourth International [35] W. Liao, B. Zeng, X. Yin, and P. Wei, “An improved aspect-category
Conference on Informatics and Computing (ICIC). IEEE, 2019, pp. sentiment analysis model for text sentiment analysis based on roberta,”
1–5. Applied Intelligence, vol. 51, pp. 3522–3533, 2021.
[13] A. I. Saad, “Opinion mining on us airline twitter data using machine [36] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
learning techniques,” in 2020 16th international computer engineering jointly learning to align and translate,” arXiv preprint arXiv:1409.0473,
conference (ICENCO). IEEE, 2020, pp. 59–63. 2014.
[14] E. Prabhakar, M. Santhosh, A. H. Krishnan, T. Kumar, and R. Sudhakar, [37] A. Younas, R. Nasim, S. Ali, G. Wang, and F. Qi, “Sentiment analysis
“Sentiment analysis of us airline twitter data using new adaboost of code-mixed roman urdu-english social media text using deep learning
approach,” International Journal of Engineering Research & Technology approaches,” in 2020 IEEE 23rd International Conference on Compu-
(IJERT), vol. 7, no. 1, pp. 1–6, 2019. tational Science and Engineering (CSE). IEEE, 2020, pp. 66–71.
[15] M. M. Rahman and Y. Watanobe, “An efficient approach for selecting [38] K. Dhola and M. Saradva, “A comparative evaluation of traditional ma-
initial centroid and outlier detection of data clustering,” in Advancing chine learning and deep learning classification techniques for sentiment
Technology Industrialization Through Intelligent Software Methodolo- analysis,” in 2021 11th international conference on cloud computing,
gies, Tools and Techniques. IOS Press, 2019, pp. 616–628. data science & engineering (Confluence). IEEE, 2021, pp. 932–936.
[16] D. K. Madhuri, “A machine learning based framework for sentiment [39] W. Zhang, X. Li, Y. Deng, L. Bing, and W. Lam, “A survey on
classification: Indian railways case study,” Int. J. Innov. Technol. Explor. aspect-based sentiment analysis: Tasks, methods, and challenges,” IEEE
Eng.(IJITEE), vol. 8, no. 4, 2019. Transactions on Knowledge and Data Engineering, vol. 35, no. 11, pp.
[17] A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of naive 11 019–11 038, 2023.
bayes and svm algorithm based on sentiment analysis using review [40] J. Fields, K. Chovanec, and P. Madiraju, “A survey of text classification
dataset,” in 2019 8th International Conference System Modeling and with transformers: How wide? how large? how long? how accurate? how
Advancement in Research Trends (SMART). IEEE, 2019, pp. 266–270. expensive? how safe?” IEEE Access, vol. 12, pp. 6518–6531, 2024.
[18] Y. G. Jung, K. T. Kim, B. Lee, and H. Y. Youn, “Enhanced naive bayes [41] S. Poria, D. Hazarika, N. Majumder, and R. Mihalcea, “Beneath the
classifier for real-time sentiment analysis with sparkr,” in 2016 Inter- tip of the iceberg: Current challenges and new directions in sentiment
national Conference on Information and Communication Technology analysis research,” IEEE transactions on affective computing, vol. 14,
Convergence (ICTC). IEEE, 2016, pp. 141–146. no. 1, pp. 108–132, 2020.
[19] A. H. Uddin, D. Bapery, and A. S. M. Arif, “Depression analysis from [42] M. M. Rahman and Y. Watanobe, “Multilingual program code classifi-
social media data in bangla language using long short term memory cation using n-layered bi-lstm model with optimized hyperparameters,”
(lstm) recurrent neural network technique,” in 2019 international con- IEEE Transactions on Emerging Topics in Computational Intelligence,
ference on computer, communication, chemical, materials and electronic 2023.
engineering (IC4ME2). IEEE, 2019, pp. 1–4. [43] M. M. Rahman, Y. Watanobe, and K. Nakamura, “A bidirectional lstm
[20] M. M. Rahman, Y. Watanobe, and K. Nakamura, “Source code as- language model for code evaluation and repair,” Symmetry, vol. 13, no. 2,
sessment and classification based on estimated error probability using p. 247, 2021.
attentive lstm language model and its application in programming [44] A. I. Shiplu, M. M. Rahman, and Y. Watanobe, “A robust ensemble
education,” Applied Sciences, vol. 10, no. 8, p. 2973, 2020. machine learning model with advanced voting techniques for comment
[21] R. M. Alahmary, H. Z. Al-Dossari, and A. Z. Emam, “Sentiment analysis classification,” in Big Data Analytics in Astronomy, Science, and En-
of saudi dialect using deep learning techniques,” in 2019 International gineering: 11th International Conference on Big Data Analytics, BDA
Conference on Electronics, Information, and Communication (ICEIC). 2023, Aizu, Japan, December 5–7, 2023, Proceedings. Springer Nature,
IEEE, 2019, pp. 1–6. p. 141.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17

[45] M. Rhanoui, M. Mikram, S. Yousfi, and S. Barzali, “A cnn-bilstm [69] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts,
model for document-level sentiment analysis,” Machine Learning and “Learning word vectors for sentiment analysis,” in Proceedings of the
Knowledge Extraction, vol. 1, no. 3, pp. 832–847, 2019. 49th annual meeting of the association for computational linguistics:
[46] S. Anbukkarasi and S. Varadhaganapathy, “Analyzing sentiment in tamil Human language technologies, 2011, pp. 142–150.
tweets using deep neural network,” in 2020 Fourth International Con- [70] A. Go, R. Bhayani, and L. Huang, “Twitter sentiment classification using
ference on Computing Methodologies and Communication (ICCMC). distant supervision,” CS224N project report, Stanford, vol. 1, no. 12, p.
IEEE, 2020, pp. 449–453. 2009, 2009.
[47] H. Liu, I. Chatterjee, M. Zhou, X. S. Lu, and A. Abusorrah, “Aspect- [71] V. Balakrishnan and E. Lloyd-Yemoh, “Stemming and lemmatization:
based sentiment analysis: A survey of deep learning methods,” IEEE A comparison of retrieval performances,” 2014.
Transactions on Computational Social Systems, vol. 7, no. 6, pp. 1358– [72] D. Choi, C. J. Shallue, Z. Nado, J. Lee, C. J. Maddison, and G. E.
1375, 2020. Dahl, “On empirical comparisons of optimizers for deep learning,” arXiv
[48] M. Singh, A. K. Jakhar, and S. Pandey, “Sentiment analysis on the preprint arXiv:1910.05446, 2019.
impact of coronavirus in social life using the bert model,” Social Network [73] A. S. Zharmagambetov and A. A. Pak, “Sentiment analysis of a
Analysis and Mining, vol. 11, no. 1, p. 33, 2021. document using deep learning approach and decision trees,” in 2015
[49] K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, “Roberta- Twelve international conference on electronics computer and computa-
lstm: a hybrid model for sentiment analysis with transformer and tion (ICECCO). IEEE, 2015, pp. 1–4.
recurrent neural network,” IEEE Access, vol. 10, pp. 21 517–21 525, [74] M. Vadivukarassi, N. Puviarasan, and P. Aruna, “An exploration of air-
2022. line sentimental tweets with different classification model,” International
[50] A. He and M. Abisado, “Text sentiment analysis of douban film short Journal for Research in Engineering Application & Management, vol. 4,
comments based on bert-cnn-bilstm-att model,” IEEE Access, vol. 12, no. 02, 2018.
pp. 45 229–45 237, 2024. [75] M. S. Hossen, A. H. Jony, T. Tabassum, M. T. Islam, M. M. Rahman, and
[51] R. Cai, B. Qin, Y. Chen, L. Zhang, R. Yang, S. Chen, and W. Wang, T. Khatun, “Hotel review analysis for the prediction of business using
“Sentiment analysis about investors and consumers in energy market deep learning approach,” in 2021 International Conference on Artificial
based on bert-bilstm,” IEEE Access, vol. 8, pp. 171 408–171 415, 2020. Intelligence and Smart Systems (ICAIS). IEEE, 2021, pp. 1489–1494.
[52] S. Lee, D. K. Han, and H. Ko, “Multimodal emotion recognition fusion [76] A. Garg and R. K. Kaliyar, “Psent20: An effective political sentiment
analysis adapting bert with heterogeneous feature unification,” IEEE analysis with deep learning using real-time social media tweets,” in 2020
Access, vol. 9, pp. 94 557–94 572, 2021. 5th IEEE International Conference on Recent Advances and Innovations
[53] P. Thiengburanathum and P. Charoenkwan, “Setar: Stacking ensemble in Engineering (ICRAIE). IEEE, 2020, pp. 1–5.
learning for thai sentiment analysis using roberta and hybrid feature [77] P. K. Jain, V. Saravanan, and R. Pamula, “A hybrid cnn-lstm: A deep
representation,” IEEE Access, vol. 11, pp. 92 822–92 837, 2023. learning approach for consumer sentiment analysis using qualitative
[54] C.-S. Lin, C.-N. Tsai, J.-S. Jwo, C.-H. Lee, and X. Wang, “Hetero- user-generated contents,” Transactions on Asian and Low-Resource
geneous student knowledge distillation from bert using a lightweight Language Information Processing, vol. 20, no. 5, pp. 1–15, 2021.
ensemble framework,” IEEE Access, vol. 12, pp. 33 079–33 088, 2024. [78] D. Kim and J. Byun, “Selection of augmented data for overcoming
[55] S. Lu, C. Zhou, K. Xie, J. Lin, and Z. Wang, “Fast and accurate fsa sys- the imbalance problem in facies classification,” IEEE Geoscience and
tem using elbert: An efficient and lightweight bert,” IEEE Transactions Remote Sensing Letters, vol. 19, pp. 1–5, 2022.
on Signal Processing, vol. 71, pp. 3821–3834, 2023.
[56] Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba,
and S. Fidler, “Aligning books and movies: Towards story-like visual
explanations by watching movies and reading books,” in Proceedings of
the IEEE international conference on computer vision, 2015, pp. 19–27.
[57] A. Gokaslan, V. Cohen, E. Pavlick, and S. Tellex, “Openwebtext corpus,”
2019.
[58] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut- Md. Mostafizer Rahman received his Ph.D. degree
dinov, “Dropout: a simple way to prevent neural networks from over- in the Department of Computer and Information
fitting,” The journal of machine learning research, vol. 15, no. 1, pp. Systems, University of Aizu, Japan in 2022. He is
1929–1958, 2014. also working at Dhaka University of Engineering
[59] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, & Technology, Gazipur, Bangladesh. He received
2016. his B.Sc. and M.Sc. engineering degrees in the
[60] A. Graves and J. Schmidhuber, “Offline handwriting recognition with Department of Computer Science and Engineering
multidimensional recurrent neural networks,” Advances in neural infor- from Hajee Mohammad Danesh Science & Tech-
mation processing systems, vol. 21, 2008. nology University, Dinajpur, Bangladesh, and Dhaka
[61] P. L. Seabe, C. R. B. Moutsinga, and E. Pindza, “Forecasting cryptocur- University of Engineering & Technology, Gazipur,
rency prices using lstm, gru, and bi-directional lstm: A deep learning Bangladesh, in 2009 and 2014, respectively. His
approach,” Fractal and Fractional, vol. 7, no. 2, p. 203, 2023. research interests include machine learning, LLM, software engineering, AI
[62] J. Vimali and S. Murugan, “A text based sentiment analysis model using for Code, NLP, information visualization, and big data analytics.
bi-directional lstm networks,” in 2021 6th International conference on
communication and electronics systems (ICCES). IEEE, 2021, pp.
1652–1658.
[63] P. Bhuvaneshwari, A. N. Rao, Y. H. Robinson, and M. Thippeswamy,
“Sentiment analysis for user reviews using bi-lstm self-attention based
cnn model,” Multimedia Tools and Applications, vol. 81, no. 9, pp.
12 405–12 419, 2022.
[64] V. R. Kota and S. D. Munisamy, “High accuracy offering attention Ariful Islam Shiplu is a undergraduate student
mechanisms based deep learning approach using cnn/bi-lstm for sen- at Dhaka University of Engineering & Technology,
timent analysis,” International Journal of Intelligent Computing and Gazipur, Bangladesh. He completed his Diploma-in-
Cybernetics, vol. 15, no. 1, pp. 61–74, 2022. Engineering degree in the Department of Computer
[65] G. Yang and H. Xu, “A residual bilstm model for named entity Science and Technology from Narsingdi Polytech-
recognition,” Ieee Access, vol. 8, pp. 227 710–227 718, 2020. nic Institute, Narsingdi, Bangladesh. His academic
[66] B. Y. Lin, F. F. Xu, Z. Luo, and K. Zhu, “Multi-channel bilstm-crf model journey is complemented by a robust foundation in
for emerging named entity recognition in social media,” in Proceedings technical concepts and adept problem-solving skills.
of the 3rd Workshop on Noisy User-generated Text, 2017, pp. 160–165. His research interests include Machine Learning,
[67] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Deep Learning, Large Language Models, and Pro-
computation, vol. 9, no. 8, pp. 1735–1780, 1997. gramming.
[68] A. Graves and J. Schmidhuber, “Framewise phoneme classification
with bidirectional lstm and other neural network architectures,” Neural
networks, vol. 18, no. 5-6, pp. 602–610, 2005.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18

Yutaka Watanobe is currently a Senior Associate


Professor at the School of Computer Science and
Engineering, The University of Aizu, Japan. He
received his M.S. and Ph.D. degrees from The
University of Aizu in 2004 and 2007 respectively. He
was a Research Fellow of the Japan Society for the
Promotion of Science (JSPS) at The University of
Aizu in 2007. He is now a director of i-SOMET. He
was a coach of four ICPC World Final teams. He is
a developer of the Aizu Online Judge (AOJ) system.
His research interests include intelligent software,
programming environment, smart learning, machine learning, data mining,
cloud robotics, and visual languages. He is a member of IEEE, IPSJ.

Md Ashad Alam, PhD is Statistical Scientist (Data


Scientist/Biostatistician/Bioinformatician) with over
a decade of experience conducting multi- and
inter-disciplinary research. His expertise lies in
human imaging, genetics/genomics, functional ge-
nomics (transcriptomics, proteomics, epigenomics),
biostatistics & bioinformatics, big data & statis-
tical machine learning, genetic epidemiology, and
biomedical data science. The focus of his work is on
the integrated analysis of multi-view data in biomed-
ical applications. Dr. Alam has extensive experience
in the theoretical development of novel statistical and machine/deep learning
methods for gene/protein expression analyses and integrative analyses of
various omics data at DNA, mRNA, miRNA, methylation, and protein levels.
His work is represented by 45 peer-reviewed publications (including ¿ 25 in
the past 5 years), several of them in top journals. He is highly knowledgeable,
experienced, and competent in bioinformatic and statistical issues related to
next-generation DNA sequencing-based studies.

You might also like