Arabic Aspect Based Sentiment Analysis Using Bidirectional GRU
Arabic Aspect Based Sentiment Analysis Using Bidirectional GRU
Arabic Aspect Based Sentiment Analysis Using Bidirectional GRU
a r t i c l e i n f o a b s t r a c t
Article history: Aspect-based Sentiment analysis (ABSA) accomplishes a fine-grained analysis that defines the aspects of
Received 20 May 2021 a given document or sentence and the sentiments conveyed regarding each aspect. This level of analysis
Revised 14 August 2021 is the most detailed version that is capable of exploring the nuanced viewpoints of the reviews. The bulk
Accepted 31 August 2021
of study in ABSA focuses on English with very little work available in Arabic. Most previous work in
Available online 8 September 2021
Arabic has been based on regular methods of machine learning that mainly depends on a group of rare
resources and tools for analyzing and processing Arabic content such as lexicons, but the lack of those
Keywords:
resources presents another challenge. In order to address these challenges, Deep Learning (DL)-based
Aspect-based sentiment analysis (ABSA)
Deep learning
methods are proposed using two models based on Gated Recurrent Units (GRU) neural networks for
Opinion target extraction (OTE) ABSA. The first is a DL model that takes advantage of word and character representations by combining
Aspect sentiment polarity classification bidirectional GRU, Convolutional Neural Network (CNN), and Conditional Random Field (CRF) making up
BGRU-CNN-CRF and IAN-BGRU the (BGRU-CNN-CRF) model to extract the main opinionated aspects (OTE). The second is an interactive
attention network based on bidirectional GRU (IAN-BGRU) to identify sentiment polarity toward
extracted aspects. We evaluated our models using the benchmarked Arabic hotel reviews dataset. The
results indicate that the proposed methods are better than baseline research on both tasks having
39.7% enhancement in F1-score for opinion target extraction (T2) and 7.58% in accuracy for aspect-
based sentiment polarity classification (T3). Achieving F1 score of 70.67% for T2, and accuracy of
83.98% for T3.
Ó 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.jksuci.2021.08.030
1319-1578/Ó 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
M. M.Abdelgwad, Taysir Hassan A Soliman, A. I.Taloba et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 6652–6662
Three main ABSA tasks can be identified as proposed by (Pontiki In several NLP tasks, Interactive attention networks(IAN) has
et al., 2016); T1: aspect category identification, T2: aspect opinion shown impressive results in machine translation (Meng et al.,
target extraction, and T3: aspect polarity detection. The topics of 2016), question answering (Wang et al., 2016; Li et al., 2017),
this study are T2 and T3 tasks. and document classification (Yang et al., 2016). Authors of (Ma
According to (Al-Smadi et al., 2018) there are several differences et al., 2017) suggested using IAN for English ABSA and achieved
between SA and ABSA; such as (a) linking text parts to specific competitive results. The main idea is interactive learning to repre-
aspects (i.e. extracting target opinion expressions), and (b) Paraphras- sent targets and contexts. To further improve the representation of
ing the text by extracting text parts that discuss the same aspects (i.e. targets and context, we proposed using the features provided by
Battery efficiency and power usage both related to the same aspect). the GRU model in general and bidirectional GRU in particular
ABSA has been a major focus of high-profile Natural Language instead of using single direction Long Short Term Memory (LSTM)
Processing (NLP) conferences and workshops like SemEval due to in the base model, the bi-directional GRU overcomes feed-forward
its importance. SemEval is an annual NLP workshop that offers a models limited ability by extracting unlimited contextual informa-
number of activities to the scientific community to test SA systems. tion from both sentence directions.
The first ABSA joint task was coordinated by SemEval in 2014 To carry out the ABSA tasks, we follow some steps that ulti-
(Pontiki et al., 2014). This task provided the scientific community mately make up the ABSA workflow, and they are in order:
with both standard datasets and joint evaluation procedures. ABSA
activities were effectively replicated over the next two years at 1. Breaking down reviews into individual sentences.
SemEval (Pontiki et al., 2015; Pontiki et al., 2016) as the task 2. preprocessing each sentence (tokenization, stopwords removal,
expanded to include different domains, languages, and problems. and text vectorization).
In fact, SemEval-2016 presented a total of 39 datasets for the ABSA 3. T1: Extracting main opinionated aspects using (BGRU-CNN-
task in 7 domains and 8 languages. In addition, a classifier proven CRF) model.
to perform well for NLP tasks, which is Support Vector Machine 4. T2: Determine sentiments related to each aspect using (IAN-
(SVM), was used in the baseline evaluation procedure. BGRU) model.
More recently, experimental work with innovative machine
learning methods, called ‘‘Deep Learning” multi-layer processing The overall workflow for the proposed ABSA approach is shown
technology that utilizes consecutive unit layers to build on previ- in Fig. 1.
ous outputs, was demonstrated using the backpropagation algo- In this paper, ABSA research tasks are performed by a type of
rithm (LeCun et al., 2015). On each layer, the inputs are RNN (GRU), where two GRU models were built, as follows: (a) DL
converted to numerical representations, which are later classified. architecture based on a state-of-the-art model that utilizes the rep-
Therefore, an increasingly greater degree of abstraction is obtained resentations of both words and characters through the combina-
(Goodfellow et al., 2016). tion of bidirectional GRU, CNN, and CRF (BGRU-CNN-CRF) to
DL is considered as one of the highly suggested techniques in extract the main opinionated aspects (i.e. T2: (OTE)) (b) IAN based
machine learning for various NLP challenges such as SA (Kwaik on bidirectional GRU (IAN-BGRU) is implemented to identify senti-
et al., 2019; Luo, 2019), machine translation (Ameur et al., 2017; ment polarity toward extracted aspects from T2 (i.e. T3).
Li et al., 2019), named entity recognition (Khalifa and Shaalan, The main contributions of this study are:
2019), and speech recognition (Zerari et al., 2019; Algihab et al.,
2019). The strength of DL is that, aside from its great performance, 1. The proposed models did not rely on any handcrafted features
it does not rely on handcrafted features or external resources. or external resources such as lexicons, which considered one
Word embedding or distributed representations improve neural of the tools that are not widely available in the public domain
network performance and enhances DL models. Two common for analyzing and processing Arabic content and requires great
ways to embed words available in Arabic: Word2Vec (Mikolov effort in collection.
et al., 2013) and FastText (Bojanowski et al., 2017). 2. The proposed models are better than baseline research on both
First, Word2Vec makes use of small neural networks to calcu- tasks having 39.7% enhancement in F1-score for opinion target
late word embedding based on word context. There are two ways extraction (T2) and 7.58% in accuracy for aspect-based senti-
to put that approach into practice. The first is a continuous bag of ment polarity classification (T3). Obtaining F1 score of 70.67%
words or CBOW. In this method, the network attempts to predict for T2, and accuracy of 83.98% for T3.
which word is most likely given the context. Skip-gram is the sec-
ond approach, the idea is very similar but the network is working The rest of this paper is arranged as follows: Section 2 addresses
in the opposite direction, the network attempts to predict the con- literature reviews of ABSA; Section 3 illustrates the proposed mod-
text given target word. In several areas of NLP, Word2Vec has els; Section 4 explains the dataset and the baseline approach; Sec-
proved useful. But one unresolved issue has been unknown term tion 5 Presents results and discussion; finally, Section 6 concludes
generalization. Second, FastText which was founded in 2016 by the paper and outlines future work plans.
Facebook vowed to resolve this obstacle.
A new development in DL has emerged, it is the attention mech-
2. Related work
anism. The attention mechanism has achieved good success in
computer vision and in many NLP applications such as document
This section is divided into two main sub-sections covering
sentiment classification, document summarization, named entity
related works in English and Arabic ABSA separately, each subsec-
recognition, and machine translation. The attention mechanism
tion highlights the effective studies that have been applied on OTE
of a neural network allows learning properly by focusing selec-
and aspect sentiment polarity tasks.
tively on the major parts of the sentence while performing a task.
Recently, the attention mechanism has been relied upon in sev-
eral DL-based models for SA (Ma et al., 2017; Huang et al., 2018; 2.1. English ABSA
Liu et al., 2018; Yang et al., 2018). The attention mechanism allows
the neural network to concentrate on the various parts of a sen- 2.1.1. Opinion Target Extraction (OTE)
tence that relate to each aspect when the sentence contains differ- Deep learning methods were used for the first time on the Opin-
ent aspects. ion target extraction task by (Irsoy and Cardie, 2014) instead of
6653
M. M.Abdelgwad, Taysir Hassan A Soliman, A. I.Taloba et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 6652–6662
using regular CRF model. They could extract opinion targets using Connection LSTM (TC-LSTM). The TD-LSTM splits the sentence into
deep recurrent neural networks. the left and right parts around the aspect and flows into two LSTM
The authors of (Liu et al., 2015) also applied different RNN vari- models in different forward and backward sequential paths. In
ations on this task like LSTM supported with word embeddings and order to determine the sentiment polarity label, the final hidden
some handcrafted features and achieved better results than CRF vectors of left LSTM and right LSTM are linked to be fed into Soft-
models. max layer. Nonetheless, the interactions between aspect target and
The authors of (Da’u and Salim, 2019) applied CNN model sup- context are not captured by TD-LSTM. To solve this issue, TC-LSTM
ported with word embeddings and POS embeddings to make uses the semantical interaction between the aspect and the con-
sequence labeling easier. Two types of word embeddings were pro- text, by integrating the aspect target and context word embed-
posed: general-purpose and domain-specific word embeddings. dings as the inputs, and transfers it back and forth through two
The best results were achieved when applying the model with different LSTMs, similar to those used in TD-LSTM.
POS and domain-specific word embeddings. The attention mechanism allows the neural network to learn
(Chen et al., 2017) applied Bi-LSTM with CRF on top, to extract properly by focusing selectively on the major parts of the sentence
opinion aspects. They experimented with many datasets and while performing a task. Attention-based LSTM (ATAE-LSTM) was
achieved competitive results. developed by the authors (Wang et al., 2016) to explore the corre-
lation between aspects and contexts through applying the atten-
2.1.2. aspect sentiment polarity classification tion mechanism to assist in identifying the important parts of a
ABSA is a SA branch where research approaches can be catego- sentence towards the stated aspect. ATAE-LSTM combines embed-
rized into two approaches: regular methods of machine learning dings of both the context and aspect and use it as input to LSTM in
and methods based on DL. Sentiment classification at aspect level order to achieve the best possible use of target information. The
is usually considered to be a problem of text classification. hidden LSTM vectors would provide knowledge of the aspect tar-
Text classification methods, like SVM can be applied to ABSA get, that may allow the model to obtain attention weights more
without taking into consideration the specified targets, as indi- accurately.
cated in (Pang et al., 2002). Only one attention can fail to capture various key context words
Several rule-based methods have been developed to deal with associated with different targets at very long distance of depen-
ABSA in some early works, as indicated in (Nasukawa and Yi, dency, so the authors of (Chen et al., 2017) suggested using multi-
2003; Ding and Liu, 2007) where sentence dependency parsing ple attentions to deal with this problem by generating recurrent
was executed, then predefined rules were used to determine senti- attention on memory (RAM). RAM produces memory from input,
ments related to each aspect. Although these methods have and by applying various attentions on memory, it could extract
achieved satisfactory results, their results rely heavily on the effec- essential information, and for prediction, it uses a non-linear com-
tiveness of labor-intensive handcraft features. bination of extracted features from different attentions using GRU.
NN variations encourage research in different areas of NLP espe- The authors of (Tang et al., 2016) proposed deep memory neural
cially those that need fine-grained analysis like ABSA, as they are network (MemNet), which consists of applying multi-hop atten-
capable of generating new representations using original features tion layers on context word embeddings of the sentence and con-
through several hidden layers. sidered the last hop output as the final target representation.
Recursive NN (Rec-NN) can be used to execute semantic compo- Review typically consists of many sentences and each sentence
sitions on tree structures, so it was adopted to classify sentiments consists of several words, so the review structure is hierarchical.
at aspect level by the authors (Dong et al., 2014; Nguyen and Based on the hierarchical structure of the review, the authors of
Shirai, 2015), by transferring the opinion targets into the root of (Ruder et al., 2016a) developed Bidirectional Hierarichal LSTM
a tree and propagates the sentiment of targets according to context (H-LSTM) for ABSA. They noticed that modeling knowledge in the
and syntax relations. internal review structure can enhance the performance of the
RNN, are widely used in ABSA to identify sentiments polarity at model.
aspect level. LSTM is an effective RNN network that is capable of To deal with complex sentence structures that have many
reducing vanishing gradient problems. However, LSTMs are not aspects, authors of (Liu et al., 2018) have proposed the CABAC
suitable for addressing the interactive correlation between context model. Two types of attention were used, one at the sentence level,
and aspect, leading to an enormous loss of aspect relevant data. To to focus on words that are important with respect to the aspect,
include aspects into the model, the authors of (Tang et al., 2015) and the other attention type was used to take into account the
proposed Target-Dependent LSTM (TD-LSTM) and Target- order and correlation of words using a group of memories. While
6654
M. M.Abdelgwad, Taysir Hassan A Soliman, A. I.Taloba et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 6652–6662
effective, LSTM cannot train in parallel and tends to be time con- was applied on the Arabic hotel reviews dataset and achieved
suming as is the case with other RNNs, since they are time-series state-of-the-art results.
NN. Therefore a simpler, more accurate, and faster model based In this paper, we investigated the modeling power of BGRU in
on CNN and gating mechanism was proposed by the authors of different Arabic ABSA tasks. For OTE, our model (BGRU-CNN-CRF)
(Xue and Li, 2018) than conventional LSTM models with attention depends mainly on BGRU to extract word-level features. Also,
mechanisms, as their model calculations can easily be paralyzed two different types of word embeddings were used to train our
during training and have no dependence. In their research, they model (fastText and Word2vec). High results were observed after
concentrated on only two tasks: aspect category analysis and applying fastText embeddings. Despite the simplicity of our model,
aspect sentiment analysis, and in both of them, they got great it outperformed the baseline and achieved high results close to (Al-
results. Dabet et al., 2020).
CNN also serves as a supplementary method for finding key For the third task, IAN based on Bidirectional GRU (IAN-BGRU)
local features like linguistic patterns (CNN-LP) (Poria et al., 2016) is implemented to identify sentiment polarity toward extracted
or as an efficient way to substitute attention for ABSA like aspects. The evaluation results showed that our model is compet-
Target-Specific Transformation Networks (Li et al., 2018). itive and achieved the state of the art results.
To deal with this task (Li et al., 2019) explored BERT embed-
dings (Devlin et al., 2018) with various simple neural networks
such as Linear, GRU, conditional random field, and self-attention 3. Proposed methods
layers. The experimental results showed BERT-based neural net-
works achieved higher results compared to non-BERT complex In this section, the proposed models previously mentioned in
models. ABSA workflow in Fig. 1 will be explained in details. The proposed
models are mainly based on GRU, which is a form of RNN. RNN is
an Artificial Neural Network (ANN), that is used in different NLP
2.2. Arabic ABSA applications. RNN was configured to identify the sequential data
properties and then predict the following scenarios using patterns.
Despite the large numbers of Arabic speakers and being a rich The main advantage of RNN over feed-forward neural networks is
morphological language, the number of works currently available its ability to process inputs of any length and remember all infor-
in the Arabic ABSA still limited, most of which are listed in the next mation all the time, which is very useful in any time-series predic-
subsection. tion. RNN uses recurrent hidden units, and their activation is based
on the previous step each time. The key disadvantages of RNNs are
2.2.1. OTE and aspect sentiment polarity classification the problems of gradient vanishing/exploding, which make it
For Arabic ABSA, a few attempts were made, where the HAAD harder to train and deal with major problems of machine learning
dataset was presented in 2015 along with a baseline study (Al- (Bengio et al., 1994; Pascanu et al., 2013). GRU has been suggested
Smadi et al., 2015). HAAD was annotated via the SemEval-2014 as a solution to this problem and has proven to be effective in
framework. many NLP problems.
In support of ABSA task, another benchmarked dataset of Arabic Accordingly, two models based on GRU have been proposed to
hotel reviews was noted in 2016, this dataset was used to validate handle the research tasks (BGRU-CNN-CRF for T2, and IAN-BGRU
some of the methods proposed in the Multilingual ABSA SemEval- for T3).
2016 Task5.
The authors of (Al-Smadi et al., 2019) suggested applying a set 3.1. Background
of supervised machine learning-based classifiers enhanced with a
set of hand-crafted features such as morphological, syntactic, and This part provides a detailed description of all the components
semantic features on the Arabic hotel reviews dataset. Their used to build our models. They include GRU, BGRU, CNN, and CRF.
approach covered three tasks: identify aspect categories, extract
opinion targets, and identify the sentiment polarity. The evaluation
results showed that their approach was very competitive and 3.1.1. Gated Recurrent Unit (GRU)
effective. Recently, GRU (Cho et al., 2014), a family of RNNs, has been pro-
In addition, the authors of (Al-Smadi et al., 2018) proposed posed to deal with gradient vanishing/exploding problems. GRU is
applying two supervised machine learning-based approaches a powerful and simple alternative to LSTM networks (Hochreiter
namely SVM and RNN on the Arabic hotel reviews dataset in line and Schmidhuber, 1997). Similar to LSTM models, GRU is designed
with task 5 of SemEval-2016. The researchers investigated the to adaptively update or reset the memory content using r j reset
three tasks: aspect category identification, (OTE), and sentiment gate and a zj update gate that are similar to the forget and input
polarity identification. The findings indicate that SVM outperforms gates of LSTM. Compared to LSTM, GRU does not have a memory
RNN in all tasks. Though, the deep RNN was found to be faster and cell and only has two gates. The GRU activation ht in time t is
j
To calculate the update gate zt for time step t, we use the pre-
vious hidden states ht1 and the current input xt in the following
equation:
~t ¼ tanhðWxt þ r t Uht1 Þ
h ð3Þ
where is the Hadamard product (also known as the element-wise
product) and r t represent the reset gate which used to determine
the amount of information to forget from the past. We use this for- Fig. 2. Gated Recurrent Unit.
mula to calculate it:
The main tasks of our research were tested using the Arabic
hotel reviews dataset1. The dataset was prepared as a part of
SemEval-2016 Task-5 which was a multilingual ABSA task that
includes customer reviews in eight languages and seven domains
(Pontiki et al., 2016). The Arabic hotel reviews dataset contains
24,028 annotated ABSA tuples divided as follow: 19,226 tuples for
training and 4,802 tuple for testing. Furthermore, both text level
(2291 reviews texts) and sentence level (6029 sentences) annota-
tions were provided for the dataset. This research concentrated only
on the tasks at sentence level. Table 1, indicates the size and distri-
bution of the dataset to the tasks of the research.
The dataset was supplemented with baseline research based on
SVM and N-grams as features. The results obtained from that
research are considered as a baseline for each task and are men-
tioned in this paper in the results section related to each task. Fig. 4. Architecture of the proposed BGRU-CNN-CRF model.
5. Experimentation and results For model training 70% of the dataset was used, for validation 10%
was used and for testing 20%. The Pytorch library was used to
Both models (BGRU-CNN-CRF for T2, and IAN-BGRU for T3) implement all the neural networks. The computations for each
were trained and tested using the Arabic hotel reviews dataset. model were performed separately on the GeForce GTX 1080 Ti
GPU. This section explains training of each model based on the tar-
1
https://github.com/msmadi/ABSA-Hotels geted task.
6657
M. M.Abdelgwad, Taysir Hassan A Soliman, A. I.Taloba et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 6652–6662
Table 1
The size and distribution of the dataset to the tasks of the research (Pontiki et al., 2016).
5.1. opinion target expression extraction task (T2) in every word (BLSTM-CRF + LSTM-char). To initialize the word
embedding lookup table, two techniques have been introduced.:
To handle the OTE task, BGRU-CNN-CRF model is implemented (a) word2vec, (b) fastText. BLSTM-CRF + LSTM-char with word2vec
and trained using the features of word and character embedding. word embeddings attained (F-1 = 66.32%) and (F-1 = 69.98%) with
FastText embeddings.
5.1.1. Evaluation method The authors of (Al-Dabet et al., 2020) proposed attention based
F1 metric is adopted for performance evaluation of OTE which is neural network to extract opinion targets. Their model composed
referred as the weighted harmonic mean for accuracy and recall. of Bidirectional lstm with CNN as encoder and Bidirectional lstm
The score is computed by as decoder with attention and CRF. Their model achieved F1-
score = 72.83.
1 1 1
recall þ precision precision recall The differences between our model (CNN-BGRU-CRF) and the
F1 ¼ ð Þ ¼2 ð5Þ
2 precision þ recall Bi-LSTM-CRF (Al-Smadi et al., 2019) are:
The BLSTM layer they used for character embedding has been
5.1.2. Hyperparameters Setting
substituted by the CNN layer. As CNN has fewer training param-
Two types of word embeddings were proposed: (a) fastText
eters than BLSTM, training performance is higher and was rec-
embeddings based on CBOW model. (b) Word2Vec embeedings.
ommended as the preferred approach as mentioned in (Zhai
For Word2eVec: we intilaized both context word embeddings
et al., 2018) when compared the performance of BLSTM-CRF
and targets word embeddings with AraVec (Soliman et al., 2017)
models with CNN-based and LSTM-based character-level
which is a pre-trained distributed word representation (word
embeddings for biomedical named entity recognition.
embedding) that intends to provide free-to-use and effective word
We replaced the LSTM in the BLSTM-CRF base model with GRU
embedding models for Arabic NLP research community. It is basi-
due to the simplicity of the GRU (since GRUs have fewer param-
cally a word2vec model that has been trained on Arabic data.
eters), ease of training, and thus ease of learning.
Two different models, unigrams and n-grams are provided by Ara-
Vec, which are built on top of various domains with Arabic content.
The differences between our model and the Attention-Based
In this research, we used CBOW-unigram model built on top of
Neural Model (Al-Dabet et al., 2020) are:
Twitter tweets,
We set the dimensions of word embeddings (for both fastText
Our model (BGRU-CNN-CRF) relies mainly on Bi-GRU to extract
and Word2vec) and GRU hidden states to 100.
word level features instead of Bi-LSTM.
All character embeddings are initialized with uniform samples
Also, two different types of word embeddings were used to train
of.
p p our model (fastText and Word2vec).
½ 3dim; þ 3dim, where dim = 25 in addition, 30 filters with a
window size of 3 are used.
Table 2 provides a comparison of F1 scores between the pro-
Mini-batch stochastic gradient descent (SGD) with batch size 16
posed model, the baseline, and the above-mentioned models using
and momentum 0.9 is used to optimize parameters. We selected an
the Arabic hotel reviews dataset for evaluation on T2.
initial rate of (g0=0.01), and at each training epoch, the learning
Fig. 5 shows that, despite the simplicity of our model, it outper-
rate is updated as gt=g0/(1 +qt), with rate of decay q= 0.04, and
formed the Bi-LSTM-CRF and achieved high results close to (Al-
t is the number of epochs completed. Gradient clipping of 5.0
Dabet et al., 2020) when using the fastText embeddings.
was used to reduce the impact of ‘‘gradient exploding”.
The initial words embeddings are modified by updating the gra-
dients of the neural network model via back-propagation. Early 5.2. aspect sentiment polarity identification (T3)
stopping (Caruana et al., 2001) was used based on validation set
performance. The best parameters appear in about 60 epochs, 5.2.1. Evaluation method
according to our experiments. To determine models efficiency in the aspect sentiment polarity
In order to minimize overfitting, our model is regularized using identification task, accuracy metric is adopted that can be
the drop-out method. Dropout is applied in several places in our expressed as:
model, namely: to character embeddings on all input and output T
vectors of GRU and before input to CNN. We fixed the dropout rate Accuracy ¼ ð6Þ
N
at 0.5 for all dropout layers.
where T and N, respectively, refer to the correctly predicted number
5.1.3. Results of samples and the overall samples number. Accuracy is the per-
We did not find any published research on the same task that centage of samples predicted correctly out of all samples. A
applied neural network models to the Arabic hotel reviews dataset better-performed system has better accuracy.
and reported better results than (Al-Smadi et al., 2019 and Al-
Dabet et al., 2020), so we compared our results with their. 5.2.2. Hyperparameters Setting
The Authors of (Al-Smadi et al., 2019) have expanded the basic Both context word embeddings and targets word embeddings
model BLSTM-CRF utilized for sequence labeling with character- are initialized by AraVec (Soliman et al., 2017). In particular, the
level word embeddings by applying BLSTM to characters sequence CBOW-unigram model built on top of Twitter tweets.
6658
M. M.Abdelgwad, Taysir Hassan A Soliman, A. I.Taloba et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 6652–6662
Table 2 (Wang et al., 2016). ATAE-LSTM model was applied by the authors
Comparison of F1 scores between the proposed of (Al-Smadi et al., 2019) (AB-LSTM-PC) on Arabic hotel reviews
model, the baseline, and other related works on
T2.
dataset.
MemNet in order to correctly catch the importance of the con-
Model F1(%) textual words, MemNet applied multi-hop attention layers on con-
Baseline 30.97 text word embeddings of the sentence and considered the last hop
Bi-LSTM-CRF (word2vec) 66.32 output as the final target representation (Tang et al., 2016).
Bi-LSTM-CRF (fastText) 69.98
CNN-BGRU-CRF (word2vec) 69.44
IAN-LSTM employs two LSTMs for interactive modeling of con-
CNN-BGRU-CRF (fastText) 70.67 text and target. Context hidden states are used to generate the tar-
Attention-Based Neural Model 72.83 get attention vector, and target hidden states are used to generate
the context attention vector. On the basis of these two attention
vectors, context representations and target representation are cre-
We initialize all weight matrices using samples from the uni- ated, then concatenated and ultimately fed into softmax for classi-
form distribution Uð0:1; 0:1Þ. we set the dimensions of word fication (Ma et al., 2017).
embeddings, attention vectors, and hidden states to 300. IAN-BLSTM extends IAN-LSTM by using bidirectional LSTM
After tunning the parameters of the model by making several instead of uni-direction LSTM to model aspect term and the
runs using different parameters with specified transformations, context.
we could found the combination of parameters that yield the best IAN-GRU like IAN-LSTM but use GRU instead of LSTM to model
results, which are as follow: Back-propagation algorithm is used to aspect term and its context.
train the model in sentence level using Adam (Kingma and Ba, IAN-BGRU extends IAN-GRU by using bidirectional GRU instead
2014) for optimization with a learning-rate of 3e5 , L2- of uni-direction GRU to model the aspect term and context.
regularization weight of 2e5 , dropout rate of 0.3, batch size of Models with * sign mean that, results were adopted from their
64 and number of epochs equal to 12. research without being practically re-implemented.
To our knowledge, no other research has applied models with-
out * sign on Arabic ABSA in general and on Arabic hotel reviews in
5.2.3. Comparison Models particular. Table 3 provides a comparison of the accuracy results
Baseline * SVM is trained only with N-grams features. between the proposed model and the above-mentioned models
INSIGHT-1 * Won 1st place in the SemEval-2016 Task-5 compe- using the Arabic hotel reviews dataset for evaluation on T3.
tition evaluated on the Arabic hotel reviews dataset. They concate-
nated aspect embedding with every word embedding and fed the
mixture to CNN for aspect sentiment and category identification 5.2.4. Discussions
tasks (Ruder et al., 2016b). LSTM achieves the poorest performance out of all neural net-
LSTM uses only one LSTM to model the sentence and the last work baseline methods because it deals with targets on a par with
hidden states as a representation for final classification. other context words. As the target information is not used
TD-LSTM splits the sentence into the left and right parts around sufficiently.
the aspect and flows into two LSTM models in different forward TD-LSTM outperforms LSTM, as it evolves from the standard
and backward sequential paths. In order to determine the senti- LSTM and handles both left and right contexts of the targeted
ment polarity label, the final hidden vectors of the left LSTM and aspect separately. The targets are represented twice and are
the right LSTM are linked to be fed into Softmax (Tang et al., 2015). emphasized in certain ways in the final representation.
AB-LSTM-PC * attention-based LSTM with Aspect integration Moreover, AB-LSTM-PC stably outperforms TD-LSTM for its
uses the attention mechanism that allows to focus more on the introduction of the attention mechanism. As it collects a range of
context relevant to the targeted aspects. For each word embedding, significant contextual information, under the guidance of the tar-
ATAE-LSTM adds aspects embedding that reinforce the model by get and creates more accurate representations for ABSA. AB-
learning the hidden association between context and aspect LSTM-PC also affirms the importance of modeling targets by
Fig. 5. Achieved results in T2 (OTE) for the two proposed models (CNN-BGRU-CRF (word2vec) and CNN-BGRU-CRF (fastText)) compared to the baseline, and other related
works.
6659
M. M.Abdelgwad, Taysir Hassan A Soliman, A. I.Taloba et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 6652–6662
Fig. 6. Achieved results by the IAN-BGRU model in comparison to the baseline and other Arabic models on T3.
6660
M. M.Abdelgwad, Taysir Hassan A Soliman, A. I.Taloba et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 6652–6662
aspect opinion target extraction, and T3: aspect polarity detection. Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9
(8), 1735–1780.
The topics of this study are T2 and T3. Two GRU based models were
Hu, M., Liu, B., 2004. Mining and summarizing customer reviews. In: Proceedings of
adopted to handle research work. (a) DL model that takes advan- the tenth ACM SIGKDD international conference on Knowledge discovery and
tage of word and character representations by combining bidirec- data mining, pp. 168–177.
tional GRU, Convolutional Neural Network (CNN), and Conditional Huang, B., Ou, Y., Carley, K.M., 2018. Aspect level sentiment classification with
attention-over-attention neural networks. In: International Conference on
Random Field (CRF) making up the (BGRU-CNN-CRF) model to Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior
extract the main opinionated aspects (OTE). (b) an interactive Representation in Modeling and Simulation. Springer, pp. 197–206.
attention network based on bidirectional GRU (IAN-BGRU) to iden- Irsoy, O., Cardie, C., 2014. Opinion mining with deep recurrent neural networks. In:
Proceedings of the 2014 conference on empirical methods in natural language
tify sentiment polarity toward extracted aspects. We evaluated our processing (EMNLP), pp. 720–728.
models using the benchmarked Arabic hotel reviews dataset. The Khalifa, M., Shaalan, K., 2019. Character convolutions for arabic named entity
results indicate that the proposed methods are better than baseline recognition with long short-term memory networks. Computer Speech
Language 58, 335–346.
research on both tasks having 39.7% enhancement in F1-score for Kingma, D.P. Ba, J., 2014. Adam: A method for stochastic optimization. arXiv
opinion target extraction (T2) and 7.58% in accuracy for aspect- preprint arXiv:1412.6980.
based sentiment polarity classification (T3). Achieving F1 score of Kwaik, K.A., Saad, M., Chatzikyriakidis, S., Dobnik, S., 2019. Lstm-cnn deep learning
model for sentiment analysis of dialectal arabic. In: International Conference on
70.67% for T2, and accuracy of 83.98% for T3. Arabic Language Processing. Springer, pp. 108–121.
For future work, we intend to apply transformer based models Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C., 2016. Neural
in ABSA tasks. architectures for named entity recognition. arXiv preprint arXiv:1603.01360.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep Learning Nature 521 (7553), 436–444.
Li, H., Min, M.R., Ge, Y., Kadav, A., 2017. A context-aware attention network for
interactive question answering. In: Proceedings of the 23rd ACM SIGKDD
Declaration of Competing Interest International Conference on Knowledge Discovery and Data Mining, pp. 927–
935.
The authors declare that they have no known competing finan- Li, Q., Zhang, X., Xiong, J., Hwu, W.-M., Chen, D., 2019. Implementing neural machine
translation with bi-directional gru and attention mechanism on fpgas using hls.
cial interests or personal relationships that could have appeared
In: Proceedings of the 24th Asia and South Pacific Design Automation
to influence the work reported in this paper. Conference, pp. 693–698.
Li, X., Bing, L., Lam, W., Shi, B., 2018. Transformation networks for target-oriented
sentiment classification. arXiv preprint arXiv:1805.01086.
References Li, X., Bing, L., Zhang, W., Lam, W., 2019b. Exploiting bert for end-to-end aspect-
based sentiment analysis. arXiv preprint arXiv:1910.00883.
Liu, P., Joty, S., Meng, H., 2015. Fine-grained opinion mining with recurrent neural
Al-Dabet, S., Tedmori, S., Al-Smadi, M., 2020. Extracting opinion targets using
networks and word embeddings. In: Proceedings of the 2015 conference on
attention-based neural model. SN Computer Sci. 1 (5), 1–10.
empirical methods in natural language processing, pp. 1433–1443.
Al-Smadi, M., Al-Ayyoub, M., Jararweh, Y., Qawasmeh, O., 2019. Enhancing aspect-
Liu, Q., Zhang, H., Zeng, Y., Huang, Z., Wu, Z., 2018. Content attention model for
based sentiment analysis of arabic hotels’ reviews using morphological,
aspect based sentiment analysis. In: Proceedings of the 2018 World Wide Web
syntactic and semantic features. Inform. Processing Manage. 56 (2), 308–319.
Conference, pp. 1023–1032.
Al-Smadi, M., Qawasmeh, O., Al-Ayyoub, M., Jararweh, Y., Gupta, B., 2018. Deep
Luo, L.-X., 2019. Network text sentiment analysis method combining lda text
recurrent neural network vs. support vector machine for aspect-based
representation and gru-cnn. Personal Ubiquitous Computing 23 (3–4), 405–412.
sentiment analysis of arabic hotels’ reviews. J. Comput. Sci. 27, 386–393.
Ma, D., Li, S., Zhang, X., Wang, H., 2017. Interactive attention networks for aspect-
Al-Smadi, M., Qawasmeh, O., Talafha, B., Quwaider, M., 2015. Human annotated
level sentiment classification. arXiv preprint arXiv:1709.00893.
arabic dataset of book reviews for aspect based sentiment analysis. In: 2015 3rd
Ma, X. Hovy, E., 2016. End-to-end sequence labeling via bi-directional lstm-cnns-
International Conference on Future Internet of Things and Cloud. IEEE, pp. 726–
crf. arXiv preprint arXiv:1603.01354.
730.
Meng, F., Lu, Z., Li, H., Liu, Q., 2016. Interactive attention for neural machine
Al-Smadi, M., Talafha, B., Al-Ayyoub, M., Jararweh, Y., 2019. Using long short-term
translation. arXiv preprint arXiv:1610.05011.
memory deep neural networks for aspect-based sentiment analysis of arabic
Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013. Efficient estimation of word
reviews. Int. J. Mach. Learning Cybern. 10 (8), 2163–2175.
representations in vector space. arXiv preprint arXiv:1301.3781.
Algihab, W., Alawwad, N., Aldawish, A., AlHumoud, S., 2019. Arabic speech
Nasukawa, T., Yi, J., 2003. Sentiment analysis: Capturing favorability using natural
recognition with deep learning: A review. In: International Conference on
language processing. In: Proceedings of the 2nd international conference on
Human-Computer Interaction. Springer, pp. 15–31.
Knowledge capture, pp. 70–77.
Ameur, M.S.H., Meziane, F., Guessoum, A., 2017. Arabic machine transliteration
Nguyen, T.H., Shirai, K., 2015. Phrasernn: Phrase recursive neural network for
using an attention-based encoder-decoder model. Procedia Computer Sci. 117,
aspect-based sentiment analysis. In: Proceedings of the 2015 Conference on
287–297.
Empirical Methods in Natural Language Processing, pp. 2509–2514.
Bengio, Y., Simard, P., Frasconi, P., 1994. Learning long-term dependencies with
Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs up? sentiment classification using
gradient descent is difficult. IEEE Trans. Neural Networks 5 (2), 157–166.
machine learning techniques. arXiv preprint cs/0205070.
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T., 2017. Enriching word vectors with
Pascanu, R., Mikolov, T., Bengio, Y., 2013. On the difficulty of training recurrent
subword information. Trans. Assoc. Comput. Linguistics 5, 135–146.
neural networks. In: International conference on machine learning, pp. 1310–
Caruana, R., Lawrence, S., Giles, C.L., 2001. Overfitting in neural nets:
1318.
Backpropagation, conjugate gradient, and early stopping. In: Advances in
Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-
neural information processing systems. pp. 402–408.
Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., et al., 2016. Semeval-
Chen, P., Sun, Z., Bing, L., Yang, W., 2017. Recurrent attention network on memory
2016 task 5: Aspect based sentiment analysis. In: International workshop on
for aspect sentiment analysis. In: Proceedings of the 2017 conference on
semantic evaluation, pp. 19–30.
empirical methods in natural language processing, pp. 452–461.
Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., Androutsopoulos, I., 2015.
Chen, T., Xu, R., He, Y., Wang, X., 2017. Improving sentiment analysis via sentence
Semeval-2015 task 12: Aspect based sentiment analysis. In: Proceedings of the
type classification using bilstm-crf and cnn. Expert Systems Appl. 72, 221–230.
9th international workshop on semantic evaluation (SemEval 2015). pp. 486–
Chiu, J.P., Nichols, E., 2016. Named entity recognition with bidirectional lstm-cnns.
495.
Trans. Assoc. Comput. Linguistics 4, 357–370.
Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I.,
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,
Manandhar, S., 2014. SemEval-2014 task 4: Aspect based sentiment analysis. In:
Bengio, Y., 2014. Learning phrase representations using rnn encoder-decoder
Proceedings of the 8th International Workshop on Semantic Evaluation
for statistical machine translation. arXiv preprint arXiv:1406.1078.
(SemEval 2014). Association for Computational Linguistics, Dublin, Ireland,
Da’u, A., Salim, N., 2019. Aspect extraction on user textual reviews using multi-
pp. 27–35.
channel convolutional neural network. PeerJ Computer Sci. 5, e191.
Poria, S., Cambria, E., Gelbukh, A., 2016. Aspect extraction for opinion mining with a
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K., 2018. Bert: Pre-training of deep
deep convolutional neural network. Knowledge-Based Systems. 108, 42–49.
bidirectional transformers for language understanding. arXiv preprint
Rana, T.A., Cheah, Y.-N., 2016. Aspect extraction in sentiment analysis: comparative
arXiv:1810.04805.
analysis and survey. Artificial Intell. Rev. 46 (4), 459–483.
Ding, X., Liu, B., 2007. The utility of linguistic rules in opinion mining. In:
Ruder, S., Ghaffari, P., Breslin, J.G., 2016a. A hierarchical model of reviews for aspect-
Proceedings of the 30th annual international ACM SIGIR conference on Research
based sentiment analysis. arXiv preprint arXiv:1609.02745.
and development in information retrieval, pp. 811–812.
Ruder, S., Ghaffari, P., Breslin, J.G., 2016b. Insight-1 at semeval-2016 task 5: Deep
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., Xu, K., 2014. Adaptive recursive neural
learning for multilingual aspect-based sentiment analysis. arXiv preprint
network for target-dependent twitter sentiment classification. In: Proceedings
arXiv:1609.02748.
of the 52nd annual meeting of the association for computational linguistics
Soliman, A.B., Eissa, K., El-Beltagy, S.R., 2017. Aravec: A set of arabic word
(volume 2: Short papers), pp. 49–54.
embedding models for use in arabic nlp. Procedia Computer Sci. 117, 256–265.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep learning. MIT press.
6661
M. M.Abdelgwad, Taysir Hassan A Soliman, A. I.Taloba et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 6652–6662
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014. Yang, M., Qu, Q., Chen, X., Guo, C., Shen, Y., Lei, K., 2018. Feature-enhanced attention
Dropout: a simple way to prevent neural networks from overfitting. J. Mach. network for target-dependent sentiment classification. Neurocomputing. 307,
Learn. Res. 15 (1), 1929–1958. 91–97.
Tang, D., Qin, B., Feng, X., Liu, T., 2015. Effective lstms for target-dependent Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E., 2016. Hierarchical attention
sentiment classification. arXiv preprint arXiv:1512.01100. networks for document classification. In: Proceedings of the 2016 conference of
Tang, D., Qin, B., Liu, T., 2016. Aspect level sentiment classification with deep the North American chapter of the association for computational linguistics:
memory network. arXiv preprint arXiv:1605.08900. human language technologies, pp. 1480–1489.
Wang, B., Liu, K., Zhao, J., 2016. Inner attention based recurrent neural networks for Zerari, N., Abdelhamid, S., Bouzgou, H., Raymond, C., 2019. Bidirectional deep
answer selection. In: Proceedings of the 54th Annual Meeting of the Association architecture for arabic speech recognition. Open Computer Sci. 9 (1), 92–102.
for Computational Linguistics (Volume 1: Long Papers), pp. 1288–1297. Zhai, Z., Nguyen, D.Q., Verspoor, K., 2018. Comparing cnn and lstm character-level
Wang, Y., Huang, M., Zhu, X., Zhao, L., 2016. Attention-based lstm for aspect-level embeddings in bilstm-crf models for chemical and disease named entity
sentiment classification. In: Proceedings of the 2016 conference on empirical recognition. arXiv preprint arXiv:1808.08450.
methods in natural language processing, pp. 606–615. Zhao, J., Liu, K., Xu, L., 2016. Sentiment analysis: mining opinions, sentiments, and
Xue, W. Li, T., 2018. Aspect based sentiment analysis with gated convolutional emotions.
networks. arXiv preprint arXiv:1805.07043.
6662