0% found this document useful (0 votes)
39 views7 pages

Aspect-Base Sentiment Analysis With Dual Contrastive Learning 2

The document discusses aspect-based sentiment analysis and a proposed model that uses dual contrastive learning with pre-trained transformer models like BERT. Specifically: 1) Aspect-based sentiment analysis aims to identify sentiment polarity for aspects in sentences and plays an important role in business intelligence. 2) The proposed model uses induced trees from fine-tuning pre-trained models and dual contrastive learning to simultaneously learn features and classifier parameters. 3) Experimental results on SemEval2014 benchmarks show the performance of the proposed model.

Uploaded by

deepak it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views7 pages

Aspect-Base Sentiment Analysis With Dual Contrastive Learning 2

The document discusses aspect-based sentiment analysis and a proposed model that uses dual contrastive learning with pre-trained transformer models like BERT. Specifically: 1) Aspect-based sentiment analysis aims to identify sentiment polarity for aspects in sentences and plays an important role in business intelligence. 2) The proposed model uses induced trees from fine-tuning pre-trained models and dual contrastive learning to simultaneously learn features and classifier parameters. 3) Experimental results on SemEval2014 benchmarks show the performance of the proposed model.

Uploaded by

deepak it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2023 9th International Conference on Web Research (ICWR)

Aspect-base Sentiment Analysis with Dual


Contrastive Learning
Akram Karimi Zarandi1, Sayeh Mirzaei*1, Hamed Talebi2
1 School of Engineering Science, College of Engineering, University of Tehran, Iran, [email protected], [email protected]
2
Amirkabir University of Technology, Tehran, Iran, [email protected]
2023 9th International Conference on Web Research (ICWR) | 979-8-3503-9969-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICWR57742.2023.10139077

Abstract—Aspect-based sentiment analysis (ABSA) is a type of deep learning have been widely used [4]. Among these, machine
sentiment analysis that aims to identify the polarity of sentiment learning models are among the popular models, and SVM has
for aspects in a sentence. Also according to the studies, it is an been proven to perform better and thus is a preferable choice for
important research area that plays an important role in business solving various classification problems [2]. Also, assigning
intelligence, marketing and psychology. To solve this problem different polarities to different aspects that may be present in a
different methods based on dictionary, machine learning and deep sentence or user review is called ABSA, which was first
learning have been used. Research shows that among the methods introduced in SemEval 2014. There have been advances in
based on deep learning, Transformers has been able to achieve ABSA today and the emergence of pre-trained language models
good results and help to understand the language better. In this
such as Bidirectional Encoder Representation of Transformers
paper we use induced trees from Fine-tuning pre-trained models
(FT-PTMs). We also use dual contrastive learning and different
(BERT), which are pre-trained with a large amount of data, have
pre-trained models such as BERT, RoBERTa and XLNet in our improved performance in this field, which allow the model to
proposed model. The results obtained from the implementation of learn the two way relationships of each word with the help of
the model in SemEval2014 benchmarks confirm the performance word context[5]. Also recently, the progress of BERT models
of our model. has led to the emergence of different types of it, such as Robustly
Optimized BERT Approach (RoBERTa), this robust model is
Keywords—Natural Language Processing, Aspect Base basically a BERT model, except that it has been trained on much
Sentiment Analysis, Deep Learning, Transformer, Bert, RoBERTa, larger data and has been able to achieve better results [6]. In this
XLNet paper we propose a model that uses the Induced tree of FT-
PTMs and by dual contrastive learning is able to simultaneously
I. INTRODUCTION learn the features of input samples and classifier parameters.
Beliefs are very important in many human activities and are Our contributions in this paper are as follows:
among the most influential factors in our behaviors and choices,
so that our perception of reality and the choices we make depend • Using the Perturbed Masking technique, we obtain
on a significant extent on how others view and evaluate the induced trees of FT-PTMs where aspect terms are
world. The significant growth in the use of the Internet and directly related to sentiment words.
social networks has made people in the community share their
opinions and experiences and even their thoughts on various • We propose a new model based on Induced trees of FT-
issues of daily life and widely on various platforms such as PTMs using dual contrastive learning.
platforms for selling goods, various websites, blogs, citizen • We train and evaluate the model with twitter data set as
surveillance systems, educational systems, transportation well as laptop and restaurant data from SemEval 2014,
systems, in the field of health and treatment, etc. to share. In [1] the obtained results show the good performance of our
it was shown how social media express collective wisdom, and model.
if used correctly, they can show a very powerful and accurate
indicator of future results, Therefore, it is very important for II. DEFINITION OF BASIC CONCEPTS AND PROBLEM DESIGN
organizations and governments to use this huge data generated
by social networks to understand the main interest of people A. Different levels of sentiment analysis
about a specific product or providing a specific service. In this In general, the process of sentiment analysis can be done at
way, this possibility is provided for individuals or even one of the following three levels: document level, sentence level,
organizations so that when they need to make a decision, they and aspect level.
can check the available opinions and use them as an experience
Sentiment analysis at the document level shows the overall
for their decision[2]. Although the advancement of technology
sentiment in a document and at the sentence level expresses the
today allows users of social networks to express their feelings in
sentiment of that sentence, but in the sentiment analysis at the
different forms (such as audio, video or text), among them "text"
aspect level, different aspects are identified and the sentiments
is still the most common and widely used. Textual sentiment
related to each of them are expressed. For example, the sentence
analysis is one of the most up-to-date, attractive and challenging
"The phone is great, it just gets a little hot", shows a positive
research topics in the field of natural language processing,
feeling towards the phone and a negative feeling towards the
which, due to its importance and necessity in social and business
battery. ABSA by extracting aspects of reviews and their
issues, far exceeds computer science and researchers in various
polarity shows us how each product is in terms of different
fields (i.e. government information, psychology, business,
aspects[2]. And this is especially useful when manufacturers or
computer science, artificial intelligence, recommender systems,
service providers want to know which component or feature of
as well as healthcare and medicine, etc.) try to extract sentiment
their product is not good enough and needs to be improved based
from a text[3]. To solve the problem of text classification, which
on negative customer feedback. Although document and
is a complex problem due to the unstructured form of the text,
sentence levels have their own uses, the most challenging level
various methods based on dictionaries, machine learning, and

276
979-8-3503-9969-1/23/$31.00 ©2023 IEEE
ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 31,2024 at 04:27:56 UTC from IEEE Xplore. Restriction
is aspect based sentiment analysis. For example, consider a text preprocessing, feature extraction, and feature selection[2],
situation where the overall sentiment of a sentence is considered And the classification step can also be done using different
as positive, while the sentence mentions several positive aspects sentiment analysis approaches (for example, machine learning),
and also mentions one negative aspect, which is more important and finally we will have the output[3].
in the eyes of users and therefore the sentiment generally it
should be mentioned negatively[3]. In Fig.1, we see the levels C. Text classification
of sentiment analysis. In natural language processing, ABSA can be considered as
a text classification problem, which is defined as assigning a text
In ABSA work, there are two main and important sub-tasks or sequence to a specific category, because it consists of several
that must be addressed, the first is to identify the aspect, and the
operations that end with the classification of whether a given
second is to classify the sentiment of the text according to these
text expresses positive, negative, or neutral sentiments[2]. This
extracted aspects[2]. In this paper the aspects are predefined and
classification can be seen in Fig.3.
we do not need to identify them.
B. Text preprocessing III. NECESSITY AND PURPOSE OF RESEARCH
Although, the problem of sentiment analysis may seem like Although sentiment analysis has many challenges due to the
an easy process, but in fact, it needs to consider many subtasks management of textual unstructured data, it has received much
of natural language processing, such as sarcasm detection and attention and development due to its practical applications.
subjectivity detection, therefore, when working in this field, we Advances in technology have made sentiment analysis an
have to deal with unsolved issues such as negation control and increasingly popular research field in recent years that allowing
sarcasm detection. We are facing, which is one of the challenges companies and organizations to understand whether or not
in this activity. In addition, the text we encounter in sentiment customers are satisfied with their products and use it to improve
analysis is not always organized like newspapers or books, but their services, it also allows customers to make the best decision
can also contain spelling mistakes, acronyms, or idioms[2]. in choosing the desired product or service. Therefore, it can be
Therefore, to extract sentiments from a text, different steps are seen that there is a need to analyze the data generated by the user,
needed, the general process of which is shown in Fig.2. hence the field of sentiment analysis has attracted the attention
of many researchers due to the increase of published papers in
As we can see in Fig.2, The first step is to collect and extract the last decade and a half.
data from different sources, The data processing step includes
IV. RESEARCH BACKGROUND
A. Traditional text classification
To extract information from the text, it is first broken down
into unique and separate words called features, then to classify
it in the sentiment analysis task, these features are compared
with the features of the given aspect. Finally, according to the
result obtained from this comparison, the work of classification
is done. Also, this feature extraction process, which is also called
Fig. 1. Different Levels of sentiment analysis. feature engineering, is an important task in textual sentiment
analysis due to its great impact on the final performance of the
classification task. In the past, dictionary-based handwritten
rules were often used for feature selection, which required high
knowledge and often a lot of resources, so they were replaced by
machine learning methods followed by deep learning methods ,
In simple terms, it can be said that in these methods, a set of data
that includes the text of the user's review as well as the polarity
of the Sentiment relative to it, is considered as an input for the
supervised machine learning algorithm to train the model. Then,
according to the knowledge it has acquired through this training,
the model can predict the sentiment polarity for the data it
encounters for the first time[2].
B. deep learning methods in text classification
Deep learning, inspired by the neural network of the human
brain, provides methods for learning feature representations that
can be used in both supervised and unsupervised methods[2].

Fig. 2. The generic process of sentiment analysis. Fig. 3. classification in sentiment analysis.

277

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 31,2024 at 04:27:56 UTC from IEEE Xplore. Restriction
Since these methods can automatically extract information from performance. In [14] the authors proposed a model called AC
the text, in the field of natural language processing, especially BiLSTM, which includes a BiLSTM, an attention mechanism
sentiment analysis, using this method, very complex models can and a convolution layer, and it was able to achieve good results
be trained with larger data, which has recently obtained very and performance. In [15] the authors proposed Tree LSTM to
good results[4]. In the following, we will examine these
use the semantic expression of sentences, which performed
methods:
better than the basic LSTM. In [16] the authors proposed a
1) CNN-based methods: Convolutional neural networks target-dependent LSTM model is called TD-LSTM. This model
were first proposed in [7] which the results obtained, improved is able to automatically consider target information. Also in[17],
the performance by more than 40% and proved the feasibility of the authors proposed a model that initially uses an LSTM to
using deep networks and deep learning in machine vision. Also, encode and transform the input text into a vector and another
CNNs have recently become very popular in the field of text LSTM to decode the target from the vector, which had good
analysis and have been successfully used in text classification results and performance.
tasks including sentiment analysis. in[8], the authors used two
4) Attention-based methods: The attention mechanism was
models for sentime classification in a supervised manner at the
first introduced in 2014[11], and after that researchers proposed
sentence level, one was parallel CNNs and the other was deep
many models for sentiment analysis based on the attention
CNNs, and they found that both of these models perform better
mechanism or the combination of RNN models and achieved
than the SVM method. Also in [9], the authors proposed a model
good results. In [18], the authors proposed a model for extracting
based on CNN and gate mechanisms, which was simpler and
an interpretable sentence embedding by adding a self-attention
more accurate than the previous models and produced good
mechanism to the CNN layers, which achieved better results
results.
than the models. In [19], the authors introduced the ATAE
2) RNN-based methods: When we want to process an input LSTM model by combining LSTM and the attention mechanism
sequence as a chain of items, we use recurrent neural networks in such a way that the obtained results show the good
(RNNs), the main goal of training RNNs is to be able to predict performance of the attention mechanism. In [20] researchers
the next item or token in the sequence of words, like a linguistic proposed a deep memory network to show the importance of
model. In order for the RNN to be able to process the input each word in sentiment analysis. In this model, each of the
sequence, it uses a memory cell by which it can record the computational layers is a neural attention model on external
information in a sequence[2]. In RNNs, the output depends on memory. In [21] researchers proposed the AF-LSTM model to
all the previous computations, and it works based on the integrate aspect information in the neural network. This model
information of the previous hidden layer. Dong et al [10] is able to adaptively focus on words associated with an aspect
introduced RNNs in text for the first time, they proposed a model word. In [22] the IAN network was proposed, which uses two
that can propagate the sentiments of text context to aspect and LSTM networks to generate facet and text representations
achieve good results. In [11] Bahdanau et al using recurrent separately, in this model, both aspects and context of their
neural network, built a language model that can better measure representations are learned through interactive learning. Also
the correlation between words and achieve good results. Also, in[12], with the aim of extracting interactive information from
Stacked RNN and Bidirectional RNN are two more advanced the aspect term and the context, FEA-NN model with the
modes than RNN, which have a more complex and complete cell combination of BiLSTM and CNN is proposed, which is able to
structure by which more information can be extracted, and this reduce the effect of words that are not related to the aspect.
work it can help to improve the accuracy of the model. Although
5) Transformer-based methods: Multi-head attention,
RNNs help to understand the meanings of words, and have been
which is one of the expanded types of attention mechanism, is
proven to be effective, but it is possible due to the common
one of the important parts in the design of transformers, which
errors of grammar analysis, they fail in practice and fail to
was proposed for the first time in [23]. This model use of
preserve the context of the text in long distances, also, their main
encoders and decoders consisting of multi head layers of self-
problem is gradient fading[2, 12].
attention instead of RNNs. In [24], the authors designed a Local
3) LSTM and GRU-based methods: Long short term Context Focus to pay more attention to the local context words.
memory (LSTM) as well as gated recurrent unit (GRU) each This model is able to process the global context and the local
provide architectures that help solve problems in RNNs. The context of the aspect separately by using dynamic mask layers
LSTM model can store information by taking a cell "c" and use and context features. Although transformer models have made
it to make a decision in the current situation. Also, GRU, with a good progress in sentiment analysis, but they require a lot of
similar and even simpler structure than LSTM, helps to maintain training data for training, in such cases, a pre-trained model can
text context information over long distances, and can be selected be a good solution, which we will discuss in the next section.
according to the task[2]. Recently, LSTM and its variants have
6) pre-trained models-based methods: A pre-trained model
been applied to various tasks and satisfactory results have been
is a model that is trained on large data to produce a word
obtained, and it has been able to help improve the accuracy of
embedding, and then this word embedding can be used to train
sentiment analysis by maintaining long-term dependencies.
a model with less data and at a lower cost. Universal Language
In[13] the authors proposed two implementations of the LSTM
Model Fine-Tuning (ULMFiT), which is able to reach the
network for the classification of aspect sentiment polarity, the
performance of a trained model with tens of times the training
first is a two-way LSTM at the character level and the second is
data, is the first presented model of pre-trained methods. And
an aspect-based LSTM, and they achieved good results and
after that embedding from Language Models (ELMo) is a model

278

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 31,2024 at 04:27:56 UTC from IEEE Xplore. Restriction
that produces embedding of words in such a way that in addition C. Dual Contrastive Learning
to the meaning of the words, it also includes their syntactic Contrastive learning was first used in computer vision, and
meaning [25]. Also, the semantic information in the text can be later researchers applied it to NLP tasks as well[34]. This
completely encoded by the two-way encoding representations of technique, which is used for both supervised and unsupervised
transformers (BERT) and the accuracy of the classification data, performs the process of textual representation in such a
operation can be increased [5]. In [26] a language representation way that similar samples are closer in the vector space and
model based on BERT knowledge was proposed, which is able dissimilar samples are further apart. Also In[35], the authors
to obtain complete information by injecting the knowledge of proposed dual Contrastive learning with the aim of
simultaneously learning the features of the input sample and the
the field of feeling into the BERT model. In[27] a model has classifier parameters. This model first considers the classifier
been proposed in which by constructing an auxiliary sentence parameters as augmented samples and then uses Contrastive
from the aspect, the model works like a sentence pair learning.
classification task. Also, GPT, which stands for Generative pre
training, has a very good growth compared to Bert and is a D. Contrastive Loss
powerful model that has been able to achieve advanced results. Using this technique when training the representation layer
of the model causes words with similar embedding to be mapped
V. METHODOLOGY to one result and stay away from other unrelated embedding like
clustering, also, by adding vector similarity and temperature
A. BERT, RoBERTa and XLNet normalization factor, this technique will work like softmax
Since BERT has achieved excellent results in many NLP function[35]. We want the similarity vectors to all get closer to
tasks and that is why our proposed model architecture is based 1, since –log (1) = 0, that’s the optimal loss. [36]. Given N
on BERT, RoBERTa and XLNet. BERT and RoBERTa both training samples, the standard contrastive loss is defined as (1):
take transformers as their backbone architecture, BERT can use
12 or 24 transformer layers. ( ( , )/ )
, =− ( )
RoBERTa is a model in which the researchers proposed ∑ [ ] ( ( , )/ )
modifications to the original BERT model, and they used a
larger dataset[6]. BERT was trained on 16GB of text and Where T is a temperature hyperparameter, is the
RoBERTa was trained 160GB of text[28]. Also in [6] the normalized representation of sample and sim represents the
researchers suggested improvements in the model design and coded output cosine similarity. This technique is like a modified
used BERT's pre training for the training method, and they were version of entropy loss, where the loss will be small if positive
able to train the model for longer sequences. An advantage that samples are encoded into similar representations and negative
pre-trained models have is that they do not need to use a large samples into different representations.
body for training.
E. Dual Contrastive Loss
XLNet is an improved training method that uses more data
and has more computing power[29]. Also, XLNet is an In[36] in order to use the existing relationships between
extension of the Transformer XL model, which uses more than different training samples, researchers proposed the double
130 GB of textual data for its training. contrast loss technique, in which is maximize ∗" # if # has
same label with while maximize ∗" # if # carries a different
B. Tree structure
label with . Where for each input example , classifier is
Dependency trees are used to extract long dependencies and and feature representation is . Also ∗ denote the column of
syntactic relationships between words in a sentence and can .
effectively help improve ABSA performance. Therefore, most
advanced ABSA models use dependency trees to help model the Given an anchor and an input sample , researchers in
connections between aspects and their opinion words.[28, 30- [36] adopted $ ∗ %& ∈ ( which refers to the positive samples
32]. PTMs also implicitly accept a kind of dependency tree and also $ ∗ %& ∈ ) \( which refers to the negative samples.
structure that we use in our work. In [33] the authors proposed Finally, contrastive loss is defined as (2):
the Perturbed Masking technique with the aim of evaluating the
effect of one word on the prediction of another word and 01 2 34∗- . 6
identifying syntactic information in a text sequence. This = + +− ( )
technique is capable of inducing trees from pre-trained models |- | ∑7.8 01 2 34∗7 . 6
./ .-
and can be applied to any layer of the pre-trained model. In this

paper, we exploit this technique and use it to induce trees from As well, given an anchor , we can also take 9 # : #;<= as
BERT, RoBERTa, and XLNET after fine-tuning. For example, positive samples and 9 # : #;>=\<= as negative samples. Finally we
in Fig. 4 we see an example of the induced tree by FT-PTMs for can define another contrastive loss as (3):
the sentence “Great food but service was dreadful!” in which the
aspect term is marked with red color and the related sentiment (4∗ . / )
word is marked with green color. 4 = + +− (?)
|- | ∑7.8 (4∗ . 7 / )
./ .-

Where ) ≔ A$B% refers to a set of indexes of contrastive


samples, T refers to the temperature factor and ( ≔
$CD) : FG = F % refers to a set of indicators with positive
Fig. 4. FT- PTMs Induced Tree.
examples.

279

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 31,2024 at 04:27:56 UTC from IEEE Xplore. Restriction
And finally, the dual contrastive loss is formulated as (4):

HIJKL = HM + HO (4)

F. Our method
Recently, according to the research done, researchers have
used a lot of transformer models in their proposed models, and
different implementations of language models have taken ideas
from transformer architecture, because they have a higher ability
to understand language, and as a result, they can get better results
and achieve higher accuracy. Among the transformer-based
methods, BERT is one of the most popular and has recently been
used in many research works to generate a representation of
input from text. Also, different types of BERT family have been
able to achieve even better results and be more useful. Therefore,
our goal is to use different models of the BERT family and, as a
result, compare and evaluate them.
In this paper, due to the good capabilities that dual
contrastive learning has in simultaneous learning the features of
input samples and classification parameters, and it can also
consider the classifier parameters as augmented samples, we
propose a new model based on FT-PTMs, including BERT,
RoBERTa and XLNet, using dual contrastive learning. As Fig. 5. Overall architecture of our fine-tuning model.
explained in the tree structure section, in this model to find the
dependency between words, we first use the Perturbed Masking TABLE I. STATISTICS OF DATASETS.
technique to generate induced trees of FT-PTMs, We then feed
Dataset #Pos. #Neu. #Neg. Total
it as input to fine-tuning BERT, RoBERTa and XLNet along
with sentence and aspect representations. Finally, we use dual Train 2146 637 807 3608
contrastive learning to adapt Contrastive loss for sentiment Rest14
Test 728 196 196 1120
analysis classification. Describe the model architecture we
proposed, is shown in Fig.5, where F is the input, FQ is the Train 994 464 870 2328
polarity of the corresponding predicted sentiment and ℎ( ) Lap14
Test 341 169 128 638
represents the hidden layers.
Train 1561 3127 1560 6248
G. Datasets Twitter
Test 173 346 173 692
We evaluate the performance of the proposed method on the
SemEval 2014 Task4 and Twitter benchmark datasets[37, 38].
The details of each dataset are shown in Table I. TABLE II. EXPERIMENTAL RESULTS.

Rest14 Laptop14 Twitter


VI. EXPERIMENT Model
Acc F1 Acc F1 Acc F1
A. Experiment setup
We have fine tune BERT, RoBERTa and XLNET on the FT_Bert 85.17 78.41 77.58 73.10 74.56 74.98
SemEval 2014 Task4 and Twitter datasets. The batch size is b =
32, dropout rate d = 0.1, the number of epochs is set equal to 40, FT_Roberta 86.87 80.32 84.32 81.20 76.73 76.14
learning rate µ = 2e-4. We have also used the AdamW optimizer.
B. Experiment Results FT_Xlnet 88.30 82.63 81.19 77.12 82.60 78.85
In this section, the results of the comparison of our models
are presented, where the best results are shown in bold, LD FT_Bert+ST +
86.11 80.18 79.73 77.21 76.14 75.91
LD
indicates the use of Dual Contrastive Loss and ST indicates the
use of the tree structure that described above.(Table II) FT_Roberta+ST+
87.09 82.46 85.55 83.15 78.19 77.69
LD
C. Model comparison
FT _Xlnet+ST +
88.96 83.33 82.23 79.69 83.77 79.05
The models we have used as baselines to compare with our LD
model are:(Table III)
RAM[39] uses multi-layer memory networks with attention
IAN[22] This model can interactively calculate the attention mechanism to obtain context features.
weights of context words and aspect words to obtain richer
context features. TNet[40] this model transforms Bi-LSTM embedding
intotarget-specific embedding for target-oriented sentiment
ATAE-LSTM[19] It is an attention-based LSTM model that classification.
is able to focus on specific aspects during training by appending
the aspect vector to hidden sentence representations or adding kumaGCN[41] this model combines dependency graph and latent
graph to supplement syntax features.
an additional aspect.

280

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 31,2024 at 04:27:56 UTC from IEEE Xplore. Restriction
TABLE III. COMPARISON WITH OTHER MODELS. [1] Asur, S. and B.A. Huberman. Predicting the future with social media. in
2010 IEEE/WIC/ACM international conference on web intelligence and
Rest14 Laptop14 Twitter intelligent agent technology. 2010. IEEE.
Models [2] Birjali, M., M. Kasri, and A. Beni-Hssane, A comprehensive survey on
Acc F1 Acc F1 Acc F1
sentiment analysis: Approaches, challenges and trends. Knowledge-
IAN 78.60 73.12 72.10 73.64 - - Based Systems, 2021. 226: p. 107134.
[3] Rajalakshmi, S., S. Asha, and N. Pazhaniraja. A comprehensive survey on
ATAE-LSTM 76.82 72.65 67.93 60.81 - - sentiment analysis. in 2017 fourth international conference on signal
processing, communication and networking (ICSCN). 2017. IEEE.
RAM 80.23 70.08 74.49 71.35 69.36 67.30
[4] Narayanaswamy, G.R., Exploiting BERT and RoBERTa to improve
TNet 80.79 74.32 75.21 69.73 68.63 64.31 performance for aspect based sentiment analysis. 2021, dissertation.
Technological University Dublin, Dublin. https://doi. org/10 ….
kumaGCN 81.43 73.64 76.12 72.42 72.45 70.77
[5] Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for
MGAN 81.25 71.49 75.39 72.47 72.54 70.81 language understanding. arXiv preprint arXiv:1810.04805, 2018.
[6] Liu, Y., et al., Roberta: A robustly optimized bert pretraining approach.
R-GAT 83.30 76.02 77.42 73.76 75.57 73.82 arXiv preprint arXiv:1907.11692, 2019.
AF-LSTM 77.13 72.85 72.32 68.21 66.60 60.82 [7] Krizhevsky, A., I. Sutskever, and G.E. Hinton, Imagenet classification
with deep convolutional neural networks. Communications of the ACM,
BERT-SPC 84.11 76.68 77.59 73.28 75.18 74.01 2017. 60(6): p. 84-90.
[8] Chen, Y., Convolutional neural network for sentence classification. 2015,
BERT4GCN 84.75 77.11 77.49 73.01 74.73 73.67
University of Waterloo.
ACLT 85.71 78.44 79.68 75.83 75.48 74.51 [9] Xue, W. and T. Li, Aspect based sentiment analysis with gated
convolutional networks. arXiv preprint arXiv:1805.07043, 2018.
Our best models
[10] Dong, L., et al. Adaptive recursive neural network for target-dependent
FT_Roberta+ST+ twitter sentiment classification. in Proceedings of the 52nd annual
87.09 82.46 85.55 83.15 78.19 77.69 meeting of the association for computational linguistics (volume 2: Short
LD
papers). 2014.
Xlnet+ST + LD 88.96 83.33 82.23 79.69 83.77 79.05 [11] Bahdanau, D., K. Cho, and Y. Bengio, Neural machine translation by
jointly learning to align and translate. arXiv preprint arXiv:1409.0473,
2014.
MGAN[42] This model, which is proposed based on the
[12] Meng, W., et al., Aspect based sentiment analysis with feature enhanced
attention mechanism, can capture the word-level interaction attention CNN-BiLSTM. IEEE Access, 2019. 7: p. 167240-167249.
between aspect and context. [13] Al-Smadi, M., et al., Using long short-term memory deep neural networks
R-GAT[43] This model uses syntactic information encoding, for aspect-based sentiment analysis of Arabic reviews. International
Journal of Machine Learning and Cybernetics, 2019. 10: p. 2163-2175.
also in the dependency line in this model is reconstructed in an
[14] Liu, G. and J. Guo, Bidirectional LSTM with attention mechanism and
aspect-oriented way. convolutional layer for text classification. Neurocomputing, 2019. 337: p.
AF-LSTM[21] In this model, aspect information by 325-338.
modeling word-aspect relationships is input to the neural [15] Tai, K.S., R. Socher, and C.D. Manning, Improved semantic
representations from tree-structured long short-term memory networks.
network and enables this network to adaptively focus on aspect arXiv preprint arXiv:1503.00075, 2015.
words.
[16] Tang, D., et al., Effective LSTMs for target-dependent sentiment
BERT-SPC[5] This model is designed with the purpose of classification. arXiv preprint arXiv:1512.01100, 2015.
pre training deep bidirectional representations of a text that is [17] Sutskever, I., O. Vinyals, and Q.V. Le, Sequence to sequence learning
with neural networks. Advances in neural information processing
able to obtain the meaning of the text. systems, 2014. 27.
BERT4GCN[44] This model, which uses BERT and GCN [18] Lin, Z., et al., A structured self-attentive sentence embedding. arXiv
layers, is able to shorten the distance between aspects and their preprint arXiv:1703.03130, 2017.
corresponding sentiment words by learning the aspect-oriented [19] Wang, Y., et al. Attention-based LSTM for aspect-level sentiment
tree structure. classification. in Proceedings of the 2016 conference on empirical
methods in natural language processing. 2016.
ACLT [45] By learning the aspect-oriented tree structure, this [20] Tang, D., B. Qin, and T. Liu, Aspect level sentiment classification with
model is able to shorten the distance between aspects and deep memory network. arXiv preprint arXiv:1605.08900, 2016.
corresponding words and bring them closer together. [21] Tay, Y., L.A. Tuan, and S.C. Hui. Learning to attend via word-aspect
associative fusion for aspect-based sentiment analysis. in Proceedings of
VII. CONCLUSION the AAAI conference on artificial intelligence. 2018.
[22] Ma, D., et al., Interactive attention networks for aspect-level sentiment
In this paper, we use the Perturbed Masking technique to classification. arXiv preprint arXiv:1709.00893, 2017.
generate induced trees of FT-PTMs and then input it along with [23] Vaswani, A., et al., Attention is all you need. Advances in neural
sentence and aspect representations into FT-PTMs models. information processing systems, 2017. 30.
Finally, we use dual contrastive learning for adapting the [24] Zeng, B., et al., Lcf: A local context focus mechanism for aspect-based
contrastive loss to for sentiment analysis classification. The sentiment classification. Applied Sciences, 2019. 9(16): p. 3389.
results obtained in tests and evaluations show the effectiveness [25] Nurifan, F., R. Sarno, and K.R. Sungkono, Aspect based sentiment
of our proposed model. In the future, we are going to expand the analysis for restaurant reviews using hybrid elmo-wikipedia and hybrid
model using graph neural networks, and this time consider the expanded opinion lexicon-senticircle. International Journal of Intelligent
Engineering and Systems, 2019. 12(6): p. 47-58.
representation of sentences, aspects and induced trees of FT-
[26] Zhao, A. and Y. Yu, Knowledge-enabled BERT for aspect-based
PTMs as input to the graph neural network. sentiment analysis. Knowledge-Based Systems, 2021. 227: p. 107220.
REFERENCES [27] Sun, C., L. Huang, and X. Qiu, Utilizing BERT for aspect-based
sentiment analysis via constructing auxiliary sentence. arXiv preprint
arXiv:1903.09588, 2019.

281

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 31,2024 at 04:27:56 UTC from IEEE Xplore. Restriction
[28] Dai, J., et al., Does syntax matter? a strong baseline for aspect-based
sentiment analysis with roberta. arXiv preprint arXiv:2104.04986, 2021.
[29] Yang, Z., et al., Xlnet: Generalized autoregressive pretraining for
language understanding. Advances in neural information processing
systems, 2019. 32.
[30] Wang, K., et al., Relational graph attention network for aspect-based
sentiment analysis. arXiv preprint arXiv:2004.12362, 2020.
[31] Sun, K., et al. Aspect-level sentiment analysis via convolution over
dependency tree. in Proceedings of the 2019 conference on empirical
methods in natural language processing and the 9th international joint
conference on natural language processing (EMNLP-IJCNLP). 2019.
[32] Zhang, C., Q. Li, and D. Song. Syntax-aware aspect-level sentiment
classification with proximity-weighted convolution network. in
Proceedings of the 42nd international ACM SIGIR conference on
research and development in information retrieval. 2019.
[33] Wu, Z., et al., Perturbed masking: Parameter-free probing for analyzing
and interpreting BERT. arXiv preprint arXiv:2004.14786, 2020.
[34] Mai, S., et al., Hybrid contrastive learning of tri-modal representation for
multimodal sentiment analysis. IEEE Transactions on Affective
Computing, 2022.
[35] Chen, Q., et al., Dual contrastive learning: Text classification via label-
aware data augmentation. arXiv preprint arXiv:2201.08702, 2022.
[36] Gao, T., X. Yao, and D. Chen, Simcse: Simple contrastive learning of
sentence embeddings. arXiv preprint arXiv:2104.08821, 2021.
[37] Pontiki, M., et al. Semeval-2016 task 5: Aspect based sentiment analysis.
in ProWorkshop on Semantic Evaluation (SemEval-2016). 2016.
Association for Computational Linguistics.
[38] Kirange, D., R.R. Deshmukh, and M. Kirange, Aspect based sentiment
analysis semeval-2014 task 4. Asian Journal of Computer Science and
Information Technology (AJCSIT) Vol, 2014. 4.
[39] Chen, P., et al. Recurrent attention network on memory for aspect
sentiment analysis. in Proceedings of the 2017 conference on empirical
methods in natural language processing. 2017.
[40] Li, X., et al., Transformation networks for target-oriented sentiment
classification. arXiv preprint arXiv:1805.01086, 2018.
[41] Chen, C., Z. Teng, and Y. Zhang. Inducing target-specific latent structures
for aspect sentiment classification. in Proceedings of the 2020 conference
on empirical methods in natural language processing (EMNLP). 2020.
[42] Fan, F., Y. Feng, and D. Zhao. Multi-grained attention network for aspect-
level sentiment classification. in Proceedings of the 2018 conference on
empirical methods in natural language processing. 2018.
[43] Wang, K., et al., Relational graph attention network for aspect‐based
sentiment analysis (2020). arXiv preprint arXiv:2004.12362, 2004.
[44] Xiao, Z., et al., BERT4GCN: Using BERT intermediate layers to augment
GCN for aspect-based sentiment classification. arXiv preprint
arXiv:2110.00171, 2021.
[45] Zhou, Y., et al., To be closer: Learning to link up aspects with opinions.
arXiv preprint arXiv:2109.08382, 2021.

282

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 31,2024 at 04:27:56 UTC from IEEE Xplore. Restriction

You might also like