0% found this document useful (0 votes)

83 views10 pages

Attention-Based LSTM For Aspect-Level Sentiment Classification

Uploaded by

Hacker Tale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views10 pages

Attention-Based LSTM For Aspect-Level Sentiment Classification

Uploaded by

Hacker Tale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Attention-based LSTM for Aspect-level Sentiment Classification

Yequan Wang and Minlie Huang and Li Zhao* and Xiaoyan Zhu
State Key Laboratory on Intelligent Technology and Systems
Tsinghua National Laboratory for Information Science and Technology
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
*Microsoft Research Asia
[email protected], [email protected]
[email protected], [email protected]

Abstract of “Staffs are not that friendly, but the taste covers
all.” will be positive if the aspect is food but neg-
Aspect-level sentiment classification is a fine- ative when considering the aspect service. Polarity
grained task in sentiment analysis. Since it
could be opposite when different aspects are consid-
provides more complete and in-depth results,
aspect-level sentiment analysis has received ered.
much attention these years. In this paper, we Neural networks have achieved state-of-the-art
reveal that the sentiment polarity of a sentence
performance in a variety of NLP tasks such as ma-
is not only determined by the content but is
also highly related to the concerned aspect.
chine translation (Lample et al., 2016), paraphrase
For instance, “The appetizers are ok, but the identification (Yin et al., 2015), question answer-
service is slow.”, for aspect taste, the polar- ing (Golub and He, 2016) and text summariza-
ity is positive while for service, the polarity tion (Rush et al., 2015). However, neural net-
is negative. Therefore, it is worthwhile to ex- work models are still in infancy to deal with aspect-
plore the connection between an aspect and level sentiment classification. In some works, tar-
the content of a sentence. To this end, we get dependent sentiment classification can be ben-
propose an Attention-based Long Short-Term
efited from taking into account target information,
Memory Network for aspect-level sentiment
classification. The attention mechanism can such as in Target-Dependent LSTM (TD-LSTM)
concentrate on different parts of a sentence and Target-Connection LSTM (TC-LSTM) (Tang et
when different aspects are taken as input. We al., 2015a). However, those models can only take
experiment on the SemEval 2014 dataset and into consideration the target but not aspect informa-
results show that our model achieves state-of- tion which is proved to be crucial for aspect-level
the-art performance on aspect-level sentiment classification.
classification.
Attention has become an effective mechanism to
obtain superior results, as demonstrated in image
1 Introduction
recognition (Mnih et al., 2014), machine transla-
Sentiment analysis (Nasukawa and Yi, 2003), also tion (Bahdanau et al., 2014), reasoning about entail-
known as opinion mining (Liu, 2012), is a key ment (Rocktäschel et al., 2015) and sentence sum-
NLP task that receives much attention these years. marization (Rush et al., 2015). Even more, neural
Aspect-level sentiment analysis is a fine-grained attention can improve the ability to read comprehen-
task that can provide complete and in-depth results. sion (Hermann et al., 2015). In this paper, we pro-
In this paper, we deal with aspect-level sentiment pose an attention mechanism to enforce the model
classification and we find that the sentiment polar- to attend to the important part of a sentence, in re-
ity of a sentence is highly dependent on both con- sponse to a specific aspect. We design an aspect-to-
tent and aspect. For example, the sentiment polarity sentence attention mechanism that can concentrate

606

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 606–615,
c
Austin, Texas, November 1-5, 2016. 2016 Association for Computational Linguistics
on the key part of a sentence given the aspect. ature. As we mentioned before, aspect-level sen-
We explore the potential correlation of aspect and timent classification is a fine-grained classification
sentiment polarity in aspect-level sentiment classifi- task. The majority of current approaches attempt to
cation. In order to capture important information in detecting the polarity of the entire sentence, regard-
response to a given aspect, we design an attention- less of the entities mentioned or aspects. Traditional
based LSTM. We evaluate our approach on a bench- approaches to solve those problems are to manu-
mark dataset (Pontiki et al., 2014), which contains ally design a set of features. With the abundance of
restaurants and laptops data. sentiment lexicons (Rao and Ravichandran, 2009;
The main contributions of our work can be sum- Perez-Rosas et al., 2012; Kaji and Kitsuregawa,
marized as follows: 2007), the lexicon-based features were built for sen-
timent analysis (Mohammad et al., 2013). Most of
• We propose attention-based Long Short-Term these studies focus on building sentiment classifiers
memory for aspect-level sentiment classifica- with features, which include bag-of-words and sen-
tion. The models are able to attend differ- timent lexicons, using SVM (Mullen and Collier,
ent parts of a sentence when different aspects 2004). However, the results highly depend on the
are concerned. Results show that the attention quality of features. In addition, feature engineering
mechanism is effective. is labor intensive.
• Since aspect plays a key role in this task, we 2.2 Sentiment Classification with Neural
propose two ways to take into account aspect Networks
information during attention: one way is to
concatenate the aspect vector into the sentence Since a simple and effective approach to learn dis-
hidden representations for computing attention tributed representations was proposed (Mikolov et
weights, and another way is to additionally ap- al., 2013), neural networks advance sentiment anal-
pend the aspect vector into the input word vec- ysis substantially. Classical models including Re-
tors. cursive Neural Network (Socher et al., 2011; Dong
et al., 2014; Qian et al., 2015), Recursive Neu-
• Experimental results indicate that our ap- ral Tensor Network (Socher et al., 2013), Recur-
proach can improve the performance compared rent Neural Network (Mikolov et al., 2010; Tang
with several baselines, and further examples et al., 2015b), LSTM (Hochreiter and Schmidhuber,
demonstrate the attention mechanism works 1997) and Tree-LSTMs (Tai et al., 2015) were ap-
well for aspect-level sentiment classification. plied into sentiment analysis currently. By utilizing
syntax structures of sentences, tree-based LSTMs
The rest of our paper is structured as follows: have been proved to be quite effective for many NLP
Section 2 discusses related works, Section 3 gives a tasks. However, such methods may suffer from syn-
detailed description of our attention-based propos- tax parsing errors which are common in resource-
als, Section 4 presents extensive experiments to jus- lacking languages.
tify the effectiveness of our proposals, and Section 5 LSTM has achieved a great success in various
summarizes this work and the future direction. NLP tasks. TD-LSTM and TC-LSTM (Tang et
al., 2015a), which took target information into con-
2 Related Work
sideration, achieved state-of-the-art performance
In this section, we will review related works on in target-dependent sentiment classification. TC-
aspect-level sentiment classification and neural net- LSTM obtained a target vector by averaging the
works for sentiment classification briefly. vectors of words that the target phrase contains.
However, simply averaging the word embeddings of
2.1 Sentiment Classification at Aspect-level a target phrase is not sufficient to represent the se-
Aspect-level sentiment classification is typically mantics of the target phrase, resulting a suboptimal
considered as a classification problem in the liter- performance.

607
Despite the effectiveness of those methods, it is where Wi , Wf , Wo ∈ Rd×2d are the weighted ma-
still challenging to discriminate different sentiment trices and bi , bf , bo ∈ Rd are biases of LSTM to be
polarities at a fine-grained aspect level. Therefore, learned during training, parameterizing the transfor-
we are motivated to design a powerful neural net- mations of the input, forget and output gates respec-
work which can fully employ aspect information for tively. σ is the sigmoid function and ⊙ stands for
sentiment classification. element-wise multiplication. xt includes the inputs
of LSTM cell unit, representing the word embed-
3 Attention-based LSTM with Aspect ding vectors wt in Figure 1. The vector of hidden
Embedding layer is ht .
3.1 Long Short-term Memory (LSTM) We regard the last hidden vector hN as the rep-
resentation of sentence and put hN into a sof tmax
Recurrent Neural Network(RNN) is an extension of
layer after linearizing it into a vector whose length is
conventional feed-forward neural network. How-
equal to the number of class labels. In our work, the
ever, standard RNN has the gradient vanishing
set of class labels is {positive, negative, neutral}.
or exploding problems. In order to overcome
the issues, Long Short-term Memory network
(LSTM) was developed and achieved superior per- 3.2 LSTM with Aspect Embedding
formance (Hochreiter and Schmidhuber, 1997). In (AE-LSTM)
the LSTM architecture, there are three gates and a
cell memory state. Figure 1 illustrates the architec- Aspect information is vital when classifying the po-
ture of a standard LSTM. larity of one sentence given aspect. We may get op-
posite polarities if different aspects are considered.
softmax
ℎ1 ℎ2 ℎ𝑁 To make the best use of aspect information, we pro-
pose to learn an embedding vector for each aspect.
Vector vai ∈ Rda is represented for the embed-
LSTM LSTM … LSTM ding of aspect i, where da is the dimension of aspect
embedding. A ∈ Rda ×|A| is made up of all aspect
embeddings. To the best of our knowledge, it is the
first time to propose aspect embedding.
𝑤1 𝑤2 𝑤𝑁

Figure 1: The architecture of a standard LSTM. 3.3 Attention-based LSTM (AT-LSTM)

{w1 , w2 , . . . , wN } represent the word vector in a sen-
tence whose length is N . {h1 , h2 , . . . , hN } is the hidden The standard LSTM cannot detect which is the im-
vector. portant part for aspect-level sentiment classification.
In order to address this issue, we propose to de-
More formally, each cell in LSTM can be com- sign an attention mechanism that can capture the
puted as follows: key part of sentence in response to a given aspect.
[ ] Figure 2 represents the architecture of an Attention-
ht−1 based LSTM (AT-LSTM).
X= (1)
xt
Let H ∈ Rd×N be a matrix consisting of hid-
ft = σ(Wf · X + bf ) (2) den vectors [h1 , . . . , hN ] that the LSTM produced,
it = σ(Wi · X + bi ) (3) where d is the size of hidden layers and N is the
ot = σ(Wo · X + bo ) (4) length of the given sentence. Furthermore, va rep-
resents the embedding of aspect and eN ∈ RN is a
ct = ft ⊙ ct−1 + it ⊙ tanh(Wc · X + bc ) (5)
vector of 1s. The attention mechanism will produce
ht = ot ⊙ tanh(ct ) (6) an attention weight vector α and a weighted hidden

608
Attention 𝛼

𝑣𝑎 𝑣𝑎 𝑣𝑎 𝑣𝑎
Aspect Embedding
r
ℎ1 ℎ2 ℎ3 ℎ𝑁
H

LSTM LSTM LSTM … LSTM

Word Representation 𝑤1 𝑤2 𝑤3 𝑤𝑁

Figure 2: The Architecture of Attention-based LSTM. The aspect embeddings have been used to decide the attention weights
along with the sentence representations. {w1 , w2 , . . . , wN } represent the word vector in a sentence whose length is N . va
represents the aspect embedding. α is the attention weight. {h1 , h2 , . . . , hN } is the hidden vector.

representation r. The attention mechanism allows the model to

[ ] capture the most important part of a sentence when
Wh H different aspects are considered.
M = tanh( ) (7)
Wv va ⊗ eN h∗ is considered as the feature representation of
α = sof tmax(wT M ) (8) a sentence given an input aspect. We add a linear
T layer to convert sentence vector to e, which is a real-
r = Hα (9)
valued vector with the length equal to class number
where, M ∈ R(d+da )×N , α ∈ RN , r ∈ Rd . |C|. Then, a sof tmax layer is followed to trans-
Wh ∈ Rd×d , Wv ∈ Rda ×da and w ∈ Rd+da are form e to conditional probability distribution.
projection parameters. α is a vector consisting of
y = sof tmax(Ws h∗ + bs ) (11)
attention weights and r is a weighted representation
of sentence with given aspect. The operator in 7 (a
where Ws and bs are the parameters for sof tmax
circle with a multiplication sign inside, OP for short
layer.
here) means: va ⊗eN = [v; v; . . . ; v], that is, the op-
erator repeatedly concatenates v for N times, where 3.4 Attention-based LSTM with Aspect
eN is a column vector with N 1s. Wv va ⊗ eN is Embedding (ATAE-LSTM)
repeating the linearly transformed va as many times
as there are words in sentence. The way of using aspect information in AE-LSTM
The final sentence representation is given by: is letting aspect embedding play a role in com-
puting the attention weight. In order to better
h∗ = tanh(Wp r + Wx hN ) (10) take advantage of aspect information, we append
the input aspect embedding into each word input
where, h∗ ∈ Rd , Wp and Wx are projection param- vector. The structure of this model is illustrated
eters to be learned during training. We find that this in 3. In this way, the output hidden representa-
works practically better if we add Wx hN into the fi- tions (h1 , h2 , ..., hN ) can have the information from
nal representation of the sentence, which is inspired the input aspect (va ). Therefore, in the following
by (Rocktäschel et al., 2015). step that compute the attention weights, the inter-

609
Attention 𝛼

𝑣𝑎 𝑣𝑎 𝑣𝑎 𝑣𝑎
Aspect Embedding
𝑟

ℎ1 ℎ2 ℎ3 ℎ𝑁
H

LSTM LSTM LSTM … LSTM

Word Representation
𝑤1 𝑤2 𝑤3 𝑤𝑁

Aspect Embedding
𝑣𝑎 𝑣𝑎 𝑣𝑎 𝑣𝑎

Figure 3: The Architecture of Attention-based LSTM with Aspect Embedding. The aspect embeddings have been take as input
along with the word embeddings. {w1 , w2 , . . . , wN } represent the word vector in a sentence whose length is N . va represents the
aspect embedding. α is the attention weight. {h1 , h2 , . . . , hN } is the hidden vector.

dependence between words and the input aspect can cell unit, the dimension of Wi , Wf , Wo , Wc will be
be modeled. enlarged correspondingly. Additional parameters
are listed as follows:
3.5 Model Training AT-LSTM: The aspect embedding A is added
The model can be trained in an end-to-end way by into the set of parameters naturally. In addition,
backpropagation, where the objective function (loss Wh , Wv , Wp , Wx , w are the parameters of atten-
function) is the cross-entropy loss. Let y be the tar- tion. Therefore, the additional parameter set of AT-
get distribution for sentence, ŷ be the predicted sen- LSTM is {A, Wh , Wv , Wp , Wx , w}.
timent distribution. The goal of training is to mini- AE-LSTM: The parameters include the as-
mize the cross-entropy error between y and ŷ for all pect embedding A. Besides, the dimension of
sentences. Wi , Wf , Wo , Wc will be expanded since the aspect
∑∑ j vector is concatenated. Therefore, the additional pa-
loss = − yi log ŷij + λ||θ||2 (12) rameter set consists of {A}.
i j
ATAE-LSTM: The parameter set consists of
where i is the index of sentence, j is the index of {A, Wh , Wv , Wp , Wx , w}. Additionally, the dimen-
class. Our classification is three way. λ is the L2 - sion of Wi , Wf , Wo , Wc will be expanded with the
regularization term. θ is the parameter set. concatenation of aspect embedding.
Similar to standard LSTM, the parameter set The word embedding and aspect embedding are
is {Wi , bi , Wf , bf , Wo , bo , Wc , bc , Ws , bs }. Fur- optimized during training. The percentage of out-
thermore, word embeddings are the parameters of-vocabulary words is about 5%, and they are ran-
too. Note that the dimension of Wi , Wf , Wo , Wc domly initialized from U (−ϵ, ϵ), where ϵ = 0.01.
changes along with different models. If the aspect In our experiments, we use AdaGrad (Duchi et
embeddings are added into the input of the LSTM al., 2011) as our optimization method, which has

610
improved the robustness of SGD on large scale Positive Negative Neural
Asp.
learning task remarkably in a distributed environ- Train Test Train Test Train Test
ment (Dean et al., 2012). AdaGrad adapts the learn- Fo. 867 302 209 69 90 31
ing rate to the parameters, performing larger updates Pr. 179 51 115 28 10 1
for infrequent parameters and smaller updates for Se. 324 101 218 63 20 3
frequent parameters. Am. 263 76 98 21 23 8
An. 546 127 199 41 357 51
4 Experiment Total 2179 657 839 222 500 94
We apply the proposed model to aspect-level sen- Table 1: Aspects distribution per sentiment class. {Fo., Pr.,
timent classification. In our experiments, all word Se, Am., An.} refer to {food, price, service, ambience, anec-
vectors are initialized by Glove1 (Pennington et al., dotes/miscellaneous}. “Asp.” refers to aspect.
2014). The word embedding vectors are pre-trained
on an unlabeled corpus whose size is about 840 bil- Models Three-way Pos./Neg.
lion. The other parameters are initialized by sam- LSTM 82.0 88.3
pling from a uniform distribution U (−ϵ, ϵ). The TD-LSTM 82.6 89.1
dimension of word vectors, aspect embeddings and TC-LSTM 81.9 89.2
the size of hidden layer are 300. The length of at- AE-LSTM 82.5 88.9
tention weights is the same as the length of sentence. AT-LSTM 83.1 89.6
Theano (Bastien et al., 2012) is used for implement- ATAE-LSTM 84.0 89.9
ing our neural network models. We trained all mod-
Table 2: Accuracy on aspect level polarity classification about
els with a batch size of 25 examples, and a momen-
restaurants. Three-way stands for 3-class prediction. Pos./Neg.
tum of 0.9, L2 -regularization weight of 0.001 and
indicates binary prediction where ignoring all neutral instances.
initial learning rate of 0.01 for AdaGrad.
Best scores are in bold.
4.1 Dataset
We experiment on the dataset of SemEval 2014 Task illustrates the comparative results.
42 (Pontiki et al., 2014). The dataset consists of
customers reviews. Each review contains a list of Aspect-Term-level Classification For a given set
aspects and corresponding polarities. Our aim is to of aspects term within a sentence, this task is to de-
identify the aspect polarity of a sentence with the termine whether the polarity of each aspect term is
corresponding aspect. The statistics is presented in positive, negative or neutral. We conduct experi-
Table 1. ments on the dataset of SemEval 2014 Task 4. In
the sentences of both restaurant and laptop datasets,
4.2 Task Definition there are the location and sentiment polarity for
Aspect-level Classification Given a set of pre- each occurrence of an aspect term. For example,
identified aspects, this task is to determine the there is an aspect term fajitas whose polarity is neg-
polarity of each aspect. For example, given a ative in sentence “I loved their fajitas.”.
sentence, “The restaurant was too expensive.”, Experiments results are shown in Table 3 and Ta-
there is an aspect price whose polarity is negative. ble 4. Similar to the experiment on aspect-level
The set of aspects is {food, price, service, ambi- classification, our models achieve state-of-the-art
ence, anecdotes/miscellaneous}. In the dataset of performance.
SemEval 2014 Task 4, there is only restaurants
4.3 Comparison with baseline methods
data that has aspect-specific polarities. Table 2
1
We compare our model with several baselines, in-
Pre-trained word vectors of Glove can be obtained from
http://nlp.stanford.edu/projects/glove/
cluding LSTM, TD-LSTM, and TC-LSTM.
2
The introduction about SemEval 2014 can be obtained LSTM: Standard LSTM cannot capture any as-
from http://alt.qcri.org/semeval2014/ pect information in sentence, so it must get the same

611
𝜶

(a) the aspect of this sentence: service

(b) the aspect of this sentence: food

Figure 4: Attention Visualizations. The aspects of (a) and (b) are service and food respectively. The color depth expresses the
importance degree of the weight in attention vector α. From (a), attention can detect the important words from the whole sentence
dynamically even though multi-semantic phrase such as “fastest delivery times” which can be used in other areas. From (b),
attention can know multi-keypoints if more than one keypoint existing.

Models Three-way Pos./Neg. TD-LSTM, it cannot “know” which words are im-
LSTM 74.3 - portant for a given aspect.
TD-LSTM 75.6 - TC-LSTM: TC-LSTM extended TD-LSTM by
AE-LSTM 76.6 89.6 incorporating a target into the representation of a
ATAE-LSTM 77.2 90.9 sentence. It is worth noting that TC-LSTM per-
forms worse than LSTM and TD-LSTM in Table 2.
Table 3: Accuracy on aspect term polarity classification about
TC-LSTM added target representations, which was
restaurants. Three-way stands for 3-class prediction. Pos./Neg.
obtained from word vectors, into the input of the
indicates binary prediction where ignoring all neutral instances.
LSTM cell unit.
Best scores are in bold.
In our models, we embed aspects into another
Models Three-way Pos./Neg.
vector space. The embedding vector of aspects can
LSTM 66.5 - be learned well in the process of training. ATAE-
TD-LSTM 68.1 - LSTM not only addresses the shortcoming of the
AE-LSTM 68.9 87.4 unconformity between word vectors and aspect em-
ATAE-LSTM 68.7 87.6 beddings, but also can capture the most important
Table 4: Accuracy on aspect term polarity classification about information in response to a given aspect. In ad-
laptops. Three-way stands for 3-class prediction. Pos./Neg. in- dition, ATAE-LSTM can capture the important and
dicates binary prediction where ignoring all neutral instances. different parts of a sentence when given different
Best scores are in bold. aspects.

4.4 Qualitative Analysis

sentiment polarity although given different aspects. It is enlightening to analyze which words decide the
Since it cannot take advantage of the aspect infor- sentiment polarity of the sentence given an aspect.
mation, not surprisingly the model has worst per- We can obtain the attention weight α in Equation 8
formance. and visualize the attention weights accordingly.
TD-LSTM: TD-LSTM can improve the perfor- Figure 4 shows the representation of how atten-
mance of sentiment classifier by treating an aspect tion focuses on words with the influence of a given
as a target. Since there is no attention mechanism in aspect. We use a visualization tool Heml (Deng

612
The appetizers are ok, but the service is slow.

aspect: food; polarity: neutral aspect: service; polarity: negative

(a)
I highly recommend it for not just its superb cuisine, but also for its friendly owners and staff.

aspect: food; polarity: positive aspect: food; polarity: positive

(b)
The service, however, is a peg or two below the quality of food (horrible bartenders), and

the clientele, for the most part, are rowdy, loud-mouthed commuters (this could explain the

bad attitudes from the staff) getting loaded for an AC/DC concert or a Knicks game.

aspect: food; polarity: positive aspect: service; polarity: positive aspect: ambience; polarity: negative

(c)

Figure 5: Examples of classification. (a) is an instance with different aspects. (b) represents that our model can focus on where
the keypoints are and not disturbed by the privative word not. (c) stands for long and complicated sentences. Our model can obtain
correct sentiment polarity.

et al., 2014) to visualize the sentences. The color aspects food and service. Our model can discrimi-
depth indicates the importance degree of the weight nate different sentiment polarities with different as-
in attention vector α, the darker the more important. pects. In sentence (b), “I highly recommend it for
The sentences in Figure 4 are “I have to say they not just its superb cuisine, but also for its friendly
have one of the fastest delivery times in the city .” owners and staff.”, there is a negation word not. Our
and “The fajita we tried was tasteless and burned model can obtain correct polarity, not affected by
and the mole sauce was way too sweet.”. The corre- the negation word who doesn’t represent negation
sponding aspects are service and food respectively. here. In the last instance (c), “The service, however,
Obviously attention can get the important parts from is a peg or two below the quality of food (horri-
the whole sentence dynamically. In Figure 4 (a), ble bartenders), and the clientele, for the most part,
“fastest delivery times” is a multi-word phrase, but are rowdy, loud-mouthed commuters (this could ex-
our attention-based model can detect such phrases plain the bad attitudes from the staff) getting loaded
if service can is the input aspect. Besides, the atten- for an AC/DC concert or a Knicks game.”, the sen-
tion can detect multiple keywords if more than one tence has a long and complicated structure so that
keyword is existing. In Figure 4 (b), tastless and too existing parser may hardly obtain correct parsing
sweet are both detected. trees. Hence, tree-based neural network models
are difficult to predict polarity correctly. While our
4.5 Case Study attention-based LSTM can work well in those sen-
As we demonstrated, our models obtain the state-of- tences with the help of attention mechanism and as-
the-art performance. In this section, we will further pect embedding.
show the advantages of our proposals through some
typical examples. 5 Conclusion and Future Work
In Figure 5, we list some examples from the test
set which have typical characteristics and cannot be In this paper, we have proposed attention-based
inferred by LSTM. In sentence (a), “The appetiz- LSTMs for aspect-level sentiment classification.
ers are ok, but the service is slow.”, there are two The key idea of these proposals are to learn aspect

613
embeddings and let aspects participate in computing John Duchi, Elad Hazan, and Yoram Singer. 2011.
attention weights. Our proposed models can con- Adaptive subgradient methods for online learning and
centrate on different parts of a sentence when dif- stochastic optimization. The Journal of Machine
ferent aspects are given so that they are more com- Learning Research, 12:2121–2159.
David Golub and Xiaodong He. 2016. Character-level
petitive for aspect-level classification. Experiments
question answering with attention. arXiv preprint
show that our proposed models, AE-LSTM and arXiv:1604.00727.
ATAE-LSTM, obtain superior performance over the Karl Moritz Hermann, Tomas Kocisky, Edward Grefen-
baseline models. stette, Lasse Espeholt, Will Kay, Mustafa Suleyman,
Though the proposals have shown potentials for and Phil Blunsom. 2015. Teaching machines to read
aspect-level sentiment analysis, different aspects are and comprehend. In Advances in Neural Information
input separately. As future work, an interesting Processing Systems, pages 1684–1692.
and possible direction would be to model more than Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long
one aspect simultaneously with the attention mech- short-term memory. Neural computation, 9(8):1735–
anism. 1780.
Nobuhiro Kaji and Masaru Kitsuregawa. 2007. Building
Acknowledgments lexicon for sentiment analysis from massive collection
of html documents. In EMNLP-CoNLL, pages 1075–
This work was partly supported by the National 1083.
Basic Research Program (973 Program) under Guillaume Lample, Miguel Ballesteros, Sandeep Subra-
grant No.2012CB316301/2013CB329403, the Na- manian, Kazuya Kawakami, and Chris Dyer. 2016.
tional Science Foundation of China under grant Neural architectures for named entity recognition.
arXiv preprint arXiv:1603.01360.
No.61272227/61332007, and the Beijing Higher
Bing Liu. 2012. Sentiment analysis and opinion mining.
Education Young Elite Teacher Project. The work
Synthesis lectures on human language technologies,
was also supported by Tsinghua University Beijing 5(1):1–167.
Samsung Telecom R&D Center Joint Laboratory for Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan
Intelligent Media Computing. Cernockỳ, and Sanjeev Khudanpur. 2010. Re-
current neural network based language model. In
INTERSPEECH, volume 2, page 3.
References Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- rado, and Jeff Dean. 2013. Distributed representa-
gio. 2014. Neural machine translation by jointly tions of words and phrases and their compositional-
learning to align and translate. arXiv preprint ity. In Advances in Neural Information Processing
arXiv:1409.0473. Systems, pages 3111–3119.
Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Volodymyr Mnih, Nicolas Heess, Alex Graves, et al.
James Bergstra, Ian Goodfellow, Arnaud Bergeron, 2014. Recurrent models of visual attention. In
Nicolas Bouchard, David Warde-Farley, and Yoshua Advances in Neural Information Processing Systems,
Bengio. 2012. Theano: new features and speed im- pages 2204–2212.
provements. arXiv preprint arXiv:1211.5590. Saif M Mohammad, Svetlana Kiritchenko, and Xiaodan
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Zhu. 2013. Nrc-canada: Building the state-of-the-
Matthieu Devin, Mark Mao, Andrew Senior, Paul art in sentiment analysis of tweets. arXiv preprint
Tucker, Ke Yang, Quoc V Le, et al. 2012. Large scale arXiv:1308.6242.
distributed deep networks. In Advances in Neural Tony Mullen and Nigel Collier. 2004. Sentiment analy-
Information Processing Systems, pages 1223–1231. sis using support vector machines with diverse infor-
Wankun Deng, Yongbo Wang, Zexian Liu, Han Cheng, mation sources. In EMNLP, volume 4, pages 412–
and Yu Xue. 2014. Hemi: a toolkit for illustrating 418.
heatmaps. PloS one, 9(11):e111988. Tetsuya Nasukawa and Jeonghee Yi. 2003. Sen-
Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming timent analysis: Capturing favorability using natu-
Zhou, and Ke Xu. 2014. Adaptive recursive neural ral language processing. In Proceedings of the 2nd
network for target-dependent twitter sentiment classi- international conference on Knowledge capture, pages
fication. In ACL (2), pages 49–54. 70–77. ACM.

614
Jeffrey Pennington, Richard Socher, and Christopher D tion with long short term memory. arXiv preprint
Manning. 2014. Glove: Global vectors for word rep- arXiv:1512.01100.
resentation. Proceedings of the Empiricial Methods Duyu Tang, Bing Qin, and Ting Liu. 2015b. Docu-
in Natural Language Processing (EMNLP 2014), ment modeling with gated recurrent neural network
12:1532–1543. for sentiment classification. In Proceedings of the
Veronica Perez-Rosas, Carmen Banea, and Rada Mihal- 2015 Conference on Empirical Methods in Natural
cea. 2012. Learning sentiment lexicons in spanish. In Language Processing, pages 1422–1432.
LREC, volume 12, page 73. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen
Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Har- Zhou. 2015. Abcnn: Attention-based convolutional
ris Papageorgiou, Ion Androutsopoulos, and Suresh neural network for modeling sentence pairs. arXiv
Manandhar. 2014. Semeval-2014 task 4: Aspect preprint arXiv:1512.05193.
based sentiment analysis. In Proceedings of the
8th international workshop on semantic evaluation
(SemEval 2014), pages 27–35.
Qiao Qian, Bo Tian, Minlie Huang, Yang Liu, Xuan
Zhu, and Xiaoyan Zhu. 2015. Learning tag embed-
dings and tag-specific composition functions in re-
cursive neural network. In Proceedings of the 53rd
Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference
on Natural Language Processing, volume 1, pages
1365–1374.
Delip Rao and Deepak Ravichandran. 2009. Semi-
supervised polarity lexicon induction. In Proceedings
of the 12th Conference of the European Chapter of
the Association for Computational Linguistics, pages
675–682. Association for Computational Linguistics.
Tim Rocktäschel, Edward Grefenstette, Karl Moritz Her-
mann, Tomáš Kočiskỳ, and Phil Blunsom. 2015. Rea-
soning about entailment with neural attention. arXiv
preprint arXiv:1509.06664.
Alexander M Rush, Sumit Chopra, and Jason We-
ston. 2015. A neural attention model for ab-
stractive sentence summarization. arXiv preprint
arXiv:1509.00685.
Richard Socher, Jeffrey Pennington, Eric H Huang, An-
drew Y Ng, and Christopher D Manning. 2011.
Semi-supervised recursive autoencoders for predict-
ing sentiment distributions. In Proceedings of
the Conference on Empirical Methods in Natural
Language Processing, pages 151–161. Association for
Computational Linguistics.
Richard Socher, Alex Perelygin, Jean Y Wu, Jason
Chuang, Christopher D Manning, Andrew Y Ng, and
Christopher Potts. 2013. Recursive deep models for
semantic compositionality over a sentiment treebank.
In EMNLP, volume 1631, page 1642. Citeseer.
Kai Sheng Tai, Richard Socher, and Christopher D
Manning. 2015. Improved semantic representa-
tions from tree-structured long short-term memory
networks. arXiv preprint arXiv:1503.00075.
Duyu Tang, Bing Qin, Xiaocheng Feng, and Ting
Liu. 2015a. Target-dependent sentiment classifica-

615

Sentiment Analysis Using Convolutional Neural Network
No ratings yet
Sentiment Analysis Using Convolutional Neural Network
6 pages
Fu Chengyao
No ratings yet
Fu Chengyao
129 pages
Huang 2018
No ratings yet
Huang 2018
30 pages
Information Sciences: Li Kong, Chuanyi Li, Jidong Ge, Feifei Zhang, Yi Feng, Zhongjin Li, Bin Luo
No ratings yet
Information Sciences: Li Kong, Chuanyi Li, Jidong Ge, Feifei Zhang, Yi Feng, Zhongjin Li, Bin Luo
17 pages
Chen (2017) - CNN+sentiment Analysis+sentence Type
No ratings yet
Chen (2017) - CNN+sentiment Analysis+sentence Type
10 pages
Co-Extraction of Feature Sentiment and Context Terms For Context-Sensitive Feature-Based Sentiment Classification Using Attentive-LSTM
No ratings yet
Co-Extraction of Feature Sentiment and Context Terms For Context-Sensitive Feature-Based Sentiment Classification Using Attentive-LSTM
10 pages
Aspect Based Sentiment Analysis With Gated Convolutional Networks
No ratings yet
Aspect Based Sentiment Analysis With Gated Convolutional Networks
10 pages
Enhancing Sentiment Analysis Via Fusion of Multiple Embeddings Using Attention Encoder With LSTM
No ratings yet
Enhancing Sentiment Analysis Via Fusion of Multiple Embeddings Using Attention Encoder With LSTM
17 pages
Auto-ABSA Automatic Detection of Aspects in Aspect-Based Sentiment Analysis
No ratings yet
Auto-ABSA Automatic Detection of Aspects in Aspect-Based Sentiment Analysis
11 pages
Recursive Neural Conditional Random Fields For Aspect-Based Sentiment Analysis
No ratings yet
Recursive Neural Conditional Random Fields For Aspect-Based Sentiment Analysis
11 pages
Aspect-Based Sentiment Classification With Aspect-Specific Graph Convolutional Networks
No ratings yet
Aspect-Based Sentiment Classification With Aspect-Specific Graph Convolutional Networks
11 pages
A Summary of Aspect-Based Sentiment Analysis
No ratings yet
A Summary of Aspect-Based Sentiment Analysis
11 pages
Aspect-Based Sentiment Analysis: A Survey of Deep Learning Methods
No ratings yet
Aspect-Based Sentiment Analysis: A Survey of Deep Learning Methods
18 pages
Deep Sentient Network With Multifarious Features and Inter-Mutual Attention Mechanism For Target-Specific Sentiment Classification 2
No ratings yet
Deep Sentient Network With Multifarious Features and Inter-Mutual Attention Mechanism For Target-Specific Sentiment Classification 2
6 pages
A Hybrid Solution For Sentence-Level Aspect-Based
No ratings yet
A Hybrid Solution For Sentence-Level Aspect-Based
9 pages
Applsci 11 11255 v2
No ratings yet
Applsci 11 11255 v2
17 pages
Prompted Representation Joint Contrastive Learning For Aspect-Based 2024
No ratings yet
Prompted Representation Joint Contrastive Learning For Aspect-Based 2024
12 pages
Aspect-Based Sentiment Analysis Using A Hybridized Approach Based On CNN and GA
No ratings yet
Aspect-Based Sentiment Analysis Using A Hybridized Approach Based On CNN and GA
14 pages
Enhanced Implicit Sentiment Understanding With Prototype Learning and Demonstration For Aspect-Based Sentiment Analysis
No ratings yet
Enhanced Implicit Sentiment Understanding With Prototype Learning and Demonstration For Aspect-Based Sentiment Analysis
16 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
Various Approaches To Aspect-Based Sentiment Analysis: Preprint
No ratings yet
Various Approaches To Aspect-Based Sentiment Analysis: Preprint
4 pages
Sentiment Analysis On Product Reviews Based Onweighted
No ratings yet
Sentiment Analysis On Product Reviews Based Onweighted
13 pages
Paper 11
No ratings yet
Paper 11
10 pages
Chiny Et Al. - 2021 - LSTM, VADER and TF-IDF Based Hybrid Sentiment Anal
No ratings yet
Chiny Et Al. - 2021 - LSTM, VADER and TF-IDF Based Hybrid Sentiment Anal
11 pages
Technical Seminar Nameera
No ratings yet
Technical Seminar Nameera
14 pages
Applsci 10 05841
No ratings yet
Applsci 10 05841
14 pages
Weakly-Supervised Deep Embedding For Product Review Sentiment Analysis
No ratings yet
Weakly-Supervised Deep Embedding For Product Review Sentiment Analysis
12 pages
2018 Aspect Deep Learning
No ratings yet
2018 Aspect Deep Learning
11 pages
Target-Dependent Sentiment Classification With BERT: Zhengjie Gao, Ao Feng, Xinyu Song, and Xi Wu
No ratings yet
Target-Dependent Sentiment Classification With BERT: Zhengjie Gao, Ao Feng, Xinyu Song, and Xi Wu
19 pages
Gran Paper
No ratings yet
Gran Paper
13 pages
A Deep Neural Network Model For Target-Based Sentiment Analysis
No ratings yet
A Deep Neural Network Model For Target-Based Sentiment Analysis
7 pages
Sentic LSTM
No ratings yet
Sentic LSTM
12 pages
Aspect Based Sentiment Analysis Via Dual Residual Networks
No ratings yet
Aspect Based Sentiment Analysis Via Dual Residual Networks
25 pages
Sentiment Analysis Based On Weighted Word2vec and Att-LSTM
No ratings yet
Sentiment Analysis Based On Weighted Word2vec and Att-LSTM
5 pages
Aspect-Based Sentiment Analysis With Gated Alternate Neural Network
No ratings yet
Aspect-Based Sentiment Analysis With Gated Alternate Neural Network
14 pages
Icwe 2020
No ratings yet
Icwe 2020
15 pages
Chatgpt Tweets Sentiment Analysis Using Machine Learning and Data Classification
No ratings yet
Chatgpt Tweets Sentiment Analysis Using Machine Learning and Data Classification
11 pages
Ensemble Deep Learning For Aspect-Based Sentiment Analysis
No ratings yet
Ensemble Deep Learning For Aspect-Based Sentiment Analysis
10 pages
Integrated Syntactic and Semantic Tree For Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network
No ratings yet
Integrated Syntactic and Semantic Tree For Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network
16 pages
Parecido EMais Importante
No ratings yet
Parecido EMais Importante
11 pages
Deep-Sentiment: Sentiment Analysis Using Ensemble of CNN and Bi-LSTM Models
No ratings yet
Deep-Sentiment: Sentiment Analysis Using Ensemble of CNN and Bi-LSTM Models
6 pages
Xlnet-Lstm-Cnn For Text Sentiment Analysis: Yiwen Wang
No ratings yet
Xlnet-Lstm-Cnn For Text Sentiment Analysis: Yiwen Wang
23 pages
Aspect Based Sentiment Analysis
No ratings yet
Aspect Based Sentiment Analysis
8 pages
Neurocomputing: Shusen Zhou, Qingcai Chen, Xiaolong Wang
No ratings yet
Neurocomputing: Shusen Zhou, Qingcai Chen, Xiaolong Wang
11 pages
Exploring Sentiment Analysis Through Deep Learning: A Comprehensive Review
No ratings yet
Exploring Sentiment Analysis Through Deep Learning: A Comprehensive Review
4 pages
1 s2.0 S0950705121009059 Main
No ratings yet
1 s2.0 S0950705121009059 Main
11 pages
Modeling Category Semantic and Sentiment Knowledge For Aspect-Level Sentiment Analysis
No ratings yet
Modeling Category Semantic and Sentiment Knowledge For Aspect-Level Sentiment Analysis
8 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
5 pages
Huang 2021
No ratings yet
Huang 2021
14 pages
Sentiment Analysis For Social Networks Using Machi
No ratings yet
Sentiment Analysis For Social Networks Using Machi
4 pages
DualGCN Exploring Syntactic and Semantic Information For Aspect
No ratings yet
DualGCN Exploring Syntactic and Semantic Information For Aspect
15 pages
Chen 2019 IOP Conf. Ser. Mater. Sci. Eng. 490 062063
No ratings yet
Chen 2019 IOP Conf. Ser. Mater. Sci. Eng. 490 062063
6 pages
Ai Project Cycle
100% (1)
Ai Project Cycle
12 pages
Short Text Sentiment Classification Using Bayesian
No ratings yet
Short Text Sentiment Classification Using Bayesian
16 pages
Amit Anand Presentation Sem4 Deep Learning Based Sentiment Analysis-2
No ratings yet
Amit Anand Presentation Sem4 Deep Learning Based Sentiment Analysis-2
12 pages
Deep Context - and Relation-Aware Learning For Aspect-Based Sentiment Analysis
No ratings yet
Deep Context - and Relation-Aware Learning For Aspect-Based Sentiment Analysis
9 pages
Pe-3032 WK 1 Introduction To Control System March 04
100% (3)
Pe-3032 WK 1 Introduction To Control System March 04
70 pages
Syntactic Graph Attention Network For Aspect Level Sentiment Analysis
No ratings yet
Syntactic Graph Attention Network For Aspect Level Sentiment Analysis
14 pages
Convulutional Gru
No ratings yet
Convulutional Gru
7 pages
MCQs - Big Data Analytics - Predictive Analytics
No ratings yet
MCQs - Big Data Analytics - Predictive Analytics
10 pages
Purposive Communication Overview
No ratings yet
Purposive Communication Overview
4 pages
Robust Pole Placement Using Linear Quadratic Regulator Weight Selection Algorithm
No ratings yet
Robust Pole Placement Using Linear Quadratic Regulator Weight Selection Algorithm
5 pages
Chapter 9 Quantitative Feedback Theory
No ratings yet
Chapter 9 Quantitative Feedback Theory
44 pages
PBI Desktop Fundamentals Training Session 1
No ratings yet
PBI Desktop Fundamentals Training Session 1
70 pages
Characteristics of Language PDF
100% (1)
Characteristics of Language PDF
1 page
IEEE Systems Man and Cybernetics Magazine-April 2021
No ratings yet
IEEE Systems Man and Cybernetics Magazine-April 2021
68 pages
Chapter 2 List and Linked List
No ratings yet
Chapter 2 List and Linked List
33 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Naive Bayes Classifier From Wikipedia
No ratings yet
Naive Bayes Classifier From Wikipedia
13 pages
SR 11-7, Validation and Machine Learning Models
100% (1)
SR 11-7, Validation and Machine Learning Models
31 pages
Qwen Technical Report
No ratings yet
Qwen Technical Report
59 pages
AI Paper 2
No ratings yet
AI Paper 2
1 page
BDA Question Bank - 2023
No ratings yet
BDA Question Bank - 2023
4 pages
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur - No Free Hunch
No ratings yet
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur - No Free Hunch
22 pages
Infographic IDP Vs OCR Latest
No ratings yet
Infographic IDP Vs OCR Latest
1 page
Program Efficiency or Idea of Algorithm Efficiency: Based On CBSE Curriculum Class - 12
No ratings yet
Program Efficiency or Idea of Algorithm Efficiency: Based On CBSE Curriculum Class - 12
9 pages
Big Data and Strategic Entrepreneurships
No ratings yet
Big Data and Strategic Entrepreneurships
83 pages
Lecture Notes in Artificial Intelligence 14392
No ratings yet
Lecture Notes in Artificial Intelligence 14392
18 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Partitioning in Distributed Systems
No ratings yet
Partitioning in Distributed Systems
34 pages
Best Practices For Dbms Stats
No ratings yet
Best Practices For Dbms Stats
26 pages
My First Article
No ratings yet
My First Article
10 pages
1 s2.0 S1047320320301115 Main
No ratings yet
1 s2.0 S1047320320301115 Main
10 pages
UNIT 1 QB Final
No ratings yet
UNIT 1 QB Final
2 pages
Autonomous Systems: David P. Watson and David H. Scheidt
No ratings yet
Autonomous Systems: David P. Watson and David H. Scheidt
9 pages
The Four Dimensions of Knowledge: Cognitive, Connectionist, Autopoietic and Integral. Advancing The Understanding Learning
No ratings yet
The Four Dimensions of Knowledge: Cognitive, Connectionist, Autopoietic and Integral. Advancing The Understanding Learning
8 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
From Everand
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
Sanket Subhash Khandare
No ratings yet
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
From Everand
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Natural Language Processing with AllenNLP: Definitive Reference for Developers and Engineers
From Everand
Applied Natural Language Processing with AllenNLP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Attention-Based LSTM For Aspect-Level Sentiment Classification

Uploaded by

Attention-Based LSTM For Aspect-Level Sentiment Classification

Uploaded by

Attention-based LSTM for Aspect-level Sentiment Classification

Figure 1: The architecture of a standard LSTM. 3.3 Attention-based LSTM (AT-LSTM)

LSTM LSTM LSTM … LSTM

representation r. The attention mechanism allows the model to

LSTM LSTM LSTM … LSTM

(a) the aspect of this sentence: service

(b) the aspect of this sentence: food

4.4 Qualitative Analysis

aspect: food; polarity: neutral aspect: service; polarity: negative

aspect: food; polarity: positive aspect: food; polarity: positive

You might also like