0% found this document useful (0 votes)
83 views10 pages

Attention-Based LSTM For Aspect-Level Sentiment Classification

Uploaded by

Hacker Tale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views10 pages

Attention-Based LSTM For Aspect-Level Sentiment Classification

Uploaded by

Hacker Tale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Attention-based LSTM for Aspect-level Sentiment Classification

Yequan Wang and Minlie Huang and Li Zhao* and Xiaoyan Zhu
State Key Laboratory on Intelligent Technology and Systems
Tsinghua National Laboratory for Information Science and Technology
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
*Microsoft Research Asia
[email protected], [email protected]
[email protected], [email protected]

Abstract of “Staffs are not that friendly, but the taste covers
all.” will be positive if the aspect is food but neg-
Aspect-level sentiment classification is a fine- ative when considering the aspect service. Polarity
grained task in sentiment analysis. Since it
could be opposite when different aspects are consid-
provides more complete and in-depth results,
aspect-level sentiment analysis has received ered.
much attention these years. In this paper, we Neural networks have achieved state-of-the-art
reveal that the sentiment polarity of a sentence
performance in a variety of NLP tasks such as ma-
is not only determined by the content but is
also highly related to the concerned aspect.
chine translation (Lample et al., 2016), paraphrase
For instance, “The appetizers are ok, but the identification (Yin et al., 2015), question answer-
service is slow.”, for aspect taste, the polar- ing (Golub and He, 2016) and text summariza-
ity is positive while for service, the polarity tion (Rush et al., 2015). However, neural net-
is negative. Therefore, it is worthwhile to ex- work models are still in infancy to deal with aspect-
plore the connection between an aspect and level sentiment classification. In some works, tar-
the content of a sentence. To this end, we get dependent sentiment classification can be ben-
propose an Attention-based Long Short-Term
efited from taking into account target information,
Memory Network for aspect-level sentiment
classification. The attention mechanism can such as in Target-Dependent LSTM (TD-LSTM)
concentrate on different parts of a sentence and Target-Connection LSTM (TC-LSTM) (Tang et
when different aspects are taken as input. We al., 2015a). However, those models can only take
experiment on the SemEval 2014 dataset and into consideration the target but not aspect informa-
results show that our model achieves state-of- tion which is proved to be crucial for aspect-level
the-art performance on aspect-level sentiment classification.
classification.
Attention has become an effective mechanism to
obtain superior results, as demonstrated in image
1 Introduction
recognition (Mnih et al., 2014), machine transla-
Sentiment analysis (Nasukawa and Yi, 2003), also tion (Bahdanau et al., 2014), reasoning about entail-
known as opinion mining (Liu, 2012), is a key ment (Rocktäschel et al., 2015) and sentence sum-
NLP task that receives much attention these years. marization (Rush et al., 2015). Even more, neural
Aspect-level sentiment analysis is a fine-grained attention can improve the ability to read comprehen-
task that can provide complete and in-depth results. sion (Hermann et al., 2015). In this paper, we pro-
In this paper, we deal with aspect-level sentiment pose an attention mechanism to enforce the model
classification and we find that the sentiment polar- to attend to the important part of a sentence, in re-
ity of a sentence is highly dependent on both con- sponse to a specific aspect. We design an aspect-to-
tent and aspect. For example, the sentiment polarity sentence attention mechanism that can concentrate

606

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 606–615,
c
Austin, Texas, November 1-5, 2016. 2016 Association for Computational Linguistics
on the key part of a sentence given the aspect. ature. As we mentioned before, aspect-level sen-
We explore the potential correlation of aspect and timent classification is a fine-grained classification
sentiment polarity in aspect-level sentiment classifi- task. The majority of current approaches attempt to
cation. In order to capture important information in detecting the polarity of the entire sentence, regard-
response to a given aspect, we design an attention- less of the entities mentioned or aspects. Traditional
based LSTM. We evaluate our approach on a bench- approaches to solve those problems are to manu-
mark dataset (Pontiki et al., 2014), which contains ally design a set of features. With the abundance of
restaurants and laptops data. sentiment lexicons (Rao and Ravichandran, 2009;
The main contributions of our work can be sum- Perez-Rosas et al., 2012; Kaji and Kitsuregawa,
marized as follows: 2007), the lexicon-based features were built for sen-
timent analysis (Mohammad et al., 2013). Most of
• We propose attention-based Long Short-Term these studies focus on building sentiment classifiers
memory for aspect-level sentiment classifica- with features, which include bag-of-words and sen-
tion. The models are able to attend differ- timent lexicons, using SVM (Mullen and Collier,
ent parts of a sentence when different aspects 2004). However, the results highly depend on the
are concerned. Results show that the attention quality of features. In addition, feature engineering
mechanism is effective. is labor intensive.
• Since aspect plays a key role in this task, we 2.2 Sentiment Classification with Neural
propose two ways to take into account aspect Networks
information during attention: one way is to
concatenate the aspect vector into the sentence Since a simple and effective approach to learn dis-
hidden representations for computing attention tributed representations was proposed (Mikolov et
weights, and another way is to additionally ap- al., 2013), neural networks advance sentiment anal-
pend the aspect vector into the input word vec- ysis substantially. Classical models including Re-
tors. cursive Neural Network (Socher et al., 2011; Dong
et al., 2014; Qian et al., 2015), Recursive Neu-
• Experimental results indicate that our ap- ral Tensor Network (Socher et al., 2013), Recur-
proach can improve the performance compared rent Neural Network (Mikolov et al., 2010; Tang
with several baselines, and further examples et al., 2015b), LSTM (Hochreiter and Schmidhuber,
demonstrate the attention mechanism works 1997) and Tree-LSTMs (Tai et al., 2015) were ap-
well for aspect-level sentiment classification. plied into sentiment analysis currently. By utilizing
syntax structures of sentences, tree-based LSTMs
The rest of our paper is structured as follows: have been proved to be quite effective for many NLP
Section 2 discusses related works, Section 3 gives a tasks. However, such methods may suffer from syn-
detailed description of our attention-based propos- tax parsing errors which are common in resource-
als, Section 4 presents extensive experiments to jus- lacking languages.
tify the effectiveness of our proposals, and Section 5 LSTM has achieved a great success in various
summarizes this work and the future direction. NLP tasks. TD-LSTM and TC-LSTM (Tang et
al., 2015a), which took target information into con-
2 Related Work
sideration, achieved state-of-the-art performance
In this section, we will review related works on in target-dependent sentiment classification. TC-
aspect-level sentiment classification and neural net- LSTM obtained a target vector by averaging the
works for sentiment classification briefly. vectors of words that the target phrase contains.
However, simply averaging the word embeddings of
2.1 Sentiment Classification at Aspect-level a target phrase is not sufficient to represent the se-
Aspect-level sentiment classification is typically mantics of the target phrase, resulting a suboptimal
considered as a classification problem in the liter- performance.

607
Despite the effectiveness of those methods, it is where Wi , Wf , Wo ∈ Rd×2d are the weighted ma-
still challenging to discriminate different sentiment trices and bi , bf , bo ∈ Rd are biases of LSTM to be
polarities at a fine-grained aspect level. Therefore, learned during training, parameterizing the transfor-
we are motivated to design a powerful neural net- mations of the input, forget and output gates respec-
work which can fully employ aspect information for tively. σ is the sigmoid function and ⊙ stands for
sentiment classification. element-wise multiplication. xt includes the inputs
of LSTM cell unit, representing the word embed-
3 Attention-based LSTM with Aspect ding vectors wt in Figure 1. The vector of hidden
Embedding layer is ht .
3.1 Long Short-term Memory (LSTM) We regard the last hidden vector hN as the rep-
resentation of sentence and put hN into a sof tmax
Recurrent Neural Network(RNN) is an extension of
layer after linearizing it into a vector whose length is
conventional feed-forward neural network. How-
equal to the number of class labels. In our work, the
ever, standard RNN has the gradient vanishing
set of class labels is {positive, negative, neutral}.
or exploding problems. In order to overcome
the issues, Long Short-term Memory network
(LSTM) was developed and achieved superior per- 3.2 LSTM with Aspect Embedding
formance (Hochreiter and Schmidhuber, 1997). In (AE-LSTM)
the LSTM architecture, there are three gates and a
cell memory state. Figure 1 illustrates the architec- Aspect information is vital when classifying the po-
ture of a standard LSTM. larity of one sentence given aspect. We may get op-
posite polarities if different aspects are considered.
softmax
ℎ1 ℎ2 ℎ𝑁 To make the best use of aspect information, we pro-
pose to learn an embedding vector for each aspect.
Vector vai ∈ Rda is represented for the embed-
LSTM LSTM … LSTM ding of aspect i, where da is the dimension of aspect
embedding. A ∈ Rda ×|A| is made up of all aspect
embeddings. To the best of our knowledge, it is the
first time to propose aspect embedding.
𝑤1 𝑤2 𝑤𝑁

Figure 1: The architecture of a standard LSTM. 3.3 Attention-based LSTM (AT-LSTM)


{w1 , w2 , . . . , wN } represent the word vector in a sen-
tence whose length is N . {h1 , h2 , . . . , hN } is the hidden The standard LSTM cannot detect which is the im-
vector. portant part for aspect-level sentiment classification.
In order to address this issue, we propose to de-
More formally, each cell in LSTM can be com- sign an attention mechanism that can capture the
puted as follows: key part of sentence in response to a given aspect.
[ ] Figure 2 represents the architecture of an Attention-
ht−1 based LSTM (AT-LSTM).
X= (1)
xt
Let H ∈ Rd×N be a matrix consisting of hid-
ft = σ(Wf · X + bf ) (2) den vectors [h1 , . . . , hN ] that the LSTM produced,
it = σ(Wi · X + bi ) (3) where d is the size of hidden layers and N is the
ot = σ(Wo · X + bo ) (4) length of the given sentence. Furthermore, va rep-
resents the embedding of aspect and eN ∈ RN is a
ct = ft ⊙ ct−1 + it ⊙ tanh(Wc · X + bc ) (5)
vector of 1s. The attention mechanism will produce
ht = ot ⊙ tanh(ct ) (6) an attention weight vector α and a weighted hidden

608
Attention 𝛼

𝑣𝑎 𝑣𝑎 𝑣𝑎 𝑣𝑎
Aspect Embedding
r
ℎ1 ℎ2 ℎ3 ℎ𝑁
H

LSTM LSTM LSTM … LSTM

Word Representation 𝑤1 𝑤2 𝑤3 𝑤𝑁

Figure 2: The Architecture of Attention-based LSTM. The aspect embeddings have been used to decide the attention weights
along with the sentence representations. {w1 , w2 , . . . , wN } represent the word vector in a sentence whose length is N . va
represents the aspect embedding. α is the attention weight. {h1 , h2 , . . . , hN } is the hidden vector.

representation r. The attention mechanism allows the model to


[ ] capture the most important part of a sentence when
Wh H different aspects are considered.
M = tanh( ) (7)
Wv va ⊗ eN h∗ is considered as the feature representation of
α = sof tmax(wT M ) (8) a sentence given an input aspect. We add a linear
T layer to convert sentence vector to e, which is a real-
r = Hα (9)
valued vector with the length equal to class number
where, M ∈ R(d+da )×N , α ∈ RN , r ∈ Rd . |C|. Then, a sof tmax layer is followed to trans-
Wh ∈ Rd×d , Wv ∈ Rda ×da and w ∈ Rd+da are form e to conditional probability distribution.
projection parameters. α is a vector consisting of
y = sof tmax(Ws h∗ + bs ) (11)
attention weights and r is a weighted representation
of sentence with given aspect. The operator in 7 (a
where Ws and bs are the parameters for sof tmax
circle with a multiplication sign inside, OP for short
layer.
here) means: va ⊗eN = [v; v; . . . ; v], that is, the op-
erator repeatedly concatenates v for N times, where 3.4 Attention-based LSTM with Aspect
eN is a column vector with N 1s. Wv va ⊗ eN is Embedding (ATAE-LSTM)
repeating the linearly transformed va as many times
as there are words in sentence. The way of using aspect information in AE-LSTM
The final sentence representation is given by: is letting aspect embedding play a role in com-
puting the attention weight. In order to better
h∗ = tanh(Wp r + Wx hN ) (10) take advantage of aspect information, we append
the input aspect embedding into each word input
where, h∗ ∈ Rd , Wp and Wx are projection param- vector. The structure of this model is illustrated
eters to be learned during training. We find that this in 3. In this way, the output hidden representa-
works practically better if we add Wx hN into the fi- tions (h1 , h2 , ..., hN ) can have the information from
nal representation of the sentence, which is inspired the input aspect (va ). Therefore, in the following
by (Rocktäschel et al., 2015). step that compute the attention weights, the inter-

609
Attention 𝛼

𝑣𝑎 𝑣𝑎 𝑣𝑎 𝑣𝑎
Aspect Embedding
𝑟

ℎ1 ℎ2 ℎ3 ℎ𝑁
H

LSTM LSTM LSTM … LSTM

Word Representation
𝑤1 𝑤2 𝑤3 𝑤𝑁

Aspect Embedding
𝑣𝑎 𝑣𝑎 𝑣𝑎 𝑣𝑎

Figure 3: The Architecture of Attention-based LSTM with Aspect Embedding. The aspect embeddings have been take as input
along with the word embeddings. {w1 , w2 , . . . , wN } represent the word vector in a sentence whose length is N . va represents the
aspect embedding. α is the attention weight. {h1 , h2 , . . . , hN } is the hidden vector.

dependence between words and the input aspect can cell unit, the dimension of Wi , Wf , Wo , Wc will be
be modeled. enlarged correspondingly. Additional parameters
are listed as follows:
3.5 Model Training AT-LSTM: The aspect embedding A is added
The model can be trained in an end-to-end way by into the set of parameters naturally. In addition,
backpropagation, where the objective function (loss Wh , Wv , Wp , Wx , w are the parameters of atten-
function) is the cross-entropy loss. Let y be the tar- tion. Therefore, the additional parameter set of AT-
get distribution for sentence, ŷ be the predicted sen- LSTM is {A, Wh , Wv , Wp , Wx , w}.
timent distribution. The goal of training is to mini- AE-LSTM: The parameters include the as-
mize the cross-entropy error between y and ŷ for all pect embedding A. Besides, the dimension of
sentences. Wi , Wf , Wo , Wc will be expanded since the aspect
∑∑ j vector is concatenated. Therefore, the additional pa-
loss = − yi log ŷij + λ||θ||2 (12) rameter set consists of {A}.
i j
ATAE-LSTM: The parameter set consists of
where i is the index of sentence, j is the index of {A, Wh , Wv , Wp , Wx , w}. Additionally, the dimen-
class. Our classification is three way. λ is the L2 - sion of Wi , Wf , Wo , Wc will be expanded with the
regularization term. θ is the parameter set. concatenation of aspect embedding.
Similar to standard LSTM, the parameter set The word embedding and aspect embedding are
is {Wi , bi , Wf , bf , Wo , bo , Wc , bc , Ws , bs }. Fur- optimized during training. The percentage of out-
thermore, word embeddings are the parameters of-vocabulary words is about 5%, and they are ran-
too. Note that the dimension of Wi , Wf , Wo , Wc domly initialized from U (−ϵ, ϵ), where ϵ = 0.01.
changes along with different models. If the aspect In our experiments, we use AdaGrad (Duchi et
embeddings are added into the input of the LSTM al., 2011) as our optimization method, which has

610
improved the robustness of SGD on large scale Positive Negative Neural
Asp.
learning task remarkably in a distributed environ- Train Test Train Test Train Test
ment (Dean et al., 2012). AdaGrad adapts the learn- Fo. 867 302 209 69 90 31
ing rate to the parameters, performing larger updates Pr. 179 51 115 28 10 1
for infrequent parameters and smaller updates for Se. 324 101 218 63 20 3
frequent parameters. Am. 263 76 98 21 23 8
An. 546 127 199 41 357 51
4 Experiment Total 2179 657 839 222 500 94
We apply the proposed model to aspect-level sen- Table 1: Aspects distribution per sentiment class. {Fo., Pr.,
timent classification. In our experiments, all word Se, Am., An.} refer to {food, price, service, ambience, anec-
vectors are initialized by Glove1 (Pennington et al., dotes/miscellaneous}. “Asp.” refers to aspect.
2014). The word embedding vectors are pre-trained
on an unlabeled corpus whose size is about 840 bil- Models Three-way Pos./Neg.
lion. The other parameters are initialized by sam- LSTM 82.0 88.3
pling from a uniform distribution U (−ϵ, ϵ). The TD-LSTM 82.6 89.1
dimension of word vectors, aspect embeddings and TC-LSTM 81.9 89.2
the size of hidden layer are 300. The length of at- AE-LSTM 82.5 88.9
tention weights is the same as the length of sentence. AT-LSTM 83.1 89.6
Theano (Bastien et al., 2012) is used for implement- ATAE-LSTM 84.0 89.9
ing our neural network models. We trained all mod-
Table 2: Accuracy on aspect level polarity classification about
els with a batch size of 25 examples, and a momen-
restaurants. Three-way stands for 3-class prediction. Pos./Neg.
tum of 0.9, L2 -regularization weight of 0.001 and
indicates binary prediction where ignoring all neutral instances.
initial learning rate of 0.01 for AdaGrad.
Best scores are in bold.
4.1 Dataset
We experiment on the dataset of SemEval 2014 Task illustrates the comparative results.
42 (Pontiki et al., 2014). The dataset consists of
customers reviews. Each review contains a list of Aspect-Term-level Classification For a given set
aspects and corresponding polarities. Our aim is to of aspects term within a sentence, this task is to de-
identify the aspect polarity of a sentence with the termine whether the polarity of each aspect term is
corresponding aspect. The statistics is presented in positive, negative or neutral. We conduct experi-
Table 1. ments on the dataset of SemEval 2014 Task 4. In
the sentences of both restaurant and laptop datasets,
4.2 Task Definition there are the location and sentiment polarity for
Aspect-level Classification Given a set of pre- each occurrence of an aspect term. For example,
identified aspects, this task is to determine the there is an aspect term fajitas whose polarity is neg-
polarity of each aspect. For example, given a ative in sentence “I loved their fajitas.”.
sentence, “The restaurant was too expensive.”, Experiments results are shown in Table 3 and Ta-
there is an aspect price whose polarity is negative. ble 4. Similar to the experiment on aspect-level
The set of aspects is {food, price, service, ambi- classification, our models achieve state-of-the-art
ence, anecdotes/miscellaneous}. In the dataset of performance.
SemEval 2014 Task 4, there is only restaurants
4.3 Comparison with baseline methods
data that has aspect-specific polarities. Table 2
1
We compare our model with several baselines, in-
Pre-trained word vectors of Glove can be obtained from
http://nlp.stanford.edu/projects/glove/
cluding LSTM, TD-LSTM, and TC-LSTM.
2
The introduction about SemEval 2014 can be obtained LSTM: Standard LSTM cannot capture any as-
from http://alt.qcri.org/semeval2014/ pect information in sentence, so it must get the same

611
𝜶

(a) the aspect of this sentence: service

(b) the aspect of this sentence: food

Figure 4: Attention Visualizations. The aspects of (a) and (b) are service and food respectively. The color depth expresses the
importance degree of the weight in attention vector α. From (a), attention can detect the important words from the whole sentence
dynamically even though multi-semantic phrase such as “fastest delivery times” which can be used in other areas. From (b),
attention can know multi-keypoints if more than one keypoint existing.

Models Three-way Pos./Neg. TD-LSTM, it cannot “know” which words are im-
LSTM 74.3 - portant for a given aspect.
TD-LSTM 75.6 - TC-LSTM: TC-LSTM extended TD-LSTM by
AE-LSTM 76.6 89.6 incorporating a target into the representation of a
ATAE-LSTM 77.2 90.9 sentence. It is worth noting that TC-LSTM per-
forms worse than LSTM and TD-LSTM in Table 2.
Table 3: Accuracy on aspect term polarity classification about
TC-LSTM added target representations, which was
restaurants. Three-way stands for 3-class prediction. Pos./Neg.
obtained from word vectors, into the input of the
indicates binary prediction where ignoring all neutral instances.
LSTM cell unit.
Best scores are in bold.
In our models, we embed aspects into another
Models Three-way Pos./Neg.
vector space. The embedding vector of aspects can
LSTM 66.5 - be learned well in the process of training. ATAE-
TD-LSTM 68.1 - LSTM not only addresses the shortcoming of the
AE-LSTM 68.9 87.4 unconformity between word vectors and aspect em-
ATAE-LSTM 68.7 87.6 beddings, but also can capture the most important
Table 4: Accuracy on aspect term polarity classification about information in response to a given aspect. In ad-
laptops. Three-way stands for 3-class prediction. Pos./Neg. in- dition, ATAE-LSTM can capture the important and
dicates binary prediction where ignoring all neutral instances. different parts of a sentence when given different
Best scores are in bold. aspects.

4.4 Qualitative Analysis


sentiment polarity although given different aspects. It is enlightening to analyze which words decide the
Since it cannot take advantage of the aspect infor- sentiment polarity of the sentence given an aspect.
mation, not surprisingly the model has worst per- We can obtain the attention weight α in Equation 8
formance. and visualize the attention weights accordingly.
TD-LSTM: TD-LSTM can improve the perfor- Figure 4 shows the representation of how atten-
mance of sentiment classifier by treating an aspect tion focuses on words with the influence of a given
as a target. Since there is no attention mechanism in aspect. We use a visualization tool Heml (Deng

612
The appetizers are ok, but the service is slow.

aspect: food; polarity: neutral aspect: service; polarity: negative

(a)
I highly recommend it for not just its superb cuisine, but also for its friendly owners and staff.

aspect: food; polarity: positive aspect: food; polarity: positive

(b)
The service, however, is a peg or two below the quality of food (horrible bartenders), and

the clientele, for the most part, are rowdy, loud-mouthed commuters (this could explain the

bad attitudes from the staff) getting loaded for an AC/DC concert or a Knicks game.

aspect: food; polarity: positive aspect: service; polarity: positive aspect: ambience; polarity: negative

(c)

Figure 5: Examples of classification. (a) is an instance with different aspects. (b) represents that our model can focus on where
the keypoints are and not disturbed by the privative word not. (c) stands for long and complicated sentences. Our model can obtain
correct sentiment polarity.

et al., 2014) to visualize the sentences. The color aspects food and service. Our model can discrimi-
depth indicates the importance degree of the weight nate different sentiment polarities with different as-
in attention vector α, the darker the more important. pects. In sentence (b), “I highly recommend it for
The sentences in Figure 4 are “I have to say they not just its superb cuisine, but also for its friendly
have one of the fastest delivery times in the city .” owners and staff.”, there is a negation word not. Our
and “The fajita we tried was tasteless and burned model can obtain correct polarity, not affected by
and the mole sauce was way too sweet.”. The corre- the negation word who doesn’t represent negation
sponding aspects are service and food respectively. here. In the last instance (c), “The service, however,
Obviously attention can get the important parts from is a peg or two below the quality of food (horri-
the whole sentence dynamically. In Figure 4 (a), ble bartenders), and the clientele, for the most part,
“fastest delivery times” is a multi-word phrase, but are rowdy, loud-mouthed commuters (this could ex-
our attention-based model can detect such phrases plain the bad attitudes from the staff) getting loaded
if service can is the input aspect. Besides, the atten- for an AC/DC concert or a Knicks game.”, the sen-
tion can detect multiple keywords if more than one tence has a long and complicated structure so that
keyword is existing. In Figure 4 (b), tastless and too existing parser may hardly obtain correct parsing
sweet are both detected. trees. Hence, tree-based neural network models
are difficult to predict polarity correctly. While our
4.5 Case Study attention-based LSTM can work well in those sen-
As we demonstrated, our models obtain the state-of- tences with the help of attention mechanism and as-
the-art performance. In this section, we will further pect embedding.
show the advantages of our proposals through some
typical examples. 5 Conclusion and Future Work
In Figure 5, we list some examples from the test
set which have typical characteristics and cannot be In this paper, we have proposed attention-based
inferred by LSTM. In sentence (a), “The appetiz- LSTMs for aspect-level sentiment classification.
ers are ok, but the service is slow.”, there are two The key idea of these proposals are to learn aspect

613
embeddings and let aspects participate in computing John Duchi, Elad Hazan, and Yoram Singer. 2011.
attention weights. Our proposed models can con- Adaptive subgradient methods for online learning and
centrate on different parts of a sentence when dif- stochastic optimization. The Journal of Machine
ferent aspects are given so that they are more com- Learning Research, 12:2121–2159.
David Golub and Xiaodong He. 2016. Character-level
petitive for aspect-level classification. Experiments
question answering with attention. arXiv preprint
show that our proposed models, AE-LSTM and arXiv:1604.00727.
ATAE-LSTM, obtain superior performance over the Karl Moritz Hermann, Tomas Kocisky, Edward Grefen-
baseline models. stette, Lasse Espeholt, Will Kay, Mustafa Suleyman,
Though the proposals have shown potentials for and Phil Blunsom. 2015. Teaching machines to read
aspect-level sentiment analysis, different aspects are and comprehend. In Advances in Neural Information
input separately. As future work, an interesting Processing Systems, pages 1684–1692.
and possible direction would be to model more than Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long
one aspect simultaneously with the attention mech- short-term memory. Neural computation, 9(8):1735–
anism. 1780.
Nobuhiro Kaji and Masaru Kitsuregawa. 2007. Building
Acknowledgments lexicon for sentiment analysis from massive collection
of html documents. In EMNLP-CoNLL, pages 1075–
This work was partly supported by the National 1083.
Basic Research Program (973 Program) under Guillaume Lample, Miguel Ballesteros, Sandeep Subra-
grant No.2012CB316301/2013CB329403, the Na- manian, Kazuya Kawakami, and Chris Dyer. 2016.
tional Science Foundation of China under grant Neural architectures for named entity recognition.
arXiv preprint arXiv:1603.01360.
No.61272227/61332007, and the Beijing Higher
Bing Liu. 2012. Sentiment analysis and opinion mining.
Education Young Elite Teacher Project. The work
Synthesis lectures on human language technologies,
was also supported by Tsinghua University Beijing 5(1):1–167.
Samsung Telecom R&D Center Joint Laboratory for Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan
Intelligent Media Computing. Cernockỳ, and Sanjeev Khudanpur. 2010. Re-
current neural network based language model. In
INTERSPEECH, volume 2, page 3.
References Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- rado, and Jeff Dean. 2013. Distributed representa-
gio. 2014. Neural machine translation by jointly tions of words and phrases and their compositional-
learning to align and translate. arXiv preprint ity. In Advances in Neural Information Processing
arXiv:1409.0473. Systems, pages 3111–3119.
Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Volodymyr Mnih, Nicolas Heess, Alex Graves, et al.
James Bergstra, Ian Goodfellow, Arnaud Bergeron, 2014. Recurrent models of visual attention. In
Nicolas Bouchard, David Warde-Farley, and Yoshua Advances in Neural Information Processing Systems,
Bengio. 2012. Theano: new features and speed im- pages 2204–2212.
provements. arXiv preprint arXiv:1211.5590. Saif M Mohammad, Svetlana Kiritchenko, and Xiaodan
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Zhu. 2013. Nrc-canada: Building the state-of-the-
Matthieu Devin, Mark Mao, Andrew Senior, Paul art in sentiment analysis of tweets. arXiv preprint
Tucker, Ke Yang, Quoc V Le, et al. 2012. Large scale arXiv:1308.6242.
distributed deep networks. In Advances in Neural Tony Mullen and Nigel Collier. 2004. Sentiment analy-
Information Processing Systems, pages 1223–1231. sis using support vector machines with diverse infor-
Wankun Deng, Yongbo Wang, Zexian Liu, Han Cheng, mation sources. In EMNLP, volume 4, pages 412–
and Yu Xue. 2014. Hemi: a toolkit for illustrating 418.
heatmaps. PloS one, 9(11):e111988. Tetsuya Nasukawa and Jeonghee Yi. 2003. Sen-
Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming timent analysis: Capturing favorability using natu-
Zhou, and Ke Xu. 2014. Adaptive recursive neural ral language processing. In Proceedings of the 2nd
network for target-dependent twitter sentiment classi- international conference on Knowledge capture, pages
fication. In ACL (2), pages 49–54. 70–77. ACM.

614
Jeffrey Pennington, Richard Socher, and Christopher D tion with long short term memory. arXiv preprint
Manning. 2014. Glove: Global vectors for word rep- arXiv:1512.01100.
resentation. Proceedings of the Empiricial Methods Duyu Tang, Bing Qin, and Ting Liu. 2015b. Docu-
in Natural Language Processing (EMNLP 2014), ment modeling with gated recurrent neural network
12:1532–1543. for sentiment classification. In Proceedings of the
Veronica Perez-Rosas, Carmen Banea, and Rada Mihal- 2015 Conference on Empirical Methods in Natural
cea. 2012. Learning sentiment lexicons in spanish. In Language Processing, pages 1422–1432.
LREC, volume 12, page 73. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen
Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Har- Zhou. 2015. Abcnn: Attention-based convolutional
ris Papageorgiou, Ion Androutsopoulos, and Suresh neural network for modeling sentence pairs. arXiv
Manandhar. 2014. Semeval-2014 task 4: Aspect preprint arXiv:1512.05193.
based sentiment analysis. In Proceedings of the
8th international workshop on semantic evaluation
(SemEval 2014), pages 27–35.
Qiao Qian, Bo Tian, Minlie Huang, Yang Liu, Xuan
Zhu, and Xiaoyan Zhu. 2015. Learning tag embed-
dings and tag-specific composition functions in re-
cursive neural network. In Proceedings of the 53rd
Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference
on Natural Language Processing, volume 1, pages
1365–1374.
Delip Rao and Deepak Ravichandran. 2009. Semi-
supervised polarity lexicon induction. In Proceedings
of the 12th Conference of the European Chapter of
the Association for Computational Linguistics, pages
675–682. Association for Computational Linguistics.
Tim Rocktäschel, Edward Grefenstette, Karl Moritz Her-
mann, Tomáš Kočiskỳ, and Phil Blunsom. 2015. Rea-
soning about entailment with neural attention. arXiv
preprint arXiv:1509.06664.
Alexander M Rush, Sumit Chopra, and Jason We-
ston. 2015. A neural attention model for ab-
stractive sentence summarization. arXiv preprint
arXiv:1509.00685.
Richard Socher, Jeffrey Pennington, Eric H Huang, An-
drew Y Ng, and Christopher D Manning. 2011.
Semi-supervised recursive autoencoders for predict-
ing sentiment distributions. In Proceedings of
the Conference on Empirical Methods in Natural
Language Processing, pages 151–161. Association for
Computational Linguistics.
Richard Socher, Alex Perelygin, Jean Y Wu, Jason
Chuang, Christopher D Manning, Andrew Y Ng, and
Christopher Potts. 2013. Recursive deep models for
semantic compositionality over a sentiment treebank.
In EMNLP, volume 1631, page 1642. Citeseer.
Kai Sheng Tai, Richard Socher, and Christopher D
Manning. 2015. Improved semantic representa-
tions from tree-structured long short-term memory
networks. arXiv preprint arXiv:1503.00075.
Duyu Tang, Bing Qin, Xiaocheng Feng, and Ting
Liu. 2015a. Target-dependent sentiment classifica-

615

You might also like