0% found this document useful (0 votes)
37 views12 pages

Sentic LSTM

Uploaded by

sujoyff81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views12 pages

Sentic LSTM

Uploaded by

sujoyff81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Cognitive Computation manuscript No.

(will be inserted by the editor)

Sentic LSTM: A Hybrid Network for Targeted


Aspect-Based Sentiment Analysis
Yukun Ma · Haiyun Peng · Erik Cambria · Tahir Khan · Amir Hussain

Received: date / Accepted: date

Abstract acknowledged that incorporating the background knowl-


Background Sentiment analysis has emerged as one of edge is an important add-on for many NLP tasks.
the most popular NLP tasks in recent years. A clas- MethodsIn this paper, we propose a knowledge-rich so-
sic setting of the task mainly involves classifying the lution to targeted aspect-based sentiment analysis with
overall sentiment polarity of the inputs. However, it is a specific focus on leveraging common-sense knowledge
based on the assumption that the sentiment expressed in the deep neural sequential model. To explicitly model
in a sentence is unified and consistent, which does not the inference of the sentiment dependent we augment
hold in the reality. As a fine-grained alternative of the the LSTM with a stacked attention mechanism consist-
task, analyzing the sentiment towards a specific target ing of attention models for target-level sentence-level,
and aspect has drawn much attention from the commu- respectively. In order to explicitly integrate the explicit
nity for its more practical assumption that sentiment is knowledge with implicit knowledge, we propose an ex-
dependent on a particular set of aspects and entities. tension of LSTM, termed Sentic LSTM. The extended
Recently, deep neural models have achieved great LSTM cell includes a separate output gate that inter-
successes on sentiment analysis. As a functional simu- polates token-level memory and the concept-level in-
lation of the behavior of human brains and one of the put. In addition, we we propose an extension of the
most successful deep neural models for sequential data, Sentic-LSTM by creating a hybrid of the LSTM and
long short-term memory (LSTM) networks are excel- a recurrent additive network that simulates the sentic
lent in learning implicit knowledge from data. However, patterns used in our previous work. Results In this pa-
it is impossible for LSTM to acquire explicit knowledge per, we are mainly concerned with a joint task combin-
such as commonsense facts from the training data for ing the target-dependent aspect detection and targeted
accomplishing their specific tasks. On the other hand, aspect-based polarity classification. The performance of
emerging knowledge bases have brought a variety of proposed methods on this joint task is evaluated on
knowledge resources to our attention, and it has been two benchmark data sets. The experiment shows that
the combination of proposed attention architecture and
Y. Ma and H. Peng and E. Cambria
knowledge-embedded LSTM could outperform state-of-
School of Computer Science and Engineering
Nanyang Technological University the-art methods in two targeted aspect sentiment tasks.
50 Nanyang Ave, Singapore 639798 Conclusion We present a knowledge-rich solution for
E-mail: [email protected] the task of targeted aspect-based sentiment analysis.
E-mail: [email protected]
E-mail: [email protected]
Our model can effectively incorporate the commonsense
Tahir Khan knowledge into the deep neural network and be trained
Alef Education Consultancy in an end-to-end manner. We show that the two-step
Abu Dhabi, UAE attentive neural architecture as well as the proposed
E-mail: [email protected]
Amir Hussain
Sentic-LSTM and H-Sentic-LSTM can achieve an im-
University of Stirling proved performance on resolving the aspect categories
Stirling FK9 4LA, Scotland, UK and sentiment polarity for a targeted entity in its con-
E-mail: [email protected] text over state-of-the-art systems.
2 Ma et al.

1 Introduction cially, neural sequential model, such as LSTM [19], at-


tracts more and more attention for their capacity of
Sentiment analysis [1] has turned out to be a criti- representing sequential information. Although it is not
cal step for processing natural language processing, es- scientifically validated that the LSTM is cognitive plau-
pecially for social media data on online communities, sible, the functionality, especially the forget mechanism,
blogs, Wikis, microblogging platforms, and other on- of LSTM is considered simulating the functionality of
line collaborative media. As a branch of affective com- human brains. Due to use of gated functions, LSTM can
puting research [2], sentiment analysis aims to cate- effectively learn implicit knowledge from sequences by
gorize text (but sometimes also audio and video [3]) avoiding the troublesome of gradient vanishing. In con-
into either positive or negative (or neutral [4] in some trast, explicit knowledge such as commonsense knowl-
cases). Early works on sentiment analysis focus on re- edge is hard to be learned from the data where com-
solving the overall sentiment of a text unit [5, 6], as- monsense facts are not explicitly annotated.
suming that a sentence expresses a unified sentiment We, therefore, target three problems remaining un-
or polarity. In contrast, aspect-based sentiment analy- solved by current state-of-the-art methods. Firstly, given
sis (ABSA) [7–11] extends the typical setting of sen- a target might be composed of multiple instances (men-
timent analysis with a more realistic assumption that tions of the same target) or multiple words, existing
polarity is associated with specific aspects (or product research assumes all instances are of equal importance
features). Taking the sentence “The design of the space and simply computes an average vector over instances.
is good but the service is horrible” for an example, the This oversimplification conflicts with the fact that part
sentiment expressed towards the two aspects, “space” of the target expression is more tightly tied with senti-
and “service”, is the completely opposite. On the other ment than the rest. Secondly, hierarchical attention ex-
hand, targeted (or target-dependent) sentiment classi- ploited by existent methods only implicitly model the
fication (TSA) [12–14] resolves the sentiment polarity process of inferring the sentiment evidence as black-
of a given target expression in its context, assuming box. Last but not least, existing research falls short in
that sentiment is dependent on targeted entities. For effectively incorporating external sentiment knowledge.
instance, in the sentence “I just log on my [facebook]. The emergence of a variety of knowledge [20, 21] has
[Transformer] is boring”, the sentiment expressed to- facilitated many applications of text processing [22–27]
wards [Transformer] is negative, while no clear sen- and image processing [28, 29] with accessing external
timent for [facebook]. Most recently, targeted aspect- information that are not available in the limited train-
based sentiment analysis (TABSA) [15] combines the ing data. Especially, the task of TABSA might benefit
challenges of ABSA and TSA. Namely, the task requires from using the affective commonsense knowledge [30]
detection of the aspect category as well as the resolution for that the affective properties strongly correlate with
of the polarity for specific aspects of a given targeted aspects and sentiment polarity. To address these prob-
entity. Figure 1 illustrates another example of TABSA lems, we propose a two-step attention architecture with
in Sentihood dataset1 [15]. an extended LSTM cell that can better leverage the ex-
ternal knowledge. We identify our contribution as three
folds:

1. We propose a two-step attention model which ex-


plicitly attends to first the words of target expres-
sion, and then to the whole sentence. The two-step
attention model simulates the process of first locat-
ing and memorizing the targets and then searching
for related sentiment cues over the whole sentence.
Fig. 1: Example of targeted aspect-based sentiment 2. We extend the classic LSTM cell with components
analysis in Sentihood accounting for integration with external knowledge.
The extended LSTM is capable of utilizing the ex-
plicit knowledge to control the information flows
As a popular and effective solution to many NLP
from time to time as well as generating the output
task, deep learning methods [12,14,16–18] have achieved
fed into the classifier.
great successes when applied to ABSA or TSA. Espe-
3. To the best of our knowledge, we are the first to
1
The Sentihood dataset has masked all targeted entities incorporate affective commonsense knowledge into
with “Location” + INDEX. a deep neural network for modeling sequences.
Sentic LSTM: A Hybrid Network for Targeted Aspect-Based Sentiment Analysis 3

2 Related Work over the input sequence could refine the attended words
again and again to find the most important words.
In this section, we survey several related research lines: These existent approaches have either ignored the
ABSA, TSA, TABSA, and finally works on incorporat- problem of multiple target instances (or words) or sim-
ing external knowledge into deep neural models. ply used an averaging vector over target expression [14,
18]. Our method differs with existing methods by weight-
ing each target word with an attention weight so that
2.1 Aspect-based Sentiment Analysis the given target tends to be represented by its most
informative part.
ABSA is the task to classify sentiment polarity with
respect to a set of aspects. The biggest challenge faced
by ABSA is how to effectively represent the aspect- 2.3 Targeted Aspect-based Sentiment Analysis
specific sentiment information of the whole sentence.
Early works on ABSA have mainly relied on feature- Two baseline systems [15] are proposed together with
engineering to characterize the sentences [31, 32]. Moti- SentiHood dataset: a feature-based logistic regression
vated by the success of deep learning methods in rep- model and a LSTM-based model. The feature-based lo-
resentation learning, many recent works [13, 16, 17, 33] gistic regression model uses feature templates including
utilize deep neural networks, such as LSTM, to gener- n-grams tokens and POS tags extracted from the con-
ate sentence embeddings (dense vector representation text of instances. The LSTM baseline can be seen as an
of sentences) which are then fed to a classifier as a low- adaptation of the TD-LSTM [12] that simply uses the
dimensional feature vector. Moreover, the representa- hidden outputs at the position of target instances as-
tion can be enhanced by using the attention mecha- suming that all target instances are equally important.
nism [17], taking as input the word sequence and as-
pects. For each word of the sentence, the attention vec-
2.4 Incorporating External Knowledge
tor quantifies its sentiment salience as well as the rel-
evance to the given aspect. The resulting sentiment
Existent study on incorporating external knowledge into
representation benefits from attention mechanism for
deep neural networks is also closely related to this work.
it overcomes the shortcoming of recurrent neural net-
External knowledge base has been typically used as a
work (RNN) that suffers from information loss when
source of features [22, 35, 36]. Most recently, neural se-
only one single output (e.g., the output at the end of
quential models [27, 37] leverage the lower-dimensional
the sequence) is used by the classifier.
continuous representation of knowledge concepts as ad-
ditional inputs. However, these approaches have treated
the computation of neural sequential models as a black-
2.2 Targeted Sentiment Analysis box without tight integration of the knowledge and the
computational structure. Our Sentic LSTM is inspired
In comparison with the ABSA, targeted sentiment anal- by [24] which adds a knowledge recall gate to the cell
ysis aims to analyze the sentiment with regard to tar- state of LSTM. However, our method differs from [24]
geted entities in the sentence. It is thus critical for in the way of using external knowledge to generate the
targeted sentiment analysis methods, e.g., the target- hidden outputs and controlling the information flow.
dependent LSTM model (TD-LSTM) and target con-
nection LSTM model [12], to model the interaction be-
tween sentiment targets and the whole sentence. In or-
der to obtain the target-dependent sentence represen-
tation, the TD-LSTM directly uses the hidden outputs
of a bidirectional-LSTM sentence encoders panning the
target mentions, while TC-LSTM extends TD-LSTM
by concatenating each input word vector with a target
vector. Similar to ABSA, attention models are also ap-
plicable to targeted sentiment analysis. Rather than us-
ing a single level of attention, deep memory network [18] Fig. 2: Overview of the two-step attentive neural archi-
and recurrent attention model [34] have achieved supe- tecture
rior performance by learning a deep attention over the
single-level attention, for that multiple passes (or hops)
4 Ma et al.

3 Methodology built. The target-level attention takes as input the hid-


den outputs of target expression (highlighted in red)
In this section, we describe the proposed method in and encodes the salience of each word via a self-attention.
detail. We start with briefly introducing the task defi- The target-level attention model then outputs a weighted
nition of TABSA followed by an overview of the whole sum of these hidden outputs as a vector representation
neural architecture. Afterwards, we describe the two- of the given target. At the second step, we feed the
step attention model. Lastly, we describe the proposed target embedding together with aspect embeddings as
knowledge-embedded extension of the LSTM. queries to a sentence-level attention model to trans-
form the whole sentence into a vector. A softmax layer
is used for mapping the sentence vector to an output
Table 1: List of Aspects Defined in SentiHood Dataset
label (e.g., None, Neural, Negative, and Positive for a
4-class setting; or None, Negative, and Positive for a
General Shopping
Price Multicultural
3-class setting) that jointly represents its sentiment po-
Transit-location Green-nature larity and membership of an aspect.
Safety Dinning
Live Quiet
Nightlife Touristy
3.3 Long Short-Term Memory Network

The sentence is encoded using an extension of RNN [38],


3.1 Task Definition termed LSTM [19], which was firstly introduced by [19]
to solve the vanishing and exploding gradient problem
A sentence s is composed of a sequence of words. Simi- faced by the vanilla RNN. A typical LSTM cell contains
lar to [14], we consider all mentions of the same target three gates: forget gate, input gate and output gate.
as a single target. A target t composed of m words These gates determine the information to flow in and
(can be either consecutive or not) in sentence s, de- flow out at the current time step. The cell is defined as
below:
noted as T = {t1 , t2 , · · · , ti , · · · , tm } with ti referring
to the position of ith word in the target expression,
the task of TABSA can be divided into two subtasks.
Firstly, it resolves the aspect categories of t belong- fi = σ(Wf [xi , hi−1 ] + bf )
ing to a predefined set (Table 1). Secondly, it classifies Ii = σ(WI [xi , hi−1 ] + bI )
the sentiment polarity with regard to each aspect cat- C
ei = tanh(WC [xi , hi−1 ] + bC )
egory associated with t. For example, the sentence “I (1)
live in [West London] for years. I like it and it is safe to C i = f i ∗ Ci−1 + I i ∗ Cei
live in much of [west London]. Except [Brent] maybe. ” oi = σ(Wo [xi , hi−1 ] + bo )
contains two targets, [W estLondon] and [Brent]. Our hi = oi ∗ tanh(Ci )
objective is to detect the aspects and classify the senti-
ment polarity. The desired output for [W estLondon] is , where fi , Ii and oi are the forget gate, input gate
[’general ’:positive;’safety’:positive ], while output for and output gate respectively. Wf , WI , Wo , bf , bI and
[Brent] should be [’general ’:negative;’safety’:negative]. bo are the weight matrix and bias scalar for each gate.
Ci is the cell state and hi is the hidden output. A sin-
gle LSTM typically encodes the sequence from only one
3.2 Overview direction. However, two LSTMs can also be stacked to
be used as a bidirectional encoder, referred to as bidi-
In this section, we provide an overview of our proposed rectional LSTM. For a sentence s = {w1 , w2 , · · · , wL },
method. Our proposed neural architecture is comprised bidirectional LSTM produces a sequence of hidden out-
of two components: the sequence encoder and a hierar- puts,
chical attention components. Fig. 2 illustrates our neu- "−
ral architecture. Given a sentence s = {w1 , w2 , · · · , wL }, →− → →
− #
h1 h2 · · · h L
a look-up operation is first performed to convert input H = [h1 , h2 ...hL ] = ←
−← − ←

h1 h2 · · · h L
words into word embeddings {v1 , v2 , · · · , vL }, where L
is the length of the sentence. The sequence encoder , where each element of H is a concatenation of the cor-
transforms the word embeddings into a sequence of hid- responding hidden outputs of both forward and back-
den outputs, on top of which the attention model is ward LSTM cells.
Sentic LSTM: A Hybrid Network for Targeted Aspect-Based Sentiment Analysis 5

3.4 Target-level Attention Table 2: Example of AffectNet

Based on the attention mechanism, we calculate an at- AffectNet IsA-pet KindOf-food Arises-joy ...
tention vector for a target expression. A target might dog 0.981 0 0.789 ...
consist of consecutive or non-consecutive sequence of cupcake 0 0.922 0.910 ...
words, denoted as T = {t1 , t2 , · · · , tm }, where ti is the rotten fish 0 0.459 0 ...
police man 0 0 0 ...
location of an individual word in a target expression. win lottery 0 0 0.991 ...
The hidden outputs corresponding to T is denoted as
H 0 = {ht1 , ht2 , · · · , htm }. We compute the vector rep-
resentation of a target tas 3.6 Sentic LSTM
X
vt = H 0 α = αj htj (2)
j
In this paper, we consider using the affective common-
sense knowledge as our knowledge source to be embed-
, where the target attention vector α = {α1 , α2 , · · · , αm } ded into the sequence model. Affective commonsense
is distributed over target word sequence T . The atten- knowledge such as AffectNet [39] contains concepts as-
tion vector α is a self-attention vector that takes noth- sociated with a rich set of affective properties (as shown
ing but the hidden output itself as input. The attention in Table 2). These affective properties provide not only
vector α of target expression is computed by feeding the concept-level features but also semantic links to the as-
hidden output into a bi-layer perceptron, as shown in pects and their sentiment polarity. For example, the
Equation 3. concept ‘rotten fish’ has property “KindOf-food” which
α = sof tmax(Wa(2) tanh(Wa(1) H 0 )) (3) can be directly related to aspects such as ‘restaurant’ or
(1) (2) ‘food quality’, and properties such as ‘Arises-joy’ could
, where Wa ∈ Rdm ×dh and Wa∈ R1×dm are param- contribute positively to the classification of sentiment
eters of the attention component. polarity.

3.5 Sentence-level Attention Model

Following the target-level attention, our model learns


a target-and-aspect-specific sentence attention over all
the words of a sentence. Given a sentence s of length L,
the hidden outputs are denoted as H = [h1 , h2 , · · · , hL ].
An attention model computes a linear combination of
the hidden vectors into a single vector, i.e.,
X
a
vs,t = Hβ = βi hi (4)
i

, where the vector β = [β1 , β2 , · · · , βL ] is called sentence-


level attention vector. Each element βi encodes the salience
of the word wi in the sentence s with regard to the
aspect a and target T . Existing research on target or
aspect-based sentiment analysis mostly uses targets or Fig. 3: Visualization of AffectiveSpace
aspect terms as queries.
At first, each hi is transformed to a dm dimensional
vector by a multi-layer neural network with a tanh ac- However, AffectNet is of high dimensions which hin-
tivating function, followed by a dense softmax layer to ders it from being used in deep neural models. Affec-
generate a probability distribution over the words in tiveSpace [30] has been built to map the concepts of Af-
sentence s, i.e., fectNet to continuous low-dimensional embeddings (as
shown in Figure 3) without losing the semantic and af-
βa = sof tmax(vaT tanh(Wm (H 0 vt ))) (5)
fective relatedness in the original space. Based on this
, where va is the aspect embedding of aspect a, H vt new space of concepts, we embedded the concept-level
(1)
is the operation concatenating vt to each hi ; Wm ∈ information into deep neural sequential models to bet-
Rdm ×dh is the matrix mapping row vectors of H to a ter classify the aspects and sentiment of sentences.
(2) 1×dm
dm dimensional space, and Wm ∈ R maps each In order to leverage the commonsense knowledge
new row vector to a unnormalized attention weight. with efficacy, we propose an extension of LSTM, termed
6 Ma et al.

Sentic LSTM. It is reasonable to assume that the knowl- gate oci to output concept-level knowledge complemen-
edge concepts contains information complementary to tary to the token level memory. Since AffectiveSpace
the textual word sequence, especially when the knowl- is learned independently, we leverage a transformation
edge base in use is designed to include abstract con- matrix Wc ∈ Rdh ×dµ to map AffectiveSpace to the same
cepts. Our Sentic LSTM aims to entitle the concepts space as the memory outputs. In other words, oci mod-
with two important roles: 1) assisting with the filter- els the relative contribution of token-level and concept-
ing of information flowing from one time step to the level. Moreover, we notice that oci ∗tanh(Wc µi ) actually
next and 2) providing complementary information to resembles the functionality of the sentinel vector used
the memory cell. At each time step i, we assume that by [27] that allows the model to choose whether to use
a set of knowledge concept candidates can be triggered the affective knowledge or not.
and mapped to a dc dimensional space. We denote the
set of K concepts as {µi,1 , µi,2 , · · · , µi,K }. First, we
combine the candidate embeddings into a single vector 3.7 Hybrid Sentic-LSTM
as in Equation 6.
Inspired by the work [40], we propose a simplified ver-
1 X sion of Sentic-LSTM that is a hybrid of LSTM and
µi = µi,j (6)
K j recurrent additive network. This variant of the Sentic-
LSTM could involve the concept-level input to the re-
In this paper, as we find that there exists only up to current connection and maintain a reduced number of
4 extracted concepts for each time step, we simply use parameters as compared with the sentic LSTM. An-
the average vector, although a more sophisticated at- other intuition is that the additive operation on each
tention model can also be easily employed to replace concept embedding over time steps simulates inference
the averaging function. process done by the rule-based sentic patterns [41] that
were used in the previous system. A mathematical de-
scription of the hybrid Sentic-LSTM is give in Equa-
fi = σ(Wf [xi , hi−1 , µi ] + bf ) tion 8.
Ii = σ(WI [xi , hi−1 , µi ] + bI )
C
ei = tanh(WC [xi , hi−1 ] + bC )
Mi = Wc µi
Ci = fi ∗ Ci−1 + Ii ∗ C ei (7) fi = σ(Wf [xi , hi−1 , µi ] + bf )
oi = σ(Wo [xi , hi−1 , µi ] + bo ) Ii = σ(WI [xi , hi−1 , µi ] + bI )
oci = σ(Wco [xi , hi−1 , µi ] + bco ) Ci = tanh(WC [xi , hi−1 ] + bC )
e (8)
hi = oi ∗ tanh(Ci ) + oci ∗ tanh(Wc µi ) oci= σ(Wco [xi , hi−1 , µi ] + bco )
Our extension of LSTM is illustrated in Equation 7. Ci = fi ∗ Ci−1 + Ii ∗ C ei + oci ∗ Mi
At first, we assume that the affective concepts are mean- hi = Ci
ingful cues to control the information of token-level in-
formation. For example, a multi-word concept ‘rotten Similar to recurrent additive network, we could rewrite
fish’ might indicate the word ‘rotten’ is a sentiment- the hidden output at time step i as Equation 9.
related modifier to its next word ‘fish’ and hence less
information should be filtered out at next time step.
We thus add knowledge concepts to the forget, input, Ciµ = fi ∗ (Ci−1 − Ciµ ) + Ii ∗ C
ei + fi ∗ C µ + oci ∗ Mi
i
and output gate of standard LSTM to help filtering i
X
the information. The presence of affective concepts in ei + fi ∗ C µ + oc ∗ M =
Ii ∗ C wji ∗ Mj
i i
the input gate is expected to prevent the memory cell j=0
from affected by input tokens conflicted with the pre- (9)
existed knowledge. Similarly, the output gate uses the
knowledge to filter out irrelevant information stored wji is a product of the input and forget gates. We could
in the memory. Another important feature of our ex- see that the hidden output at time step i is a hybrid of
tension of the LSTM is based on the assumption that a simplified LSTM, whose gates are coupled with both
the information from the concept-level output is com- word-level and concept-level input, and a recurrent ad-
plementary to the token level. Therefore, we extended ditive component accumulating the information from
the regular LSTM with an additional knowledge output the concept-level input over previous time steps.
Sentic LSTM: A Hybrid Network for Targeted Aspect-Based Sentiment Analysis 7

3.8 Prediction and Parameter Learning Table 3: SentiHood Dataset

The objective to train our classier is defined as mini- Train Dev Test
mizing the sum of the cross-entropy losses of prediction Targets 3,806 955 1,898
on each target-aspect pair, i.e., Targets w/ Sentiment 2,476 619 1,241
Aspects per Target(w/ Sentiment) 1.37 1.37 1.35
1 XXX
Ls = log pac,t
|D| t∈s
s∈D a∈A
candidates at each time step, and use pre-trained 100-
, where A is the set of predefined aspects, and pac,t is dimensional AffectiveSpace embedding3 as the concept
the probability of the golden-truth polarity class c given embeddings. In the case that no concepts are extracted,
target t with respect to a sentiment category a, which a zero vector is used as the concept input.
is defined by a softmax function,

pac,t = sof tmax(W p vs,t


a
+ bas ) 4.2 Experiment Setting

, where W p and bas are the parameters to map the vec- We evaluate our method on two sub-tasks of the target-
tor representation of target t to the polarity label of specific aspect-based sentiment analysis: 1) aspect cate-
aspect a. To avoid overfitting, we add a dropout layer gorization and 2) aspect-based sentiment classification.
with dropout probability of 0.5 after the embedding Following Saeidi et al. [15], we treat the outputs of
layer. We stop the training process of our model after aspect-based classification as hierarchical classes. For
10 epochs and select the model that achieves the best aspect categorization, we output the label with the high-
performance on the development set. est probability for each aspect. The labels are, for ex-
ample in the 3-class setting, ‘Positive’, ‘Negative’, and
‘None’, where ‘None’ means the aspect should not be
4 Experiment bonded to the given target. For aspect-based sentiment
classification, we only look at the probability of ‘Posi-
4.1 Dataset and Resources tive’ and ‘Negative’, while ignoring the scores of ‘None’4 .
For evaluating the aspect-based sentiment classifica-
We evaluate our method on two datasets: SentiHood tion, we calculate the accuracy averaged over aspects.
dataset [15] and a subset of SemEval 2015 [42]. The We evaluate the aspect categorization as a multi-label
Sentihood dataset was built by querying Yahoo! An- classification problem, and the results, therefore, are
swers with location names of London city. Table 3 shows averaged over targets instead of aspects. We evaluate
statistics of SentiHood datasets. The whole dataset is our methods and baseline systems using both loose and
split into train, test, and development set by the au- strict metrics. We report scores of three widely used
thor. Overall, the entire dataset contains 5,215 sen- evaluation metrics of multi-label classifier: Macro-F1,
tences, with 3,862 sentences containing a single target Micro-F1, and strict Accuracy (Strict Acc.).
and 1,353 sentences containing multiple targets. It also Given the dataset D, the ground truth aspect cat-
shows that there are approximately 2/3 of targets anno- egories of the target t ∈ D is denoted as Yt , while the
tated with aspect-based sentiment polarity (train set: predicted aspect categories denoted as Ybt . The three
2476 out of 2977; test set:1241 out of 1898; development metrics can be computed as
set: 619 out of 955). On average, each sentiment-bearing
1
P
target has been annotated with 1.37 aspects. To show – Strict accuracy (Strict Acc.): D t∈D σ(Yt = Yt ),
b
the generalizability of our methods, we build a subset where σ(·) is an indicator function.
of the dataset used by SemEval-2015. We remove sen- – Macro-F1 = 2 Ma-P×Ma-R
Ma-P+Ma-R , which is based on Macro-
tences containing no targets as well as N U LL targets. Precision (Ma-P) and Micro-Recall (Ma-R) with Ma-
1
P |Yt ∩Ybt | 1
P |Yt ∩Ybt |
To be comparable with Sentihood, we combine targets P = |D| t∈D Yt , and Ma-R= |D| t∈D Yt .
with the same surface form within the same sentence Mi-P×Mi-R
– Micro-F1 = 2 Mi-P+Mi-R , which is based on Micro-
as mentions of the same target. In total, we have 1,197 Precision (Mi-P) and Micro-Recall (Mi-R), where
targets left in the training set and 542 targets left in the
P P
t∈D |Yt ∩Ybt |
t∈D |Yt ∩Ybt |
Mi-P= P , and Mi-R= P
Yt .
testing set. On average, each target has 1.06 aspects. t∈D Y
bt t∈D

To inject the commonsense knowledge, we use a


syntax-based concept parser2 to extract a set of concept 3 http://sentic.net/downloads
4 On SemEval 2015, we use ‘Negative’,‘Positive’, ‘Neutral’
2 http://github.com/senticnet/concept-parser and ‘None’.
8 Ma et al.

Table 4: Performance of systems on Sentihood Dataset

Aspect Categorization Sentiment


Strict Acc. (%) Macro F1 (%) Micro F1 (%) Sentiment Acc. (%)
dev test dev test dev test dev test
TDLSTM 50.27 50.83 59.03 58.17 55.72 55.78 82.60 81.82
LSTM + TA 54.17 52.02 62.90 61.07 60.56 59.02 83.80 84.29
LSTM + TA + SA 68.83 66.42 79.36 76.69 79.14 76.64 86.00 86.75
LSTM + TA + DMN SA 60.66 60.14 68.89 70.19 67.28 68.37 84.80 83.36
LSTM + TA + SA + KB Feat 69.38 64.76 80.00 76.33 79.79 76.08 87.00 88.70
LSTM + TA + SA + KBA 68.08 65.12 78.68 76.40 78.73 76.46 87.40 87.98
Recall LSTM + TA + SA 68.64 64.66 78.44 75.61 78.53 75.91 86.80 86.85
Sentic-LSTM + TA + SA 69.20 67.43 78.84 78.18 79.09 77.66 88.80 89.32
H-Sentic-LSTM + TA + SA 69.20 67.52 78.66 78.10 78.77 77.87 87.00 87.78

4.3 Performance Comparison – Sentic-LSTM + TA + SA: The encoder is re-


placed with the proposed knowledge-embedded LSTM.
We compare our proposed method with the methods – H-Sentic-LSTM + TA + SA: The hybrid Sentic-
that have been proposed for TABSA as well as those LSTM that has a simplified recurrent additive com-
proposed for ABSA or TSA but applicable to TABSA. ponent accounting for the concept-level input.
Furthermore, we also compare the performances of sev-
eral variants of our proposed method in order to high- The word embedding of input layer is initialized
light our technical contribution. We run the model for by pre-trained skip-gram model [43] with 150 hidden
multiple times and report the results that perform the units on a combination of Yelp5 and Amazon review
best in development set. For SemEval-2015 dataset, we dataset [44], and we use 50 hidden units for the bi-
report the results of the final epoch. directional LSTM.
– TD-LSTM: This method is the adaptation of TD-
LSTM [12, 15]. It adopts Bi-LSTM to encode the
4.4 Results of Attention Model
sequential structure of a sentence and represents a
given target using a vector averaged on the hidden
Table 4 and Table 5 show the performance on the Sen-
outputs of target instances.
tihood dataset and SemEval-15 dataset respectively.
– Bi-LSTM + TA: Our method learns an instance
In comparison with the non-attention baseline (TD-
attention on top of the outputs of LSTM to model
LSTM), we can find that our best attention-based model
the contribution of each instance.
significantly improves the aspect detection by more than
– Bi-LSTM + TA + SA: In addition to target in-
35% and sentiment classification by approximately 10%
stance attention, we add to the model a sentence-
on the Sentihood dataset. It also achieves similar im-
level attention.
provement of sentiment classification on Semeval Dataset.
– Bi-LSTM + TA + DMN SA : The sentence-
However, it is notable that, on the SemEval-2015 dataset,
level attention is replaced by a dynamic memory
the improvement of aspect detection is relatively smaller.
network with multiple hops [18]. We run the mem-
We conjecture the reason is that Sentihood dataset has
ory network with different numbers of hops and re-
masked the target as a special word “LOCATION”,
port the result with 4 hops which produces the best
which proved less informative than the full name of
performance on development set of Sentihood. We
aspect targets that are used by SemEval-2015. Hence,
exclude the case of zero hops which correspond to
using only the hidden outputs regarding the target does
Bi-LSTM + TA + SA.
not suffice to represent the sentiment of the whole sen-
– LSTM + TA + SA + KB Feat: Concepts are
tence in Sentihood dataset. Although not significant,
fed into the input layer as additional features.
the target-level attention performs better than target
– LSTM + TA + SA + KBA: This is an integra-
averaging model (i.e., TD-LSTM) for that the target
tion of the method proposed by [27], which learns
attention is capable of identifying the part of target ex-
an attention over the concept embeddings. The con-
pressions with higher sentiment salience. On the other
cept embedding is combined with the hidden output
hand, it is notable that the two-step attention achieve
before being fed into the classifier.
significant improvements on both aspect categorization
– Recall-LSTM + TA + SA: LSTM is extended
with a recall knowledge gate as in [24]. 5 http://yelp.com.sg/dataset/challenge
Sentic LSTM: A Hybrid Network for Targeted Aspect-Based Sentiment Analysis 9

Table 5: Performance of systems on SemEval-2015 Dataset

Aspect Categorization Sentiment


Strict Acc. Macro F1 Micro F1 Sentiment Acc.
TDLSTM 65.49 70.56 69.00 68.57
LSTM+TA 66.42 71.71 70.06 69.24
LSTM+TA+SA 63.46 70.73 66.18 74.28
LSTM+TA+DMN SA 48.33 52.73 51.39 69.07
LSTM+TA+SA+KB Feat 65.68 74.46 70.71 76.13
LSTM+TA+SA+KBA 67.34 74.36 71.78 73.10
Recall-LSTM + TA + SA 66.05 72.90 69.66 74.11
Sentic-LSTM + TA + SA 67.34 76.44 73.82 76.47
H-Sentic-LSTM + TA + SA 69.19 77.55 75.00 74.11

and sentiment classification, indicating that the target ure ??, when there are multiple target instances (or
and aspect dependent sentence attention could retrieve words), the target-level attention is capable of selecting
information relevant to both tasks. the first target instance which is tied with the clearer
To our surprise, using multiple hops in the sentence- sentiment.
level attention fails to produce any improvement. The
performance even falls down significantly on SemEval-
2015 dataset with a much smaller number of training 4.6 The Result of Knowledge-embedded LSTM
instances but larger aspect set than Sentihood. We con-
jecture the reason is that using multi-hops increases the
number of parameter to learn, making it less applicable
to small and sparse dataset such as SemEval 2015.

Fig. 5: Example of target-level attention

It can be seen from Table 4 and 5 that injecting the


knowledge into the model improves the performance in
general. Since the affective space used in our experiment
contains the information of affective properties that are
semantically related to the aspects and sentiment polar-
Fig. 4: Example of sentence-level attention
ity, it is reasonable to find that it improves performance
on both tasks. The results also show that our pro-
posed Sentic LSTM outperforms baseline knowledge-
rich methods, even though not very significantly. Com-
4.5 Visualization of Attention paring Sentic LSTM with KB feats that uses extracted
concepts as features, we can find that Sentic LSTM
We visualize the attention vectors of sentence-level at- improves more on the aspect categorization part, in-
tention in Figure 4 with regard to “Transition-location” dicating that advantage of using a knowledge output
and “Price” aspects. The two attention vectors have gate to choose between commonsense knowledge and in-
encoded quite different concerns in the word sequence. ner memory. The superior performance of Sentic LSTM
In the first example, the ‘Transition-location’ attention over Recall-LSTM and KBA indicates that the trig-
attends to the words “long” which is expressing a neg- gered knowledge concepts can also help to filter the
ative sentiment towards the target. In comparison, the information that conflicts with the commonsense.
‘Price’ attention attends the more to the word ‘cheap’ As compared to the standard Sentic-LSTM, the hy-
that is related to the aspect. That is to say, the two brid of Sentic-LSTM and recurrent additive neural net-
attention vectors are capable of distinguishing informa- work (i.e., H-Sentic-LSTM) achieves comparable per-
tion related to different aspects. As visualized in Fig- formance with smaller number of parameters. We notice
10 Ma et al.

Table 6: Comparison of systems using AffectiveSpace and SentiWordNet (SH stands for Sentihood, and SE stands
for SemEval-15)

Aspect Categorization Sentiment


Strict Acc. (%) Macro F1 (%) Micro F1 (%) Sentiment Acc. (%)
SH SE SH SE SH SE SH SE
AffectiveSpace 67.71 70.47 78.05 77.90 78.19 75.12 89.63 78.31
SentiWordNet 60.23 66.42 70.03 73.60 69.48 71.18 85.01 75.79

that using the recurrent additive operation on concept- 5 Conclusion


level inputs improves the performance of aspect cate-
gorization on SemEval data set, indicating that the ad- In this paper, we propose a neural architecture together
ditive operation help accumulating the evidence from a with two extension of the standard LSTM for the task
broader context, and the generality of the model could of targeted aspect-based sentiment analysis. We explic-
possibly benefit more from the reduced number of pa- itly model the process of inferring the sentiment as-
rameters given that size of training data is much smaller pects and polarity as a two-step attention model that
than Sentihood data. encodes the target and full sentence in sequence. The
On the other hand, the performance of H-Sentic- target-level attention attends to the sentiment-salient
LSTM on resolving the polarity of aspect-target pairs part of a target expression and generates a more ac-
are slightly under-performing Sentic-LSTM. Since the curate representation of the target, while the aspect-
additive operation is only able to introduce non-negative target-dependent sentence-level attention searches for
contribution from previous step, we conjecture the de- the target and aspect dependent evidence over the full
crease of performance might be a result of the additive sentence. Moreover, we validate the efficacy of the pro-
operation does not suffice to simulate complicated sen- posed extensions of the LSTM cell as well as the benefit
timent patterns such as negation. of using affective commonsense knowledge for the task
of targeted aspect-based sentiment analysis. In the fu-
ture, we would like to take into account the relations
4.7 AffectiveSpace versus SentiWordNet between concepts when performing the task.

At last, we compare systems using different sentiment


knowledge bases. SentiWordNet is a lexicon-based sen- Compliance with Ethical Standards
timent knowledge base consisting of word sense synsets
annotated with sentiment polarity. However, it is no- Conflict of Interest The authors declare that they
table that SentiWordNet contains neither commonsense have no conflict of interest.
concepts nor affective properties, which are the key fea-
tures of AffectiveSpace. Consequently, we have to use Informed Consent Informed consent was not required
randomly initialized embeddings for representing the as no human or animals were involved.
SentiWordNet synsets. Word synsets are mapped to
the same 100 dimension embeddings as AffectiveSpace Human and Animal Rights This article does not
does. Each word in the sentence is mapped to its word contain any studies with human or animal subjects per-
sense with the help of a Word Sense Disambiguation formed by any of the authors.
tool. We deliberately remove the neutral synsets (i.e.,
those having zero values for both Positive and Negative)
to emphasize on the sentiment-bearing words. Table 6 References
shows the comparison of the two knowledge base. We
report the results using our overall best-performed sys- 1. Erik Cambria, Dipankar Das, Sivaji Bandyopadhyay, and
tem (Sentic LSTM + TA + SA). It shows that using Antonio Feraco. A Practical Guide to Sentiment Analysis.
Springer, Cham, Switzerland, 2017.
the AffectiveSpace achieves superior performance than 2. Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir
using SentiWordNet. We conjecture the reason is that Hussain. A review of affective computing: From unimodal
the word sense synsets is not as informative as affec- analysis to multimodal fusion. Information Fusion, 37:98–
tive properties. Moreover, probably because the link be- 125, 2017.
3. Soujanya Poria, Erik Cambria, Devamanyu Hazarika,
tween word senses and aspects are not straightforward,
Navonil Mazumder, Amir Zadeh, and Louis-Philippe
we find the gap in sentiment classification is smaller Morency. Context-dependent sentiment analysis in user-
than aspect categorization. generated videos. In ACL, pages 873–883, 2017.
Sentic LSTM: A Hybrid Network for Targeted Aspect-Based Sentiment Analysis 11

4. Iti Chaturvedi, Edoardo Ragusa, Paolo Gastaldo, pages 1546–1556, Osaka, Japan, December 2016. The
Rodolfo Zunino, and Erik Cambria. Bayesian network COLING 2016 Organizing Committee.
based extreme learning machine for subjectivity detec- 16. Thien Hai Nguyen and Kiyoaki Shirai. Phrasernn: Phrase
tion. Journal of The Franklin Institute, 2017. recursive neural network for aspect-based sentiment anal-
5. Sanjiv R Das and Mike Y Chen. Yahoo! for amazon: ysis. In Proceedings of the 2015 Conference on Empirical
Sentiment extraction from small talk on the web. Man- Methods in Natural Language Processing, pages 2509–2514,
agement science, 53(9):1375–1388, 2007. Lisbon, Portugal, September 2015. Association for Com-
6. Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, and putational Linguistics.
Toshikazu Fukushima. Mining product reputations on 17. Yequan Wang, Minlie Huang, xiaoyan zhu, and Li Zhao.
the web. In Proceedings of the Eighth ACM SIGKDD In- Attention-based lstm for aspect-level sentiment classifi-
ternational Conference on Knowledge Discovery and Data cation. In Proceedings of the 2016 Conference on Empirical
Mining, pages 341–349, New York, NY, USA, 2002. ACM. Methods in Natural Language Processing, pages 606–615,
7. Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Har- Austin, Texas, November 2016. Association for Compu-
ris Papageorgiou, Ion Androutsopoulos, and Suresh Man- tational Linguistics.
andhar. Semeval-2014 task 4: Aspect based sentiment 18. Duyu Tang, Bing Qin, and Ting Liu. Aspect level sen-
analysis. In Proceedings of the 8th International Work- timent classification with deep memory network. In
shop on Semantic Evaluation (SemEval 2014), pages 27–35, Proceedings of the 2016 Conference on Empirical Methods
Dublin, Ireland, August 2014. Association for Computa- in Natural Language Processing, pages 214–224, Austin,
tional Linguistics and Dublin City University. Texas, November 2016. Association for Computational
8. Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Linguistics.
Androutsopoulos, Suresh Manandhar, Mohammad AL- 19. Sepp Hochreiter and Jürgen Schmidhuber. Long short-
Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, term memory. Neural computation, 9(8):1735–1780, 1997.
Orphee De Clercq, Veronique Hoste, Marianna Apidi- 20. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas-
anaki, Xavier Tannier, Natalia Loukachevitch, Evgeniy tiani. Sentiwordnet 3.0: An enhanced lexical resource for
Kotelnikov, Núria Bel, Salud Marı́a Jiménez-Zafra, and sentiment analysis and opinion mining. In LREC, vol-
Gülşen Eryiğit. Semeval-2016 task 5: Aspect based sen- ume 10, pages 2200–2204, Valletta, Malta, 2010. Euro-
timent analysis. In Proceedings of the 10th International pean Language Resources Association (ELRA).
Workshop on Semantic Evaluation (SemEval-2016), pages
21. Erik Cambria, Soujanya Poria, Rajiv Bajpai, and Bjoern
19–30, San Diego, California, June 2016. Association for Schuller. Senticnet 4: A semantic resource for sentiment
Computational Linguistics. analysis based on conceptual primitives. In Proceedings of
9. Soujanya Poria, Erik Cambria, and Alexander Gelbukh.
COLING 2016, the 26th International Conference on Com-
Aspect extraction for opinion mining with a deep con-
putational Linguistics: Technical Papers, pages 2666–2677,
volutional neural network. Knowledge-Based Systems,
Osaka, Japan, December 2016. The COLING 2016 Orga-
108:42–49, 2016.
nizing Committee.
10. Yunqing Xia, Erik Cambria, and Amir Hussain. Aspnet:
22. Lev Ratinov and Dan Roth. Design challenges and mis-
Aspect extraction by bootstrapping generalization and
conceptions in named entity recognition. In Proceedings of
propagation using an aspect network. Cognitive Compu-
the Thirteenth Conference on Computational Natural Lan-
tation, 7(2):241–253, 2015.
guage Learning, pages 147–155. Association for Computa-
11. Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Feder-
ica Bisio. Sentic LDA: Improving on LDA with semantic tional Linguistics, 2009.
similarity for aspect-based sentiment analysis. In IJCNN, 23. Yukun Ma, Jung-jae Kim, Benjamin Bigot, and
pages 4465–4473, 2016. Tahir Muhammad Khan. Feature-enriched word em-
12. Duyu Tang, Bing Qin, Xiaocheng Feng, and Ting Liu. beddings for named entity recognition in open-domain
Effective lstms for target-dependent sentiment classifica- conversations. In Acoustics, Speech and Signal Processing
tion. In Proceedings of COLING 2016, the 26th Interna- (ICASSP), 2016 IEEE International Conference on, pages
tional Conference on Computational Linguistics: Technical 6055–6059. IEEE, 2016.
Papers, pages 3298–3307, Osaka, Japan, December 2016. 24. Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun,
The COLING 2016 Organizing Committee. and Xiaolong Wang. Incorporating loose-structured
13. Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming knowledge into lstm with recall gate for conversation
Zhou, and Ke Xu. Adaptive recursive neural network for modeling. arXiv preprint arXiv:1605.05110, 2016.
target-dependent twitter sentiment classification. In Pro- 25. Yang Li, Quan Pan, Tao Yang, Suhang Wang, Jiliang
ceedings of the 52nd Annual Meeting of the Association for Tang, and Erik Cambria. Learning word representations
Computational Linguistics (Volume 2: Short Papers), pages for sentiment analysis. Cognitive Computation, pages 1–9,
49–54, Baltimore, Maryland, June 2014. Association for 2017.
Computational Linguistics. 26. Nir Ofek, Soujanya Poria, Lior Rokach, Erik Cambria,
14. Bo Wang, Maria Liakata, Arkaitz Zubiaga, and Rob Amir Hussain, and Asaf Shabtai. Unsupervised com-
Procter. Tdparse: Multi-target-specific sentiment recog- monsense knowledge enrichment for domain-specific sen-
nition on twitter. In Proceedings of the 15th Conference of timent analysis. Cognitive Computation, 8(3):467–477,
the European Chapter of the Association for Computational 2016.
Linguistics: Volume 1, Long Papers, pages 483–493, Va- 27. Bishan Yang and Tom Mitchell. Leveraging knowledge
lencia, Spain, April 2017. Association for Computational bases in lstms for improving machine reading. In Pro-
Linguistics. ceedings of the 55th Annual Meeting of the Association for
15. Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, and Computational Linguistics (Volume 1: Long Papers), pages
Sebastian Riedel. Sentihood: Targeted aspect based sen- 1436–1446, Vancouver, Canada, July 2017. Association
timent analysis dataset for urban neighbourhoods. In for Computational Linguistics.
Proceedings of COLING 2016, the 26th International Con- 28. Erik Cambria and Amir Hussain. Sentic album: content-,
ference on Computational Linguistics: Technical Papers, concept-, and context-based online personal photo man-
12 Ma et al.

agement system. Cognitive Computation, 4(4):477–496, neural information processing systems, pages 3111–3119,
2012. 2013.
29. Qiu-Feng Wang, Erik Cambria, Cheng-Lin Liu, and Amir 44. Ruining He and Julian McAuley. Ups and downs: Mod-
Hussain. Common sense knowledge for handwritten chi- eling the visual evolution of fashion trends with one-class
nese text recognition. Cognitive Computation, 5(2):234– collaborative filtering. In Proceedings of the 25th Interna-
242, 2013. tional Conference on World Wide Web, pages 507–517. In-
30. Erik Cambria, Jie Fu, Federica Bisio, and Soujanya Poria. ternational World Wide Web Conferences Steering Com-
Affectivespace 2: Enabling affective intuition for concept- mittee, 2016.
level sentiment analysis. In AAAI, pages 508–514, 2015.
31. Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab
Barman, Dasha Bogdanova, Jennifer Foster, and Lamia
Tounsi. Dcu: Aspect-based polarity classification for se-
meval task 4. In Proceedings of the 8th International Work-
shop on Semantic Evaluation (SemEval 2014), pages 223–
229, Dublin, Ireland, August 2014. Association for Com-
putational Linguistics and Dublin City University.
32. Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and
Saif Mohammad. Nrc-canada-2014: Detecting aspects
and sentiment in customer reviews. In Proceedings of
the 8th International Workshop on Semantic Evaluation
(SemEval 2014), pages 437–442, Dublin, Ireland, Au-
gust 2014. Association for Computational Linguistics and
Dublin City University.
33. Himabindu Lakkaraju, Richard Socher, and Chris Man-
ning. Aspect specific sentiment analysis using hierarchi-
cal deep learning. In NIPS Workshop on Deep Learning and
Representation Learning. Curran Associates, Inc., 2014.
34. Peng Chen, Zhongqian Sun, Lidong Bing, and Wei Yang.
Recurrent attention network on memory for aspect sen-
timent analysis. In Proceedings of the 2017 Conference on
Empirical Methods in Natural Language Processing, pages
463–472, Copenhagen, Denmark, September 2017. Asso-
ciation for Computational Linguistics.
35. Altaf Rahman and Vincent Ng. Coreference resolution
with world knowledge. In Proceedings of the 49th Annual
Meeting of the Association for Computational Linguistics:
Human Language Technologies-Volume 1, pages 814–824.
Association for Computational Linguistics, 2011.
36. Ndapandula Nakashole and Tom M Mitchell. A
knowledge-intensive model for prepositional phrase at-
tachment. In ACL (1), pages 365–375, 2015.
37. Sungjin Ahn, Heeyoul Choi, Tanel Pärnamaa, and
Yoshua Bengio. A neural knowledge language model.
arXiv preprint arXiv:1608.00318, 2016.
38. Mike Schuster and Kuldip K Paliwal. Bidirectional recur-
rent neural networks. IEEE Transactions on Signal Pro-
cessing, 45(11):2673–2681, 1997.
39. Luca Oneto, Federica Bisio, Erik Cambria, and Davide
Anguita. Semi-supervised learning for affective common-
sense reasoning. Cognitive Computation, 9(1):18–42, 2017.
40. Kenton Lee, Omer Levy, and Luke Zettlemoyer. Recur-
rent additive networks. arXiv preprint arXiv:1705.07393,
2017.
41. Soujanya Poria, Erik Cambria, Gregoire Winterstein, and
Guang-Bin Huang. Sentic patterns: Dependency-based
rules for concept-level sentiment analysis. Knowledge-
Based Systems, 69:45–63, 2014.
42. Maria Pontiki, Dimitris Galanis, Haris Papageorgiou,
Suresh Manandhar, and Ion Androutsopoulos. Semeval-
2015 task 12: Aspect based sentiment analysis. In Pro-
ceedings of the 9th International Workshop on Semantic
Evaluation (SemEval 2015), pages 486–495, Denver, Col-
orado, June 2015. Association for Computational Lin-
guistics.
43. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
rado, and Jeff Dean. Distributed representations of words
and phrases and their compositionality. In Advances in

You might also like