Sentic LSTM
Sentic LSTM
2 Related Work over the input sequence could refine the attended words
again and again to find the most important words.
In this section, we survey several related research lines: These existent approaches have either ignored the
ABSA, TSA, TABSA, and finally works on incorporat- problem of multiple target instances (or words) or sim-
ing external knowledge into deep neural models. ply used an averaging vector over target expression [14,
18]. Our method differs with existing methods by weight-
ing each target word with an attention weight so that
2.1 Aspect-based Sentiment Analysis the given target tends to be represented by its most
informative part.
ABSA is the task to classify sentiment polarity with
respect to a set of aspects. The biggest challenge faced
by ABSA is how to effectively represent the aspect- 2.3 Targeted Aspect-based Sentiment Analysis
specific sentiment information of the whole sentence.
Early works on ABSA have mainly relied on feature- Two baseline systems [15] are proposed together with
engineering to characterize the sentences [31, 32]. Moti- SentiHood dataset: a feature-based logistic regression
vated by the success of deep learning methods in rep- model and a LSTM-based model. The feature-based lo-
resentation learning, many recent works [13, 16, 17, 33] gistic regression model uses feature templates including
utilize deep neural networks, such as LSTM, to gener- n-grams tokens and POS tags extracted from the con-
ate sentence embeddings (dense vector representation text of instances. The LSTM baseline can be seen as an
of sentences) which are then fed to a classifier as a low- adaptation of the TD-LSTM [12] that simply uses the
dimensional feature vector. Moreover, the representa- hidden outputs at the position of target instances as-
tion can be enhanced by using the attention mecha- suming that all target instances are equally important.
nism [17], taking as input the word sequence and as-
pects. For each word of the sentence, the attention vec-
2.4 Incorporating External Knowledge
tor quantifies its sentiment salience as well as the rel-
evance to the given aspect. The resulting sentiment
Existent study on incorporating external knowledge into
representation benefits from attention mechanism for
deep neural networks is also closely related to this work.
it overcomes the shortcoming of recurrent neural net-
External knowledge base has been typically used as a
work (RNN) that suffers from information loss when
source of features [22, 35, 36]. Most recently, neural se-
only one single output (e.g., the output at the end of
quential models [27, 37] leverage the lower-dimensional
the sequence) is used by the classifier.
continuous representation of knowledge concepts as ad-
ditional inputs. However, these approaches have treated
the computation of neural sequential models as a black-
2.2 Targeted Sentiment Analysis box without tight integration of the knowledge and the
computational structure. Our Sentic LSTM is inspired
In comparison with the ABSA, targeted sentiment anal- by [24] which adds a knowledge recall gate to the cell
ysis aims to analyze the sentiment with regard to tar- state of LSTM. However, our method differs from [24]
geted entities in the sentence. It is thus critical for in the way of using external knowledge to generate the
targeted sentiment analysis methods, e.g., the target- hidden outputs and controlling the information flow.
dependent LSTM model (TD-LSTM) and target con-
nection LSTM model [12], to model the interaction be-
tween sentiment targets and the whole sentence. In or-
der to obtain the target-dependent sentence represen-
tation, the TD-LSTM directly uses the hidden outputs
of a bidirectional-LSTM sentence encoders panning the
target mentions, while TC-LSTM extends TD-LSTM
by concatenating each input word vector with a target
vector. Similar to ABSA, attention models are also ap-
plicable to targeted sentiment analysis. Rather than us-
ing a single level of attention, deep memory network [18] Fig. 2: Overview of the two-step attentive neural archi-
and recurrent attention model [34] have achieved supe- tecture
rior performance by learning a deep attention over the
single-level attention, for that multiple passes (or hops)
4 Ma et al.
Based on the attention mechanism, we calculate an at- AffectNet IsA-pet KindOf-food Arises-joy ...
tention vector for a target expression. A target might dog 0.981 0 0.789 ...
consist of consecutive or non-consecutive sequence of cupcake 0 0.922 0.910 ...
words, denoted as T = {t1 , t2 , · · · , tm }, where ti is the rotten fish 0 0.459 0 ...
police man 0 0 0 ...
location of an individual word in a target expression. win lottery 0 0 0.991 ...
The hidden outputs corresponding to T is denoted as
H 0 = {ht1 , ht2 , · · · , htm }. We compute the vector rep-
resentation of a target tas 3.6 Sentic LSTM
X
vt = H 0 α = αj htj (2)
j
In this paper, we consider using the affective common-
sense knowledge as our knowledge source to be embed-
, where the target attention vector α = {α1 , α2 , · · · , αm } ded into the sequence model. Affective commonsense
is distributed over target word sequence T . The atten- knowledge such as AffectNet [39] contains concepts as-
tion vector α is a self-attention vector that takes noth- sociated with a rich set of affective properties (as shown
ing but the hidden output itself as input. The attention in Table 2). These affective properties provide not only
vector α of target expression is computed by feeding the concept-level features but also semantic links to the as-
hidden output into a bi-layer perceptron, as shown in pects and their sentiment polarity. For example, the
Equation 3. concept ‘rotten fish’ has property “KindOf-food” which
α = sof tmax(Wa(2) tanh(Wa(1) H 0 )) (3) can be directly related to aspects such as ‘restaurant’ or
(1) (2) ‘food quality’, and properties such as ‘Arises-joy’ could
, where Wa ∈ Rdm ×dh and Wa∈ R1×dm are param- contribute positively to the classification of sentiment
eters of the attention component. polarity.
Sentic LSTM. It is reasonable to assume that the knowl- gate oci to output concept-level knowledge complemen-
edge concepts contains information complementary to tary to the token level memory. Since AffectiveSpace
the textual word sequence, especially when the knowl- is learned independently, we leverage a transformation
edge base in use is designed to include abstract con- matrix Wc ∈ Rdh ×dµ to map AffectiveSpace to the same
cepts. Our Sentic LSTM aims to entitle the concepts space as the memory outputs. In other words, oci mod-
with two important roles: 1) assisting with the filter- els the relative contribution of token-level and concept-
ing of information flowing from one time step to the level. Moreover, we notice that oci ∗tanh(Wc µi ) actually
next and 2) providing complementary information to resembles the functionality of the sentinel vector used
the memory cell. At each time step i, we assume that by [27] that allows the model to choose whether to use
a set of knowledge concept candidates can be triggered the affective knowledge or not.
and mapped to a dc dimensional space. We denote the
set of K concepts as {µi,1 , µi,2 , · · · , µi,K }. First, we
combine the candidate embeddings into a single vector 3.7 Hybrid Sentic-LSTM
as in Equation 6.
Inspired by the work [40], we propose a simplified ver-
1 X sion of Sentic-LSTM that is a hybrid of LSTM and
µi = µi,j (6)
K j recurrent additive network. This variant of the Sentic-
LSTM could involve the concept-level input to the re-
In this paper, as we find that there exists only up to current connection and maintain a reduced number of
4 extracted concepts for each time step, we simply use parameters as compared with the sentic LSTM. An-
the average vector, although a more sophisticated at- other intuition is that the additive operation on each
tention model can also be easily employed to replace concept embedding over time steps simulates inference
the averaging function. process done by the rule-based sentic patterns [41] that
were used in the previous system. A mathematical de-
scription of the hybrid Sentic-LSTM is give in Equa-
fi = σ(Wf [xi , hi−1 , µi ] + bf ) tion 8.
Ii = σ(WI [xi , hi−1 , µi ] + bI )
C
ei = tanh(WC [xi , hi−1 ] + bC )
Mi = Wc µi
Ci = fi ∗ Ci−1 + Ii ∗ C ei (7) fi = σ(Wf [xi , hi−1 , µi ] + bf )
oi = σ(Wo [xi , hi−1 , µi ] + bo ) Ii = σ(WI [xi , hi−1 , µi ] + bI )
oci = σ(Wco [xi , hi−1 , µi ] + bco ) Ci = tanh(WC [xi , hi−1 ] + bC )
e (8)
hi = oi ∗ tanh(Ci ) + oci ∗ tanh(Wc µi ) oci= σ(Wco [xi , hi−1 , µi ] + bco )
Our extension of LSTM is illustrated in Equation 7. Ci = fi ∗ Ci−1 + Ii ∗ C ei + oci ∗ Mi
At first, we assume that the affective concepts are mean- hi = Ci
ingful cues to control the information of token-level in-
formation. For example, a multi-word concept ‘rotten Similar to recurrent additive network, we could rewrite
fish’ might indicate the word ‘rotten’ is a sentiment- the hidden output at time step i as Equation 9.
related modifier to its next word ‘fish’ and hence less
information should be filtered out at next time step.
We thus add knowledge concepts to the forget, input, Ciµ = fi ∗ (Ci−1 − Ciµ ) + Ii ∗ C
ei + fi ∗ C µ + oci ∗ Mi
i
and output gate of standard LSTM to help filtering i
X
the information. The presence of affective concepts in ei + fi ∗ C µ + oc ∗ M =
Ii ∗ C wji ∗ Mj
i i
the input gate is expected to prevent the memory cell j=0
from affected by input tokens conflicted with the pre- (9)
existed knowledge. Similarly, the output gate uses the
knowledge to filter out irrelevant information stored wji is a product of the input and forget gates. We could
in the memory. Another important feature of our ex- see that the hidden output at time step i is a hybrid of
tension of the LSTM is based on the assumption that a simplified LSTM, whose gates are coupled with both
the information from the concept-level output is com- word-level and concept-level input, and a recurrent ad-
plementary to the token level. Therefore, we extended ditive component accumulating the information from
the regular LSTM with an additional knowledge output the concept-level input over previous time steps.
Sentic LSTM: A Hybrid Network for Targeted Aspect-Based Sentiment Analysis 7
The objective to train our classier is defined as mini- Train Dev Test
mizing the sum of the cross-entropy losses of prediction Targets 3,806 955 1,898
on each target-aspect pair, i.e., Targets w/ Sentiment 2,476 619 1,241
Aspects per Target(w/ Sentiment) 1.37 1.37 1.35
1 XXX
Ls = log pac,t
|D| t∈s
s∈D a∈A
candidates at each time step, and use pre-trained 100-
, where A is the set of predefined aspects, and pac,t is dimensional AffectiveSpace embedding3 as the concept
the probability of the golden-truth polarity class c given embeddings. In the case that no concepts are extracted,
target t with respect to a sentiment category a, which a zero vector is used as the concept input.
is defined by a softmax function,
, where W p and bas are the parameters to map the vec- We evaluate our method on two sub-tasks of the target-
tor representation of target t to the polarity label of specific aspect-based sentiment analysis: 1) aspect cate-
aspect a. To avoid overfitting, we add a dropout layer gorization and 2) aspect-based sentiment classification.
with dropout probability of 0.5 after the embedding Following Saeidi et al. [15], we treat the outputs of
layer. We stop the training process of our model after aspect-based classification as hierarchical classes. For
10 epochs and select the model that achieves the best aspect categorization, we output the label with the high-
performance on the development set. est probability for each aspect. The labels are, for ex-
ample in the 3-class setting, ‘Positive’, ‘Negative’, and
‘None’, where ‘None’ means the aspect should not be
4 Experiment bonded to the given target. For aspect-based sentiment
classification, we only look at the probability of ‘Posi-
4.1 Dataset and Resources tive’ and ‘Negative’, while ignoring the scores of ‘None’4 .
For evaluating the aspect-based sentiment classifica-
We evaluate our method on two datasets: SentiHood tion, we calculate the accuracy averaged over aspects.
dataset [15] and a subset of SemEval 2015 [42]. The We evaluate the aspect categorization as a multi-label
Sentihood dataset was built by querying Yahoo! An- classification problem, and the results, therefore, are
swers with location names of London city. Table 3 shows averaged over targets instead of aspects. We evaluate
statistics of SentiHood datasets. The whole dataset is our methods and baseline systems using both loose and
split into train, test, and development set by the au- strict metrics. We report scores of three widely used
thor. Overall, the entire dataset contains 5,215 sen- evaluation metrics of multi-label classifier: Macro-F1,
tences, with 3,862 sentences containing a single target Micro-F1, and strict Accuracy (Strict Acc.).
and 1,353 sentences containing multiple targets. It also Given the dataset D, the ground truth aspect cat-
shows that there are approximately 2/3 of targets anno- egories of the target t ∈ D is denoted as Yt , while the
tated with aspect-based sentiment polarity (train set: predicted aspect categories denoted as Ybt . The three
2476 out of 2977; test set:1241 out of 1898; development metrics can be computed as
set: 619 out of 955). On average, each sentiment-bearing
1
P
target has been annotated with 1.37 aspects. To show – Strict accuracy (Strict Acc.): D t∈D σ(Yt = Yt ),
b
the generalizability of our methods, we build a subset where σ(·) is an indicator function.
of the dataset used by SemEval-2015. We remove sen- – Macro-F1 = 2 Ma-P×Ma-R
Ma-P+Ma-R , which is based on Macro-
tences containing no targets as well as N U LL targets. Precision (Ma-P) and Micro-Recall (Ma-R) with Ma-
1
P |Yt ∩Ybt | 1
P |Yt ∩Ybt |
To be comparable with Sentihood, we combine targets P = |D| t∈D Yt , and Ma-R= |D| t∈D Yt .
with the same surface form within the same sentence Mi-P×Mi-R
– Micro-F1 = 2 Mi-P+Mi-R , which is based on Micro-
as mentions of the same target. In total, we have 1,197 Precision (Mi-P) and Micro-Recall (Mi-R), where
targets left in the training set and 542 targets left in the
P P
t∈D |Yt ∩Ybt |
t∈D |Yt ∩Ybt |
Mi-P= P , and Mi-R= P
Yt .
testing set. On average, each target has 1.06 aspects. t∈D Y
bt t∈D
and sentiment classification, indicating that the target ure ??, when there are multiple target instances (or
and aspect dependent sentence attention could retrieve words), the target-level attention is capable of selecting
information relevant to both tasks. the first target instance which is tied with the clearer
To our surprise, using multiple hops in the sentence- sentiment.
level attention fails to produce any improvement. The
performance even falls down significantly on SemEval-
2015 dataset with a much smaller number of training 4.6 The Result of Knowledge-embedded LSTM
instances but larger aspect set than Sentihood. We con-
jecture the reason is that using multi-hops increases the
number of parameter to learn, making it less applicable
to small and sparse dataset such as SemEval 2015.
Table 6: Comparison of systems using AffectiveSpace and SentiWordNet (SH stands for Sentihood, and SE stands
for SemEval-15)
4. Iti Chaturvedi, Edoardo Ragusa, Paolo Gastaldo, pages 1546–1556, Osaka, Japan, December 2016. The
Rodolfo Zunino, and Erik Cambria. Bayesian network COLING 2016 Organizing Committee.
based extreme learning machine for subjectivity detec- 16. Thien Hai Nguyen and Kiyoaki Shirai. Phrasernn: Phrase
tion. Journal of The Franklin Institute, 2017. recursive neural network for aspect-based sentiment anal-
5. Sanjiv R Das and Mike Y Chen. Yahoo! for amazon: ysis. In Proceedings of the 2015 Conference on Empirical
Sentiment extraction from small talk on the web. Man- Methods in Natural Language Processing, pages 2509–2514,
agement science, 53(9):1375–1388, 2007. Lisbon, Portugal, September 2015. Association for Com-
6. Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, and putational Linguistics.
Toshikazu Fukushima. Mining product reputations on 17. Yequan Wang, Minlie Huang, xiaoyan zhu, and Li Zhao.
the web. In Proceedings of the Eighth ACM SIGKDD In- Attention-based lstm for aspect-level sentiment classifi-
ternational Conference on Knowledge Discovery and Data cation. In Proceedings of the 2016 Conference on Empirical
Mining, pages 341–349, New York, NY, USA, 2002. ACM. Methods in Natural Language Processing, pages 606–615,
7. Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Har- Austin, Texas, November 2016. Association for Compu-
ris Papageorgiou, Ion Androutsopoulos, and Suresh Man- tational Linguistics.
andhar. Semeval-2014 task 4: Aspect based sentiment 18. Duyu Tang, Bing Qin, and Ting Liu. Aspect level sen-
analysis. In Proceedings of the 8th International Work- timent classification with deep memory network. In
shop on Semantic Evaluation (SemEval 2014), pages 27–35, Proceedings of the 2016 Conference on Empirical Methods
Dublin, Ireland, August 2014. Association for Computa- in Natural Language Processing, pages 214–224, Austin,
tional Linguistics and Dublin City University. Texas, November 2016. Association for Computational
8. Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Linguistics.
Androutsopoulos, Suresh Manandhar, Mohammad AL- 19. Sepp Hochreiter and Jürgen Schmidhuber. Long short-
Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, term memory. Neural computation, 9(8):1735–1780, 1997.
Orphee De Clercq, Veronique Hoste, Marianna Apidi- 20. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas-
anaki, Xavier Tannier, Natalia Loukachevitch, Evgeniy tiani. Sentiwordnet 3.0: An enhanced lexical resource for
Kotelnikov, Núria Bel, Salud Marı́a Jiménez-Zafra, and sentiment analysis and opinion mining. In LREC, vol-
Gülşen Eryiğit. Semeval-2016 task 5: Aspect based sen- ume 10, pages 2200–2204, Valletta, Malta, 2010. Euro-
timent analysis. In Proceedings of the 10th International pean Language Resources Association (ELRA).
Workshop on Semantic Evaluation (SemEval-2016), pages
21. Erik Cambria, Soujanya Poria, Rajiv Bajpai, and Bjoern
19–30, San Diego, California, June 2016. Association for Schuller. Senticnet 4: A semantic resource for sentiment
Computational Linguistics. analysis based on conceptual primitives. In Proceedings of
9. Soujanya Poria, Erik Cambria, and Alexander Gelbukh.
COLING 2016, the 26th International Conference on Com-
Aspect extraction for opinion mining with a deep con-
putational Linguistics: Technical Papers, pages 2666–2677,
volutional neural network. Knowledge-Based Systems,
Osaka, Japan, December 2016. The COLING 2016 Orga-
108:42–49, 2016.
nizing Committee.
10. Yunqing Xia, Erik Cambria, and Amir Hussain. Aspnet:
22. Lev Ratinov and Dan Roth. Design challenges and mis-
Aspect extraction by bootstrapping generalization and
conceptions in named entity recognition. In Proceedings of
propagation using an aspect network. Cognitive Compu-
the Thirteenth Conference on Computational Natural Lan-
tation, 7(2):241–253, 2015.
guage Learning, pages 147–155. Association for Computa-
11. Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Feder-
ica Bisio. Sentic LDA: Improving on LDA with semantic tional Linguistics, 2009.
similarity for aspect-based sentiment analysis. In IJCNN, 23. Yukun Ma, Jung-jae Kim, Benjamin Bigot, and
pages 4465–4473, 2016. Tahir Muhammad Khan. Feature-enriched word em-
12. Duyu Tang, Bing Qin, Xiaocheng Feng, and Ting Liu. beddings for named entity recognition in open-domain
Effective lstms for target-dependent sentiment classifica- conversations. In Acoustics, Speech and Signal Processing
tion. In Proceedings of COLING 2016, the 26th Interna- (ICASSP), 2016 IEEE International Conference on, pages
tional Conference on Computational Linguistics: Technical 6055–6059. IEEE, 2016.
Papers, pages 3298–3307, Osaka, Japan, December 2016. 24. Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun,
The COLING 2016 Organizing Committee. and Xiaolong Wang. Incorporating loose-structured
13. Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming knowledge into lstm with recall gate for conversation
Zhou, and Ke Xu. Adaptive recursive neural network for modeling. arXiv preprint arXiv:1605.05110, 2016.
target-dependent twitter sentiment classification. In Pro- 25. Yang Li, Quan Pan, Tao Yang, Suhang Wang, Jiliang
ceedings of the 52nd Annual Meeting of the Association for Tang, and Erik Cambria. Learning word representations
Computational Linguistics (Volume 2: Short Papers), pages for sentiment analysis. Cognitive Computation, pages 1–9,
49–54, Baltimore, Maryland, June 2014. Association for 2017.
Computational Linguistics. 26. Nir Ofek, Soujanya Poria, Lior Rokach, Erik Cambria,
14. Bo Wang, Maria Liakata, Arkaitz Zubiaga, and Rob Amir Hussain, and Asaf Shabtai. Unsupervised com-
Procter. Tdparse: Multi-target-specific sentiment recog- monsense knowledge enrichment for domain-specific sen-
nition on twitter. In Proceedings of the 15th Conference of timent analysis. Cognitive Computation, 8(3):467–477,
the European Chapter of the Association for Computational 2016.
Linguistics: Volume 1, Long Papers, pages 483–493, Va- 27. Bishan Yang and Tom Mitchell. Leveraging knowledge
lencia, Spain, April 2017. Association for Computational bases in lstms for improving machine reading. In Pro-
Linguistics. ceedings of the 55th Annual Meeting of the Association for
15. Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, and Computational Linguistics (Volume 1: Long Papers), pages
Sebastian Riedel. Sentihood: Targeted aspect based sen- 1436–1446, Vancouver, Canada, July 2017. Association
timent analysis dataset for urban neighbourhoods. In for Computational Linguistics.
Proceedings of COLING 2016, the 26th International Con- 28. Erik Cambria and Amir Hussain. Sentic album: content-,
ference on Computational Linguistics: Technical Papers, concept-, and context-based online personal photo man-
12 Ma et al.
agement system. Cognitive Computation, 4(4):477–496, neural information processing systems, pages 3111–3119,
2012. 2013.
29. Qiu-Feng Wang, Erik Cambria, Cheng-Lin Liu, and Amir 44. Ruining He and Julian McAuley. Ups and downs: Mod-
Hussain. Common sense knowledge for handwritten chi- eling the visual evolution of fashion trends with one-class
nese text recognition. Cognitive Computation, 5(2):234– collaborative filtering. In Proceedings of the 25th Interna-
242, 2013. tional Conference on World Wide Web, pages 507–517. In-
30. Erik Cambria, Jie Fu, Federica Bisio, and Soujanya Poria. ternational World Wide Web Conferences Steering Com-
Affectivespace 2: Enabling affective intuition for concept- mittee, 2016.
level sentiment analysis. In AAAI, pages 508–514, 2015.
31. Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab
Barman, Dasha Bogdanova, Jennifer Foster, and Lamia
Tounsi. Dcu: Aspect-based polarity classification for se-
meval task 4. In Proceedings of the 8th International Work-
shop on Semantic Evaluation (SemEval 2014), pages 223–
229, Dublin, Ireland, August 2014. Association for Com-
putational Linguistics and Dublin City University.
32. Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and
Saif Mohammad. Nrc-canada-2014: Detecting aspects
and sentiment in customer reviews. In Proceedings of
the 8th International Workshop on Semantic Evaluation
(SemEval 2014), pages 437–442, Dublin, Ireland, Au-
gust 2014. Association for Computational Linguistics and
Dublin City University.
33. Himabindu Lakkaraju, Richard Socher, and Chris Man-
ning. Aspect specific sentiment analysis using hierarchi-
cal deep learning. In NIPS Workshop on Deep Learning and
Representation Learning. Curran Associates, Inc., 2014.
34. Peng Chen, Zhongqian Sun, Lidong Bing, and Wei Yang.
Recurrent attention network on memory for aspect sen-
timent analysis. In Proceedings of the 2017 Conference on
Empirical Methods in Natural Language Processing, pages
463–472, Copenhagen, Denmark, September 2017. Asso-
ciation for Computational Linguistics.
35. Altaf Rahman and Vincent Ng. Coreference resolution
with world knowledge. In Proceedings of the 49th Annual
Meeting of the Association for Computational Linguistics:
Human Language Technologies-Volume 1, pages 814–824.
Association for Computational Linguistics, 2011.
36. Ndapandula Nakashole and Tom M Mitchell. A
knowledge-intensive model for prepositional phrase at-
tachment. In ACL (1), pages 365–375, 2015.
37. Sungjin Ahn, Heeyoul Choi, Tanel Pärnamaa, and
Yoshua Bengio. A neural knowledge language model.
arXiv preprint arXiv:1608.00318, 2016.
38. Mike Schuster and Kuldip K Paliwal. Bidirectional recur-
rent neural networks. IEEE Transactions on Signal Pro-
cessing, 45(11):2673–2681, 1997.
39. Luca Oneto, Federica Bisio, Erik Cambria, and Davide
Anguita. Semi-supervised learning for affective common-
sense reasoning. Cognitive Computation, 9(1):18–42, 2017.
40. Kenton Lee, Omer Levy, and Luke Zettlemoyer. Recur-
rent additive networks. arXiv preprint arXiv:1705.07393,
2017.
41. Soujanya Poria, Erik Cambria, Gregoire Winterstein, and
Guang-Bin Huang. Sentic patterns: Dependency-based
rules for concept-level sentiment analysis. Knowledge-
Based Systems, 69:45–63, 2014.
42. Maria Pontiki, Dimitris Galanis, Haris Papageorgiou,
Suresh Manandhar, and Ion Androutsopoulos. Semeval-
2015 task 12: Aspect based sentiment analysis. In Pro-
ceedings of the 9th International Workshop on Semantic
Evaluation (SemEval 2015), pages 486–495, Denver, Col-
orado, June 2015. Association for Computational Lin-
guistics.
43. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
rado, and Jeff Dean. Distributed representations of words
and phrases and their compositionality. In Advances in