0% found this document useful (0 votes)
42 views14 pages

Huang 2021

The article presents a novel model called AEC-LSTM, which integrates emotional intelligence and a topic-level attention mechanism to enhance sentiment analysis in texts. This model improves the learning ability of LSTM networks by considering the modulation effects of emotions on sentiment feature extraction, leading to better classification performance compared to existing methods. Experimental results demonstrate that AEC-LSTM significantly outperforms state-of-the-art deep learning models in sentiment analysis tasks.

Uploaded by

teemo2774
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views14 pages

Huang 2021

The article presents a novel model called AEC-LSTM, which integrates emotional intelligence and a topic-level attention mechanism to enhance sentiment analysis in texts. This model improves the learning ability of LSTM networks by considering the modulation effects of emotions on sentiment feature extraction, leading to better classification performance compared to existing methods. Experimental results demonstrate that AEC-LSTM significantly outperforms state-of-the-art deep learning models in sentiment analysis tasks.

Uploaded by

teemo2774
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

Attention-Emotion-Enhanced Convolutional LSTM


for Sentiment Analysis
Faliang Huang , Xuelong Li , Fellow, IEEE, Changan Yuan, Shichao Zhang , Senior Member, IEEE,
Jilian Zhang, and Shaojie Qiao

Abstract— Long short-term memory (LSTM) neural networks and context. Experiments on real-world data sets show that
and attention mechanism have been widely used in sentiment our approach can improve sentiment classification performance
representation learning and detection of texts. However, most effectively and outperform state-of-the-art deep learning-based
of the existing deep learning models for text sentiment analysis methods significantly.
ignore emotion’s modulation effect on sentiment feature extrac-
tion, and the attention mechanisms of these deep neural network Index Terms— Attention mechanism, long short-term memory
architectures are based on word- or sentence-level abstractions. (LSTM), representation learning, text sentiment analysis.
Ignoring higher level abstractions may pose a negative effect on
learning text sentiment features and further degrade sentiment I. I NTRODUCTION
classification performance. To address this issue, in this article,
a novel model named AEC-LSTM is proposed for text senti-
ment detection, which aims to improve the LSTM network by
integrating emotional intelligence (EI) and attention mechanism.
W ITH the rapid development of social networks, more
and more users tend to share their opinions about social
events, post discussions of political groups, and express their
Specifically, an emotion-enhanced LSTM, named ELSTM, is first preferences over products and services on social media plat-
devised by utilizing EI to improve the feature learning ability forms. Sentiments conveyed in user-generated texts provide
of LSTM networks, which accomplishes its emotion modulation
of learning system via the proposed emotion modulator and the most up-to-date and comprehensive information. Sentiment
emotion estimator. In order to better capture various structure analysis aims to analyze people’s opinions on entities, such
patterns in text sequence, ELSTM is further integrated with as events, individuals, issues, services, products, and orga-
other operations, including convolution, pooling, and concate- nizations. The proliferation of user-generated content (UGC)
nation. Then, topic-level attention mechanism is proposed to data makes sentiment analysis increasingly crucial and popular
adaptively adjust the weight of text hidden representation. With
the introduction of EI and attention mechanism, sentiment in many applications. For example, product review sentiment
representation and classification can be more effectively achieved analysis can be used to help e-commerce platforms provid-
by utilizing sentiment semantic information hidden in text topic ing online advice and recommendations for their customers.
Sociopolitical events, such as the Hongkong Riots and the
Manuscript received April 9, 2020; revised October 4, 2020 and London Riots vividly, demonstrate the importance of sentiment
December 29, 2020; accepted January 27, 2021. This work was supported analysis to public security. Sentiment analysis results of the
in part by the Natural Science Foundation of China under Grant 61962038, conversation on Twitter before the American election can
Grant 61962006, Grant 61972177, Grant 61871470, Grant 61772091,
and Grant 61802035; in part by the Natural Science Foundation of be used as an indicator of the election outcome. Sentiment
Guangxi under Grant 2018GXNSFDA138005; in part by the Guangxi analysis has been drawing tremendous attention from both
Bagui Teams for Innovation and Research under Grant 201979; and in industrial and academic fields for its wide applications [1]–[4].
part by the CCF-Huawei Database System Innovation Research Plan under
Grant CCF-HuaweiDBIR2020004A. (Corresponding authors: Shaojie Qiao; According to text sentiment representation schemes, cur-
Changan Yuan.) rent mainstream machine learning methods for text senti-
Faliang Huang is with the School of Computer and Information Engi- ment analysis can be roughly organized into two categories:
neering, Nanning Normal University, Nanning 530100, China, and also
with the Guangxi College of Education, Nanning 530023, China (e-mail: shallow-sentiment-based methods and deep-sentiment-based
[email protected]). methods. Following the milestone work of Pang et al. [5],
Xuelong Li is with the School of Artificial Intelligence, Optics and Elec- shallow-sentiment-based methods treat text sentiment analysis
tronics (iOPEN), Northwestern Polytechnical University, Xi’an 710072, China
(e-mail: [email protected]). as a text classification problem and exploit supervised machine
Changan Yuan is with the Guangxi Academy of Sciences, Nanning 530003, learning techniques to estimate sentiment distribution of texts.
China (e-mail: [email protected]). Performance of the sentiment learners is heavily dependent on
Shichao Zhang is with the Guangxi Key Lab of Multi-source Information
Mining Security, Guangxi Normal University, Guilin 541004, China (e-mail: the choice of text representation models. However, shallow-
[email protected]). sentiment-based representation models are incapable, to some
Jilian Zhang is with the College of Cyber Security, Jinan University, extent, to effectively capture true sentiment distribution of
Guangzhou 510632, China (e-mail: [email protected]).
Shaojie Qiao is with the School of Software Engineering, Chengdu texts, and it cannot obtain satisfactory sentiment analysis
University of Information Technology, Chengdu 610225, China (e-mail: results. For instance, bag of words (BOW) model may lead
[email protected]). to data sparsity and false correlation problems, and n-gram
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TNNLS.2021.3056664. may exacerbate the problem of the curse of dimensionality.
Digital Object Identifier 10.1109/TNNLS.2021.3056664 Due to the powerful ability to extract discriminative features
2162-237X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

from various modal data, deep learning techniques have been To this end, enlightened by emotion intelligence and cog-
widely utilized to learn distributed representation of texts and nitive theory from neuroscience, we propose a novel deep
acquire different levels abstraction of sentiment information. learning model AEC-LSTM, i.e., attention-emotion-enhanced
Extensive studies [6] have indicated that deep-sentiment-based convolutional LSTM, for text sentiment detection. Specifically,
methods have an overwhelming advantage over the traditional an emotion-enhanced LSTM named ELSTM is designed to
machine learning methods in terms of sentiment recognition make use of emotion intelligence for improving sentiment
accuracy. representation performance. ELSTM deals with emotion via
Due to the recurrent computation mechanism and subtle an emotion-based modulator and an emotion estimator. The
avoidance of exploding and vanishing gradients, LSTM net- emotion estimator is to estimate the emotion state distrib-
works excel at learning embedding representations of text ution of ELSTM based on the quality of ELSTM’s output
sequences and therefore has been widely used in text sentiment representation of the input subsequence. With emotion signal
analysis [7]–[12]. Like other deep neural networks, LSTM from the emotion estimator, the emotion-based modulator
networks attempt to imitate information processing mechanism generates modulation signal to balance the tradeoff between
in biological nerve systems via different simple analogs for information memorizing and information forgetting according
structure and function of biological brains [13]–[18]. Studies to emotion-action mapping rules. Finally, the forget gate
in cognitive science [19], [20] show that cognitive processing of ELSTM performs the modulation action. Also, a novel
system of human brains is emotion-related. Specially, positive attention mechanism based on high-level abstraction, i.e.,
emotions, such as enthusiasm and happiness, have the potential topic-level attention mechanism, is proposed. With ELSTM
to allow human to develop more creative problem-solving and topic-level attention mechanism, we further construct a
strategies, whereas negative emotions, such as sadness and ten- convolutional neural network AEC-LSTM to capture sentiment
sion, may severely degrade decision performance on cognitive patterns hidden in texts. For performance evaluation, we apply
tasks. However, all of the existing LSTM networks do not the AEC-LSTM model to four real-world social corpora
take into consideration emotion’s effect on sentiment feature and compare its performance against state-of-the-art deep
extraction of text sequence. This may cause performance learning-based sentiment analysis models CNN and LSTM.
degradation of the LSTM networks for sentiment analysis. Experimental results show that the proposed AEC-LSTM can
It is worth emphasizing that sentiment and emotion are improve sentiment classification performance effectively and
closely related but different concepts in psychology, although outperform its competitors significantly.
the two terminologies are used indiscriminately in most of the To summarize, the proposed AEC-LSTM has its unique
existing works in text sentiment analysis. Specially, a senti- advantages over the existing methods: 1) in contrast with
ment can be defined as a mental attitude, which is displayed the neglect of emotion modulation in existing artificial neural
through various expressions such as short text sequences and models, emotion’s modulation effect is utilized in AEC-LSTM
short videos in social media. Emotions can be defined as to improve the learning ability of neural units via simulat-
temporary psychological states that exert a great impact on ing human emotion mechanism; 2) compared to the extant
an individual behavior and can excite/restrain neurons when attention mechanisms, nearly all of which are founded on
receiving advantageous/disadvantageous feedbacks, as there- low abstractions such as word level, phrase level, and so on,
fore improve the learning ability of deep neural model LSTM. topic-level attention mechanism in AEC-LSTM can simulate
In this article, we identify and analyze sentiment hidden in higher level attention in human brain recognition, leading to
social texts. In addition to text sentiment analysis, the proposed the acquisition of higher quality sentiment representations of
deep model ELSTM can also be extended to other learning text sequences.
tasks, and this is our future work. Thus, we maintain firm Our contributions in this article mainly include the
distinctions between sentiment and emotion in this article. following.
Sentiment patterns hidden in texts are closely related to the 1) Based on emotional intelligence (EI) theory and practice,
text topics since human tend to revolve around a certain topic a novel LSTM variant dubbed ELSTM is devised for the
while writing some certain texts or posting a message [21], purpose of utilizing positive interaction between emotion
[22]. For instance, the adjective “complicated” may be nega- and cognition to improve model inference. To the best
tive in sentiment space when it occurs in a message discussing of our knowledge, for the first time, emotional mech-
a movie character, whereas it conveys positive sentiment in anism in human brain is introduced to improve LSTM
a message commenting on movie plot. Attention mechanism representation learning ability.
has been widely integrated into various deep neural models 2) Topic attention mechanism is proposed to effectively
for boosting text sentiment analysis performance [24]–[32]. accomplish weight adjustment of text subsequence hid-
However, most of the existing models usually take aside text den representation.
topic while analyzing text sentiment polarity. In particular, 3) We devise a sentiment analysis deep network model
these models apply an attention mechanism to adaptively AEC-LSTM, which integrates topic attention mecha-
adjust weights of different low-level components such as nism and convolutional ELSTM, to facilitate text sen-
words and phrases, rather than high-level abstraction such as timent detection.
topics of texts. This dogmatical separation between sentiment 4) Through an extensive set of experiments on the
and text topic will undoubtedly degrade the performance of benchmark sentiment analysis data sets, we illus-
sentiment identification. trate significant performance improvements achieved

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HUANG et al.: ATTENTION-EMOTION-ENHANCED CONVOLUTIONAL LSTM 3

by our method with respect to the conventional without extra pretraining step. Chen et al. [12] designed
methods. an attention-based LSTM network to learn bisense emoji
The rest of this article is organized as follows. We review embeddings for tweeter sentiment analysis.
related work in Section II. Section III describes the proposed Sentiment features computed by CNN networks are local
model ELSTM in detail. The AEC-LSTM network structure and position-independent, and sentiment features resulted from
is expatiated in Section IV. We present experiment results recursiveness of LSTM networks are global. In order to
on massive data sets and conduct detailed discussion in better fuse sentiment features generated by different type
Section V. Finally, we conclude this article in Section VI. networks, mixture models of LSTM and CNN have been
proposed. Wang et al. [27] proposed to combine CNN and
II. R ELATED W ORK RNN for sentiment analysis of short texts by utilizing CNN
to produce coarse-grained local features and applying RNN to
Our approach is closely related to text sentiment analysis,
capture long-distance dependencies. Wang et al. [28] presented
EI, and attention mechanism, and we review some of the most
a regional CNN-LSTM model consisting of regional CNN,
relevant work here.
which treats individual sentences but not the whole text as
input, and LSTM to predict valence-arousal ratings of texts.
A. Text Sentiment Analysis Chen et al. [29] applied 1-D CNNs to learn distributed
Numerous techniques for text sentiment analysis have representation of reviews and used a CNN with gated recur-
emerged recently, including supervised and unsupervised rent units to learn distributed representations of users and
methods. Supervised methods train sentiment classifiers, such products. Although those deep learning techniques effectively
as support vector machines (SVM), maximum entropy, and improve the performance of sentiment analysis, emotion’s
naive Bayes, with handcrafted features such as BOW and modulation effect on sentiment cognition is put aside while
n-gram. Unsupervised methods exploit sentiment lexicons, learning sentiment representation of texts, which may impair
grammatical analysis, and syntactic patterns to discover the the performance of sentiment analysis.
cluster structure of text sentiment space. Performance of
traditional sentiment analysis methods mainly depends on the
quality of the manual features extracted from the opinionated B. Emotional Intelligence
texts. Powerful ability of feature learning has enabled deep Formal theories of EI are put forth by Peter Salovey
neural networks to be flourishing in text sentiment analysis. and John Maye early in 1990 [30]. There is no universal
Represented by convolutional neural networks (CNN) and definition of EI; however, main models consisting of ability
LSTM networks, various types of deep learning models have model, mixed model, and trait model are commonly accepted.
sprung up. Ability model views emotions as useful sources of information
By restricting the receptive fields of the hidden layers to be that help one to make sense of and navigate the social
local, CNNs mainly consisting of convolutional and pooling environment. Mixed model holds that EI is a wide array of
layers seek to extract local features such as sentiment-rich competencies and skills that drive leadership performance.
word sequences. Chen et al. [23] utilized multimodal samples In the trait model, EI is conceptualized as a constellation
with weak sentiment labels to train CNNs for microblog senti- of emotional self-perceptions located at the lower levels of
ment prediction. Yang et al. [24] trained an emotion-semantic- personality [31].
enhanced CNN model to learn embedding representations of Studies in EI show that emotions play an important role in
emotions and words. Zhao et al. [25] designed a model called the human decision-making process. Khashman [32] proposed
GloVe-DCNN for tweet sentiment classification. Gui et al. [26] a novel emotional network DuoNeural to integrate emotion
learned representations of users, products, and review words and cognition at the structural level of intelligent systems.
by using heterogeneous network embedding technique and For better pattern recognition, Lotfi and Akbarzadeh [33]
employed CNN to detect product review sentiment polarity devised a limbic-based artificial emotional neural network
with the learned representations. In addition to CNN, various (LiAENN), which can achieve modeling emotional situations
LSTM networks have been proposed to extract sentiment such as anxiety and confidence in the learning process.
features from text sequences. Shi et al. [7] constructed a Yu et al. [34] proposed a double-layered emotional mul-
hierarchical LSTM model LSTM-MF to extract user-based tiagent reinforcement learning framework to endow agents
and content-based features for sentiment analysis in microblog with internal cognitive and emotional capabilities in order to
texts. Zhou et al. [8] incorporated a word2vec model and a force these agents to learn cooperative behaviors. Oyedotun
stacked bidirectional LSTM model for sentiment analysis in and Khashman [35] modified the emotional neural network
Chinese microblogs. Baziotis et al. [9] proposed a Siamese model to unify the prototype- and adaptive-learning theories.
Bidirectional LSTM with a context-aware attention mecha- Markadeh et al. [36] designed a brain-emotional learning-
nism for topic-based sentiment analysis tasks and employed a based intelligent controller for simultaneous speed and flux
two-layer bidirectional LSTM with an attention mechanism for control of a laboratory induction motor drive. It is worth point-
message-level sentiment analysis. Yang et al. [10] presented an ing out that the proposed Emo-LSTM, inspired by those work
attention-based bidirectional LSTM approach to improve the mentioned above, further strengthens the understanding that
target-dependent sentiment classification. Sachan et al. [11] emotions should be embedded within the reasoning process
devised a simple BiLSTM model and a training strategy of intelligent systems.

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

C. Attention Mechanism
Attention mechanisms in neural networks are essentially
imitation of human visual mechanisms, which act as a
high-level control system and can make cognitive processing
more effective [37], [38]. Calculating attention in deep learn-
ing primarily consists of three steps. First, similarity weight
between a query and each key is computed using some given
similarity metric. The second step is typically to use a softmax
function to normalize these weights and finally to weight
these weights in conjunction with the corresponding values
and obtain the final attention. Recently, attention mechanisms
have found broad application in text sentiment analysis tasks Fig. 1. Classical LSTM.
based on deep learning.
To capture hierarchical structure feature of documents,
Yang et al. [39] designed a hierarchical attention (HA) mech- attention mechanism in this article is based on topic-level
anism applied at the word and sentence levels, in which abstraction, which is built on higher level abstraction. The
content with different importance was treated differently when topic-level attention mechanism may be more plausible due
constructing document representation. Considering that most to the fact that text sentiment and text topic are usually
existing methods only focus on local text information and interrelated or interdependent. It is worth noting that our model
ignore the global user preference and product characteristics, AEC-LSTM is similar to TopicRNN [52] in attempting to learn
Chen et al. [40] built an attention mechanism based on global better semantic or sentiment representation of text sequences
user and product information for sentiment classification. via latent topic modeling techniques, but there is a distinct
Kokkinos and Potamianos [41] proposed a tree-structured difference between the two models. Specially, a topic vector
bidirectional neural network with gated memory units as well is used as bias in TopicRNN to capture the global semantic
as a structural attention (SA) model. Long et al. [42] created dependence among words in a document and separate global
a novel cognition grounded attention model for sentiment semantic and local dynamic contributions of the words, and in
analysis that is learned from cognition grounded eye-tracking AEC-LSTM topic, vectors are used as attention to represent
data. Zhang et al. [43] proposed to integrate CNN with the topic contribution to sentiment. In addition, stop words
three different attentions, including attention vector, LSTM exert great effect on the distributions of generated topics in
attention, and attentive pooling, for text sentiment analy- TopicRNN, as may render TopicRNN to some extent unfit for
sis. Deng et al. [44] applied a sparse self-attention mecha- learning representations of short social texts such as tweets
nism to capture the importance of each word to distinguish and microblogs, since stop words are often absent in short
text sentiment polarities. Gan et al. [45] proposed a sparse social corpus. Also, AEC-LSTM can remedy this defect of
attention-based separable dilated convolutional neural net- TopicRNN.
work, in which sentiment-oriented components are noticed
according to the features of specific target entity in the sparse
III. E MOTION -E NHANCED LSTM
attention layer. Yuan et al. [46] came up with a domain
attention model for multidomain sentiment analysis. Spe- A. Model Description
cially, the domain representation is used as attention to select Recurrent neural networks (RNNs) provide a very elegant
the most important domain-related features in each domain. way of dealing with text sequences that embody correlations
Lei et al. [47] devised a sentiment-aware attention network between words that are close in the sequence. However, tradi-
(SAAN), which combines three attention mechanisms, i.e., tional RNN has a problem of gradient vanishing or exploding,
word-level mutual attention, phrase-level convolutional atten- and it cannot effectively capture semantic or sentiment depen-
tion, and sentence-level multi-head attention. Tang et al. [48] dence among words with long distance. In order to overcome
learned sentiment embedding instead of semantic embedding the issues, long short-term memory network (LSTM) was
for word- and sentence-level sentiment analysis. Ren et al. proposed by Hochreiter and Schmidhuber [53] and achieved
[49] produced a context-sensitive neural network for senti- superior performance in sequence representation learning.
ment classification. Observing that mismatching between the From the structure of LSTM block (Fig. 1), we can see that
sentiment words and the aspects occurs when an unrelated LSTM is an artificial memory architecture to approximate the
sentiment word is semantically meaningful for the given working mechanism of brain memory system and language
aspect, Cheng et al. [50] proposed an HA (HEAT) network cognition. Cognitive psychologists and neuroscientists agreed
consisting of aspect attention and sentiment attention. Liu that emotion plays a role at various specific stages of memory
et al. [51] presented two content attention mechanisms, i.e., process, including encoding information, consolidating mem-
sentence-level content attention mechanism and context atten- ories and recalling experiences [54]. For instance, emotive
tion mechanism. state at the time human perceive and process an observation
Different from the above-mentioned attention mechanisms can positively affect the encoding of information into the
built on low abstractions, for instance, word level, phrase short or even long-term memory. As discussed in Section II-B,
level, entity-attribute level, and sentence level, our proposed reasonable utilization of emotion has been proved to facilitate

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HUANG et al.: ATTENTION-EMOTION-ENHANCED CONVOLUTIONAL LSTM 5

Fig. 2. Proposed ELSTM.

intelligent systems to make higher quality decision. However, the regulation consists of four stages, i.e., forgetting stage,
existing LSTM networks ignore the effect of emotion while emotion estimation stage, update stage, and output stage.
updating information in the memory cell of LSTM block. In the forgetting stage, ELSTM aims to decide what infor-
In order to better simulate the functionality of brain memory mation to remove from the cell state when it receives input
system, a novel emotion-enhanced LSTM, dubbed ELSTM, x t . In order to bring the advantages of emotion intelligence
is proposed based on emotion intelligence. into full play in this decision process, we reform the forget
As shown in Fig. 2, ELSTM features three gates (input, gate of classical LSTM to enable emotion signal to balance
forget, and output), three states (hidden state, cell state, and the tradeoff between the exploitation of historic information
emotion state), memory cell, emotion-based modulator (EM), h t−1 and the exploration of current input x t . The decision
and emotion estimator used for emotion state estimation. process is formulated as (4) and can be described as follows:
The output of ELSTM is recurrently connected back to the ELSTM first takes output h t−1 from the previously hidden
block input and all of the gates. More formally, the workflow layer and current input x t , and a number in [0, 1] is generated
of ELSTM can be briefly described as follows: with a sigmoid function. The output number gives that the
probability of the event “information in the cell state is to
at = tanh(Wa x t + Ua h t−1 + ba ) (1) be kept.” For instance, 1 means that information in the cell
i t = σ (Wi x t + Ui h t−1 + bi ) (2) state is completely kept and 0 means completely dump. Also,
et = EM(es) (3) emotion modulation signal et from the emotion modulator EM
is utilized to further modulate the output from the sigmoid
f t = et ◦ σ (W f x t + U f h t−1 + b f ) (4)
activation function. Here, in order to accomplish the modula-
ot = σ (Wo x t + Uo h t−1 + bo ) (5) tion, the Hadamard product operator is chosen to mix the two
ct = f t ◦ ct−1 + i t ◦ at (6) signals et and the output according to the existence of short
h t = ot ◦ tanh(ct ) (7) paths in the emotional brain [32].
In the emotion estimation stage, ELSTM attempts to learn
where activation function σ (·) is a sigmoid function like its emotion state distribution through the proposed emotion
σ (x) = (1/1 + e−x ), ◦ is a Hadamard product operator, U estimator, which is elaborated in Section III-C.
and W , respectively, denote weight matrices of output h t−1 In the update stage, ELSTM seeks to decide what new
from the previous hidden layer and current input x t , b is input information to store in the cell state via combining the input
bias of the three S-shape functions. i t , ft , and ot denote output gate and a tanh function. The input gate determines which
of input gate, forget gate, and output gate, respectively. et is values ELSTM will update (2), and the tanh function (1)
an emotion modulation factor produced by the emotion-based creates a vector of new candidate values ct to be added to the
modulator EM based on the input emotion state es (how to cell state. Information update in the cell state is accomplished
obtain emotion state signal es is expatiated in Section III-B), by the combination schema formalized in (6).
ct is the state of memory cell at time step t, and h t denotes In the output stage, ELSTM computes the hidden state h t ,
the hidden state of ELSTM at time step t. corresponding to the input sequence (x 1 , x 2 , . . . , x t ). ELSTM
From the above computation, we can see that the key first uses the output gate ot to determine which parts of
to ELSTM is the cell state. ELSTM can remove or add the cell state to output in (5) and then puts the cell state
information to the cell state by regulating the gates. Specially, through the tanh function and multiplies it by result of the

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

output so that it only outputs the parts it decides, as shown happiness, which is caused by its success in some time
in (7). steps. The success at a given time step t is defined by the
According to the ELSTM architecture as well as its work- capability of ELSTM to learn semantic correlation between
flow, we can see that ELSTM has the following characteristics: input subsequence (x t−τ , x t−τ +1 , . . . , x t−1 ) and its next item
1) it completely inherits classical LSTM’s simulation for x t . If underlying semantic of the subsequence is success-
human brain memory mechanism and forgetting mechanism fully captured, the joyful experience of success will motivate
and 2) it ingeniously utilizes emotion to effectively balance ELSTM to strengthen the exploitation of historical informa-
the tradeoff between information memorizing and information tion. In another word, ELSTM prefers to memorize the learned
forgetting. knowledge rather than forget it and hence increases its emotion
modulation factor.
B. Emotion Estimator Rule 3 (Decrease Emotion Modulation Factor): ELSTM’s
failure to learn satisfactory embedding representation of the
Emotions result when thalamus sends a message to
above-mentioned subsequences may cause a sad experience
the brain in response to a stimulus, and emotional brain
and further may motivate such action. This means that ELSTM
has a superior feature that is fast reacting. In order to
decreases its emotion modulation factor so as to forget more
simulate the high-efficiency of emotion generation mech-
of the learned low-quality knowledge
anism, we design a novel emotion estimator based on   sp 
similarity cognition and negative sampling. Linguistically, et + ρ ∗ h  t − h t  if es = H
for a given text sequence (x 1 , x 2 , . . . , x t−1 , x t , . . .), sub- et+1 =   sp  (10)
et − ρ ∗ h t − h t  if es = S
sequence (x t−τ , x t−τ +1 , . . . , x t−1 ) is probably semantically
closely related with item x t . Based on this, we compute where ρ is the emotion inertia coefficient, |·| is the absolute

relevance of the two through applying inner product operation value operator, h t is the output representation of input
sp
to representation vector of sequence (x t−τ , x t−τ +1 , . . . , x t−1 ) item x t , h t denotes sum pooling of output representations
and representation vector of item x t . Here, we per- (h t−τ , h t−τ +1 , . . . , h t−1 ), and H and S denote emotion states
form sum pooling operation on the hidden represen- Happy and Sad, respectively.
tations (h t−τ , h t−τ +1 , . . . , h t−1 ) to obtain the representa- Equation (10) also suggests that the emotion inertia coeffi-
tion of subsequence (x t−τ , x t−τ +1 , . . . , x t−1 ), i.e., h t =
sp cient ρ is very important for emotion-based modulation. Here,
SumPooling(h t−τ , h t−τ +1 , . . . , h t−1 ). The probability that item we adopt a simple self-adaptive strategy to adjust ρ by using
x t is next to sequence (x t−τ , x t−τ +1 , . . . , x t−1 ) can be formu- the following formula:
lated with the softmax function as follows: EPOCH − ep
 sp  ρep = (ρmax − ρmin ) ∗ + ρmin (11)
exp h tT h t EPOCH
pt =  V  T sp  (8)
where ρmin and ρmax are the initial and final emotion inertia
i=1 exp h i h t
coefficient, respectively, EPOCH is the maximum epoch, and
where V denotes the size of vocabulary. ep is the current epoch.
Unfortunately, the cost of computing pt in (8) is pro- From (11), we can see that at the initial epoch ep = 0, ρep
hibitively expensive since it is proportional to V . Obviously, corresponds to ρmax , and when ep approaches EPOCH, ρep
this is completely at odds with high efficiency of emotion gradually decreases to ρmin . Rationale behind this is that, like
intelligence. To address the issue, we apply the negative other deep learning models, ELSTM will gradually converge
sampling technique [55] to draw negative examples from the to an optimal point in solution space while the network is
background distribution. With negative sampling, (8) can be evolving. Hence, in early stages, larger coefficient values are
rewritten as (9). Finally, emotion state is decided based on required so that ELSTM can have higher evolving speed,
Rule 1
 sp 
whereas in later stages, smaller coefficient values are given
exp h tT h t to network neurons so as to make them stable gradually.
pt =  sp    T sp  (9)
exp h tT h t + NS i=1 exp h t h i
where N S is the number of negative examples from sampling, D. Model Inference
which is quite small compared with V . Inference of ELSTM mainly consists of two information
Rule 1 (Emotion State Determination): If pt > ET, then propagation processes, i.e., forward propagation and backward
emotion estimator outputs emotion state “Happy”; else outputs propagation. The former is the above-mentioned workflow in
“Sad.” ET is a predetermined threshold. Section III-A, and here, we describe the latter in detail.
Let outt be the output difference as computed by any
C. Emotion-Based Modulator subsequent layers of deep neural network and h t be the
output difference as computed by the next time-step ELSTM.
Core mission of emotion-based modulator is to transform
For simplicity, we define
neutron emotion state es into emotion-based modulation ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
action. In order to fulfill the mission, we propose a set of at Wa Ua ba
⎢ it ⎥ ⎢ Wi ⎥ ⎢ Ui ⎥ ⎢ bi ⎥
state-action mapping rules [formalized as (10)] as follows. gatest = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ f t ⎦, W = ⎣ W f ⎦, U = ⎣ U f ⎦, b = ⎣ b f ⎦.
Rule 2 (Increase Emotion Modulation Factor): This action
commonly happens when the emotion state of ELSTM is ot Wo Uo bo

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HUANG et al.: ATTENTION-EMOTION-ENHANCED CONVOLUTIONAL LSTM 7

Fig. 3. Architecture of AEC-LSTM model.

Following common gradient descent approach and chain granularity level, e.g., word- or phrase-level, attention mecha-
rule for error flow and truncation, we obtain the following nism. Here, we design an attention-emotion-enhanced convo-
gradients of output difference: lutional LSTM network (AEC-LSTM) for sentiment detection,
as shown in Fig. 3. In this network, we introduce convolution
δh t = h t + outt (12) operation to deal with concatenated word vectors and leverage
 
δct = δh t ◦ ot ◦ 1 − tanh2 (ct ) + δct+1 ◦ ft+1 (13) topic attention mechanism to adjust sentiment polarity mem-
  bership of a given text sequence.
δat = δct ◦ i t ◦ 1 − at2 (14)
δi t = δct ◦ at ◦ i t ◦ (1 − i t ) (15)
  A. Convolutional ELSTM
δ f t = et ◦ δct ◦ ct−1 ◦ ft ◦ 1 − f t (16)
Convolution operation is a natural simulation of
δot = δh t ◦ tanh(ct ) ◦ ot ◦ (1 − ot ) (17) template-matching phenomenon existing in the human
δgatest = [δat , δi t , δ f t , δot ]T (18) reading process. The 2-D convolution has achieved great
δx t = W T ◦ δgatest (19) success in image representation learning, and yet, 1-D
convolution is frequently applied to the concatenated word
δh t−1 = U T ◦ δgatest (20)
vectors in most existing work of text sequence representation
where f t = σ (W f x t + U f h t−1 + b f ). With the gradients, the learning. Here, we focus on the simulation of word-level
final updates to the internal parameters W, U , and b can be template-matching.
computed as follows, where ⊗ is an outer product operator. For a sequence of text with length L (note that we do
Note that the internal parameters W, U , and b are shared padding when necessary), with the help of word embedding
across the whole sequence, and thus, we need to take the same techniques such as word2vec and GloVe, it can be represented
summation over t as a vector of size L ∗ d, i.e., x 0:L−1 = x 0 ⊕ x 1 ⊕ · · · ⊕ x L−1 ,
where x i ∈ R d (i = 0, 1, . . . , L − 1) is the d-dimensional
δW = δgatest ⊗ x t (21) vector representation of the i th word in the text sequence
t and ⊕ is concatenation operator for word vectors. A filter
δU = δgatest ⊗ h t−1 (22) w = w0 ⊕ w1 ⊕ · · · ⊕ wl−1 , where l is the size of filter
t w, is applied to a span of l words (word window) in the
δb = δgatest . (23) text sequence in word-level convolution operation, resulting
t
in a similarity score si . This operation can be formalized as
follows:
IV. AEC-LSTM FOR S ENTIMENT C LASSIFICATION si = f (w  x i:i+l−1 +b) = f (w T x i:i+l−1 +b) (24)
This section is motivated by the observation that the where f is a nonlinear activation function such as sigmoid,
topic-level attention mechanism may be more beneficial to hyperbolic tangent, and ReLU,  is dot product of two vectors
text sentiment polarity detection than the existing concrete (i.e., word vector and filter vector), x i:i+l−1 is a word sequence

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

with length l staring from position i , i.e., x i:i+l−1 = x i ⊕ x i+1 weight computation, since the n-gram may not necessarily be
⊕ · · · ⊕ x i+l−1 , and b is the convolution bias. a word. To address the issue, we employ summation method,
The same filter is used to extract local features for each i.e., if the size of n-gram is more than one word, we sum up
window of the given text, that is, we can extract the n-grams the topic-word vectors, where each of which corresponds to a
feature vector of size L − l + 1 using the filter over all word word in the n-gram
windows of the sentence. Obviously, various and sufficient fea- exp(h i  Ti )
ture maps can be obtained by applying various combinations ηi =  L−l+1
. (26)
of multiple filter weights and/or filter lengths. k=1
2
exp(h k  Tk )
After convolution operation, a max-pooling operation is With the obtained attention distribution, the opinionated input
applied to the output of convolutional layer, which transforms text is represented as the weighted sum of all the hidden
the feature map of size L − l + 1 to (L − l + 1/2) . Then, vectors, as shown in the following:
we have the feature map v = [v 1 , v 2 , . . . , v (L−l+1/2) ]. If we L−l+1
apply m filters to the similarity score vector, we have a feature r=
2
ηi h i . (27)
map set P = {v 1 , v 2 , . . . , v m }. It is worth noting that here, i=1

we choose the max-pooling function instead of other functions


due to its low computational complexity. C. Sentiment Classification
To make full use of different features corresponding to With the topic-aware attentive representation r , defined in
the same n-gram, which is produced by different convolution (27), of a given input text X, softmax classifier is used
filters, we execute concatenation operation on the resultant to estimate sentiment distribution of X, as shown in the
features from pooling layer. Specially, for a given n-gram following:
x i:i+l−1 , the i th component in each feature map in set P
is concatenated together as a vector ci . The concatenation p(y|X) = soft max(W (s)r + b (s) ) (28)
operation is formalized as follows: y  = arg max p(y|X) (29)
y∈Y
ci = v i1 ⊕ v i2 ⊕ · · · ⊕ v im (25) where Y = {Y1 , Y2 , . . . , Y|Y | } is sentiment label set.
where v ik (k
= 1, 2, . . . , m) is the i th component of the For model training, we use the cross-entropy between the
kth feature map vector v k generated from the convolution predicted class probabilities and the ground-truth labels as
operation between filter k and the i th ngram. the loss function of our model. L 2 regularization term is
added in the following loss function to alleviate the over-fitting
problem:
B. Topic Attention Mechanism  y
y
Recently, attention mechanism has been used to improve the Loss = − ld · log l˜d + λ 2
F (30)
d∈D y∈Y
performance of deep learning models, by selectively focusing
on specific parts of visual inputs or sequence inputs. Previ- y y
where and
ld l˜d
denote ground-truth and predicted sentiment
ous attention mechanisms for text sentiment analysis tasks distribution of text sequence d, respectively, D the training
usually compute the attention distribution, according to the corpus, Y the sentiment label set, the model parameters of
relation between the vectors that correspond to some particular AEC-LSTM, and λ a L 2 regularization parameter.
fine-grained language objects such as aspects and words, and
the hidden sentiment vectors generated by a neural model
V. E XPERIMENTAL S TUDY
such as LSTM, CNN, and autoencoder. However, the relation
between topic and sentiment, which exists objectively, has In this section, we conduct extensive experiments to verify
not been taken into consideration in existing methods. In the the effectiveness of our AEC-LSTM model.
context of natural language processing, topic modeling is
described as a method of uncovering hidden structure in a A. Experimental Settings
collection of texts. The “topics” produced by topic modeling
1) Data Sets: We use four real-world data sets to evaluate
techniques are the recurring patterns of co-occurring words,
our AEC-LSTM model. Table I shows statistics of the data
i.e., clusters of similar words. In order to adjust sentiment
sets used, where Avg Len denotes average text sequence
polarity membership of a given text, we devise a topic attention
(microblog, review) length. The first two data sets, i.e., IMDB1
mechanism as below.
and Yelp2014,2 are well-known benchmark data sets in text
Attention weights are computed via incorporating the
sentiment analysis research. Specifically, IMDB is a binary
n-gram topic distribution into the attentive representations.
sentiment analysis data set consisting of 50 000 reviews from
Specially, given topic vector Ti and hidden vector h i of i -th
the Internet Movie Database, where each review is labeled
n-gram of the input text, the corresponding topic attention
either positive or negative. Yelp2014 provides fine grained
weight can be computed using formula (26). Although there
sentiment labels, including five kinds of labels for reviews,
are many methods to extract topics from corpus, here we adopt
i.e., very negative, negative, neutral, positive and very positive.
the classical topic model Latent Dirichlet Allocation (LDA)
[56] due to its simplicity. However, topic-word distributions 1 http://ai.stanford.edu/∼amaas/data/sentiment/

generated by LDA cannot be used directly for topic attention 2 https://github.com/revantkumar/Yelp-Dataset-Challenge-2014

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HUANG et al.: ATTENTION-EMOTION-ENHANCED CONVOLUTIONAL LSTM 9

TABLE I 4) Training: With the optimized hyperparameters, each


D ESCRIPTION OF THE F OUR D ATA S ETS U SED model is trained through standard SGD over shuffled
mini-batches from training set of each data set. Network
weights and biases are randomly initialized using a truncated
normal distribution Norm(0, 1.0), and Adam optimizer [59] is
further utilized to adaptively tune learning rate during training.
It is worth noting that, in our experiments, we use the
The other two data sets JDReview and SinaWeibo are col- Skip-gram model of Word2Vec tool to prelearn word vec-
lected by web crawlers from www.jd.com and ww.weibo.com, tors for providing word embeddings for the deep learning
respectively. We recruit three volunteer students to manually baselines. Moreover, we use the JIEBA tool to conduct word
annotate SinaWeibo with sentiment labels. If a microblog has segmentation for building Chinese vocabulary for JDReview
different sentiment labels given by volunteers while anno- and SinaWeibo.
tating, we determine its sentiment polarity label according B. Performance Comparison
to the high-voting principle. For JDReview, we utilize star
The main question we want to answer is whether the pro-
rating given by users to classify the reviews into five grades.
posed AEC-LSTM can improve the performance of text senti-
Specifically, if a review receives 1 star, 2 star, 3 star, 4 star
ment detection. In this section, we first compare AEC-LSTM
or 5 star, then the review is labeled as very negative, negative,
with different baselines, including classical shallow learn-
neutral, positive, or very positive, respectively. To eliminate
ing approaches and recently proposed representative deep
noisy data in JDReview and SinaWeibo, we discard those
learning approaches. Also, we further compare the proposed
reviews with length less than 5.
topic mechanism with the existing attention models built on
2) Metrics: To evaluate performance of methods for text
lexicon-based resources. Finally, we devise some variants of
sentiment detection, we adopt the widely-used metric accuracy
AEC-LSTM to check the contributions of various components
(ACC)
in AEC-LSTM.
M To verify the classification performance, we use t-test to
ACC = (31)
N compare the means of the results produced by AEC-LSTM
where M is the number of correctly classified samples, N is and the best one among the competitors. t-test assumes that
the size of corpus to be analyzed. the data have been sampled from a normally distributed popu-
3) Hyperparameters: It is well-known that hyperparameter lation. A sample size around 40 allows the normality assump-
optimization for deep learning models is an intractable prob- tions conducive to performing the t-test [60]. Based on the
lem. Currently the overwhelming majority of hyperparameter idea of k-fold cross validation, we propose a training-testing
optimization approaches is manual and requires expertise and schema as follows. First, we construct 40 new training–testing
extensive trial and error. Studies [57], [58] suggest that only pair sets for each corpus in the following way: the original
a few of the hyperparameters really matter for most data sets, testing set of each corpus is evenly split into 20 subsets, and
though there are many hyperparameters with each of multi- then, three subsets are randomly drawn from the 20 subsets
farious deep models. Here we select three structure-related to create a new testing set, and the left subsets is formed
hyperparameters (the number of hidden states, the number as a new training set. Then, we conduct the training–testing
of filters and the size of filter) and three training-related experiments on each model with each pair of the newly
hyperparameters (learning rate, dropout rate and batch size) constructed training–testing pair sets from each corpus, and
as optimization objects. Specially, we devise a hyperparameter finally, we have reported the average performance. It is worth
optimization strategy consisting of two stages coarse-tuning noting that in each of the training-testing experiments, network
and fine-tuning. In stage coarse-tuning, a new mixed data weights of each model are randomly initialized. Finally, the
set is firstly constructed through randomly selecting 40% average performance of each model running on each corpus is
instances from each experimental data set, then random-search reported. Based on the resultant statistics means and standard
and cross-validation are used to optimize structure-related deviations, we have conducted t-test to check if the reported
hyperparameters of each model based on the mixed data set. accuracy differences are significant.
Moreover, embedding cluster validity (32) on the validation 1) Comparison of Different Sentiment Classification Meth-
set is used to evaluate model’s generalization ability. In stage ods: We compare AEC-LSTM with five groups of baseline
fine-tuning, Bayesian optimization technique [58] is utilized methods that only use review text for learning. Group 1 meth-
to tune the training-related hyperparameters for the model ods include well-known linguistic and context features for
 SVM classifiers. Deep learning methods in groups 2–4 are,
(RVi ,RV j )∈InterC dist(RVi , RV j )
1
|InterC|
em_val = 1  (32) respectively, based on CNNs, LSTMs, and hybrid structures of
|IntraC| (RVi ,RV j )∈IntraC dist(RVi , RV j ) CNN and LSTM. Deep models in group 5 are with top-ranked
where em_val denotes the ration between intra-class affin- SOTA results in sentiment classification, i.e., ULMFiT3 [61],
ity and inter-class separability among embedding vectors BERT4 [62], and XLNet5 [63].
from validation set. InterC and IntraC are sets of inter-class 3
http://nlp.fast.ai/ulmfit
review pairs and intra-class review pairs in the validation set 4 https://github.com/google-research/bert

respectively. 5 https://github.com/zihangdai/xlnet

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE II
E XPERIMENTAL R ESULTS C OMPARISON B ETWEEN AEC-LSTM AND O THER BASELINES ON R EAL -L IFE D ATA S ETS

Group 1 methods include frequently used linguistic and 2) H-CRAN [69]: H-CRAN constructs document sentiment
context features for SVM classifiers (LIBLINEAR6 ). representation vectors via combining convolution-based
1) SVM-Unigram: A LIBLINEAR is built with unigram attentions with RNNs.
features of input text reviews. 3) C-LSTM [70]: C-LSTM extracts a sequence of
2) SVM-Bigram: Only the bigram features of input text higher-level phrase representations from CNN and feeds
reviews are applied to train a LIBLINEAR. them into LSTM to obtain sentence representation.
3) NBSVM-Bigram7 [64]: An SVM is built over Naive Experimental results are summarized in Table II. A com-
Bayes (NB) log-count ratios as feature values. parative analysis based on Table II is given.
Group 2 methods are based on deep embedding representa- First, for all the data sets used, AEC-LSTM outperforms
tion produced by CNN architecture. all the competitors, including traditional machine learning
1) CNN-rand [65]: All words are randomly initialized and classification methods and deep learning methods, in terms
then modified during training. of averaged accuracy, and the advantage of AEC-LSTM over
2) CNN-multichannel8 [65]: A model with two sets of word the competitors is statistically significant, with the excep-
vectors. Each set of vectors is treated as a channel and tion of data set IMDB, in all data sets. Second, similar to
each filter is applied. its competitors, AEC-LSTM is better at binary sentiment
3) WDE-CNN 9 [25]: WDE-CNN treats review ratings as polarity detection tasks such as IMDB and SinaWeibo than
weak labels to train deep CNN networks for review multiple sentiment classification tasks such as Yelp2014 and
embedding learning. JDReview. Third, comparing SVM group with other groups,
In group 3, algorithms are all based on LSTM neural units: we can see that methods based on deep learning can certainly
1) LSTM-GCA [42]: LSTM-GCA is a mixture of cognition not guarantee its better sentiment classification results than
grounded attention (CGA) and classical LSTM. SVM-based methods. If handcrafted sentiment features are of
2) Tree-LSTM10 [66]: Tree-LSTM generalizes the standard sufficiently high quality, SVM-based methods can also achieve
LSTM architecture to tree-structured network topolo- higher classification accuracy than some deep models. For
gies. instance, NBSVM-Bigram defeats many of the deep learning
3) LSTM-GRNN11 [67]: LSTM-GRNN aims to compose an competitors in the experimental data sets. Last but not least,
LSTM-based word encoder and a GRU-based sentence a further examination of results on data sets with binary
encoder for document sentiment representation. sentiment polarity, i.e., SinaWeibo and IMDB, indicates that
Group 4 methods combine CNNs with LSTMs to construct SinaWeibo has much lower accuracy than on IMDB. The
deep neural networks for text representation. explanation may be that SinaWeibo has broader topics than
IMDB since reviews in IMDB are mainly related to movies,
1) AC-BiLSTM [68]: AC-BiLSTM is an attention-based
whereas reviews in SinaWeibo have diverse topics and are
bidirectional LSTM model with convolution layer.
collected according to posting time.
6 https://www.csie.ntu.edu.tw/∼cjlin/liblinear/ From the above analysis, we conclude that AEC-LSTM
7 http://www.stanford.edu/∼sidaw
effectively improves text sentiment analysis in terms of senti-
8 https://github.com/galsang/CNN-sentence-classification-pytorch
9
ment classification accuracy and is promising.
https://www.dropbox.com/s/yvo4×4c6dqtx3nm
10 https://github.com/stanfordnlp/treelstm 2) Comparison of Different Attentions: Topic attention
11 https://github.com/tmechsner/grnn-document-sentiment mechanism is an important characteristic of AEC-LSTM.

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HUANG et al.: ATTENTION-EMOTION-ENHANCED CONVOLUTIONAL LSTM 11

TABLE III
E XPERIMENTAL R ESULTS C OMPARISON B ETWEEN D IFFERENT ATTENTION M ODELS ON R EAL -L IFE D ATA S ETS

less than those of AEC-LSTM in Table II, although the


difference values fluctuate with variants, as indicates that the
components make different positive contributions to senti-
ment classification performance improvement and 2) all the
accuracy values marked with * fall in components TA or
ELSTM, as shows that, compared to CNN, either TA or
ELSTM exerts a bigger positive effect on enhancement of
sentiment classification performance and also manifests that
TA and ELSTM are nondominated in terms of contributions
to classification accuracy improvement.
The above observations demonstrate that emotion-based
enhancement and topic attention mechanism adjustment can
exert different positive influence on different sentiment analy-
sis tasks.
Fig. 4. Impact of text sequence length on sentiment classification accuracy.
(a) IMDB, (b) Yelp2014, (c) SinaWeibo, and (d) JDReview.

In order to evaluate the proposed topic attention (TA) C. Model Analysis


mechanism, we select three state-of-the-art attention mod- There are numerous factors that affect the performance of
els as competitors, i.e., HN-ATT [39], NSC-ATT [40], and a deep learning model for sentiment classification. In this
TreeBiGRU-ATT [41]. HN-ATT is an HA network consisting section, we discuss the impact of input data and model
of word- and sentence-level attention mechanism. NSC-ATT parameters on AEC-LSTM.
is a neural sentiment classifier, which considers user and 1) Impact of Text Sequence Length: To investigate the
product information via attentions over different semantic impact of text sequence with various lengths, we evaluate
levels. TreeBiGRU-ATT is a bidirectional RNNs with an atten- AEC-LSTM on the four data sets, and the results are given
tion mechanism. For fair comparison between the proposed in Fig. 4. We can see from Fig. 4 that sentiment classification
TA and other attention mechanisms, we extract the three accuracy of AEC-LSTM fluctuates with review length on all
attention mechanisms from the three models, i.e., User Product data sets, and the fluctuation ranges are different among the
Attention (UPA) of NSC-ATT, HA of HN-ATT, and SA of four data sets. In general, AEC-LSTM achieves higher accu-
TreeBiGRU-ATT and integrate the four attention mechanisms, racy on reviews with moderate length, but performs worse on
i.e., HA, UPA, SA, and TA, into Classical-LSTM, and exper- reviews that are either too short or too long. The explanation is
imentally compare TA with other three counterparts. Exper- that reviews with fewer content usually contain less meaning-
imental results are listed in Table III. As can be seen from ful information. For instance, short reviews may be spams,
the results, TA mechanism outperforms the three competitors. while reviews that are too long tend to contain noisy and
This means that the proposed topic attention mechanism is redundant information, destroying sentiment patterns hidden
better than existing attention models, including word level, in corpus and causing difficulty in sentiment classification.
sentence level, and structure level, when learning sentiment 2) Impact of Word Embeddings Methods: Word embeddings
representations of reviews. usually play a very important role in text sequence represen-
3) Comparison of Different Components: To evaluate tation learning, and CBOW and Skip-gram in Word2Vec are
how components of AEC-LSTM exert influence on senti- widely used methods for word embeddings. In this section,
ment classification performance, we devise three variants we conduct experiments to investigate the impact of both
AEC-LSTM/ELSTM, AEC-LSTM/CNN, and AEC-LSTM/TA CBOW and Skip-gram on AEC-LSTM, and the experiment
through removing respective components ELSTM, CNN, and results are shown in Fig. 5. From Fig. 5, we can observe
TA. The experimental results of these variants are shown in that Skip-gram outperforms CBOW in terms of sentiment
Table IV, where the accuracy marked with * in each column classification accuracy on the four data sets. On the other hand,
means that the component removed from AEC-LSTM has the performance of Skip-gram over CBOW varies across the
incurred the greatest accuracy loss in the three components. data sets, for example, Skip-gram performs significantly better
From Table IV, we can see that: 1) classification accuracies on SinaWeibo and JDReview than on IMDB and Yelp2014,
of the three variants running on each data set are, respectively, whereas performance gain on IMDB is almost negligible.

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

TABLE IV
E XPERIMENTAL R ESULTS OF AEC-LSTM VARIANTS ON R EAL -L IFE D ATA S ETS

(number of topics) can decrease the sentiment classification


accuracy of AEC-LSTM. These observations are not diffi-
cult to explain: on the one hand, a too small number of
topics may produce big topics containing some distinguished
subtopics, as will separate the correlation between sentiment
and subtopics; on the other hand, a too large number of topics
can make some intact topics fragmented and generate noisy
topics in sentiment inference. In the above two cases topic
attention mechanism may give improper weights to sentiment
embeddings of text sequence, and hence impair sentiment
classification performance of AEC-LSTM.

Fig. 5. Impact of word embedding methods on sentiment classification


VI. C ONCLUSION
accuracy. In this article, we propose a deep recurrent neural net-
work model called AEC-LSTM, for text sequence sen-
timent classification. Based on emotion psychology and
emotion intelligence, an improved LSTM cell ELSTM is
devised to incorporate emotion, so as to make a trade-
off between information memorizing and forgetting. Then,
a topic attention mechanism is proposed to effectively conduct
weight adjustment for text subsequence hidden representation.
Finally, topic attention mechanism, convolutional ELSTM, and
LDA topic model are integrated together, resulting in our
AEC-LSTM model. Extensive experimental results indicate
that AEC-LSTM outperforms the competitors in standard
sentiment classification tasks.
Our future work will focus on two main directions. First,
Fig. 6. Sentiment classification accuracy versus number of topics. we will extend AEC-LSTM to conduct aspect-level senti-
ment classification, by utilizing various priori fine-granular
sentiment knowledge from sentiment dictionaries. Second,
Recalling that CBOW is learning to predict the word by we will apply multi-objective optimization techniques to opti-
the context and Skip-gram is designed to predict the context mize parameters of AEC-LSTM model, obtaining the optimal
based on the given word, we may get such deduction: reviews balance between training-data-fitting accuracy and network
in SinaWeibo and JDReview are relatively short and usually connection sparseness, so as to improve the robustness and
contain abundant infrequent words, for which Skip-gram is performance of the model.
more favorable, and when facing relatively longer reviews
that may contain fewer infrequent words, such as IMDB and
Yelp2014, performance gap of Skip-gram may narrow down R EFERENCES
gradually. [1] B. Liu, “Sentiment analysis and opinion mining,” Synthesis Lect. Hum.
3) Impact of the Number of Topics: In this section, Lang. Technol., vol. 5, no. 1, pp. 1–167, 2012.
[2] K. Ravi and V. Ravi, “A survey on opinion mining and sentiment analy-
we target at exploring how the proposed topic atten- sis: Tasks, approaches and applications,” Knowl.-Based Syst., vol. 89,
tion mechanism works with topic number. A group of pp. 14–46, Nov. 2015.
experiments on AEC-LSTM with different topic number [3] A. Giachanou and F. Crestani, “Like it or not: A survey of Twitter
sentiment analysis methods,” ACM Comput. Surv., vol. 49, no. 2,
{1, 10, 20, 30, 40, 50, 60, 70, 80} is conducted and the experi- pp. 1–42, 2016.
mental results are shown in Fig. 6. [4] L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, “A survey of sentiment
From Fig. 6, we can see that the number of topics exerts analysis in social media,” Knowl. Inf. Syst., vol. 60, pp. 1–47, Jul. 2018.
[5] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up: Sentiment clas-
different influences on sentiment classification accuracy of sification using machine learning techniques,” in Proc. ACL Conf.
AEC-LSTM on different data sets. Specifically, AEC-LSTM Empirical Methods Natural Lang. Process., vol. 10. Stroudsburg, PA,
performs the best when the topic number is defined as 40 for USA: Association for Computational Linguistics, 2002, pp. 79–86.
[6] L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis:
IMDB, 50 for Yelp2014, 70 for Sinaweibo, and 40 for A survey,” Wiley Interdiscipl. Rev., Data Mining Knowl. Discovery,
JDReview. This indicates that an inadequate topic granularity vol. 8, no. 4, p. e1253, 2018.

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HUANG et al.: ATTENTION-EMOTION-ENHANCED CONVOLUTIONAL LSTM 13

[7] S. Shi, M. Zhao, J. Guan, Y. Li, and H. Huang, “A hierarchical LSTM [32] A. Khashman, “Modeling cognitive and emotional processes: A novel
model with multiple features for sentiment analysis of sina weibo texts,” neural network architecture,” Neural Netw., vol. 23, no. 10,
in Proc. Int. Conf. Asian Lang. Process. (IALP), Dec. 2017, pp. 379–382. pp. 1155–1163, Dec. 2010.
[8] J. Zhou, Y. Lu, H.-N. Dai, H. Wang, and H. Xiao, “Sentiment analysis of [33] E. Lotfi and M.-R. Akbarzadeh-T, “Practical emotional neural networks,”
Chinese microblog based on stacked bidirectional LSTM,” IEEE Access, Neural Netw., vol. 59, pp. 61–72, Nov. 2014.
vol. 7, pp. 38856–38866, 2019. [34] C. Yu, M. Zhang, F. Ren, and G. Tan, “Emotional multiagent reinforce-
[9] C. Baziotis, N. Pelekis, and C. Doulkeridis, “DataStories at SemEval- ment learning in spatial social dilemmas,” IEEE Trans. Neural Netw.
2017 task 4: Deep LSTM with attention for message-level and topic- Learn. Syst., vol. 26, no. 12, pp. 3083–3096, Dec. 2015.
based sentiment analysis,” in Proc. 11th Int. Workshop Semantic Eval. [35] O. K. Oyedotun and A. Khashman, “Prototype-incorporated emotional
(SemEval), 2017, pp. 747–754. neural network,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8,
[10] M. Yang, W. Tu, J. Wang, F. Xu, and X. Chen, “Attention based LSTM pp. 3560–3572, Aug. 2018.
for target dependent sentiment classification,” in Proc. 31st AAAI Conf. [36] G. R. Markadeh, E. Daryabeigi, C. Lucas, and M. A. Rahman,
Artif. Intell., 2017, pp. 5013–5014. “Speed and flux control of induction motors using emotional intelligent
[11] D. S. Sachan, M. Zaheer, and R. Salakhutdinov, “Revisiting LSTM net- controller,” IEEE Trans. Ind. Appl., vol. 47, no. 3, pp. 1126–1135,
works for semi-supervised text classification via mixed objective func- May 2011.
tion,” in Proc. AAAI Conf. Artif. Intell., vol. 33, 2019, pp. 6940–6948. [37] J. G. Taylor and N. F. Fragopanagos, “The interaction of attention and
[12] Y. Chen, J. Yuan, Q. You, and J. Luo, “Twitter sentiment analysis via emotion,” Neural Netw., vol. 18, no. 4, pp. 353–369, May 2005.
bi-sense emoji embedding and attention-based LSTM,” in Proc. 26th [38] S. Franklin, T. Madl, S. D’Mello, and J. Snaider, “LIDA: A systems-
ACM Int. Conf. Multimedia, Oct. 2018, pp. 117–125. level architecture for cognition, emotion, and learning,” IEEE Trans.
[13] G. Zhu et al., “Redundancy and attention in convolutional LSTM for Auton. Mental Develop., vol. 6, no. 1, pp. 19–41, Mar. 2014.
gesture recognition,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, [39] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical
no. 4, pp. 1323–1335, Apr. 2020. attention networks for document classification,” in Proc. Conf. North
[14] M.-H. Su, C.-H. Wu, K.-Y. Huang, and T.-H. Yang, “Cell-coupled long Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol., 2016,
short-term memory with L-skip fusion mechanism for mood disorder pp. 1480–1489.
detection through elicited audiovisual features,” IEEE Trans. Neural [40] H. Chen, M. Sun, C. Tu, Y. Lin, and Z. Liu, “Neural sentiment
Netw. Learn. Syst., vol. 31, no. 1, pp. 124–135, Jan. 2020. classification with user and product attention,” in Proc. Conf. Empirical
[15] X. Li, M. Chen, F. Nie, and Q. Wang, “A multiview-based parameter Methods Natural Lang. Process., 2016, pp. 1650–1659.
free framework for group detection,” in Proc. 31st AAAI Conf. Artif. [41] F. Kokkinos and A. Potamianos, “Structural attention neural networks
Intell., 2017, pp. 4147–4153. for improved sentiment analysis,” 2017, arXiv:1701.01811. [Online].
[16] M. Lippi, M. A. Montemurro, M. D. Esposti, and G. Cristadoro, “Natural Available: http://arxiv.org/abs/1701.01811
language statistical features of LSTM-generated texts,” IEEE Trans. [42] Y. Long, R. Xiang, Q. Lu, C.-R. Huang, and M. Li, “Improving attention
Neural Netw. Learn. Syst., vol. 30, no. 11, pp. 3326–3337, Nov. 2019. model based on cognition grounded data for sentiment analysis,” IEEE
[17] K. Shuang, R. Li, M. Gu, J. Loo, and S. Su, “Major-minor long short- Trans. Affect. Comput., early access, Mar. 4, 2019, doi: 10.1109/
term memory for word-level language model,” IEEE Trans. Neural Netw. TAFFC.2019.2903056.
Learn. Syst., vol. 31, no. 10, pp. 3932–3946, Oct. 2020. [43] Z. Zhang, Y. Zou, and C. Gan, “Textual sentiment analysis via three
different attention convolutional neural networks and cross-modality
[18] T. Ergen and S. S. Kozat, “Unsupervised anomaly detection with LSTM
consistent regression,” Neurocomputing, vol. 275, pp. 1407–1415,
neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 8,
Jan. 2018.
pp. 3127–3141, Aug. 2020.
[44] D. Deng, L. Jing, J. Yu, and S. Sun, “Sparse self-attention LSTM for
[19] A. M. Isen and B. Means, “The influence of positive affect on decision-
sentiment lexicon construction,” IEEE/ACM Trans. Audio, Speech, Lang.
making strategy,” Social Cognition, vol. 2, no. 1, pp. 18–31, Mar. 1983.
Process., vol. 27, no. 11, pp. 1777–1790, Nov. 2019.
[20] J. Storbeck and G. L. Clore, “On the interdependence of cognition and [45] C. Gan, L. Wang, Z. Zhang, and Z. Wang, “Sparse attention based
emotion,” Cognition Emotion, vol. 21, no. 6, pp. 1212–1237, Sep. 2007. separable dilated convolutional neural network for targeted sentiment
[21] F. Huang, S. Zhang, J. Zhang, and G. Yu, “Multimodal learning for analysis,” Knowl.-Based Syst., vol. 188, Jan. 2020, Art. no. 104827.
topic sentiment analysis in microblogging,” Neurocomputing, vol. 253, [46] Z. Yuan, S. Wu, F. Wu, J. Liu, and Y. Huang, “Domain attention model
pp. 144–153, Aug. 2017. for multi-domain sentiment classification,” Knowl.-Based Syst., vol. 155,
[22] S. Liu, X. Cheng, F. Li, and F. Li, “TASC: Topic-adaptive sentiment pp. 1–10, Sep. 2018.
classification on dynamic tweets,” IEEE Trans. Knowl. Data Eng., [47] Z. Lei, Y. Yang, and M. Yang, “SAAN: A sentiment-aware attention
vol. 27, no. 6, pp. 1696–1709, Jun. 2015. network for sentiment analysis,” in Proc. 41st Int. ACM SIGIR Conf.
[23] F. Chen, R. Ji, J. Su, D. Cao, and Y. Gao, “Predicting microblog Res. Develop. Inf. Retr., Jun. 2018, pp. 1197–1200.
sentiments via weakly supervised multimodal deep learning,” IEEE [48] D. Tang, F. Wei, B. Qin, N. Yang, T. Liu, and M. Zhou, “Senti-
Trans. Multimedia, vol. 20, no. 4, pp. 997–1007, Apr. 2018. ment embeddings with applications to sentiment analysis,” IEEE Trans.
[24] G. Yang, H. He, and Q. Chen, “Emotion-semantic-enhanced neural Knowl. Data Eng., vol. 28, no. 2, pp. 496–509, Feb. 2016.
network,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, [49] Y. Ren, Y. Zhang, M. Zhang, and D. Ji, “Context-sensitive Twitter
no. 3, pp. 531–543, Mar. 2019. sentiment classification using neural network,” in Proc. 30th AAAI Conf.
[25] W. Zhao et al., “Weakly-supervised deep embedding for product review Artif. Intell., 2016, pp. 1–7.
sentiment analysis,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 1, [50] J. Cheng, S. Zhao, J. Zhang, I. King, X. Zhang, and H. Wang, “Aspect-
pp. 185–197, Jan. 2018. level sentiment classification with HEAT (HiErarchical ATtention) net-
[26] L. Gui, Y. Zhou, R. Xu, Y. He, and Q. Lu, “Learning representations work,” in Proc. ACM Conf. Inf. Knowl. Manage., Nov. 2017, pp. 97–106.
from heterogeneous network for sentiment classification of product [51] Q. Liu, H. Zhang, Y. Zeng, Z. Huang, and Z. Wu, “Content attention
reviews,” Knowl.-Based Syst., vol. 124, pp. 34–45, May 2017. model for aspect based sentiment analysis,” in Proc. World Wide Web
[27] X. Wang, W. Jiang, and Z. Luo, “Combination of convolutional and Conf. (WWW), 2018, pp. 1023–1032.
recurrent neural network for sentiment analysis of short texts,” in Proc. [52] A. B. Dieng, C. Wang, J. Gao, and J. Paisley, “TopicRNN: A recur-
26th Int. Conf. Comput. Linguistics, Tech. Papers (COLING), 2016, rent neural network with long-range semantic dependency,” 2016,
pp. 2428–2437. arXiv:1611.01702. [Online]. Available: http://arxiv.org/abs/1611.01702
[28] J. Wang, L.-C. Yu, K. R. Lai, and X. Zhang, “Dimensional sentiment [53] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
analysis using a regional CNN-LSTM model,” in Proc. 54th Annu. Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
Meeting Assoc. Comput. Linguistics, 2016, pp. 225–230. [54] T. W. Buchanan, “Retrieval of emotional memories,” Psychol. Bull.,
[29] T. Chen, R. Xu, Y. He, Y. Xia, and X. Wang, “Learning user and product vol. 133, no. 5, p. 761, 2007.
distributed representations using a sequence model for sentiment analy- [55] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distrib-
sis,” IEEE Comput. Intell. Mag., vol. 11, no. 3, pp. 34–44, Aug. 2016. uted representations of words and phrases and their compositionality,”
[30] P. Salovey and J. Mayer, “Emotional intelligence,” Imag., Cogn. Pers., in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 3111–3119.
vol. 9, no. 3, pp. 185–211, 1990. [56] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,”
[31] J. Martínez-Miranda and A. Aldea, “Emotions in human and artifi- J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar. 2003.
cial intelligence,” Comput. Hum. Behav., vol. 21, no. 2, pp. 323–341, [57] J. Bergstra and Y. Bengio, “Random search for hyper-parameter opti-
Mar. 2005. mization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012.

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

[58] W. Jia, C. Xiu-Yun, Z. Hao, X. Li-Dong, L. Hang, and D. Si-Hao, Changan Yuan received the Ph.D. degree in com-
“Hyperparameter optimization for machine learning models based on puter application technology from Sichuan Univer-
Bayesian optimization,” J. Electron. Sci. Technol., vol. 17, no. 1, sity, Chengdu, China, in 2006.
pp. 26–40, 2019. His research interests include computational intel-
[59] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimiza- ligence and data mining.
tion,” in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–13.
[60] B. Flury, A First Course in Multivariate Statistics. New York, NY, USA:
Springer, 2013.
[61] J. Howard and S. Ruder, “Universal language model fine-tuning
for text classification,” 2018, arXiv:1801.06146. [Online]. Available:
http://arxiv.org/abs/1801.06146
[62] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” in Proc.
Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang.
Technol., 2019, pp. 4171–4186.
[63] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and
Q. V. Le, “XLNet: Generalized autoregressive pretraining for lan- Shichao Zhang (Senior Member, IEEE) received
guage understanding,” in Proc. Adv. Neural Inf. Process. Syst., 2019, the Ph.D. degree in computer science from Deakin
pp. 5753–5763. University, Australia, in 2001.
[64] S. Wang and C. D. Manning, “Baselines and bigrams: Simple, good His research interests include information quality
sentiment and topic classification,” in Proc. 50th Annu. Meeting Assoc. and pattern discovery.
Comput. Linguistics, vol. 2, 2012, pp. 90–94.
[65] Y. Kim, “Convolutional neural networks for sentence classifica-
tion,” 2014, arXiv:1408.5882. [Online]. Available: http://arxiv.org/abs/
1408.5882
[66] K. S. Tai, R. Socher, and C. D. Manning, “Improved semantic represen-
tations from tree-structured long short-term memory networks,” 2015,
arXiv:1503.00075. [Online]. Available: http://arxiv.org/abs/1503.00075
[67] D. Tang, B. Qin, and T. Liu, “Document modeling with gated recurrent
neural network for sentiment classification,” in Proc. Conf. Empirical
Methods Natural Lang. Process., 2015, pp. 1422–1432.
[68] G. Liu and J. Guo, “Bidirectional LSTM with attention mechanism and
convolutional layer for text classification,” Neurocomputing, vol. 337, Jilian Zhang received the Ph.D. degree in informa-
pp. 325–338, Apr. 2019. tion systems from Singapore Management Univer-
[69] J. Du, L. Gui, Y. He, R. Xu, and X. Wang, “Convolution-based neural sity, Singapore, in 2014.
attention with applications to sentiment classification,” IEEE Access, His current research interests include data manage-
vol. 7, pp. 27983–27992, 2019. ment, query authentication for outsourced databases,
[70] C. Zhou, C. Sun, Z. Liu, and F. C. M. Lau, “A C-LSTM neural network data privacy, and data mining.
for text classification,” 2015, arXiv:1511.08630. [Online]. Available:
http://arxiv.org/abs/1511.08630

Faliang Huang received the Ph.D. degree in data


mining from the South China University of Tech-
nology, Guangzhou, China, in 2011.
His research interests include data mining and
natural language processing.
Shaojie Qiao received the Ph.D. degree from
Sichuan University, Chengdu, China, in 2009.
His research interests include artificial intelligence
and data mining.

Xuelong Li (Fellow, IEEE) is currently a Full Professor with the School


of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern
Polytechnical University, Xi’an, China.

Authorized licensed use limited to: Carleton University. Downloaded on May 25,2021 at 17:09:32 UTC from IEEE Xplore. Restrictions apply.

You might also like