0% found this document useful (0 votes)
29 views

Understanding Emotions in Text Using Deep Learning and Big Data (PRINTED)

Uploaded by

Suresh Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Understanding Emotions in Text Using Deep Learning and Big Data (PRINTED)

Uploaded by

Suresh Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Accepted Manuscript

Understanding emotions in text using deep learning and big data

Ankush Chatterjee, Umang Gupta, Manoj Kumar Chinnakotla, Radhakrishnan


Srikanth, Michel Galley, Puneet Agrawal

PII: S0747-5632(18)30615-0
DOI: https://doi.org/10.1016/j.chb.2018.12.029
Reference: CHB 5851

To appear in: Computers in Human Behavior

Received Date: 3 April 2018


Revised Date: 6 December 2018
Accepted Date: 17 December 2018

Please cite this article as: Chatterjee A., Gupta U., Chinnakotla M.K., Srikanth R., Galley M. & Agrawal
P., Understanding emotions in text using deep learning and big data, Computers in Human Behavior
(2019), doi: https://doi.org/10.1016/j.chb.2018.12.029.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Understanding Emotions in Text using


Deep Learning and Big Data

PT
Ankush Chatterjee1, Umang Gupta1, Manoj Kumar Chinnakotla2,
Radhakrishnan Srikanth2, Michel Galley3 and Puneet Agrawal1

RI
U SC
1. Microsoft , Hyderabad, India
2. Microsoft, Bellevue, USA
AN
3. Microsoft Research, Redmond, USA
M

Corresponding Author
D

Puneet Agrawal,
TE

Microsoft, Hyderabad, India


EP

Email: [email protected]
C
AC
ACCEPTED MANUSCRIPT

Understanding Emotions in Text


using Deep Learning and Big Data

PT
RI
Abstract

SC
Big data and deep learning algorithms combined with enormous computing
power have paved ways for significant technological advancements. Technology
is evolving to anticipate, understand and address our unmet needs. However,

U
to fully meet human needs, machines or computers must deeply understand
human behavior including emotions. Emotions are physiological states gener-
AN
ated in humans as a reaction to internal or external events. They are complex
and studied across numerous fields including computer science. As humans, on
M
reading “Why don’t you ever text me!”, we can either interpret it as a sad or
an angry emotion and the same ambiguity exists for machines as well. Lack
of facial expressions and voice modulations make detecting emotions in text a
D

challenging problem. However, in today’s online world, humans are increasingly


TE

communicating using text messaging applications and digital agents. Hence, it


is imperative for machines to understand emotions in textual dialogue to pro-
vide emotionally aware responses to users. In this paper, we propose a novel
EP

Deep Learning based approach to detect emotions - Happy, Sad and Angry in
textual dialogues. The essence of our approach lies in combining both semantic
and sentiment based representations for more accurate emotion detection. We
C

use semi-automated techniques to gather large scale training data with diverse
AC

ways of expressing emotions to train our model. Evaluation of our approach on


real world dialogue datasets reveals that it significantly outperforms traditional
Machine Learning baselines as well as other off-the-shelf Deep Learning models.
Keywords: Cognitive Computing, Chat bot, Deep learning, Structured
Semantics, Conversational agent, Long-Short Term Memory, Convolutional

Preprint submitted to Computers in Human Behavior December 17, 2018


ACCEPTED MANUSCRIPT

Networks

PT
1. Introduction

Technology is continuously evolving to amplify human ingenuity, to make


our day to day life simpler and, to anticipate and address our unmet needs. In

RI
order to anticipate our needs, it is essential for machines or computers to be
able to deeply understand human behavior. Human behavior is very complex.

SC
Culture, social norms, faith, language among many other things, play a role in
defining human behavior. In particular, understanding and expressing emotion
is a key element of human behavior. Emotions must be deeply understood by

U
machines and computers, to be able to anticipate human needs.

AN
Emotions such as happiness, anger, sadness etc. are physiological states
that humans routinely experience. In the field of cognitive computing, where
we develop technologies to mimic functioning of the human brain, understand-
M
ing emotions is an important area of research [1]. With growing prominence of
messaging platforms like WhatsApp and Twitter, there is an increased interac-
D

tion using textual dialogues. There are several digital agents and chat bots on
these messaging platforms and are currently being used by a large number of
TE

online users. The success of these agents depends on their ability to modulate
responses based on user emotions for which it is imperative to be able to detect
emotions in textual dialogues and avoid responding inappropriately [2].
EP

Furthermore, ability of machines or computers to understand emotions is


critical for success of several other applications as well. For instance, in the
domain of customer service, social media platforms like Twitter are gaining
C

prominence where customers expect quick responses. In case of heavy flow of


AC

tweets, turn-around time for responses increase. Prioritizing tweets according


to their emotional content and responding to them in that order will result in
increased customer satisfaction. For example, responding to an angry tweet
prior to a basic inquiry. Furthermore, in this era of text messaging, users are
constantly texting and may send inappropriately angry messages to others. If

2
ACCEPTED MANUSCRIPT

I texted you last night :/

PT
Sorry I saw your texts now

Why don’t you ever text me!

RI
Figure 1: A sample 3-turn conversation from our dataset.

SC
emotion detection is implemented, in such cases, the application can take ap-
propriate action such as popping up a warning to the user before sending a
message. Emotion detection also finds social applications such as flagging con-

U
tent representing bullying, depression etc. from Twitter streams or online fora.
Thus, emotion detection in textual dialogue finds several applications in today’s
online world.
AN
Emotions have been studied by researchers [3],[4],[5] in the fields of psy-
M
chology, sociology, medicine, computer science etc. for the past several years.
Some of the prominent work in understanding and categorizing emotions in-
clude Ekman’s six class categorization [6] and Plutchik’s “Wheel of Emotion”
D

which suggested eight primary bipolar emotions [7]. Given the vast nature of
TE

study in this field, there is naturally no broader consensus on the granularity of


emotion classes. Hence, as a first-step, we restricted our current study to the
top three frequently observed emotions in our user logs – Happy, Sad and Angry.
EP

Problem Definition: In a textual dialogue, given the user utterance along


with its context, classify the emotion of user utterance as one of Happy, Sad,
C

Angry or Others.
AC

Understanding emotions in textual conversations can be a challenging prob-


lem in absence of facial expressions and voice modulations. Figure 1 provides
an example where it is difficult, even as a human, to detect the emotion of
user utterance solely on the basis of text of the conversation. The emotion of

3
ACCEPTED MANUSCRIPT

the user whose messages are on the left, could be interpreted as angry or sad.
The challenge of understanding emotions is further compounded by difficulty in

PT
understanding context, sarcasm, class size imbalance, natural language ambi-
guity and rapidly growing Internet slang. However, big data and powerful deep
learning algorithms have paved way for us to attack this problem statement.

RI
In this paper, we propose an end-to-end trainable deep learning model, called
“Sentiment and Semantic Based Emotion Detector (SS-BED)” for detecting

SC
emotions in textual dialogues. The essence of our approach lies in leveraging
both the sentiment and semantic representations of user utterance for accurate
emotion detection. The motivation behind combining sentiment and semantic

U
representations can be understood from the following example. Let’s consider
the utterance “On road again... miss my amazing partner though!”. This utter-
AN
ance contains a negative sentiment word ‘miss’ as well as a positive sentiment
word ‘amazing’ but the overall emotion of the utterance is Sad. By combining
M
the sentiment of different words in the utterance with semantic understanding
of the sentence, we can detect the emotion in this case. Hence, we intuitively
feel combining both sentiment and semantic features helps in improving classi-
D

fication of emotions under such scenarios.


Given a user utterance, SS-BED takes the individual sentiment and semantic
TE

representations of their input words and combines them into a unified represen-
tation for the entire utterance which is used for predicting the emotion. We
EP

evaluate SS-BED on real world textual dialogues and it outperforms traditional


Machine Learning approaches and other Deep Learning approaches. The main
contributions of our paper are as follows:
C

• We propose a novel approach towards understanding emotions in textual


AC

conversations, using a deep-learning system called SS-BED.


• We evaluate various Deep Learning techniques and embeddings, along with
Machine learning algorithms (such as Support Vector Machines (SVM),
Decision Trees, Naive Bayes), on real world textual conversations and
compare their effectiveness for the task of understanding emotions.

4
ACCEPTED MANUSCRIPT

Practical Application: Our current research is in the context of an on-


line chat-bot, designed for informal conversations with users. In this scenario,

PT
we notice that users often express a variety of emotions such as being nervous
about exams, excited about a new job, feeling sad about a break-up, etc. In such
cases, the boundaries between computers and humans blur, and users expect

RI
computers to deeply understand human behavior including emotions. Under-
standing these emotions and providing an emotionally aware response not only

SC
creates a deeper and sustained engagement with users but takes us a step closer
to deeply understanding humans and anticipating their psychological needs.

U
The rest of the paper is organized as follows: Section 2 provides a summary
of related work. Section 3 describes our approach (SS-BED) in detail. Our
AN
experimental setup is discussed in Section 4 and our results are in Section 5.
Finally, Section 6 concludes the paper, followed by future direction for our work.
M
2. Related Work
D

A lot of work has happened in the space of image based emotion recognition
[8], [9]. However, classifying textual dialogues based on emotions is relatively
TE

new research area. Emotion-detection algorithms can be largely bucketized into


following two categories:
EP

(a) Hand-crafted Feature Engineering Based Approaches: - Many methods


exploit the usage of keywords in a sentence with explicit emotional/affective
value [10], [11], [12], [13], [14]. To that effect, several lexical resources have
C

been created, such as WordNet-Affect [15] and SentiWordNet [16]. Part-of-


AC

Speech taggers like the Stanford Parser are also used to exploit the structure of
keywords in a sentence. These pattern/dictionary based approaches, although
attaining high precision scores, suffer from low recall. A recent work by Yenala
et al. [17] on detecting offensive queries, points out this issue in such pat-
tern based approaches. For example, an “angry” emotion might be detected

5
ACCEPTED MANUSCRIPT

in “The service is plain bullshit!”, due to the keyword ‘bullshit’, but a slight
change of the same sentence will manage to fool a pattern based approach -

PT
“The service is plain horseshit!”. One workaround would be to include the word
‘horseshit’ in the dictionary of swear words, but having to incrementally update
by a human, defeats the purpose of an automated approach towards detecting

RI
emotions. Some people have tried dimensionality reduction based clustering ap-
proaches for text documents [18] [19]. However, our problem requires clustering

SC
at a sentence/utterance level and we also have supervised data for identifying
these emotional intents. So, such techniques may not be directly applicable.
Hasan et al., Purver et al., Suttles et al. and Wang et al. have also harnessed

U
cues from emoticons and hashtags [20], [21], [22], [23], [24]. For example, the
hashtags used in the sentence “Summer officially ends today. #sadness” make
AN
it easier to predict the underlying emotion. Other methods [25], [26], [27], [28],
[29] rely on extracting statistical features such as presence of frequent n-grams,
M
negation, punctuation, emoticons, hashtags to form representations of sentences
which are then used as input by classifiers such as Decision Trees, SVMs among
others to predict the output. More detailed analysis has been provided in the
D

work of Canales et al. [30]. Vosoughi et al. extract tweets based on loca-
tion, time and author and uses context to model prior in Bayesian models [31].
TE

However, all of these methods require extensive feature engineering and do not
achieve high recall due to diverse ways of representing emotions. For example -
EP

“Trust me! I am never gonna order again” contains no affective words despite
conveying emotions.
C

(b) Deep Learning Based Approaches: - Deep Neural networks have enjoyed
considerable success in varied tasks in text, speech and image domains. Varia-
AC

tions of Recurrent Neural Networks, such as Long Short Term Memory networks
(LSTM) [38] and BiLSTM [39] have been effective in modeling sequential in-
formation [40]. Also, Convolutional Neural Networks [41] have been a popular
choice in the image domain. The lower layers of the network capture local fea-
tures whereas higher layers unravel more abstract task based features for the

6
ACCEPTED MANUSCRIPT

DL/Non-DL Modality Annotation Works

Deep Learning Speech Human Judged Chernykh et al.[32]


Non-Deep Learning Speech Human Judged Danisman et al.[33]

PT
Deep Learning Facial image Human Judged Wang et al. [8], Dachapally et al. [34]
Non-Deep Learning Facial Image Human Judged Zhang et al. [9]
Deep Learning Text Human Judged Mundra et al. [35], Our Approach
Deep Learning Text Automatic Abdul et al. [36]

RI
Non-Deep Learning Text Human Judged Kozareva et al. [12], Strapparava [37],
Yan et al. [29], Balahur et al. [10]
Non-Deep Learning Text Automatic Sykora et al. [14], Hasan et al. [20] [21]

SC
Table 1: Comparison of existing emotion detection systems

image. Their introduction to the text domain has proven their ability to deci-

U
pher abstract concepts from raw signals [42], [43].

AN
Recently, approaches which employ Deep Learning for emotion detection in text
have been proposed. Zahiri et al. predicts emotion in a TV show transcript
[44]. Unlike TV shows, textual dialogue are full of spell errors, internet slang
M
etc. Abdul et al. and Koper et al. tries to understand emotions of tweets [36]
[45]. Tweets often use cues like hashtags etc. whereas our dataset of textual
D

dialogues is missing such cues. For instance, in the tweet “The moment of the
day when you have to start to plaster a smile in your face. #depression”, there
TE

is a significant cue in form of hashtag “#depression”. Li et al. learns to detect


emotions on user comments in Chinese language. Chinese language has very
different characteristics as compared to the English language, which is the fo-
EP

cus for our study [46]. Felbo et al. learns representation based on emojis, and
uses it for emotion detection [47]. The approach is evaluated on tweets, news
headlines and self-reported emotional experiences created by a large group of
C

psychologists. News headlines are designed to be self-explanatory and to invoke


AC

reactions, for example, “Cisco sues Apple over iPhone name”. On other hand,
self-reported emotional experiences often contain key emotion words like anger,
sad etc., for instance, - “I felt very sad as I saw my father being brought home
in a casket”. Textual dialogues, on other hand, are informal and laden with
misspellings which pose serious challenges for automatic emotion detection ap-

7
ACCEPTED MANUSCRIPT

Concatenation

Sentiment Encoding
LSTM LSTM LSTM

50 50 50 Softmax

PT
64 Probabilities
SSWE SSWE SSWE
Others

Leaky ReLU
Happy
I am tensed 64
Sad

RI
Semantic Encoding
Angry
GloVe GloVe GloVe
64
100 100 100

SC
LSTM LSTM LSTM
Fully Connected Network

Figure 2: The architecture of Sentiment and Semantic Based Emotion Detector (SS-

U
BED) Model.

AN
proaches. To the best of our knowledge, the work done by Mundra et al. is the
only one which has tackled the problem of emotion detection in English textual
dialogues [35]. Hence, we evaluate our technique against their approach. Table
M
1 provides a good representation of how our work is placed with respect to other
emotion detection systems.
D

3. Our Approach
TE

We model the task of understanding emotions as a multi-class classification


problem where given a user utterance, the model outputs probabilities of it be-
EP

longing to four output classes - Happy, Sad, Angry and Others. The architecture
of our proposed SS-BED model is shown in Figure 2. Our model uses LSTMs
[38], which are effective in processing sequential information. The input user
C

utterance is fed into two LSTM layers using two different word embedding ma-
trices. One layer uses a semantic word embedding, whereas the other layer uses
AC

a sentiment word embedding. These two layers learn semantic and sentiment
feature representation and encode sequential patterns in the user utterance.
These two feature representations are then concatenated and passed to a fully
connected network with one hidden layer which models interactions between
these features and outputs probabilities per emotion class. Further details of

8
ACCEPTED MANUSCRIPT

training data used to train the model, sentiment and semantic embeddings, and
model training are provided below.

PT
3.1. Training Data Collection

Given the potentially diverse representation of emotions, it is essential to

RI
harness the advantage of big data and crowd intelligence to solve our problem.
A large amount of training data is collected, using a semi-automated approach.

SC
A dataset of 17.62 million tweet conversational pairs i.e. tweets (Twitter-Qs)
and their responses (Twitter-As; collectively referred to as Twitter Q-A pairs
below), extracted from the Twitter Firehose, covering the four year period from

U
2012 through 2015, is constructed. This data is further cleaned to remove twit-
ter handles and served as the base data for our two training data collection
techniques. AN
Technique 1: In this technique, we start with a small set (approximately
M
300) of annotated utterances per emotion class obtained by showing a randomly
selected sample from Twitter-Qs and Twitter-As to human judges. Using a vari-
D

ation of the model described in [48], we create sentence embeddings for these
annotated utterances as well as Twitter-Qs and Twitter-As. We identify poten-
TE

tial candidate utterances for each emotion class using the threshold-based cosine
similarity between annotated utterances and Twitter-Qs and Twitter-As. Var-
ious heuristics like presence of opposite emoticons (example “:’(” in a potential
EP

candidate set for Happy emotion class), length of utterances etc. are used to
further prune the candidate set. The candidate set is then shown to human
C

judges to determine whether or not they belong to the emotion class. Since
emotion class is often of very small size in a random sample of conversations,
AC

using our method we cut down the amount of human judgments required by five
times when compared to showing a random sample of utterances and choosing
emotion class utterances from them.

Technique 2: Once we obtain utterances belonging to an emotion class by the

9
ACCEPTED MANUSCRIPT

method described above, we take all the utterances that belong to Twitter-Qs
and find their corresponding Twitter-As. These Twitter-As are then further

PT
aggregated by their frequency and top Twitter-As are chosen. For example, in
the Angry emotion class “There, there”1 is a popular response in Twitter-As.
Twitter-Qs corresponding to these top Twitter-As per emotion class are picked

RI
as potential utterances in that class and are further shown to human judges for
pruning.

SC
Negative data (belonging to class Others) is collected by randomly selecting
utterances from both Twitter-Qs and Twitter-As. Those which have a high co-

U
sine similarity score (using Technique 1) with any of the utterances in emotion
classes (Happy, Sad, Angry) are discarded.
AN
We finally obtained 456k utterances in Others category, 28k for Happy, 34k
M
for Sad, and 36k for Angry.

3.2. Data Pre-processing


D

We use minimal amount of preprocessing steps, removing Twitter handles


and hashtags, pruning sentences with urls and converting tweets to lower case.
TE

Some repeated punctuations are replaced by a single occurance of the same.


Common acronyms like “omg” are expanded into their known full-forms (“oh
my god”). We restrict our vocabulary to 50,000 most frequent words as obtained
EP

from the 17.62 million twitter data. Every out-of-vocabulary word is replaced
with a special “UNK” token.
C

3.2.1. Emoticon Handling and Normalization


AC

Emoticons are frequently used in textual conversations. In Twitter Q-A


pairs, 21% of textual conversations are found to contain emoticons. We use
several heuristics and normalization techniques to specifically deal with emoti-

1A phrase frequently used in popular American sitcom, “The Big Bang Theory”

10
ACCEPTED MANUSCRIPT

Happy Sad Angry


F1 F1 F1 Macro F1

PT
Word2Vec 64.44 74.71 59.28 66.14
FastText 64.58 76.68 59.98 67.08

RI
GloVe 66.11 78.99 63.79 69.63

Table 2: Comparison of results obtained from different semantic embeddings using an

SC
LSTM network.

Word1, Word2 GloVe SSWE

U
depression, :’( 0.23 0.63
happy, sad 0.59 -0.42
best, great AN 0.78 0.15

Table 3: Comparison of GloVe and SSWE embeddings w.r.t cosine similarity of dif-
M
ferent word pairs.

cons. For example, we convert the following utterance “Yeah! :((( My plan is
D

cancelled À‚” into “Yeah! :( My plan is cancelled :/ :(”. This helps us deal
with Out of Vocabulary (OOV) issues for infinitely many possible combinations
TE

of emoticons, and convert various forms of emoticons which represent similar


feelings to a singular form.
EP

3.3. Choosing Input Embeddings

For each word in the input utterance, we have multiple options for get-
C

ting the semantic word representations of input words. We try Word2Vec [49],
GloVe [50] and FastText [51]. To get the sentiment representations, we consider
AC

Sentiment Specific Word Embedding (SSWE) [52] - SSWE aims at encoding


sentiment information in the continuous representation of words. To test effec-
tiveness of semantic embeddings for emotion detection, we train a simple LSTM
model using each of these embeddings. LSTMs have ability to capture long-term
dependencies present in the input sequence, and thus are helpful for our task.

11
ACCEPTED MANUSCRIPT

We use cross validation to determine the effectiveness of different semantic em-


beddings. Our results, as depicted in Table 2, indicate that GloVe gives the best

PT
macro F1 score. We also observe that GloVe and SSWE behave very differently;
a few examples are in Table 3. SSWE embeddings give a high cosine similarity
for “depression” and “:’(” whereas GloVe gives a low score even though the two

RI
words have similar sentiment. For the “happy” and “sad” pair, SSWE rightly
gives a low score but GloVe outputs a reasonably high score. However, seman-

SC
tically similar words like “best” and “great” have a low cosine similarity with
SSWE but high score from GloVe. Based on these observations along with our
motivation of combining sentiment and semantic features from Section 1, we

U
choose GloVe and SSWE as our embedding for Semantic and Sentiment LSTM
layer respectively.

3.4. Tuning the network


AN
To achieve optimal performance, we train with different combinations of the
M
hyper-parameters - number of hidden layers, number of units per hidden layer,
dimension of sentiment and semantic encoding, learning rate of the optimiser,
D

dropout rate, number of epochs to train for and batch size for training. Each
of these hyper-parameters are randomly selected from some predefined set of
TE

allowed values and a model is trained. 10-fold cross-validation is performed


to get optimal set of parameters, wherein the training set is uniformly divided
into 10 sets, and over the course of 10 iterations, 9 out of 10 sets are combined
EP

together to train a model and the held out set is used for validation.

3.5. Model Training


C

We use the Microsoft Cognitive Toolkit2 for training SS-BED. The param-
AC

eters of SS-BED are trained to maximize prediction accuracy given the target
labels in the training set. To avoid over-fitting and generalize learning, dropout
[53] is used. We use Cross Entropy with Softmax as our loss function [54], and

2 https://www.microsoft.com/en-us/cognitive-toolkit/

12
ACCEPTED MANUSCRIPT

Stochastic Gradient Descent (SGD) as our learner. The optimal batch size is
found to be 4000 with a learning rate of 0.005 and 0.25 as the dropout probabil-

PT
ity. It is worth noting that, to train sequence classification models, the Microsoft
Cognitive Toolkit uses sum of the length of sequences across utterances (not the
number of utterances) while picking up data of a particular batch size.

RI
4. Experimental Setup

SC
In this section, we describe details of evaluation dataset used to compare
various techniques and baseline methods used for comparison.

U
4.1. Evaluation Dataset

AN
For this task there are three datasets reported in research literature: (a)
The ISEAR dataset3 , (b) The SemEval2007 Affective Text Dataset [55] and
(c) WASSA’17 Shared Task on Emotion Intensity [56]. However, we find all
M
these datasets unsuitable for evaluating our task. ISEAR dataset consists of
user reactions when asked to remember a circumstance which aroused certain
emotions in them. For example, “When my mother slapped me in the face, I felt
D

anger at that moment.” is one of the statements in ISEAR dataset and has a
TE

different form than what one would typically expect in a dialogue. On the other
hand, SemEval2007 dataset consists of news headlines which are expressive and
self-contained. For example, “Cisco sues Apple over iPhone name” is one of the
EP

news headlines in the dataset. Lastly, WASSA’17 dataset comprises of tweets.


Tweets often contain helpful cues like hashtags, which are missing from textual
dialogues. The following sentence from the WASSA’17 dataset highlight the
C

impact of hashtags - “The moment of the day when you have to start to plaster
AC

a smile in our face. #depression”

3 http://www.affective-sciences.org/en/home/research/materials-and-online-

research/research-material/

13
ACCEPTED MANUSCRIPT

Label Happy Sad Angry Others Total

# 109 107 90 1920 2226

PT
% 4.90 4.81 4.04 86.25 100

Table 4: Emotion class label distribution in evaluation dataset.

RI
To overcome these challenges, we sample 3-turn conversations from Twit-
ter i.e. User 1’s tweet, User 2’s response to the tweet, and User 1’s response

SC
to User 2. We use the Twitter Firehose to extract these 3 turn conversations
covering the year of 2016. We sample from conversations where the last turn
is the third turn as well as from those where the third turn is in the middle

U
of the conversation. We follow the necessary pre-processing steps from Section

AN
3.2. Our dataset finally comprises of 2226 3-turn conversations along with their
emotion class labels (Happy, Sad, Angry, Others) provided by human judges.
The details of the dataset along with emotion class label statistics is shown in
M
Table 4. To gather the emotion class labels, we show third turn of the con-
versation along with the context of the previous 2 turns to human judges and
D

ask them to mark the emotion of the third turn after considering the context.
To gather high quality judgments, each conversation is shown to 5 judges, and
TE

a majority vote is taken to decide the emotion class. After several rounds of
training and auditing of mock sets, the final inter-annotator agreement based on
fleiss’ kappa value [57] is found to be 0.59. This kappa value, while slightly less
EP

then desirable, indicates the difficulty in judging textual conversations due to


ambiguities discussed earlier in Section 1. We wanted to find out how well our
models generalized on previously unseen data. Therefore, we created this eval-
C

uation dataset, on which the models were run and the numbers were reported.
AC

This set was not used for any debugging purposes, hence the performance on
this set provides a reliable measure of how well the models might generalize on
unseen data.

14
ACCEPTED MANUSCRIPT

# Features Description

1 N-grams Word and character n-grams

PT
2 WordNet-Affect- # of direct affective words classified by Word-
Presence Net Affect under relevant categories

RI
3 SSWE SSWE word embeddings
4 POS Part-of-Speech tags
5 Emoticons # of happy, sad, angry emoticons

SC
6 Misc. # of words, exclamation marks, question
marks, sequences of punctuation marks
7 Negations Presence of negations

U
AN
Table 5: Features used for Machine learning baselines

4.2. Baseline Approaches


M
We compare our approach against two classes of baselines:
(a) Machine Learning baselines and (b) Deep Learning baselines.
For Machine Learning baselines we use SVM classifier [58], Gradient Boosted
D

Decision Tree (GBDT) classifier [59] and Naive Bayes (NB) classifier [59]. SVM,
GBDT and NB classifiers are trained using Scikit Learn [60]. We did an exten-
TE

sive feature engineering for above mentioned baselines and the feature set has
been explained in Table 5.
EP

We experiment with different combinations of the aforementioned features.


For the sake of simplicity, we are reporting two combinations which are giving
the best results. The combinations are mentioned below.
C

• Feature set of N-grams and Emoticons as described in Table 5, we shall


AC

refer to this feature set as Feat-1

• Combination of all features described in Table 5, we refer to this feature


set as Feat-2

After tuning parameters using the validation set as described in Section 3.4, we
find that SVM gives the best performance with linear Kernel and regularization

15
ACCEPTED MANUSCRIPT

Happy Sad Angry Macro Micro


Precision Recall F1 Precision Recall F1 Precision Recall F1 F1 F1

NB (Feat-1) 41.35 50.46 45.45 70.87 68.22 69.52 38.16 32.22 34.94 49.97 50.81

PT
SVM (Feat-1) 66.67 25.69 37.09 86.49 59.81 70.71 85.42 45.56 59.42 55.74 56.59
GBDT (Feat-1) 75.76 22.94 35.21 89.47 63.55 74.31 86 47.78 61.43 56.98 58.49

NB (Feat-2) 43.27 57.36 49.33 70.83 69.1 69.95 68.26 42.96 52.73 57.34 57.42
SVM (Feat-2) 73.33 33.79 46.26 87.02 61.23 71.88 86.73 46.33 60.39 59.51 60.42
GBDT (Feat-2) 78.46 25.00 37.92 94.25 58.92 72.51 88.98 50.26 64.23 58.22 58.95

RI
CNN-NAVA 63.32 42.29 50.71 79.37 68.69 73.64 67.42 45.79 54.54 59.63 60.15

CNN-SSWE 67.69 40.37 50.57 77.45 73.83 75.6 80.95 37.77 51.51 59.23 60.97
CNN-GloVe 52.29 52.29 52.29 93.72 67.29 74.61 67.82 65.55 66.66 64.52 64.93

SC
LSTM-SSWE 70.69 37.61 49.1 83.87 72.89 78 73.24 57.77 64.6 63.9 64.77
LSTM-GloVe 64.18 39.45 48.86 72.88 80.37 76.44 72.15 63.33 67.45 64.25 65.26

SS-BED 69.51 52.29 59.68 85.42 76.63 80.79 87.69 63.33 73.55 71.34 71.4

U
Table 6: Comparison of various models on evaluation dataset. SS-BED results are
statistically significant with p < 0.005
AN
constant 0.0005. In case of GBDT, the best performance is achieved with 50
trees and a minimum of 10 samples per leaf.
M
For deep learning baseline, we implement the approach defined in [35], as
this approach also attempts to understand emotion classes in chat conversations.
D

We refer to this approach as CNN-NAVA. We train emotion vectors as defined


in [61] and use them as input to a CNN model. We also train individual CNN
TE

and LSTM models with different embeddings like GloVe and SSWE.
We use precision, recall, F1 score of each class and macro and micro F1 score
(where macro and micro F1 is calculated for 3 emotion classes i.e. happy, sad
EP

and angry) to evaluate different approaches. Macro F1 considers the average


of the F1 scores of the 3 classes. Micro F1 is calculated using micro average of
C

Precision and Recall. The individual True Positives, False Positives and False
Negatives are summed up to get micro average of Precision and Recall.
AC

5. Results

A summary of results from various techniques on the dataset described in


Section 4.1 is presented in Table 6. SS-BED gives the best performance on
F1 score for each emotion class as well as on Macro and Micro F1, as can be

16
ACCEPTED MANUSCRIPT

PT
RI
SC
Figure 3: Comparison of Micro and Macro F1 values of different models.

U
seen more clearly from Figure 3. The performance of SS-BED over all other
AN
models is particularly significant (p < 0.005) as measured by McNemar’s test
[62]. Our results thus indicate that combining sentiment and semantic features
in SS-BED outperforms individual LSTM-SSWE and LSTM-GloVe. SS-BED
M
was also significantly better than CNN based approaches including CNN-NAVA.
Also, when comparing across models using Macro and Micro F1 score, Deep
D

Learning models outperform NB, SVM and GBDT. Adding rich set of features
help improve performance of NB, SVM and GBDT, but they do not come at
TE

par with Deep Learning models.


However, Deep Learning models take more time to train, and a comparison
of the runtime performances have been provided in Figure 4. The SS-BED
EP

model, containing two LSTM layers, inevitably takes more time for training as
well as inference.
C

5.1. Qualitative Analysis


AC

Table 7 highlights some examples from evaluation set and compares the per-
formance of our models across these examples. In example #1, we observe that
in absence of keywords with an obvious sentiment polarity associated with them,
LSTM-SSWE fails. LSTM-GloVe and SS-BED are able to infer the sadness of
the user in this somewhat subtle statement. In example #2, presence of op-

17
ACCEPTED MANUSCRIPT

PT
RI
U SC
AN
Figure 4: Runtime performance of various models during Train and Test phases.
M
posite polarity words, “miss” and “amazing”, confuses both LSTM-GloVe and
LSTM-SSWE, but SS-BED predicts it correctly by using both semantic and
sentiment feature sets. Similarly, in a complicated and long utterance #3, all
D

baseline approaches fail, but SS-BED is able to harness the advantage of com-
TE

bining both semantic and sentiment features to predict it correctly. However,


SS-BED still needs further improvement. For example, in #4, SS-BED and all
other models do not understand the sarcasm in the user’s utterance and fail
EP

to predict the correct emotion. In some utterances like in #5, context of the
conversation plays an important role to determine underlying emotion. SS-BED
does not consider context and hence fails as do all other models.
C

5.2. Analysis of SS-BED Encoding


AC

To further understand interaction of semantic and sentiment features in SS-


BED, in this section, we present an analysis of SS-BED as compared to LSTM-
SSWE and LSTM-GloVe with respect to their output encoding. In SS-BED,
a user utterance is passed through two LSTM layers which generate two 64
dimensional encodings. These feature representations are then concatenated

18
ACCEPTED MANUSCRIPT

# True User 1’s tweet User 2’s response User 1’s response Comment
Label

1 Sad Man even food de- Yea well it is a Yeah well i do not LSTM-SSWE fails

PT
livery apps in ban- bandh have anything at as there is no key-
galore won’t deliver home :/ word with an obvious
till 6:( negative polarity but
LSTM-GloVe and

RI
SS-BED are correct.

2 Sad RIP. We need more :( whr r u? On road again... SS-BED predicts


people like you! miss my amazing correctly, Presence

SC
partner though! of opposite polarity
words ‘miss’ and
‘amazing’ confuses
both LSTM-GloVe
and LSTM-SSWE and

U
they both fail.

3 Angry :) Good for both of It’s better not to in- It is not an ego or SS-BED is only model
us!
AN
teract with a girl
with so much ego.
Attitude is still fine
attitude. U started
first! U asked me
stupid ques! :/
which could correctly
predict
complicated
this rather
user
utterance.
M
4 Angry Pathetic delivery Sir, can you please Yes. I guess your All models including
services. Very state exact problem amazing delivery SS-BED are unable
disappointed so that we can work service has not yet to understand sar-
on it. arrived. casm and fail in this
D

example.

5 Happy I just qualified for WOOT! That’s I started crying All models predicted it
TE

the Nabard intern- great news. Con- as Sad, however, when


ship gratulations! one considers context,
true emotion is Happy.
EP

Table 7: Qualitative Analysis of SS-BED results and other baseline approaches.

and is called output encoding for SS-BED. In case of LSTM-SSWE and LSTM-
C

GloVe, 64 dimensional last state vector of LSTM is the output encoding. We


call output encodings of LSTM-GloVe and LSTM-SSWE as EGloV e and ESSW E
AC

respectively. Since the dimensions of SS-BED’s output encoding is 128 as com-


pared to 64 dimensional EGloV e and ESSW E , we project it into 64 dimensions
using Principal Component Analysis [63] and call the projected encoding as
ESS−BED .
Further, we take utterances corresponding to the 3 classes - Happy, Sad and

19
ACCEPTED MANUSCRIPT

e Happy Sad Anger

EGloV e 0.358 0.507 0.678

PT
ESSW E 0.325 0.494 0.639
ESS−BED 0.504 0.546 0.689

RI
Table 8: Fraction of top 5 utterances belonging to same emotion class as the annotated
utterance averaged over each class in the evaluation set

SC
Angry from the evaluation dataset. We use EGloV e , ESSW E and ESS−BED to
find top 5 utterances based on cosine similarity for each of these utterances.

U
These top 5 utterances are fetched from a corpus of 17.62 million tweet pairs as
described in Section 3.1.
AN
Hence, for a sample utterance Sli , which is ith utterance in our evaluation set
belonging to an emotion class l, where l ∈ {‘Happy’, ‘Sad’, ‘Angry’} , we find the
e e e
top 5 most similar utterances. Let these texts be denoted by {Tli1 , Tli2 , ..., Tli5 },
M
where the encoding vector used is e ∈ {EGloV e , ESSW E , ESS−BED }.
We then annotate these 5 utterances via human judges, following which we
D

have the corresponding labels {Leli1 , Leli2 , ..., Leli5 }. Subsequently, we calculate
the percentage of utterances belonging to the emotion class l. Let this percent-
TE

age be denoted by P5 e
k=1 (Llik == l)
peli =
5
EP

e e e
Out of the 5 sentences ({Tli1 , Tli2 , ..., Tli5 }), the proportion that have the
same label as the original sample, are represented by this number. These per-
centages are averaged over Nl sample utterances of an emotion class l, producing
C

a metric corresponding to each emotion class and output encoding, given by


PNl e
AC

e p
Pl = i=1 li
Nl
Table 8 shows a comparison of different output encodings across the three
emotion classes. We observed that SS-BED’s output encoding gave best metric
across all emotion classes. This indicates that by combining semantic and sen-
timent features in SS-BED, we are able to generate a better representation of

20
ACCEPTED MANUSCRIPT

# User 1 User 2 User 1

1 Good morning! week- Good morning. :) :) :) Happy Morning

PT
end :)

2 What r the birthday going to hills with Oh great!

RI
plans? ;) friends.

3 I had a match today. And did you win? Yes!! And I am super
happy :)

SC
Table 9: Sample conversations indicating challenges in Happy emotion class

U
input utterances in the output encoding space as compared to representations
via LSTM-GloVe and LSTM-SSWE.
AN
5.3. Discussion on Ambiguity in Happy Class

On comparing the F1 scores of several models in Table 6, we observe that the


M
Happy emotion class performs significantly worse than other emotion classes.
We found inter-judge agreement to be particularly low for the Happy emotion
D

class, which indicates variation in how a user utterance is interpreted by dif-


ferent human judges. In example #1 of Table 9, User 1’s second utterance is
TE

interpreted as Happy by some judges and just as a greeting by other judges


who mark it as Others. Similarly in example #2, User 1’s second utterance
is considered a comment by some and happy statement by others due to the
EP

keyword “great”. While in example #3 User 1 is visibly happy, which is marked


Happy by most judges. We thus believe that predicting utterances for Happy
C

class on basis of textual conversation alone is a challenging problem and hence,


understanding context becomes even more important for this class.
AC

6. Conclusion

In this paper, we discuss the problem of understanding emotions in text by


machines. To be able to anticipate human needs, emotions must be deeply un-
derstood by machines and computers, as understanding and expressing emotion

21
ACCEPTED MANUSCRIPT

is a key element of human behavior. Detecting emotions helps in modulation


and regulation of responses for real-world chat-bot and other textual-dialogue

PT
based applications. For this problem, we harness the power of deep learning and
big data and propose a Deep Learning based approach called “Sentiment and
Semantic Based Emotion Detector (SS-BED)”. This approach harnesses both

RI
sentiment and semantic based features for more accurately predicting user emo-
tions from their utterances. Evaluation on real world textual dialogue shows

SC
that our approach significantly outperforms baseline approaches proposed in
literature as well as off-the-shelf deep learning and feature engineering based
machine learning models.

U
7. Future Work
AN
As part of our future work, we plan to extend this approach to detect more
emotional classes such as Surprise, Fear, Disgust etc. Currently, our model is
M
limited by the fact that it does not train on the context of the dialogue. We
plan to train models that also take the dialogue context into account besides
D

the current user utterance.


TE

References

[1] J. Thilmany, The emotional robot. cognitive computing and the quest for artificial
EP

intelligence, www.ncbi.nlm.nih.gov/pmc/articles/PMC2247377/ (2007).

[2] A. S. Miner, A. Milstein, S. Schueller, R. Hegde, C. Mangurian, E. Linos,


Smartphone-based conversational agents and responses to questions about mental
C

health, interpersonal violence, and physical health, JAMA internal medicine Vol.
176, pages 619–625.
AC

[3] R. Plutchik, The psychology and biology of emotion., New York, NY, US: Harper-
Collins College Publishers, 1994.

[4] A. R. Hochschild, The sociology of emotion as a way of seeing, in: Emotions in


social life, Routledge, 2002, pp. 31–44.

22
ACCEPTED MANUSCRIPT

[5] R. D. Lane, S. Lee, R. Reidel, V. Weldon, A. Kaszniak, G. E. Schwartz, Impaired


verbal and nonverbal emotion recognition in alexithymia, Psychosomatic medicine
58 (3) (1996) 203–210.

PT
[6] P. Ekman, An argument for basic emotions, Cognition & emotion Vol. 6, pages
169–200.

RI
[7] R. Plutchik, H. Kellerman, Emotion: theory, research and experience, Academic
press New York, 1986.

SC
[8] S.-H. Wang, P. Phillips, Z.-C. Dong, Y.-D. Zhang, Intelligent facial emotion recog-
nition based on stationary wavelet entropy and jaya algorithm, Neurocomputing
272 (2018) 668–676.

U
[9] Y.-D. Zhang, Z.-J. Yang, H.-M. Lu, X.-X. Zhou, P. Phillips, Q.-M. Liu, S.-H.
AN
Wang, Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy
support vector machine, and stratified cross validation, IEEE Access 4 (2016)
8375–8385.
M
[10] A. Balahur, J. M. Hermida, A. Montoyo, Detecting implicit expressions of sen-
timent in text based on commonsense knowledge, in: Proceedings of the 2nd
D

Workshop on Computational Approaches to Subjectivity and Sentiment Analy-


sis, ACL, 2011, pp. 53–60.
TE

[11] F.-R. Chaumartin, Upar7: A knowledge-based system for headline sentiment tag-
ging, in: Proceedings of the 4th International Workshop on Semantic Evaluations,
EP

ACL, 2007, pp. 422–425.

[12] Z. Kozareva, B. Navarro, S. Vázquez, A. Montoyo, Ua-zbsa: a headline emotion


classification through web information, in: Proceedings of the 4th international
C

workshop on semantic evaluations, ACL, 2007, pp. 334–337.


AC

[13] C. Strapparava, R. Mihalcea, Learning to identify emotions in text, in: 2008 ACM
symposium on Applied computing, 2008, pp. 1556–1560.

[14] M. D. Sykora, T. Jackson, A. O’Brien, S. Elayan, Emotive ontology: Extracting


fine-grained emotions from terse, informal messages.

23
ACCEPTED MANUSCRIPT

[15] C. Strapparava, A. Valitutti, et al., Wordnet affect: an affective extension of


wordnet., in: The 4th International Conference on Language Resources and Eval-
uation, Vol. Vol. 4, pages 1083–1086, 2004.

PT
[16] A. Esuli, F. Sebastiani, Sentiwordnet: A high-coverage lexical resource for opinion
mining, Evaluation (2007) 1–26.

RI
[17] H. Yenala, M. Chinnakotla, J. Goyal, Convolutional bi-directional lstm for detect-
ing inappropriate query suggestions in web search, in: PAKDD, Springer, 2017,

SC
pp. 3–16.

[18] A. K. Sangaiah, A. E. Fakhry, M. Abdel-Basset, I. El-henawy, Arabic text cluster-


ing using improved clustering algorithms with dimensionality reduction, Cluster

U
Computing (2018) 1–15.

AN
[19] H.-T. Zheng, Z. Wang, W. Wang, A. K. Sangaiah, X. Xiao, C. Zhao, Learning-
based topic detection using multiple features, Concurrency and Computation:
Practice and Experience 30 (15).
M
[20] M. Hasan, E. Agu, E. Rundensteiner, Using hashtags as labels for supervised
learning of emotions in twitter messages, in: ACM SIGKDD Workshop on Health
D

Informatics, New York, USA, 2014.


TE

[21] M. Hasan, E. Rundensteiner, E. Agu, Emotex: Detecting emotions in twitter


messages.

[22] M. Purver, S. Battersby, Experimenting with distant supervision for emotion


EP

classification, in: Proceedings of the 13th Conference of the European Chapter of


the Association for Computational Linguistics, ACL, 2012, pp. 482–491.
C

[23] J. Suttles, N. Ide, Distant supervision for emotion classification with discrete
binary values, in: International Conference on Intelligent Text Processing and
AC

Computational Linguistics, Springer, 2013, pp. 121–136.

[24] W. Wang, L. Chen, K. Thirunarayan, A. P. Sheth, Harnessing twitter “big data”


for automatic emotion identification, in: Privacy, Security, Risk and Trust, 2012
International Conference on Social Computing, IEEE, 2012, pp. 587–592.

24
ACCEPTED MANUSCRIPT

[25] C. O. Alm, D. Roth, R. Sproat, Emotions from text: machine learning for text-
based emotion prediction, in: Proceedings of the conference on human language
technology and empirical methods in natural language processing, ACL, 2005,

PT
pp. 579–586.

[26] R. C. Balabantaray, M. Mohammad, N. Sharma, Multi-class twitter emotion clas-

RI
sification: A new approach, International Journal of Applied Information Systems
Vol. 4, pages 48–53.

SC
[27] D. Davidov, O. Tsur, A. Rappoport, Enhanced sentiment learning using twitter
hashtags and smileys, in: Proceedings of the 23rd international conference on
computational linguistics: posters, ACL, 2010, pp. 241–249.

U
[28] F. Kunneman, C. Liebrecht, A. van den Bosch, The (un) predictability of emo-

AN
tional hashtags in twitter, In European Chapter of the Association for Computa-
tional Linguistics (2014) 26–34.

[29] J. L. S. Yan, H. R. Turtle, Exploring fine-grained emotion detection in tweets, in:


M
The North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, 2016, pp. 73–80.
D

[30] L. Canales, P. Martı́nez-Barco, Emotion detection from text: A survey, Processing


in the 5th Information Systems Research Working Days (JISIC).
TE

[31] S. Vosoughi, H. Zhou, D. Roy, Enhanced twitter sentiment classification using


contextual information, in: 6th Workshop on Computational Approaches to Sub-
EP

jectivity, Sentiment and Social Media Analysis, 2015.

[32] V. Chernykh, G. Sterling, P. Prihodko, Emotion recognition from speech with


recurrent neural networks, arXiv preprint arXiv:1701.08071.
C

[33] T. Danisman, A. Alpkocak, Emotion classification of audio signals using ensemble


AC

of support vector machines, in: International Tutorial and Research Workshop


on Perception and Interactive Technologies for Speech-Based Systems, 2008, pp.
205–216.

[34] P. R. Dachapally, Facial emotion detection using convolutional neural networks


and representational autoencoder units, arXiv preprint arXiv:1706.01509.

25
ACCEPTED MANUSCRIPT

[35] S. Mundra, A. Sen, M. Sinha, S. Mannarswamy, S. Dandapat, S. Roy, Fine-


grained emotion detection in contact center chat utterances, in: Pacific-Asia Con-
ference on Knowledge Discovery and Data Mining, Springer, 2017, pp. 337–349.

PT
[36] M. Abdul-Mageed, L. Ungar, Emonet: Fine-grained emotion detection with gated
recurrent neural networks, in: Proceedings of the 55th Annual Meeting of the

RI
Association for Computational Linguistics, Vol. Vol. 1, pages 718–728, 2017.

[37] C. Strapparava, R. Mihalcea, Annotating and identifying emotions in text, in:

SC
Intelligent Information Access, 2010, pp. 21–38.

[38] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation


Vol. 9, pages 1735–1780.

U
[39] M. Schuster, K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Trans-
AN
actions on Signal Processing Vol. 45, pages 2673–2681.

[40] N. Liang, H.-T. Zheng, J.-Y. Chen, A. K. Sangaiah, C.-Z. Zhao, TRSDL: Tag-
M
Aware Recommender System Based on Deep Learning–Intelligent Computing
Systems, Applied Sciences 8 (5) (2018) 799.
D

[41] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep con-


volutional neural networks, in: Advances in neural information processing sys-
TE

tems, 2012, pp. 1097–1105.

[42] Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint
arXiv:1408.5882.
EP

[43] A. Prakash, C. Brockett, P. Agrawal, Emulating human conversations using con-


volutional neural network-based ir, arXiv preprint arXiv:1606.07056.
C

[44] S. M. Zahiri, J. D. Choi, Emotion detection on tv show transcripts with sequence-


AC

based convolutional neural networks, arXiv preprint arXiv:1708.04299.

[45] M. Köper, E. Kim, R. Klinger, Ims at emoint-2017: emotion intensity prediction


with affective norms, automatically extended resources and deep learning, in:
WASSA, 2017, pp. 50–57.

26
ACCEPTED MANUSCRIPT

[46] P. Li, J. Li, F. Sun, P. Wang, Short text emotion analysis based on recurrent neu-
ral network, in: Proceedings of the 6th International Conference on Information
Engineering, ACM, 2017.

PT
[47] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, S. Lehmann, Using millions of emoji
occurrences to learn any-domain representations for detecting sentiment, emotion

RI
and sarcasm, in: EMNLP, 2017, pp. 1616–1626.

[48] H. Palangi, L. Deng, Y. Shen, J. Gao, X. He, J. Chen, X. Song, R. Ward, Deep

SC
sentence embedding using long short-term memory networks: Analysis and appli-
cation to information retrieval, IEEE/ACM Transactions on Audio, Speech and
Language Processing Vol. 24, pages 694–707.

U
[49] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed represen-

AN
tations of words and phrases and their compositionality, in: Advances in neural
information processing systems, 2013, pp. 3111–3119.

[50] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word repre-
M
sentation., in: EMNLP, Vol. Vol. 14, pages 1532–1543, 2014.

[51] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of tricks for efficient text
D

classification, arXiv preprint arXiv:1607.01759.


TE

[52] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, Learning sentiment-specific


word embedding for twitter sentiment classification., in: Proceedings of the 52nd
Annual Meeting of the Association for Computational Linguistics, 2014, pp. 1555–
EP

1565.

[53] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout:


A simple way to prevent neural networks from overfitting, The Journal of Machine
C

Learning Research 15 (1) (2014) 1929–1958.


AC

[54] I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.

[55] C. Strapparava, R. Mihalcea, Semeval-2007 task 14: Affective text, in: Proceed-
ings of the 4th International Workshop on Semantic Evaluations, Association for
Computational Linguistics, 2007, pp. 70–74.

27
ACCEPTED MANUSCRIPT

[56] S. M. Mohammad, F. Bravo-Marquez, Wassa-2017 shared task on emotion inten-


sity, arXiv preprint arXiv:1708.03700.

PT
[57] P. E. Shrout, J. L. Fleiss, Intraclass correlations: uses in assessing rater reliability.,
Psychological bulletin Vol. 86, page 420.

RI
[58] C. Cortes, V. Vapnik, Support-vector networks, Machine learning Vol. 20, pages
273–297.

SC
[59] J. Friedman, T. Hastie, R. Tibshirani, The elements of statistical learning,
Springer series in statistics Springer, Berlin, 2001.

[60] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,

U
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine
learning in python, Journal of Machine Learning Research Vol. 12, pages 2825–
2830. AN
[61] A. Agrawal, A. An, Unsupervised emotion detection from text using semantic
M
and syntactic relations, in: Proceedings of the The 2012 IEEE/WIC/ACM Inter-
national Joint Conferences on Web Intelligence and Intelligent Agent Technology,
2012, pp. 346–353.
D

[62] Q. McNemar, Psychological statistics, Wiley New York, 1969.


TE

[63] I. T. Jolliffe, Principal component analysis and factor analysis, in: Principal
component analysis, Springer, 1986, pp. 115–128.
C EP
AC

28
ACCEPTED MANUSCRIPT

We thank Balakrishnan Santhanam, Jaron Lochner and Rajesh Patel for their support in crowdsource
judgments. We also thank Oussama Elachqar, Chris Brockett, Niranjan Nayak and Kedhar Nath Narahari
for helpful brainstorming sessions and comments. Finally, we are grateful to Abhay Prakash and
Meghana Joshi for their constant support and guidance.

PT
RI
U SC
AN
M
D
TE
C EP
AC
ACCEPTED MANUSCRIPT

1. Emotion Detection in text finds several practical applications such as modulation of responses
for real-world chat-bot.
2. Combining Sentiment and Semantic information in a text improves emotion detection system
3. Our approach learns diverse ways of expressing emotions and significantly outperforms
methods described in literature.

PT
RI
U SC
AN
M
D
TE
C EP
AC

You might also like