0% found this document useful (0 votes)
37 views5 pages

4 TH

This document summarizes research on identifying and detecting sarcasm in social media conversations using different techniques. It describes using traditional machine learning approaches, deep learning with RNN-LSTM, and BERT for sarcasm detection on Twitter and Reddit conversation datasets. The best performing model was BERT, which achieved F1 scores of 0.722 for Twitter and 0.679 for Reddit by preprocessing text and classifying whether context was helpful for detecting sarcasm. Challenges included the small dataset size and characteristics of informal social media language.

Uploaded by

Pratik Kubde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views5 pages

4 TH

This document summarizes research on identifying and detecting sarcasm in social media conversations using different techniques. It describes using traditional machine learning approaches, deep learning with RNN-LSTM, and BERT for sarcasm detection on Twitter and Reddit conversation datasets. The best performing model was BERT, which achieved F1 scores of 0.722 for Twitter and 0.679 for Reddit by preprocessing text and classifying whether context was helpful for detecting sarcasm. Challenges included the small dataset size and characteristics of informal social media language.

Uploaded by

Pratik Kubde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Sarcasm Identification and Detection in Conversion

Context using BERT


Kalaivani A, Thenmozhi D
Department of Computer Science and Engineering,
SSN College of Engineering (Autonomous),
Affiliated to Anna University, Tamilnadu, India.
[email protected], [email protected]

Abstract analysed the verbal irony tweets using LSTM


with more different attention mechanism and still
Sarcasm analysis in user conversion text is facing the problem with the usage of slangs,
automatic detection of any irony, insult, rhetorical questions, usage of numbers and usage
hurting, painful, caustic, humour, vulgarity
of non-vocabulary tweets. In recent years, several
that degrades an individual. It is helpful in
the field of sentimental analysis and research works are performed in sarcasm
cyberbullying. As an immense growth of detection in the Natural Language Processing
social media, sarcasm analysis helps to community (Aditya Joshi at el., 2017).
avoid insult, hurts and humour to affect In Figurative Language 2020 Task 2: shared
someone. In this paper, we present task on sarcasm detection in social media forums.
traditional Machine learning approaches, It focuses to identify the given conversion text is
Deep learning approach (RNN-LSTM) sarcastic or not and find how much context is
and BERT (Bidirectional Encoder helpful for sarcasm identification have modelled
Representations from Transformers) for either the given instance may be isolated or
identifying sarcasm. We have used the
combined. It focuses on two social media forums
approaches to build the model, to identify
and categorize how much conversion
that are Twitter conversion dataset and Reddit
context or response is needed for sarcasm conversion dataset (Khodak et al., 2017). For both
detection and evaluated on the two social the datasets, the organizer provides the context
media forums that is Twitter conversation and response that is the response is reply to the
dataset and Reddit conversion dataset. We context and the context is a full dialogue
compare the performance based on the conversation thread. The computational task is to
approaches and obtained the best F1 detect and identify the sarcasm and to understand
scores as 0.722, 0.679 for the Twitter how much conversation context is needed or
forums and Reddit forums respectively. helpful for sarcasm detection.
The challenges of this shared task include: a)
1 Introduction small dataset is hard to train the complex models;
Social media have shown a rapid growth of user b) the characteristics of the language on social
counts and have been object of scientific and media forums difficulties such as non-vocabulary
sentiment analysis as in (Kalaivani A and words and ungrammatical context c) how much
Thenmozhi D, 2018). Sarcasm occurs frequently conversion text to detect sarcasm and the usage of
in user-generated content such as blogs, forums slangs, rhetorical questions, Capitalized words,
and micro posts, especially in English, and is numbers, Abbreviations, pro-longed words,
inherently difficult to analyze, not only for a hashtags, URL, Repetitions of Punctuations,
machine but even for a human. Sarcasm Analysis Contractions, Continuous words without spaces.
is useful for several applications such as We address the problem in hash tags,
sentimental analysis, opinion mining, hate speech continuation of words without spaces, URL and to
identification, offensive and abusive language classify which context is helpful to find sarcasm.
detection, advertising and cyber bullying. To address the problem, we pre-processed the text
(Debanjan Ghosh et al., 2018) performed to by using Machine learning libraries like NTLK,
identify how much context is needed to find the Gensim and classified by using different
conversion context is sarcastic or not and traditional machine learning techniques, deep
learning technique and finally we obtained the
72
Proceedings of the Second Workshop on Figurative Language Processing, pages 72–76
July 9, 2020. 2020
c Association for Computational Linguistics
https://doi.org/10.18653/v1/P17
best result by using BERT models. The tasks are 2020 shared task on sarcasm detection. The
independently evaluated by macro-F1 metrics. dataset is given with columns namely, label,
context and response where the response is the
2 Related Work reply of context and the context is the full
conversion dialogue and it is separated as C1, C2,
(Aniruddha Ghosh and Tony Veale, 2016) used C3 etc. C2 is the reply of the C1 context and C3 is
neural network semantic model to capture the the reply of C2 context respectively. Both the
temporal text patterns for shorter texts. As an datasets consists of the labels namely SARCASM
example, in this model classified “I Just Love and NOT_SARCASM. In the Twitter dataset, the
Mondays!” correctly as sarcasm, but it failed to train data has 5000 conversion tweets in that 2500
classify “Thank God It’s Monday!” as sarcasm, sarcasm tweets and 2500 not sarcasm tweets and
even though both are similar at the conceptual the test data has 1800 tweets.
level. (Keith Cortis et al., 2017) performed in the In the Reddit dataset, the train data has 4400
SemEval-2017 shared task to detect the sentiment, conversion tweets in that 2200 sarcasm tweets and
humour and to predict the sentiment score of 2200 non sarcasm tweets and the test data have
companies’ stocks in the smaller texts. 1800 tweets. we have the pre-processed the text to
(Raj Kumar Gupta and Yinping Yang, 2017) removal of @USER, URL and the pro longed
performed in the shared task of SemEval-2017 words like “ohhhhhh” and replace the words like
Task 4 to detect sarcasm by used the SVM Based F * * king as Fucking, replace the question tags
classifier and developed the CrystalNest to like Didn’t as Did not, removal of hashtags and
analyse the features combining sarcasm score separate the words into the continuous space less
derived, sentiment scores, NRC lexicon, n-grams, sentence. Tweet tokenizer is used to tokenize the
word embedding vectors, and part-of-speech word and to get the vocabulary words.
features.
(David Bamman and Noah A. Smith, 2015)
used the predictive features and analysed the
utterance on Twitter based on the properties of
author, audience and environment features.
(Mondher Bouazizi and Tomoaki Otsuki, 2016)
used the pattern-based approach to detect sarcasm
and analysed the four features such as sentiment-
related features, punctuation-related features,
syntactic and semantic features, pattern-related
features and classification done by the classifiers
such as Random Forest, Support Vector Machine,
Figure 1: Sarcastic words
k Near-est Neighbours and Maximum Entropy.
(Meishan Zhang et al., 2016) used the bi-
directional gated recurrent neural network and
discrete model to detect sarcasm and analyse the
local and conceptual information and perform the
process in Glove word embedding. (Malave N et
al., 2020) used the context-based evaluation based
on the data and to determine the user behaviour
and context information to detect sarcasm. (Yitao
Cai et al., 2019) used the multi-modal hierarchical
fusion model to detect the multi-modal sarcasm
for tweets consisting of texts and images in
Twitter. Figure 2: Not Sarcastic Words

3 Data and Methodology


We have employed the traditional machine
In our approach, we have used Twitter and Reddit learning techniques, Recurrent Neural Network
dataset given by Figurative Language processing with LSTM (RNN-LSTM) and BERT. In the
73
machine learning approach, first, we have used context text (CR) in Tfidf vectorizer and the best
the utterance of combined context and response accuracies score in Logistic regression and
(CR) for detecting the sarcasm and then pre- Gaussian NB models of the isolated response (R)
processed data using Gensim libraries to remove text in Tfidf vectorizer. In Reddit data, we have
the hashtags, punctuation, white spaces, numeric chosen the scores which are above 0.55 from the
content, stop words and then convert into lower cross validation accuracies of the machine
text. We have used the word cloud to identify and learning techniques. Based on the cross validation
categorize the most sarcastic words and non- scores, we have obtain the best accuracies score in
sarcastic words which are appeared in sarcasm logistic regression and XGBoost Classifier of the
message and not sarcasm message as shown combined text (CR) in Tfidf vectorizer and the
below in Figure 1 and Figure 2. best accuracies score in Logistic regression and
Gaussian NB models of the isolated response text
Combined Context Response (R) (R) in Tfidf vectorizer. In both the dataset, the
Models and Response (CR)
result shows Doc2Vec transformer is not
Doc2Vec Tfidf Doc2Vec Tfidf
performed well because of non-grammatical
LR 0.513 0.7296 0.509 0.7132 sentences and Tfidf Vectorizer performs well
RF 0.513 0.6764 0.527 0.7038
XGB 0.534 0.6876 0.533 0.6928 when compared with the Doc2Vec transformer in
SVC 0.507 0.7212 0.506 0.7016 dialogue conversion thread.
NB 0.505 0.7394 0.512 0.7106
In the RNN-LSTM Method, we have used the
Table 1: Accuracies of the models based combined context text with response to perform
on the feature extraction of the utterance of the pre-process using NLTK libraries, tokenize the
combined and isolated text – Twitter data word by using the word tokenizer and lemmatize
the word after that to remove the stop words.
Combined Context Response(R)
Models and Response (CR)
Finally, we have obtained the train data has
325382 words total, with a vocabulary size of
Doc2Vec Tfidf Doc2Vec Tfidf
32756, max sentence length is 568 and the test
LR 0.5061 0.552 0.497 0.597
RF 0.4947 0.539 0.505 0.564 data has 30782 words total, with a vocabulary size
XGB 0.4965 0.565 0.500 0.582 of 8824, Max sentence length is 467. We used the
SVC 0.5029 0.538 0.493 0.587
NB 0.4977 0.549 0.493 0.595
Word2Vec embedding model for the embedding
the words and obtain the 32668 unique tokens. We
Table 2: Accuracies of the models based on have evaluated using the RNN-LSTM and trained
the feature extraction of the utterance of the deep learning models with a batch size 128
combined and isolated text – Reddit data and dropout 0.2 for 5 epochs to build the model.
We have performed Doc2Vec transformer and We got the accuracy is 0.4890 which is low when
Tfidf Vectorizer for feature extraction and compared with the machine learning approach.
classified by using the Logistic Regression (LR), In the BERT model, Google research team
Random Forest Classifier (RF), XGBoost releases BERT (Devlin et al., 2018) and achieve
Classifier (XGB), Linear Support vector machine good performance on many NLP tasks. We have
(SVC), Gaussian Naïve Binomial (NB). By using used the combined context text, isolated context,
Tfidf Vectorizer, we got the 28761 features for and isolated response to perform the model. We
5000 tweets. Table 1 presents the cross validation have used the Bert uncased model for training the
accuracies of the different machine learning model, batch size is 32, learning rate is 2e-5, and
classifiers in the Twitter data as mentioned above. number of train epochs is 3.0. Warmup is a period
Table 2 presents the cross validation accuracies of of time where the learning rate is small and
the models based on the feature extraction in the gradually increases usually helps training.
Reddit data. Warmup proportion is 0.1 and the model
In Twitter data, we have chosen the scores configuration is checkpoints is 300, summary
which are above 0.70 from the cross validation steps is 100. We got the accuracy is 0.77 score.
accuracies of the machine learning techniques. We have compared over all cross validation
Based on the cross validation scores, we have accuracies scores, BERT performs good than the
obtain the best accuracies score in SVM, logistic machine learning approaches and deep learning
regression and NB classifiers of the combined technique.

74
Type Precision Recall F1 score Type Precision Recall F1 score
BERT(CR) 0.672 0.673 0.671 BERT(C) 0.587 0.589 0.585
BERT(C) 0.695 0.701 0.693 BERT(CR) 0.493 0.492 0.477
BERT(PCRW) 0.704 0.705 0.703 BERT(R) 0.679 0.679 0.679
BERT(PR) 0.638 0.638 0.637
BERT(PCW) 0.703 0.703 0.703
LR(CR) 0.526 0.526 0.526
BERT(PC1RW) 0.677 0.678 0.677
LR(R) 0.563 0.564 0.563
BERT(PC1W) 0.689 0.690 0.689
NB(R) 0.557 0.557 0.557
RNN-LSTM(CR) 0.361 0.361 0.361 SVC(R) 0.551 0.551 0.550
BERT(R) 0.722 0.722 0.722 XGB(R) 0.539 0.543 0.528
BERT(PC2R) 0.658 0.685 0.645 SVC(CR) 0.516 0.516 0.516
BERT(PR) 0.706 0.706 0.706 XGB(CR) 0.544 0.544 0.544
SVM(CR) 0.646 0.647 0.646
Table 4: Results for Reddit Dataset
NB(CR) 0.672 0.672 0.672
NB(R) 0.632 0.632 0.632
LR(R) 0.642 0.643 0.642 0.8 BERT(R)

0.6 LR(R)
Table 3: Results for Twitter Dataset 0.4 NB(R)

0.2 SVC(R)

0 XGB(CR)
4 Results Precision Recall F1 score

We have evaluated the test data of Twitter and Figure 4: Results analysis for Reddit Dataset
Reddit dataset which is shared by Figurative
Language processing 2020 shared task organizers. PC1RW represents the processed isolated first
The performance is evaluated by using the metrics context of meaningful words and response, PC1W
as precision, recall and F1 score. We have chosen represents the isolated first context of meaningful
the classifiers to predict the test data based on the words without response, R represents the
performance of the cross validation of training response, PC1R represents the processed second
data. We have performed to predict the test data context with response, PR represents the
by using various combinations of Conversion processed response. The results of the approaches
context and response that are CR represents the are presented in the Table 3 shows the response
combined context of sentences with response, C text from conversion dialogue by using BERT
represents the combined full context of sentences have higher performance than others for the
without response, PCRW represents the processed shared task of the Twitter dataset and the Table 4
combined context of meaningful words and shows BERT response text from conversion
response, PCW represents the combined full dialogue thread performs well for the shared task
context of meaningful words without response, of the Reddit dataset. The best results have
obtained by using BERT model with the isolated
0.8 response(R) text for both the Twitter and Reddit
RNN-LSTM(CR)
dataset respectively. We have noticed that the
0.6 BERT(R) BERT performs well in continuous conversion
0.4 SVM(CR) dialogues or continuous sentences with previous
NB(CR) dialogues compared with the meaningful words
0.2 from conversion context. In both the dataset, the
LR(R)
0
RNN-LSTM performs poor than the SVM, NB
Precision Recall F1 score and LR because of the smaller dataset. The
machine learning approach performs better with
Figure 3: Results analysis for Twitter Dataset the smaller dataset. But the BERT model performs

75
well for the response text of both the Twitter and International Workshop on Semantic Evaluations ,
Reddit dataset with the non-grammatical pages 519–535, Association for Computational
sentences even the data size is small. Figure 3 Linguistics.
shows the chart representations of the Raj Kumar Gupta, and Yinping Yang. 2017.
performance analysis of the different methods in CrystalNest at SemEval-2017 Task 4: Using
the Twitter data. Figure 4 shows the chart Sarcasm Detection for Enhancing Sentiment
representations of the performance analysis of the Classification and Quantification, ACM.
different methods in the Reddit data. David Bamman and Noah A. Smith. 2016.
Contextualized Sarcasm Detection on Twitter,
5 Conclusion Association for the Advancement of Artificial
Intelligence (www.aaai.org).
We have implemented traditional machine Mondher Bouazizi And Tomoaki Otsuki (Ohtsuki),.
learning, deep learning approach and BERT 2016. A Pattern-Based Approach for Sarcasm
model for identifying the sarcasm from Detection on Twitter, IEEE. Translations and
Conversion dialogue thread and to detecting content mining, Digital Object Identifier
sarcasm from social media. The approaches are 10.1109/ACCESS.2016.2594194
evaluated on Figurative Language 2020 dataset. Kalaivani A and Thenmozhi D. 2019. Sentimental
The given utterance of combined text and isolated Analysis using Deep Learning Techniques,
text are preprocessed and vectorized using word International journal of recent technology and
embeddings in deep learning models. We have engineering, ISSN: 2277-3878.
employed RNN-LSTM to build the model for Meishan Zhang, Yue Zhang, and Guohong Fu,. 2016.
both the datasets. The instances are vectorized Tweet Sarcasm Detection Using Deep Neural
using Doc2Vec and TFIDF score for traditional Network, Proceedings of COLING 2016, the 26th
machine learning models. The classifiers namely International Conference on Computational
Logistic Regression (LR), Random Forest Linguistics: Technical Papers, pages 2449–2460.
Classifier (RF), XGBoost Classifier (XGB), Malave N., and Dhage S.N. 2020. Sarcasm Detection
Linear Support vector machine (SVC), Gaussian on Twitter: User Behavior Approach. In: Thampi S.
Naïve Binomial (NB) were employed to build the et al. (eds) Intelligent Systems, Technologies and
models for both the Twitter and Reddit datasets. Applications. Advances in Intelligent Systems and
BERT uncased model with isolated response Computing, vol 910. Springer, Singapore. DOI
https://doi.org/10.1007/978-981-13-6095-4_5.
context gives better results for both the datasets
respectively. The performance may be improved Yitao Cai, Huiyu Cai and Xiaojun Wan. 2019. Multi-
further by using larger datasets. Modal Sarcasm Detection in Twitter with
Hierarchical Fusion Model, Proceedings of the
References 57th Annual Meeting of the Association for
Computational Linguistics, pages 2506–2515
Joshi, A., Bhattacharyya, P., and Carman, M. J. 2017. Association for Computational Linguistics.
Automatic sarcasm detection: A survey. ACM
Computing Surveys (CSUR), 50(5), 73. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2018. Bert: Pre-training of
Ghosh, D., Fabbri, A. R., and Muresan, S. 2018. deep bidirectional transformers for language
Sarcasm analysis using conversation context. understanding. arXiv preprint arXiv:1810.04805.
Computational Linguistics, 44(4), 755-792.
Khodak, M., Saunshi, N., and Vodrahalli, K. 2017. A
large self-annotated corpus for sarcasm. arXiv
preprint arXiv:1704.05579.
Aniruddha Ghosh, and Tony Veale. 2016. Fracking
Sarcasm using Neural Network”, research gate
publication, Conference Paper. DOI:
10.13140/RG.2.2.16560.15363.
Keith Cortis, Andre Freitas, Tobias Daudert, Manuela
Hurlimann, Manel Zarrouk, Siegfried Handschuh,
and Brian Davis. 2017. SemEval-2017 Task 5:
Fine-Grained Sentiment Analysis on Financial
Microblogs and News”, Proceedings of the 11th

76

You might also like