0% found this document useful (0 votes)
23 views14 pages

1) Sarcasm Detection in Online Comments Using

This paper explores sarcasm detection in online comments using machine and deep learning techniques, focusing on a dataset of 1.3 million social media comments. The study compares various models, highlighting that deep learning approaches, particularly BERT-based models, outperform traditional machine learning methods in detecting sarcasm. It also discusses the challenges of sarcasm detection, including the need for effective pre-processing and feature extraction to improve model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

1) Sarcasm Detection in Online Comments Using

This paper explores sarcasm detection in online comments using machine and deep learning techniques, focusing on a dataset of 1.3 million social media comments. The study compares various models, highlighting that deep learning approaches, particularly BERT-based models, outperform traditional machine learning methods in detecting sarcasm. It also discusses the challenges of sarcasm detection, including the need for effective pre-processing and feature extraction to improve model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Sarcasm detection in online comments using

machine learning
Daniel Šandor and Marina Bagic Babac
Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia

Abstract
Purpose – Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for
machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is
largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using
positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the
approach of machine and deep learning.
Design/methodology/approach – For the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of
1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language
processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge
regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory
and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared.
Findings – The performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of
improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art
model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models.
Originality/value – This study compared the performance of the various machine and deep learning models in the task of sarcasm detection using
the data set of 1.3 million comments from social media.
Keywords Sarcasm, Natural language processing, Machine learning, Deep learning, BERT, Reddit
Paper type Research paper

1. Introduction A variety of machine learning algorithms have been applied to


the task of sarcasm detection, including naive Bayes, support
Sarcasm detection is the task of identifying whether a given piece
vector machines, random forests, recurrent neural networks and
of text is sarcastic or not that has valuable applications in the real
convolutional neural networks (CNNs) (Poria et al., 2016).
world (Davidov et al., 2010). For example, in the world of
These algorithms work by learning to recognize patterns in text
business and marketing, sarcasm is a type of sentiment analysis
data that are associated with sarcasm (Zhang and Wallace,
task that, among other things, can be used for brand monitoring,
2018). On the other hand, the choice of features is an important
customer feedback analysis, opinion mining and market research
(Puh and Bagic Babac, 2023). Although sarcasm can often cheat factor in sarcasm detection (Arora, 2020). Various features have
standard sentiment analysis models, sarcasm detection can be been used for sarcasm detection, including lexical, syntactic and
used to collect truthful information about the general public’s semantic features. Lexical features involve the frequency of
view of a product or brand (Riloff et al., 2013). certain words or phrases that are often associated with sarcasm,
The problem of sarcasm detection is challenging because it while syntactic features involve the use of parts of speech and
involves a complex interplay of linguistic, pragmatic and other grammatical structures to detect sarcasm (Ghosh et al.,
contextual factors (Reyes et al., 2012). Sarcasm can be 2018). Semantic features involve the use of word embeddings or
expressed in many ways, ranging from subtle to overt, and can other techniques to capture the meaning of the text.
depend on a range of linguistic and contextual cues (Baroiu and In recent years, bidirectional encoder representations from
Traus, an-Matu, 2022). For example, sarcasm can be conveyed transformers (BERT) has emerged as a state-of-the-art method for
using exaggeration, understatement, irony or parody and can various natural language processing tasks (Devlin et al., 2019),
involve a range of linguistic features such as lexical ambiguity,
negation and presupposition (Ashwitha et al., 2021).
© Daniel Šandor and Marina Bagic Babac. Published by Emerald
Publishing Limited. This article is published under the Creative Commons
Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute,
translate and create derivative works of this article (for both commercial
The current issue and full text archive of this journal is available on Emerald and non-commercial purposes), subject to full attribution to the original
Insight at: https://www.emerald.com/insight/2398-6247.htm publication and authors. The full terms of this licence may be seen at
http://creativecommons.org/licences/by/4.0/legalcode

Received 3 January 2023


Information Discovery and Delivery Revised 11 March 2023
Emerald Publishing Limited [ISSN 2398-6247] 9 June 2023
[DOI 10.1108/IDD-01-2023-0002] Accepted 4 July 2023
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

including sarcasm detection. However, one major limitation of (2016) used two data sets manually labelled by American
BERT is its computational cost. BERT is a large-scale deep annotators to analyse how cultural differences impact the
learning model with millions of parameters, making it quality of annotation. Indian annotators faced difficulties
computationally expensive to train and use. It is a complex model mostly because of a lack of context for the provided data. Either
that can be difficult to interpret. This can make it challenging to concerning the text itself or the abstraction (e.g. the socio-
understand how the model is making predictions or diagnose economic situation at the time) the text refers to. Data bias,
errors or biases in the model. Despite these limitations, BERT unbalance between the amount of sarcastic and non-sarcastic
remains a powerful tool for sarcasm detection. However, it is data, also influenced their annotations. The language barrier
important to consider these limitations when using BERT and proved troublesome as well. The alternative to manual
explore simpler machine learning algorithms that may be more annotation is the automatic labelling of data (Musso and Bagic
appropriate for certain applications. Babac, 2022). Automatic labelling of data can be used, for
Each implementation of machine learning algorithms and example, by searching for “#sarcasm” or similar hashtags. This
neural network architectures has its own choice of method can provide data sets with more false positives and false
hyperparameters. Consequently, there exists a research gap negatives. This study used an automatically labelled data set.
regarding the fine-tuning of parameters in existing algorithms, Data collected from social media websites contain text that is
especially considering the inherently complex task of sarcasm written in an informal, colloquial language. Some text pre-
detection. Our contribution in addressing this gap lies in our processing is required. In a study by Kumar and Anand (2020),
own adjustment of hyperparameters, the selection of neural data is taken from labelled Twitter and Reddit data sets. To
network layers and the choice of features. This study shows remove the noise from the data, they removed unwanted
how relatively simple machine learning algorithms can yield punctuation, multiple spaces, URL tags, changed different
comparable results to more complex ones that may be sufficient abbreviations to their contracted format, etc. Saha et al. (2017)
for sarcasm detection in some cases. We designed, proposed a model for sentiment analysis of sarcastic tweets that
implemented and tested our own simple neural architectures requires data that has been pre-processed in a certain manner,
that showed promising results. Although the BERT-based e.g. stop-words were removed, and text was tokenized before
model outperformed other models, this study shows which each word was assigned its category, i.e. parts of speech tagging.
models can be used in less computationally complex settings. When using standard machine learning models, feature
For example, we demonstrated that ridge regression yields extraction plays an important role. To extract more context
promising results, which is less known. from the data, a set of features should be defined. In a study by
Buschmeier et al. (2014), a set of features to determine if a
2. Literature overview review is ironic or not was created. Some of these features are
Sarcasm detection is a focused research field in natural imbalance, hyperbole and punctuation. They marked a review
language processing with the subject approached in different imbalanced if the given star rating was high, but most words in
ways. Semi-supervised pattern extraction, hashtag-based the text had negative sentiment, and vice versa. The review was
supervision and speech-based analysis are some of the methods marked hyperbolic if there were three or more positive or
used in past (Reyes et al., 2012). There are also two opposed negative words in a row. The punctuation feature marks the
approaches concerning the model selection for the task of presence of multiple questions or exclamation marks (Nayel
classification. Standard machine-learning models require et al., 2021).
manual feature extraction from data (Ghosh and Veale, 2017). Deep learning models have also been used for sarcasm
This approach leaves more control to the researcher, but the detection due to their ability to automatically learn complex
task of choosing the correct, useful features is usually as difficult patterns and representations from raw text data. In the case of
as choosing the correct machine learning model. In a study by deep learning models, model selection is the difficult part
Bouazizi and Otsuki (2016), features were divided into (Gupta et al., 2020). Transformer models are a recent
sentiment-related, punctuation-related, syntax-related and development in deep learning and have been shown to be
pattern-related features. This extensive feature selection highly effective for a wide range of NLP tasks.
proved effective in machine learning classification models. Hazarika et al. (2018) introduced CASCADE, a contextual
Deep learning models provide automatic feature extraction. sarcasm detector, that effectively leveraged content and
They can potentially discover subtle semantic patterns and contextual information. By integrating user profiling and
capture context. They also require large amounts of data and discourse modelling with a CNN-based textual model,
are highly configurable. Finding the correct model proved to be CASCADE achieved state-of-the-art results on a large-scale
a difficult task on its own. Reddit corpus. This research underscored the potential of deep
Sarcasm detection is mostly used as a subcomponent of learning in capturing intricate contextual cues for sarcasm
sentiment analysis. Sentiment analysis refers to the contextual detection. Likewise, Pant and Dadu (2020) asserted the
analysis of text which identifies and extracts subjective significance of context in refining the performance of modals
information in the source material, commonly related to a based on contextual word embedding.
business problem (Bagic Babac and Podobnik, 2016). Sarcasm Scola and Segura-Bedmar (2021) extended the application
detection is a task that requires large data sets. In literature, of these models beyond social media data. They investigated
data is mostly collected from social media websites, like the use of bidirectional long short-term memory (BiLSTM)
Twitter. In the context of sarcasm detection, the annotation of models for sarcasm detection in news headlines. This study
data is a difficult and time-consuming process. Joshi et al. bridged the gap between social media and traditional news
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

texts, demonstrating the capability of deep learning models to learning practitioners. It contains 1.3 million sarcastic
detect sarcasm across diverse domains. statements from the internet commentary forum-like website
As deep learning models gained prominence, the introduction Reddit (Khodak et al., 2018). The data was collected by
of BERT marked a significant breakthrough. Khatri and Pranav extracting comments from Reddit forums that included a tag
(2020) incorporated BERT and GloVe embeddings to detect denoting sarcasm (“\s”). This tagging convention is commonly
sarcasm in tweets. By considering both the tweet content and used by Reddit users to identify sarcastic remarks. Non-
the contextual information, their approach showcased the sarcastic comments were also scraped to balance the data set.
effectiveness of BERT in capturing the nuanced contextual cues The data set is divided into the train and test sets. The training
necessary for accurate sarcasm detection. BERT’s ability to set contains 1,010,826 items of data. Additional features were
capture bidirectional dependencies in text revolutionized the extracted while scraping the data. Each item contains the
field, enabling a more sophisticated understanding of sarcastic following information: comment, the parent comment to which
language. the comment is related, author of the comment, subreddit (a
Capitalizing on BERT’s success, Parameswaran et al. (2021) specific forum), number of upvotes (likes) for the comment,
further fine-tuned BERT models, namely, using additional number of downvotes (dislikes) for the comment, the score
domain-specific data. This approach demonstrated the (calculated as the absolute number of upvotes subtracted from
transferability of BERT’s contextual representations across the number of downvotes), the date the comment was posted in
domains, resulting in enhanced performance in sarcasm YYYY-MM format and in UTC format. Finally, the label
detection. In addition, Savini and Caragea (2022) investigated determines whether the comment is sarcastic or not.
the performance improvement brought by BERT-based Table 1 shows the last five items of data in pandas DataFrame
models, both with and without intermediate task transfer format. The pandas is a fast, powerful, flexible open-source data
learning, compared to previous works. The study specifically analysis tool built on top of the Python programming language.
focused on the significance of message content in sarcasm It is used here for data manipulation and processing.
detection, showing that BERT models using only the message
content outperformed models leveraging additional 3.2 Data pre-processing
information from a writer’s history encoded as personality Text as a representation of language is a formal system that
features. This research shed light on the effectiveness of BERT- follows certain syntactic and semantic rules. It is complex and
based models and emphasized the importance of considering difficult to interpret for computational systems. Text pre-
message content for accurate sarcasm detection. processing is an integral part of any natural language processing
Recently, Sharma et al. (2023) incorporated word and phrase task (Kampic and Bagic Babac, 2021). It is done to simplify the
embeddings, including BERT, and used fuzzy logic evolutionary complex forms of natural text for easier processing by the
techniques to refine classification accuracy. This novel approach machine learning model which uses it. The text is cleaned from
aimed to overcome the limitations of traditional deterministic noise in form of emoticons, punctuation, letters in a different
models by introducing fuzzy reasoning, enabling improved case, stop words and so on.
handling of uncertainty and ambiguity in sarcasm detection. In this data set, some items with blank comments were
These advancements underscore the importance of observed. Because the comment text is the primary focus of this
leveraging contextual information, incorporating domain- paper, 53 items with blank comments were removed. Also, some
specific data (Misra and Arora, 2023) and exploring innovative duplicate items were observed. After the removal of these
techniques to better understand and interpret the intricate duplicates, 1,010,745 rows of data remained. In addition, certain
nature of sarcastic language. faulty score calculations were observed. To ensure the
Sarcasm detection dependents on data set quality, pre- correctness of the data, the score was recalculated as the number
processing methods and feature engineering, so choosing an of downvotes (downs column) subtracted from the number of
appropriate model is not a straightforward process, as it requires upvotes (ups column).
careful consideration and evaluation of various factors such as The primary point of analysis in this study was the comments
performance metrics, interpretability and computational and the parent comments they responded to. This text needed
resources. In addition, the subjective nature of sarcasm makes it to be further processed before it could be supplied to the
challenging to accurately capture and classify in text, further machine learning models. Firstly, the short forms of words were
emphasizing the importance of thorough analysis and decontracted (e.g. “won’t” was transformed to “will not”, “’m”
experimentation in developing effective models for sarcasm was transformed to “am”, “’ve” was transformed to “have”,
detection. This study presents an approach to detect sarcasm in etc.). This was done by applying several regular expressions
natural language using machine and deep learning models using using the sub-function from the re library available in Python.
a data set of 1.3 million Reddit comments by extracting Next, the return symbol (“\r”), the newline symbol (“\n”) and
additional features for analysis. The study confirms the potential the quote symbol (“ \” ”) were replaced by a single whitespace.
of using machine and deep learning techniques for sarcasm Then, the punctuation was removed from the sentences. For
detection and provides insights for further improvement. this, a constant in the Python string library containing all
punctuation signs (“!#$%&’() 1, -./:;<¼>?@[\]^_`{j}”) was
used. Then, again with the use of the sub-function from
3. Research methodology Python’s re library, all non-alphanumerical values were
3.1 Exploratory data analysis removed.
The data set [1] used in this study was downloaded from The sentences were tokenized using the Natural Language
Kaggle, an online community of data scientists and machine Toolkit (NLTK) function word_tokenize. The NLTK is a
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Table 1 Sample data from the used data set


Id. Label Comment Author Subreddit Score Ups Downs Date Created UTC parent_comment
1010821 1 I’m sure that Iran and N. TwarkMain reddit.com 2 2 0 2009-04 2009-04-25 No one is calling this
Korea have the techn. . . 00:47:52 an engineered
pathogen,. . .
1010822 1 Whatever you do, don’t BCHarvey Climate 1 1 0 2009-05 2009-05-14 In a move typical of
vote green 22:27:40 their recent do-
nothing a. . .
1010823 1 Perhaps this is an rebelcommander Atheism 1 1 0 2009-01 2009-01-11 Screw the Disabled–
atheist conspiracy to 00:22:57 I’ve got to get to
make . . . Church. . .
1010824 1 The Slavs got their own catsi Worldnews 1 1 0 2009-01 2009-01-23 I’ve always been
country – it is called. . . 21:12:49 unsettled by that. I
hear a I. . .
1010825 1 Values, as in capitalism frogking Politics 2 2 0 2009-01 2009-01-24 Why do the people
there is good mone. . . 06:20:14 who make our laws
seem unabl. . .
Note: A sample from the public data set on Kaggle.com
Source: The data set is made by Khodak et al. (2018); www.kaggle.com/datasets/danofer/sarcasm

platform used for building Python programs that work with The author and subreddit columns are also textual values.
human language data for application in statistical natural They are mostly one-word titles that required simpler
language processing. It contains text-processing libraries for processing. The values were transformed into lowercase.
tokenization, parsing, classification, stemming, tagging and Whitespaces were removed. Special signs (dash and
semantic reasoning. The function word_tokenize transforms text underscore) were removed. Finally, dots were replaced with the
into a list of words. These words were transformed into word “dot”. For example, the author “Kvetch_22.”, was
lowercase. Stop words were removed from the list of words. transformed into “kvetch22dot”.
Stop words are the most common words in any language (like
articles, prepositions, pronouns, etc.). They do not add much 3.3 Data processing
information to the text and should be removed so the model has The provided data set is balanced, with 505,340 sarcastic
less noise in the data to deal with (Kostelej and Bagic Babac, comments and 505,450 non-sarcastic comments. The word
2022). Examples of a few stop words in English are “the”, “a”, “I” is by far the most common word in these comments.
“an”, “so”, “what”. NLTK library offers English stop words in Sarcasm is often expressed by contradicting emotions in one
form of a list. Then, the words were tagged with their language sentence, and the word “but” is a conjunction used to connect
form, i.e. parts of speech tagging. ideas that contrast. The word “but” does not appear among
The process of classifying words into their parts of speech and frequent words in non-sarcastic comments, which could mean
labelling them accordingly is known as part-of-speech (POS) the word “but” is a good indicator of sarcasm.
tagging, grammatical tagging or simply tagging (Bandhakavi Figure 1 shows the number of comments per score. Both
et al., 2017). Parts of speech are also known as word classes or sarcastic and non-sarcastic comments score mostly around 0.
lexical categories (Kumawat and Jain, 2015). The collection of Non-sarcastic comments generally have more positive scores.
tags used for a particular task is known as a tag set (Bird et al., Summed up scores for the sarcastic comments would give a
2009). A part of speech is a grammatical category, commonly sum of 2,702,923, and the sum of scores for the non-sarcastic
including verbs, nouns, adjectives, adverbs, determiners and so rows is 3,002,887.
on. In this article, a smaller subset of available tags was chosen. The top three rated sarcastic comments were “I think he was
Words were tagged with one of the following tags: noun, verb, really referring to the vuvuzela”, with a score of 4,010; “Jesus, you
adjective or adverb. This was done to increase the performance of wonder why you’re still single!”, with a score of 3,444; and “Yet
word lemmatization. Tokenized words were lemmatized using another thing that men are better at doing”, with a score of 3,220.
the NLTK class WordNetLemmatizer. The POS tag was The top three ranked non-sarcastic comments were “Getting
supplied to the lemmatize function. Lemmatization refers to the pushed back by the impact of a bullet.”, with a score of 5,163; “Ah,
use of a vocabulary and morphological analysis of words, a nice guy.”, with a score of 4,909; and “Does anyone know if
normally aiming to remove inflectional endings only and to Flint has clean water yet?”, with a score of 4,609.
return the base or dictionary form of a word (e.g. “am”, “are”, Subreddits, in the context of Reddit, are forums on a specific
“is” is transformed into “be“), which is known as the lemma subject (e.g. data science subreddit). Figure 2 shows the top five
(Manning et al., 2008). Finally, lemmatized words were subreddits with the most sarcastic comments. Figure 3 shows the
combined into text. For example, the comment: “Trick or top five subreddits with the most non-sarcastic comments. Both
treating, in general, is just weird [. . .]”, was transformed into sarcastic and non-sarcastic comments are most numerous in the
“trick treat general weird”. “AskReddit” and “politics” subreddits. None of the sarcastic
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Figure 1 Number of comments per score

Figure 2 Subreddits with the most sarcastic comments

comments are from the “funny” subreddit, which is surprising learning directly to the raw data. Standard machine learning
because sarcasm is often related to humour. methods require manual feature extraction. Manual feature
extraction requires a good understanding and insight into
3.4 Feature extraction and analysis the given data. Features should be specific to the studied
Feature extraction refers to the process of transforming raw problem (Brzic et al., 2023). Because standard machine
data into numerical features that can be processed while learning models are used in this paper, manual feature
preserving the information in the original data set (Wang extraction had to be done. Feature extraction was done on
et al., 2018). It yields better results than applying machine the unprocessed comments.
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Figure 3 Subreddits with the most non-sarcastic comments

The first feature is the uppercase word count (Jurafsky and comments. The reason for that is that sarcasm is often
Martin, 2000). This feature was chosen because capital letters expressed with strong positive or negative statements (Avdic
indicate strong arguments and opinions. An assumption was that and Bagic Babac, 2021).
uppercase letters would be used to indicate strong, negative, Sarcasm is often portrayed by using contrasting statements.
sarcastic expressions (e.g. Yeah, I LOVE to WORK and not have Because of this, the absolute difference in sentiment between
FUNN!). That is, the assumption was that sarcastic texts would the comment and its parent comment was calculated.
have more uppercase letters (Ren et al., 2020). Compound scores for the whole texts were subtracted. The
Sentiment analysis is a tightly related problem to sarcasm assumption was that sarcastic comments would have a larger
detection (Nayel et al., 2021). All words in the comment text difference in polarity with their parent comment than non-
were analysed for sentiment using the VADER (Valance sarcastic comments. The compound score for the whole
Aware Dictionary for sEntiment Reasoner) sentiment analysis comment text was also recorded.
tool. Elbagir and Yang (2019) used VADER, a sentiment Sarcasm can also be expressed using certain syntactic features
analysis tool, for the task of multi-classification of tweets (Ashwitha et al., 2021). The first syntactic feature extracted was
related to the 2016 US election. It showed good accuracy in the repeated letter count. Appearances of three or more letters are
this data. VADER is a lexicon and rule-based sentiment counted. The assumption is that sarcastic words will have more
analysis tool that is specifically attuned to sentiments occurrences of repeated letters (e.g. “Yeaaaah, riiiight”). Again, by
expressed in social media. It was used in this article because using the constant containing punctuation from the string library
the data used is scraped from a social media website. in Python, punctuations are counted for each comment. The
A compound sentiment score was calculated for each word in assumption is, sarcastic comments would have more punctuation
a comment, except stop words. This is the most useful metric if (e.g. “I adore being bored!!!!”). Dots (e.g. “I love hard work [. . .]”)
a single unidimensional measure of sentiment for a given and quotation marks (e.g. “I “love” hard work”) were the best
sentence is required. It is a normalized, weighted composite indicators of sarcasm, so their counts are calculated separately.
score. Values span from 1.0 to 1.0, 1.0 indicating a strong By incorporating these feature extraction methods, we aimed
negative sentiment, and 1.0 indicating a strong positive to capture distinct linguistic and syntactic patterns associated
sentiment. Words that scored higher than 0.4 were classified as with sarcasm. These features provide valuable insights into the
positive. Words that scored lower than 0.4 were classified as characteristics of sarcastic comments, enabling effective
negative. Words that scored higher than 0.4 and lower than detection and analysis.
0.4 were classified as neutral. Positive, neutral and negative After feature extraction, further data analysis is possible
words were counted for each comment. The assumption was (Cvitanovic and Bagic Babac, 2022). Figure 4 shows that
that sarcastic comments would contain more positive and sarcastic comments had more of both positive and negative
negative words and less neutral words than non-sarcastic words. Figure 5 shows that there were slightly more neutral
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Figure 4 Positive and negative word counts per label

Figure 5 Neutral word counts per label

words in the non-sarcastic comments. Neutral words were sarcastic and non-sarcastic comment polarity is concentrated
more numerous for both comment types. This fits with the around the score of 0.5. The second-highest concentration of
assumptions made in the previous paragraph. Sarcastic polarity is around 0.5.
comments carry slightly more strong opinions. Figure 7 shows the differences between the difference
Figure 6 shows the distribution of sarcastic (left) and non- between compound sentiment scores of comments and their
sarcastic (right) comment polarity scores. There is a much respective parent comments. It can be observed that the
higher number of neutral comments, with a compound polarity difference for both types of comments is concentrated
sentiment score of 0.0, than negative or positive comments. at the value of 0.0. That is because most comments and parent
This is to be expected because neutral words are most common comments have a compound sentiment value of 0.0. It can also
in spoken language. Comments with a score of 0.0 were be observed that sarcastic comments have an overall slightly
excluded for analysis’ sake. It can be observed that both larger polarity difference from non-sarcastic comments.
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Figure 6 Comment polarity distribution

Figure 7 Polarity difference between comments are parent comments

4. Results from machine learning models mostly one-word titles. The vocabulary for comments and
parent comments was built considering only the top 5,000
The process of sarcasm detection is essentially a binary
features ordered by term frequency across the corpus. The
classification problem, with the two labels being “sarcastic” and vocabulary for authors and subreddits is considered the top
“non-sarcastic”. The pre-processed comments at this point 1,000 features. When building the vocabulary, terms that have
were not ready to be used as input in standard machine learning a document frequency lower than 10 were ignored. During the
models. Additional data transformation was required. pre-processing step, accents were removed, and another
Comment, parent comments, authors and subreddits are all character normalization was done. All numerical features
textual fields. Their pre-processed values were vectorized using (score, ups, downs, uppercase count, polarity difference,
the TF-IDF (Term Frequency Inverse Document Frequency) positive word count, negative word count, neutral word count,
Vectorizer available in Python. For the comments and parent repeated letters count, punctuation count, dot count and quote
comments, features were made of words 1 gram, 2 grams and 3 count) were added to the corresponding rows.
grams, in hope of preserving context. For the authors and The data was split into train and test sets (Poch Alonso and
subreddits, features were made of word 1 grams, as they are Bagic Babac, 2022); 20% of the data went to the testing data set
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

which made a training set of 808,596 items and a testing set of classifier performs equally as well. The ridge regression classifier
202,149 items, and a total of 12,013 features for each data set, has slightly better precision.
that is 10,000 for comments and parent comments, 2,000 for According to a study done by Li et al. (2022), in which an
authors and subreddits and 13 for numerical features. overview of standard machine learning models and deep
The performance of all models was measured by the average learning models used for text classification tasks was given,
accuracy, F1 score, precision and recall. Cross-validation with SVMs, usually, perform well on text classification tasks. This
10 stratified folds was used on the training data. Final trend was noticed here as well. Specifically, the SVM
performance metrics were calculated as the mean of metrics in implementation in Python proved better performing than most
all folds. The best-performing model, according to accuracy, other models. Ridge regression classification gave competitive
was used to get predictions on the test set which was not used in results in comparison with the SVM classifier. Our results are
cross-validation. The confusion matrix containing the number summarized in Table 2.
of true positives, false positives, true negatives and false
negatives was plotted for this prediction. For all models, none 5. Results from deep learning models
or simple changes, like regularization strength, were made. The
performance of these models is compared to neural network Deep learning models have been showing good results in text
models. classification problems since the early 2010s. They usually do
The first classification model used was a logistic regression not require feature extraction because they integrate feature
with the limited-memory Broyden–Fletcher–Goldfarb–Shanno extraction and creation into the model fitting process, and they
(algorithm) solver and L2 regularization (the inverse of map features directly to outputs using non-linear transformations
regularization strength C was set to 0.8). Mean accuracy was (Goodfellow et al., 2016). Deep learning models can be a good fit
63.2%, mean F1 score was 60.4%, mean recall was 56.3% and for the task of sarcasm detection because sarcasm is deeply
mean precision was 65.4%. Figure 8(a) shows the confusion dependent on context. Deep learning models can account for the
matrix. There were 74,465 true negatives, 48,799 false serial structure and contextual information found in textual data.
negatives, 26,806 false positives and 52,079 true positives. Deep learning models required some additional text pre-
The second classification model is the classifier using ridge processing before the data could be used as input. The only
regression with the solver chosen automatically by the data textual data used for these models were the comments. The
type. The mean accuracy was 70%, the mean F1 score was already pre-processed comments were further tokenized using
68.4%, the mean recall was 65.6% and the mean precision was the Keras library Tokenizer class in Python. This class enables
71.4%. Figure 8(b) shows the confusion matrix. There are the vectorization of a whole text corpus. The tokenizer is
74,818 true negatives, 34,760 false negatives, 26,453 false trained on the comments from the train set. This creates a
positives and 66,118 true positives. In comparison to logistic vocabulary of words with their indices. Each text in the train
regression, the ridge regression classifier outperforms logistic and test set is then turned into a sequence of integers of the
regression by all used performance metrics. same size. The integers represent the index of the word in the
The third classification model is the linear support vector vocabulary. The sequences of integers (tokens) are set to a fixed
machine (SVM) with Stochastic Gradient Descent (SGD) size of 250. If the sequence is shorter, it is padded with zeros to
learning and L2 regularization. The gradient of the loss is 250 tokens. If the sequence is longer, it is truncated to 250
estimated for each sample at a time, and the model is updated tokens.
along the way with a decreasing strength schedule (aka learning The first model was created using the Keras Sequential API. The
rate). Alpha, the factor which multiplies the regularization term second model was created using the Keras Functional API because
and the learning rate, is set to 0.0001. The training was limited it allows multiple inputs. The models were trained for 10 epochs.
to 50,000 iterations. The mean accuracy was 67.6%, the mean The performance of both models was measured by the accuracy, F1
F1 score was 63.3%, the mean recall was 56.3% and the mean score, precision and recall performance metrics on the test set. The
precision was 73.1%. Figure 8(c) shows the confusion matrix. neural networks for both models were simple. The assumption was
There are 76,180 true negatives, 38,469 false negatives, 25,091 that even without the added complexity, the automatic feature
false positives, and 62,409 true positives. In comparison to the detection ability of neural networks would enable these models to
logistic regression classifier, this linear SVM classifier surpass the standard machine learning models.
outperforms logistic regression by all used performance The first deep learning model used was a five-layer neural
metrics. In comparison to the ridge regression classifier, this network. Firstly, there is the 250-dimensional input layer for
model underperforms in all performance metrics except the tokenized comments. The second layer is the embedding
precision. layer. The vocabulary generated by the process of tokenization
The fourth classification model is the linear support vector is used here to generate the index which will be used to
classifier with L2 regularization. It is like the regular support calculate the 16-dimensional embedding of each comment.
vector classifier with the linear kernel, but it should work better The following layer is a layer of 16 bidirectional LSTM (long
on many samples according to the model documentation. The short-term memory) cells. It is used here because of the
inverse regularization parameter, C, was set to 0.9. The mean bidirectional LSTM’s ability to remember context from
accuracy was 70%, the mean F1 score was 68.3%, the mean the “past” and the “future”. The assumption was that the
recall was 65.8% and the mean precision was 71.3%. Figure 8(d) contextual information would capture the nuances of sarcasm
shows the confusion matrix. There are 73,871 true negatives, in the text. The following layer is the dense layer. This layer
33,694 false negatives, 27,400 false positives and 67,184 true contains 24 neurons using the rectified linear unit (ReLU)
positives. In comparison to the ridge classifier, this linear SVM activation function. The final dense layer consists of one
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Figure 8 Confusion matrix

neuron using the sigmoid activation function. The output is the This model had an accuracy score of 68%, an F1 score of
probability of each label for the input. The loss function used 67.7%, a recall score of 68.3% and a precision score of 67.1%.
while training the model was binary cross-entropy. The Meaning, it performed better than logistic regression according
optimization algorithm was Adam. According to the Keras to all performance metrics, better than the SGD SVM
documentation, Adam optimization is an SGD method that is according to accuracy and F1 score and outperformed all
based on adaptive estimation of first-order and second-order models according to recall. The model performed well, even
moments. Figure 9 shows the architecture of the neural without the manual feature extraction, but it could not be said
network with dimensions of inputs and outputs for each layer. it is the best model.
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Table 2 Summary of results for machine learning models


Model Accuracy (%) F1 (%) Recall (%) Precision (%)
Logistic regression 63.2 60.4 56.3 65.4
Ridge regression 70.0 68.4 65.6 71.4
Support vector machine 67.6 63.3 56.3 73.1
Linear support vector 70.0 68.3 65.8 71.3
Note: Italic are highest values per column
Source: Made by the authors

Figure 9 First deep learning model architecture model that has shown state-of-the-art performance in various
natural language processing tasks (Vaswani et al., 2017). In this
study, we used the BERT-based neural network model, which
consists of 12 layers and 768 hidden units. The model was
trained on a large corpus of English texts and was fine-tuned on
the specific text classification task at hand (Devlin et al., 2019).
The methodology of implementing BERT for sarcasm
detection involves several technical steps (Parameswaran et al.,
2021). After the pre-processing of text cleaning and
normalization, the BERT tokenizer is used to tokenize the text
data into sub-word tokens, which are then converted into
numerical representations called input encodings. These input
encodings include token IDs, attention masks and segment
IDs, which are used to train the BERT-based classification
model. The token IDs represent the numerical values of the
sub-word tokens, the attention masks indicate which tokens are
part of the input sequence and the segment IDs differentiate
between two different input sequences. Then, the BERT-based
classification model is fine-tuned on the labelled data set. The
fine-tuning process involves training the model on the input
encodings with a specific loss function and optimizer.
In this study, the loss function used for sarcasm detection
here is the sparse categorical cross-entropy loss function, and
the optimizer used is the Adam optimizer with a learning rate of
2  105. Finally, the fine-tuned BERT-based classification
The second deep learning model used was a six-layer neural model is evaluated on a separate test data set to assess its
network. The first layer is the 250-dimensional input layer. The performance. The evaluation metrics used to measure the
input data is the same as the previous model. Following the model’s performance are shown in Table 3 along with the
input layer is the embedding layer. Words were again results of the first and second deep learning models.
embedded in a 16-dimensional feature space. A bidirectional The results of our experiments show that the BERT-based
LSTM layer of the same size was used for the comment data. model is the most successful in detecting sarcasm, with an
The purpose of this layer is again to capture the contextual accuracy of 73.1%. This model also achieves the highest F1
nature of the textual data. The following layer differs from the measure, precision and responsiveness (72.4%, 71.3% and
previous model. Here, the manually extracted numerical 72.2%, respectively), indicating its ability to recognize sarcastic
features are combined with the output of the bidirectional statements. On the other hand, both BiLSTM-based models
score slightly lower compared to BERT, although they are still
LSTM layer. The next difference is that in this model an extra
able to recognize sarcasm with decent accuracy. These results
layer of 12 neurons using the ReLU activation function was
also indicate the great potential of the BERT-based model in
added. The assumption was that an extra dense layer would
detecting sarcasm in the real world; however, it can also be
better map the added numerical features. The output layer is
noted that other algorithms gave slightly worse results.
again a sigmoid function, which gives the probability of both
labels. Figure 10 shows the architecture of the neural network
with dimensions of inputs and outputs for each layer. 6. Conclusion
This model had an accuracy score of 67%, an F1 score of Sarcasm, characterized by the deliberate use of words to
68.1%, a recall score of 67.2% and a precision score of 65.7%. express the opposite of their literal meaning, represents a
Meaning, adding the manually extracted features and the extra complex form of communication. Its predominantly spoken
dense layer reduced the performance of the model. nature poses significant challenges for computational
The third deep learning model to test on a classification task detection. However, this study presents a robust framework for
of sarcasm detection is BERT, a pre-trained deep learning detecting sarcasm in social media comments. The findings have
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Figure 10 Second deep learning model architecture

Table 3 Summary of results for deep learning models was measured and compared to the performance of deep neural
networks. It can be concluded that the performance of both
Model Accuracy (%) F1 (%) Recall (%) Precision (%)
kinds of models depends largely on text cleaning and pre-
BiLSTM 1 68.0 67.7 68.3 67.1 processing. The performance of machine learning models is also
BiLSTM 2 67.0 68.1 67.2 65.7 deeply dependent on manual feature extraction, while the
BERT-based 73.1 72.4 71.3 72.2 performance of deep learning models depends mostly on the
Note: Italic are highest values per column architecture itself. The performance of the ridge regression
Source: Made by the authors classifier was surprising, as it is not as prominent in literature as
other models, in the context of text classification. Logistic
regression was the worst-performing model by all performance
practical implications in various domains, including sentiment measures. The fine-tuned BERT-based model outperformed
analysis, online reputation management and customer service. machine learning models as well as both BiLSTM models.
By accurately identifying instances of sarcasm, this framework Standard machine learning models could be improved by
contributes to enhancing the understanding of nuanced better feature extraction. For example, the combination of
communication patterns in online interactions and facilitates sentiment-related features, punctuation-related features,
more effective decision-making in relevant applications. syntax-related features and pattern-related features yielded
This article described two ways to approach this task and the good results (Bouazizi and Otsuki, 2016). Deep learning
necessary data preparation steps. An automatically annotated models could be improved with more complex architecture.
data set containing data from an online forum was analysed, pre- For example, multiple bidirectional LSTM layers could be
processed and used as input for four standard machine learning added to better capture the contextual information in the text.
models and three deep learning models. The performance of Although the BERT-based model showed promising results in
several machine learning models with near-default parameters this study, further research can be conducted to evaluate other
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

deep learning models and compare their performance with the learning in datasets on COVID-19 and climate change”,
BERT-based model. This can help to identify the most Algorithms, Vol. 16 No. 5, p. 221, doi: 10.3390/a16050221.
effective deep learning model for sarcasm detection. The data Buschmeier, K., Cimiano, P. and Klinger, R. (2014), “An
set is also massive, so more learning epochs could be beneficial impact analysis of features in a classification approach to irony
for the automatic feature extraction process. detection in product reviews”, Proceedings of the 5th Workshop
Sarcasm can be domain-specific, meaning that it can differ on Computational Approaches to Subjectivity, Sentiment and
depending on the context or topic being discussed (Potamias Social Media Analysis. Baltimore, MD, pp. 42-49.
et al., 2020). Future research can explore the development of Cvitanovic, I. and Bagic Babac, M. (2022), “Deep learning
domain-specific sarcasm detection models that can accurately with self-attention mechanism for fake news detection”, in
identify sarcasm in specific domains, such as politics or Lahby, M., Pathan, A.-S., Maleh, Y.K. and Yafooz, W.M.S.
entertainment (Marijic and Bagic Babac, 2023). Moreover, (Eds), Combating Fake News with Computational Intelligence
contextual factors such as cultural disparities, social norms and Techniques, Springer, Switzerland, pp. 205-229, doi:
the speaker’s intention play a crucial role in the interpretation 10.1007/978-3-030-90087-8_10.
of sarcasm. Investigating how these contextual elements Davidov, D., Tsur, O. and Rappoport, A. (2010), “Semi-
influence the performance of machine and deep learning supervised recognition of sarcastic sentences in twitter and
models in sarcasm detection would be an essential avenue for amazon”, Proceedings of the Fourteenth Conference on
future research. Understanding the impact of context on the Computational Natural Language Learning, pp. 107-116.
effectiveness of these models can help enhance their robustness Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019),
and applicability in real-world scenarios. “BERT: pre-training of deep bidirectional transformers for
language understanding”, Proceedings of the 2019 Conference of
the North American Chapter of the Association for Computational
Note
Linguistics: Human Language Technologies, Minneapolis, MN,
1 www.kaggle.com/datasets/danofer/sarcasm USA, pp. 2-7, June 2019.
Elbagir, S. and Yang, J. (2019), “Analysis using natural
language toolkit and VADER sentiment”, Proceedings of the
References International MultiConference of Engineers and Computer
Scientists 2019, China, Hong Kong.
Arora, A. (2020), “Sarcasm detection in social media: a review”,
Ghosh, D. and Veale, T. (2017), “Fracking sarcasm using neural
Proceedings of the International Conference on Innovative
network”, Proceedings of the 15th Conference of the European Chapter
Computing & Communication (ICICC) 2021”, https://ssrn.
of the Association for Computational Linguistics, pp. 439-444.
com/abstract¼3749018 or doi: 10.2139/ssrn.3749018
Ghosh, D., Fabbri, A.R. and Muresan, S. (2018), “Sarcasm
Ashwitha, A.S.G.S.H.R., Upadhyaya, A.P. and Ray, P.M.T.C.
analysis using conversation context”, Computational
(2021), “Sarcasm detection in natural language processing”,
Linguistics, Vol. 44 No. 4, pp. 755-792, doi: 10.1162/
Materials Today: Proceedings, Vol. 37, pp. 3324-3331, doi:
10.1016/j.matpr.2020.09.124. coli_a_00336.
Avdic, D. and Bagic Babac, M. (2021), “Application of Goodfellow, I., Bengio, Y. and Courville, A. (2016), Deep
affective lexicons in sports text mining: a case study of FIFA Learning, Adaptive Computation and Machine Learning Series,
World Cup 2018”, South Eastern European Journal of MIT Press, London, England.
Communication, Vol. 3 No. 2, pp. 23-33. Gupta, R., Kumar, J. and Agrawal, H. (2020), “A statistical
Bagic Babac, M. (2022), “Emotion analysis of user reactions to approach for sarcasm detection using twitter data”, 2020 4th
online news”, Information Discovery and Delivery, doi: International Conference on Intelligent Computing and Control
10.1108/IDD-04-2022-0027. Systems (ICICCS), IEEE, pp. 633-638.
Bagic Babac, M. and Podobnik, V. (2016), “A sentiment Hazarika, D., Poria, S., Gorantla, S., Cambria, E.,
analysis of who participates, how and why, at social media Zimmermann, R. and Mihalcea, R. (2018), “Cascade:
sports websites: how differently men and women write about contextual sarcasm detection in online discussion forums”,
football”, Online Information Review, Vol. 40 No. 6, Proceedings of the 27th International Conference on
pp. 814-833, doi: 10.1108/OIR-02-2016-0050. Computational Linguistics, Association for Computational
Bandhakavi, A., Wiratunga, N., Massie, S. and Padmanabhan, Linguistics, Santa Fe, NM, pp. 1837-1848.
D. (2017), “Lexicon generation for emotion detection from Joshi, A., Bhattacharyya, P., Carman, M.J., Saraswati, J. and
text”, IEEE Intelligent Systems, Vol. 32 No. 1, pp. 102-108. Shukla, R. (2016), “How do cultural differences impact the
Baroiu, A.-C. and Traus, an-Matu, S, . (2022), “Automatic quality of sarcasm annotation? A case study of Indian
sarcasm detection: systematic literature review”, Information, annotators and American text”, LaTeCH@ACL.
Vol. 13 No. 8, p. 399, doi: 10.3390/info13080399. Jurafsky, D. and Martin, J.H. (2000), Speech and Language
Bird, S., Klein, E. and Loper, E. (2009), Natural Language Processing: An Introduction to Natural Language Processing,
Processing with Python, O’Reilly Media. Computational Linguistics, and Speech Recognition, Prentice-
Bouazizi, M. and Otsuki, T. (2016), “A pattern-based Hall, Upper Saddle River, NJ.
approach for sarcasm detection on Twitter”, IEEE Access, Kampic, M. and Bagic Babac, M. (2021), “Sentiment analysis of
Vol. 4, pp. 5477-5488. president trump’s tweets: from winning the election to the fight
Brzic, B., Botički, I.B. and Babac, M. (2023), “Detecting against COVID-19”, Communication Management Review,
deception using natural language processing and machine Vol. 6 No. 2, pp. 90-111, doi: 10.22522/cmr20210272.
Sarcasm detection Information Discovery and Delivery
Daniel Šandor and Marina Bagic Babac

Khatri, A. and Pranav, P. (2020), “Sarcasm detection in tweets Neural Computing and Applications, Vol. 32 No. 23,
with BERT and GloVe embeddings”, Proceedings of the pp. 17309-17320.
Second Workshop on Figurative Language Processing, Puh, K. and Bagic Babac, M. (2023), “Predicting stock market
pp. 56-60, Online. Association for Computational Linguistics. using natural language processing”, American Journal of Business,
Khodak, M., Saunshi, N. and Vodrahalli, K. (2018), “A large Vol. 38 No. 2, pp. 41-61, doi: 10.1108/AJB-08-2022-0124.
self-annotated corpus for sarcasm”, Proceedings of the Eleventh Ren, L., Xu, B., Lin, H., Liu, X. and Yang, L. (2020),
International Conference on Language Resources and Evaluation “Sarcasm detection with sentiment semantics enhanced
(LREC 2018). Miyazaki, Japan. multi-level memory network”, Neurocomputing, Vol. 401,
Kostelej, M. and Bagic Babac, M. (2022), “Text analysis of the pp. 320-326.
harry potter book series”, South Eastern European Journal of Reyes, A., Rosso, P. and Buscaldi, D. (2012), “From humor
Communication, Vol. 4 No. 1, pp. 17-30. recognition to irony detection: the figurative language of social
Kumar, A. and Anand, V. (2020), “Transformers on sarcasm media”, Data & Knowledge Engineering, Vol. 74, pp. 1-12.
detection with context”, Proceedings of the Second Workshop on Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N. and
Figurative Language Processing. Virtual event, pp. 88-92. Huang, R. (2013), “Sarcasm as contrast between a positive
Kumawat, D. and Jain, V. (2015), “POS tagging approaches: a sentiment and negative situation”, in Proceedings of the 2013
comparison”, International Journal of Computer Applications, Conference on Empirical Methods in Natural Language
Vol. 118 No. 6, pp. 32-38. Processing, Association for Computational Linguistics, Seattle,
Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P.S. and Washington, pp. 704-714.
He, L. (2022), “A survey on text classification: from Saha, S., Yadav, J. and Ranjan, P. (2017), “Proposed approach
traditional to deep learning”, ACM Transactions on Intelligent for sarcasm detection in twitter”, Indian Journal of Science and
Systems and Technology, Vol. 13 No. 2, Article 31, p. 41, doi: Technology, Vol. 10 No. 25, pp. 1-8.
10.1145/3495162. Savini, E. and Caragea, C. (2022), “Intermediate-task transfer
Manning, C.D., Raghavan, P. and Schütze, H. (2008), learning with BERT for sarcasm detection”, Mathematics,
Introduction to Information Retrieval, Cambridge University Vol. 10 No. 5, p. 844, doi: 10.3390/math10050844.
Press, Cambridge, England. Scola, E. and Segura-Bedmar, I. (2021), “Sarcasm detection
Marijic, A. and Bagic Babac, M. (2023), “Predicting song
with BERT”, Procesamiento Del Lenguaje Natura, Vol. 67,
genre with deep learning”, Global Knowledge, Memory and
pp. 13-25.
Communication, doi: 10.1108/GKMC-08-2022-0187.
Sharma, D.K., Singh, B., Agarwal, S., Pachauri, N., Alhussan,
Misra, R. and Arora, P. (2023), “Sarcasm detection using news
A.A. and Abdallah, H.A. (2023), “Sarcasm detection over
headlines dataset”, AI Open, Vol. 4, pp. 13-18.
social media platforms using hybrid ensemble model with
Musso, I.M.-B. and Bagic Babac, M. (2022), “Opinion mining
fuzzy logic”, Electronics, Vol. 12 No. 4, p. 937, doi: 10.3390/
of online product reviews using a lexicon-based algorithm”,
electronics12040937.
International Journal of Data Analysis Techniques and
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,
Strategies, Vol. 14 No. 4, pp. 283-301.
Gomez, A.N., Kaiser. and Polosukhin, I.Ł (2017),
Nayel, H., Amer, E., Allam, A. and Abdallah, H. (2021),
“Attention is all you need”, Advances in Neural Information
“Machine learning-based model for sentiment and sarcasm
detection”, Proceedings of the Sixth Arabic Natural Language Processing Systems, Vol. 30, pp. 5998-6008.
Processing Workshop, Association for Computational Linguistics, Wang, A., Singh, A., Michael, J., Hill, F., Levy, O. and Bowman,
Kyiv, Ukraine (Virtual), pp. 386-389. S.R. (2018), “GLUE: a multi-task benchmark and analysis
Pant, K. and Dadu, T. (2020), “Sarcasm detection using platform for natural language understanding”, Proceedings of the
context separators in online discourse”, arXiv preprint 2018 EMNLP Workshop BlackboxNLP: Analyzing and
arXiv:2006.00850. Interpreting Neural Networks for NLP, Vol. 1, pp. 353-355.
Parameswaran, P., Trotman, A., Liesaputra, V. and Eyers, D. Zhang, Y. and Wallace, B. (2018), “A sensitivity analysis of (and
(2021), “BERT’s the word: sarcasm target detection using practitioners’ guide to) convolutional neural networks for
BERT”, Proceedings of the 19th Annual Workshop of the Australasian sentence classification”, arXiv preprint arXiv:1510.03820.
Language Technology Association, Online. Australasian Language
Technology Association, pp. 185-191.
Further reading
Poch Alonso, R. and Bagic Babac, M. (2022), “Machine
learning approach to predicting a basketball game outcome”, Puh, K. and Bagic Babac, M. (2022), “Predicting sentiment
International Journal of Data Science, Vol. 7 No. 1, pp. 60-77. and rating of tourist reviews using machine learning”, Journal
Poria, S., Cambria, E., Hazarika, D., Vij, P. and Hussain, A. of Hospitality and Tourism Insights, doi: 10.1108/JHTI-02-
(2016), “A deeper look into sarcastic tweets using deep convolution 2022-0078.
neural networks”, Proceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing, pp. 1969-1980. Corresponding author
Potamias, R.A., Siolas, G. and Stafylopatis, A. (2020), “A c Babac can be contacted at: marina.bagic@
Marina Bagi
transformer-based approach to irony and sarcasm detection”, fer.hr

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: [email protected]

You might also like