Nle23 Stereohate
Nle23 Stereohate
Nle23 Stereohate
doi:10.1017/xxxxx
ARTICLE
Abstract
Though Social media helps spread knowledge more effectively, it also stimulates the propaga-
tion of online abuse and harassment, including hate speech. It is crucial to prevent hate speech
since it may have serious adverse effects on both society and individuals. Therefore, it is not
only important for models to detect these speeches, but to also output explanations of why a
given text is toxic. While plenty of research is going on to detect online hate speech in English,
there is very little research on low-resource languages like Hindi and the explainability aspect
of hate speech. Recent laws like the ”right to explanations” of the General Data Protection
Regulation have spurred research in developing interpretable models rather than only focusing
on performance. Motivated by this, we create the first interpretable benchmark hate speech
corpus HHES in the Hindi language, where each hate post has its stereotypical bias and tar-
get group category. Providing descriptions of internal stereotypical bias as an explanation of
hate posts makes a hate speech detection model more trustworthy. Current work proposes a
commonsense-aware unified generative framework, CGenEx by reframing the multitask prob-
lem as a text-to-text generation task. The novelty of this framework is it can solve two different
categories of tasks (generation and classification) simultaneously. We establish the efficacy of
our proposed model (CGenEx-fuse) on various evaluation metrics over other baselines when
applied to Hindi HHES dataset.
Disclaimer: The article contains profanility, an inevitable situation for the nature of the work
involved. These in no way reflect the opinion of authors.
1. Introduction
The exponential increase in textual content due to the widespread use of social media
platforms renders human moderation of such information untenable (Cao et al. 2020).
Governments, media organizations, and researchers now view the prevalence of hate speech
on online social media platforms as a major problem, particularly given how quickly it
spreads and encourages harm to both individuals and society. Hate speech Nockleby (1994)
is any communication that intends to attack the dignity of a group based on characteristics
such as race, gender, ethnicity, sexual orientation, nationality, religion, or other features.
With the advancement of Natural Language Processing (NLP), numerous studies have sug-
gested methods to detect hate speech automatically using traditional Machine Learning
© Cambridge University Press 2019
2 Natural Language Engineering
(ML) (Dadvar et al. 2014; Dinakar et al. 2011; Reynolds et al. 2011) and deep learning
approaches (Agrawal and Awekar 2018; Waseem and Hovy 2016; Badjatiya et al. 2017).
However, it is crucial for artificial intelligence (AI) tools not only to identify hate speech
automatically but also to generate the implicit bias that is present in the post in order
to explain why it is hated. The advent of explainable artificial intelligence (AI) (Gunning
et al. 2019) has necessitated the provision of explanations and interpretations for deci-
sions made by machine learning algorithms. This requirement is crucial for establishing
trust and confidence in the deployment of AI models. Additionally, recent legislation in
Europe, such as the General Data Protection Regulation (GDPR) (Regulation 2016), has
implemented a ”right to explanation” law, further emphasizing the need for interpretable
models. Consequently, there is a growing emphasis on the development of models that
prioritize interpretability rather than solely focusing on improving performance through
increased model complexity.
Stereotypical bias (SB) (Cuddy et al. 2009), a common unintentional bias, can be based
on specific aspects such as skin tone, gender, ethnicity, demography, disability, Arab-
Muslim origin, etc. Stereotyping is a cognitive bias that permeates all aspects of daily life
and is firmly ingrained in human nature. Social stereotypes have a detrimental influence
on people’s opinions of other groups and may play a crucial role in how people interpret
words aimed towards minority social groups (Sap et al. 2019a). For example, earlier studies
have demonstrated that toxicity detection models correlate texts with African-American
English traits with more offensiveness than texts lacking such qualities (Davidson et al.
2019).
In the past decade, extensive research has been conducted to develop datasets and
models for automatic detection of online hate speech in the English language (Agrawal
and Awekar 2018; Waseem and Hovy 2016; Badjatiya et al. 2017). However, there is a
noticeable scarcity of hate speech detection work in the Hindi language, despite its status
as the fourth-most-spoken language globally, widely used in South Asia. Existing studies in
this domain have primarily focused on enhancing the performance of hate speech detection
using various models, often neglecting the crucial aspect of explainability. The emergence
of explainable artificial intelligence (AI) has now necessitated the provision of explanations
and interpretations for decisions made by machine learning algorithms, becoming a critical
requirement in this field. For instance, debiasing techniques that incorporate knowledge
of the toxic language may benefit from extra information provided by in-depth toxicity
analyses in text (Ma et al. 2020). Furthermore, thorough descriptions of toxicity can make
it easier for people to interact with toxicity detection systems (Rosenfeld and Richardson
2019).
To fill this research gap, in this work, we create a benchmark Hindi hate speech expla-
nation dataset (HHES) that contains the stereotypical bias and target group category of
a toxic post. To create HHES dataset, we manually translate the existing English Social
Bias Frames (SBIC) (Sap et al. 2020a) dataset. Now, we have to develop an efficient mul-
titask framework that can solve two different categories of tasks simultaneously, i.e., (i)
sequence generation task (generate stereotypical bias as explanation) and (ii) classification
task (identify the target group category).
Humans have the ability to learn multiple tasks simultaneously and apply the knowl-
edge learned from one task to another task. To mimic this quality of human intelligence,
researchers have been working on Multitask learning (MTL) (Caruana 1997) which is a
training paradigm in which a model is trained with data from different closely related
tasks in an attempt to efficiently learn the mapping and connection between these tasks.
There have been many works that have shown that solving a closely related auxiliary task
along with the main task increases the performance of the primary tasks (such as cyber-
bullying detection (CD) (Maity and Saha 2021b), complaint identification Singh et al.
LATEX Supplement 3
(2021) and tweet act classification (TAC) (Saha et al. 2021)). A typical multitask model
consists of a shared encoder that contains representations from data of different tasks and
several task-specific layers or heads attached to that encoder. However, there are many
drawbacks of this approach such as negative transfer (Crawshaw 2020) (where multiple
tasks instead of optimizing the learning process start to hurt the training process), model
capacity (Wu 2019) (if the size of shared encoder becomes too large then there will be no
transfer of information across different tasks ) or optimization scheme (Wu 2019) (how to
assign weights to different tasks during training). There are also several scalability issues
with this approach of multitasking such as adding task-specific heads every time a new
task has been introduced or changing the complete model architecture whenever a new
combination of tasks has been introduced.
To overcome the challenges of MTL, we propose the use of a generative model to solve
two different categories of tasks: classification (target group category) and generation
(stereotypical bias). Rather than employing two separate models to address these tasks,
we present a commonsense-aware unified generative multitask framework that can solve
both tasks simultaneously in a text-to-text generation manner. We converted the classifi-
cation task into a generation task, where the target output sentence is the concatenation
of the classification task’s output tokens. In our proposed model, the input is text, such as
a social media post, and the output is also text, representing the concatenation of stereo-
types and target groups separated by a special character. For instance, given the input
post ”Bitches love Miley Cyrus and Rihanna because they speak to every girl’s inner ho,”
the corresponding output or target sequence is ”< Women are sexually promiscuous>
<Gender>.” In this example, ”Women are sexually promiscuous” represents the stereo-
typical bias, and ”Gender” is the target group category. As sentient beings, we use our
common sense to establish connections between what is explicitly said and inferred. We
employed Conceptnet to generate commonsense knowledge to capture and apply common
patterns of real-world knowledge in order to draw conclusions or make decisions about a
given post. For example, if the input sentence is “I was just pretending to be retarded!”
Then some of the generated commonsense reasonings by Conceptnet are (i) “pretend
requires imagination”, (ii) “retard is similar in meaning to an idiot”.
To sum up, our contributions are twofold:
(1) HHES, a new benchmark dataset for explainable hate speech detection with target
group category identification in the Hindi language has been developed.
(2) To simultaneously solve two tasks, i.e., stereotypical bias/explanation (generation
task) and identifying target group (classification task), a commonsense-aware uni-
fied generative framework (CGenEx) with reinforcement learning-based training has
been proposeda .
The organization of this article is as follows. A survey of all the previous works in this do-
main is explained in Section 2. Section 3 describes the process of dataset creation in detail.
Section 4 explains the proposed methodology and Section 5 describes the experimental
settings and results. This part also contains a detailed error analysis of our results.
2. Related Works
Hate speech is very reliant on linguistic subtlety. Researchers have recently provided a lot
of attention to automatically identifying hate speech in social media. In this section, we
will review recent works on detecting and explaining hate speech.
a The code and dataset will be made publicly available in the camera-ready version
4 Natural Language Engineering
2.2 Explainability/Bias
Zaidan et al. (Zaidan et al. 2007) proposed the concept of rationales, in which human an-
notators underlined a section of text that supported their tagging decision. Authors have
examined that the usages of these rationales certainly improved sentiment classification
LATEX Supplement 5
3. Dataset Creation
This section discusses the developed benchmark Hindi hate speech explanation (stereo-
types) dataset (HHES). To begin, we reviewed the literature for the existing hate speech
datasets, which contain stereotypical bias and target groups. As per our knowledge, there
is only one standard social bias dataset (SBIC) in English developed by (Sap et al. 2020a).
The lack of any other publicly available dataset related to our work and the good structure
of this dataset makes it the perfect choice for our purpose.
Technological advancements have revolutionized the way people express their opinions,
particularly in low-resource languages. India, a country with a massive internet user base
of 1010 millionb , exhibits significant linguistic diversity. Among the numerous languages
spoken in India, Hindi holds a prominent position as one of the official languagesc , with
over 691 million speakersd . Consequently, a substantial portion of text conversations on
social media platforms in India occurs in the Hindi language. This phenomenon highlights
the significance of Hindi as the primary medium of communication for the majority of
users in the country.
We have manually annotated the existing English SBIC dataset to create the Hindi
hate speech explanation dataset (HHES). The annotation process was overseen by two
proficient professors who have extensive expertise in hate speech and offensive content
detection. The execution of the annotation task was carried out by a group of ten under-
graduate students who were proficient in both Hindi and English. These students were
recruited voluntarily through the department email list and were provided compensation
in the form of gift vouchers and an honorarium for their participation. To ensure consis-
tency and accuracy in the translation process, we initiated the annotation training phase
with a set of gold-standard translated samples. Our expert annotators randomly selected
300 samples and manually translated them from English to Hindi. Through collaborative
discussions, any differences or discrepancies in the translations were resolved, resulting in
the creation of 300 gold-standard manually annotated samples encompassing toxic posts
and their corresponding stereotypes. To facilitate the training of novice annotators, these
annotated examples were divided into three sets, each containing 100 samples. This di-
vision allowed for a three-phase training procedure in which novice annotators received
guidance and feedback from the expert annotators. After the completion of each training
phase, the expert annotators collaborated with the novice annotators to rectify any in-
correct annotations and provide further guidance. Upon the conclusion of the three-phase
training process, the top ten annotators were selected based on their performance. These
annotators were chosen to annotate the entire dataset, and the workload was evenly di-
vided among them. Therefore, each post was translated by one of the selected annotators.
However, we acknowledge that despite our diligent efforts, there may be cases where the
translation does not precisely replicate the original post due to the inherent difficulties of
cross-lingual translation and the complexities of social media language.
The number of training, validation, and test samples in HHES dataset are 12110, 1806,
and 1924, respectively. The detailed distribution of target group category classes is shown
in Table 1.
Further, we have engaged three senior annotators (Master’s students in Linguistics)
to verify the translation quality in terms of fluency (F) and adequacy (A) as mentioned
in (Ghosh et al. 2022). Fluency evaluates whether the translation is syntactically correct
or not, whereas Adequacy checks the semantic quality. Each annotator marked every
b https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users
c https://en.wikipedia.org/wiki/Hindi
d https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers_in_India
LATEX Supplement 7
Table 1. : Train, validation, and test split distribution of Target group category in HHES
dataset.
Target group
Post Target Stereotype
Category
Hindi: मैं आज तुमसे लड़ने के िलए नहीं गया हूँ. मुझे वापस अपने िनकगा हाथ दे िनगा काल्पिनक शब्द काल्पिनक लोगों के िलए है।
Race
English: What up fake ass hoe? I’m not gone fought you today.
nigga is a slang term for black people
Gimme my nigga hand back”
Hindi: और मैं एक कूड़ा‐कूड़ा बदचलन के बारे में जानी नहीं जा सकती। औरतें बदचलन हैं
Gender
English: And I can’t be known for fucking wit a trash bitch women are bitches
translated sentence with an ordinal value from a scale of 1-5e for both F and A. We attain
high average F and A scores of 4.23 and 4.58, respectively, illustrating that the translations
are of good quality. In Table 2, some examples of the HHES dataset are shown.
4. Methodology
In this work, we have proposed CGenEx (shown in Figure 1), a commonsense-aware uni-
fied generative framework for generating stereotypical bias to explain why an input post
is hateful and identify the target group category. Detailed descriptions of the proposed
models are described below.
Stereotypes Target
Autoregressive Decoder
ConceptNet
Commonsense
Knowledge
Extractor
Concatenation Commonsense
Aware Encoder
Fusion Model
<s>Bitches love Miley Cyrus and Rihanna because
they speak to every girl's inner ho<\s> Input Text
�� �
Figure 1: A commonsense-aware unified generative framework (CGenEx) architecture.
��
Commonsense
Fusion context-aware self-attention
Qx Kcs Vcs
Add & Norm
Linear Gating Gating
FNN
Multi-Head
Self Attention
Positional
Encoding
Input Text
�� �
Figure 2: Commonsense aware encoder module internal architecture
��
where G is a generation model. We divide our approach into three steps: 1) Commonsense
extraction module, 2) Commonsense aware transformer model and 3) Reinforcement
learning-based training.
LATEX Supplement 9
We first concatenate the tokens of the input text, Xi , and the commonsense reasoning,
CS to provide us with a final input sequence as follows: Ti = Xi ⊕ CS. Now, given a pair
of input sentences and target sequence (Ti , Yi ), the first step is to feed Ti to the encoder
module to obtain the hidden representation of input defined as
where Fso f tmax represents softmax computation and WGen denotes weights of our model.
where UK and UV are learnable parameters and matrices λK and λV are computed as
follows:
[ ] [ ][ ] [ ][ ]
λK Kx WKX UK WKCS
= σ( + GCS ) (7)
λV Vx WVX UV WVCS
where WKX , WVX , WKCS and WVCS all are learnable parameters. σ represents the sigmoid
function computation.
After obtaining Kcs and Vcs , we apply the dot product attention based fusion method
over Qx , Kcs and Vcs to obtain the final commonsense aware input representation, Z,
computed as:
Qx K T
Z = so f tmax( √ cs )Vcs (8)
dk
At last, we feed this commonsense aware input representation vector, Z, to an au-
toregressive decoder following the same decoder computations defined in equation
4.
4.1.7 Inference
During the training process, we have access to both the input sentence (Xi ) and target
sequence (Yi ). Thus, we train the model using the teacher forcing approach, i.e., using the
target sequence as the input instead of tokens predicted at prior time steps during the
decoding process. However, the inference must be done in an autoregressive manner as we
don’t have the access to target sequences to guide the decoding process. After obtaining
′
the predicted sequence Yi , we split that sequence around the special character (<>) to
get the corresponding predictions for different tasks; stereotypical bias, and target group
as described in Equation 1.
12 Natural Language Engineering
(1) CNN-GRU: The sequence output from BERT, with dimensions 128×768, is passed
through 1D CNN layers. These layers consist of three kernel sizes (1, 2, 3) and 100
filters for each size. The resulting convoluted features are then fed into a GRU layer.
The hidden output from the GRU layer is passed to a Fully Connected (FC) layer
with 100 neurons, followed by an output softmax layer.
f https://scikit-learn.org/stable/
g https://pytorch.org/
LATEX Supplement 13
(2) BiRNN: The input is fed into a Bidirectional GRU (Bi-GRU) with 128 hidden
units, generating a 256-dimensional hidden vector. This hidden vector is then passed
to an FC layer, followed by output layers for the final class prediction.
(3) BiRNN-Attention: Similar to the previous baseline model, but with the addition
of an attention layer between the Bi-GRU and FC layers.
(4) BERT-finetune: In this approach, the mBERT model is fine-tuned by adding an
output softmax layer on top of the ”CLS” output.
Generation Baselines: We use mBART Liu et al. (2020) and T5 Raffel et al. (2020)
as the baseline text-to-text generation models. We fine-tune these models on the proposed
dataset with the training objective defined in Equation 9. In a single task setting, the
output sequence is either the stereotype or target group category, depending on which task
you want to solve. In the case of multitasking, the output sequence is the concatenation
of the stereotype and target group category.
Table 3. : Results of different baselines and the two proposed frameworks, CGenEx-Con
and CGenEx-Fuse in a multi task setting. For the target tasks the results are in terms of
macro-F1 score (F1), Accuracy (Acc) and Matthews correlation coefficient (MCC) values.
F1, Acc and MCC metrics are given in %. The maximum scores attained are represented
by bold-faced values; Gray Highlight represents statistically significant results;
Stereotype Target
Model
BLEU ROUGE-L BERTScore Accuracy F1-Score MCC
Standard Baselines
CNN-GRU - - - 60.23 43.33 47.25
BiRNN - - - 60.81 43.99 48.12
BiRNN+Attention - - - 62.33 44.31 50.37
BERT-Finetune - - - 65.41 47.37 52.69
mT5-ST 36.38 39.87 78.52 62.43 45.78 50.61
mT5-MT 37.58 41.14 79.72 64.12 46.88 52.19
mBART-ST 41.47 45.72 81.12 81.23 67.35 70.23
mBART-MT 42.25 46.33 82.28 83.12 72.44 71.84
Proposed model (CGenEx) Single task
mT5:CGenEx-con 37.14 40.89 79.12 65.53 47.84 55.31
mT5:CGenEx-fuse 38.62 42.37 80.46 66.92 48.07 55.93
mBART:CGenEx-con 41.87 45.93 82.77 83.54 72.66 74.52
mBART:CGenEx-fuse 42.95 46.74 83.09 83.83 72.78 74.72
Proposed model (CGenEx) Multi task
mT5:CGenEx-con 38.25 42.14 80.75 67.96 48.67 56.76
mT5:CGenEx-fuse 38.88 42.97 81.63 68.12 48.86 56.83
mBART:CGenEx-con 43.12 47.08 83.89 84.36 72.98 75.86
mBART:CGenEx-fuse 44.23 48.83 85.27 84.77 73.24 76.26
14 Natural Language Engineering
are statistically significant. We employ scipy library functions stats.ttest_indh for the
t-test. We have highlighted (gray color) the results in Table 3 which are statistically
significant.
Table 4. : Classwise Precision, Recall, and F1 score of the Target identification task
generated by single task and multitask variants of our proposed model (CGenEx-fuse)
Figure 3:
Confusion matrices: Single task Vs Multi-task variants of mBART-CGenEx-fuse
model for target identification task.
Table 5. : Ablation Study to show the effect of reinforcement learning based training
Multi-task setting
Model Target Stereotype
Acc F1 Bleu Rouge-L BERTScore
mBART:CGenEx-con 84.36 72.98 42.32 46.38 83.89
-RL 81.11 69.06 39.43 45.44 81.95
mBART:CGenEx-fuse 84.77 73.24 44.23 48.83 85.27
-RL 82.82 70.91 39.35 45.27 82.14
1.95% in the accuracy of the target classification task and 3.13% in the BERTScore of the
stereotype generation task. This shows that RL training plays a vital role in improving the
performance for both tasks as the BLEU-based reward function (Equation 10) encourages
the model to generate an output sequence close to the golden sequence.
Task
Model Implied statement
Target group category
(Stereotypes)
BLEU Rouge-L Acc F1
SBF-GPT1 Sap et al. (2019b) 49.9 40.2 - -
SBF-GPT2 Sap et al. (2019b) 49.6 43.5 - -
ST-mBART:CGenEx-con 51.43 55.12 82.76 72.08
ST-mBART:CGenEx-fuse 53.67 56.04 86.23 75.14
MT-mBART:CGenEx-con 53.28 57.06 83.98 72.85
MT-mBART:CGenEx-fuse 55.16 58.74 87.74 76.39
LATEX Supplement 17
Predicted
Post True stereotype True target Model Predicted Stereotype
Target
यहूदी लोग जल गये थे संस्कृित
यहूदी बदबू क्यों करते हैं? क्योंिक वे SBF-GPT Sap et al. (2019b)
नरसंहार के बारे में Trans: Jewish folks were burned (Culture)
िफर से नहाने के िलए नहीं िगर रहे हैं।
प्रकाश डालता है संस्कृित यहूिदयों का गंध जाित
Translation: Why do Jews stink? ST:CGenEx-con
Trans: Sheds light (Culture) Trans: Jews are gross (Race)
Because they are not falling in the
on the genocide प्रलय के बारे में प्रकाश डालता है
bath again. संस्कृित
ST:CGenEx-fuse Trans: sheds light on mass
(Culture)
destruction
नरसंहार की बात करता है जाित
MT:CGenEx-con
Trans: Discusses about genocide (Race)
नरसंहार के बारे में प्रकाश डालता है संस्कृित
MT:CGenEx-fuse
Trans: sheds light on the genocide (Culture)
Translated Hindi
# Hindi Post Translated English Post English Commonsense
commonsense
जब बीए कुछ ऐसा कहती है जो When bae says somethin
आपको परेशान करता है लेिकन that pisses you off but crazy is similar in पागल िदमागी बीमारी
1
आप कोिशश कर रहे हैं िक अब you’re trying to not to meaning to brainsick के अर्थ में समान है
आप पागल न हों be crazy anymore
लोग आजकल अमेिरका में काले
People talk about how hard it
होने के िलए िकतना मुश्िकल है के
is to be black in America nowadays...
बारे में बात करते हैं... लेिकन सफेद
but what about being white? काला अल्पसंख्यक
2 होने के बारे में क्या है? जैसे गन्दा, black is related to minority
Messy as it is, we’re really on an से संबंिधत है
हम वास्तव में लगभग एक समान
almost equal playing field.
खेल के मैदान पर हैं. हमारे दादा‐बाप
As our grandparents it was easy.
के रूप में यह आसान था.
बाइक और काले पुरुषों में क्या What do bikes and black
3 black is distinct from white काला सफेद से अलग है
समानता है? शृंखला men have in common?
During the holocaust the
hоⅼоⅽаuѕt के दौरान यहूिदयों की
Jews had a secret language
एक गुप्त भाषा थी जो वे बोलते थे...
4 that they spoke... It was jew is a ethnic group यहूदी एक पंथ है
यह कुछ इस तरह था,
something like, ”Oh my God!
”ऐ मेरे भगवान! मैं जीिवत बर्िनंग हूँ”
I’m burning alive”
error analysis to analyze and study the effect of these translations to better understand the
semantic loss occurring while translating, which is shown in Table 8. It can be seen from
the table that both translations happen correctly in the first two examples. However, in
the third example, the first translation fails to translate the input Hindi post correctly as
it misses the corresponding English word for Hindi word शृंखला, which completely changes
the context of the input sentence. In the fourth example, the first translation happens
correctly. However, the second translation (English commonsense to Hindi commonsense)
fails as it mistranslated the word ethnic, which can misguide the model rather than help
the model.
5.6 Limitations
In this work, we primarily focused on detecting and analyzing explicit hate speech in social
media posts. Detecting sarcasm accurately in text is a complex task, as it often relies on
contextual cues, tone, and understanding of cultural references. It goes beyond the scope
of our current study, which primarily focuses on explicit and overt forms of hate speech.
However, we acknowledge the significance of sarcasm as a potential element in hate speech
and its impact on targeted groups. It is an important aspect to consider in future research
and system development.
References
Agrawal, S. and Awekar, A. 2018. Deep learning for detecting cyberbullying across multiple social media
platforms. In European conference on information retrieval, pp. 141–153. Springer.
Badjatiya, P., Gupta, M., and Varma, V. 2019. Stereotypical bias removal for hate speech detection
task using knowledge-based generalizations. In The World Wide Web Conference, pp. 49–59.
Badjatiya, P., Gupta, S., Gupta, M., and Varma, V. 2017. Deep learning for hate speech detection in
tweets. In Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760.
Cao, R., Lee, R. K.-W., and Hoang, T.-A. 2020. Deephate: Hate speech detection via multi-faceted
text representations. In 12th ACM conference on web science, pp. 11–20.
Caruana, R. 1997. Multitask learning. Machine Learning, 28.
Crawshaw, M. 2020. Multi-task learning with deep neural networks: A survey.
Cuddy, A. J., Fiske, S. T., Kwan, V. S., Glick, P., Demoulin, S., Leyens, J.-P., Bond, M. H.,
Croizet, J.-C., Ellemers, N., Sleebos, E., and others 2009. Stereotype content model across cultures:
Towards universal similarities and some differences. British journal of social psychology, 48(1):1–33.
Dadvar, M., Trieschnigg, D., and Jong, F. d. 2014. Experts and machines against bullies: A hybrid
approach to detect cyberbullies. In Canadian conference on artificial intelligence, pp. 275–281. Springer.
Davidson, T., Bhattacharya, D., and Weber, I. 2019. Racial bias in hate speech and abusive language
detection datasets. arXiv preprint arXiv:1905.12516.
Davidson, T., Warmsley, D., Macy, M., and Weber, I. 2017. Automated hate speech detection and
the problem of offensive language. In Proceedings of the international AAAI conference on web and social
media, volume 11, pp. 512–515.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. 2018. Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805.
Dinakar, K., Reichart, R., and Lieberman, H. 2011. Modeling the detection of textual cyberbullying.
In Proceedings of the International Conference on Weblog and Social Media 2011. Citeseer.
Ghosh, S., Ekbal, A., and Bhattacharyya, P. 2022. Am i no good? towards detecting perceived
burdensomeness and thwarted belongingness from suicide notes. arXiv preprint arXiv:2206.06141.
Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., and Yang, G.-Z. 2019. Xai—explainable
artificial intelligence. Science Robotics, 4(37):eaay7120.
Kamble, S. and Joshi, A. 2018. Hate speech detection from code-mixed hindi-english tweets using deep
learning models. arXiv preprint arXiv:1811.05145.
Karim, M. R., Dey, S. K., Islam, T., Sarker, S., Menon, M. H., Hossain, K., Hossain, M. A.,
and Decker, S. 2021. Deephateexplainer: Explainable hate speech detection in under-resourced bengali
language. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA),
pp. 1–10. IEEE.
Kumar, R., Reganti, A. N., Bhatia, A., and Maheshwari, T. 2018. Aggression-annotated corpus of
hindi-english code-mixed data. arXiv preprint arXiv:1803.09402.
20 Natural Language Engineering
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V.,
and Zettlemoyer, L. 2020. Bart: Denoising sequence-to-sequence pre-training for natural language
generation, translation, and comprehension. pp. 7871–7880.
Lin, C.-Y. and Och, F. J. 2004. Automatic evaluation of machine translation quality using longest common
subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting of the Association for
Computational Linguistics (ACL-04), pp. 605–612.
Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., Lewis, M., and Zettlemoyer,
L. 2020. Multilingual denoising pre-training for neural machine translation.
Ma, X., Sap, M., Rashkin, H., and Choi, Y. 2020. Powertransformer: Unsupervised controllable revision
for biased language correction. arXiv preprint arXiv:2010.13816.
Maity, K. and Saha, S. 2021a. Bert-capsule model for cyberbullying detection in code-mixed indian
languages. In International Conference on Applications of Natural Language to Information Systems, pp.
147–155. Springer.
Maity, K. and Saha, S. 2021b. A multi-task model for sentiment aided cyberbullying detection in code-
mixed indian languages. In International Conference on Neural Information Processing, pp. 440–451.
Springer.
Mathew, B., Saha, P., Yimam, S. M., Biemann, C., Goyal, P., and Mukherjee, A. 2020.
Hatexplain: A benchmark dataset for explainable hate speech detection. arXiv preprint arXiv:2012.10289.
Nockleby, J. T. 1994. Hate speech in context: The case of verbal threats. Buff. L. Rev., 42:653.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002a. Bleu: a method for automatic evaluation
of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational
Linguistics, pp. 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002b. Bleu: a method for automatic evaluation
of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational
Linguistics, pp. 311–318.
Paul, S. and Saha, S. 2020. Cyberbert: Bert for cyberbullying identification. Multimedia Systems, pp.
1–8.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. 2019. Language models
are unsupervised multitask learners.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and
Liu, P. J. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal
of Machine Learning Research, 21(140):1–67.
Regulation, P. 2016. Regulation (eu) 2016/679 of the european parliament and of the council. Regulation
(eu), 679:2016.
Reynolds, K., Kontostathis, A., and Edwards, L. 2011. Using machine learning to detect cyberbullying.
In 2011 10th International Conference on Machine learning and applications and workshops, volume 2,
pp. 241–244. IEEE.
Rosenfeld, A. and Richardson, A. 2019. Explainability in human–agent systems. Autonomous Agents
and Multi-Agent Systems, 33(6):673–705.
ROUGE, L. C. 2004. A package for automatic evaluation of summaries. In Proceedings of Workshop on
Text Summarization of ACL, Spain.
Saha, T., Upadhyaya, A., Saha, S., and Bhattacharyya, P. 2021. A multitask multimodal ensemble
model for sentiment-and emotion-aided tweet act classification. IEEE Transactions on Computational
Social Systems.
Sancheti, A., Krishna, K., Srinivasan, B., and Natarajan, A. 2020. Reinforced rewards framework
for text style transfer.
Sap, M., Card, D., Gabriel, S., Choi, Y., and Smith, N. A. 2019a. The risk of racial bias in hate speech
detection. In Proceedings of the 57th annual meeting of the association for computational linguistics, pp.
1668–1678.
Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N. A., and Choi, Y. 2019b. Social bias frames:
Reasoning about social and power implications of language. arXiv preprint arXiv:1911.03891.
Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N. A., and Choi, Y. 2020a. Social bias frames:
Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting
of the Association for Computational Linguistics, pp. 5477–5490, Online. Association for Computational
Linguistics.
Sap, M., Shwartz, V., Bosselut, A., Choi, Y., and Roth, D. 2020b. Commonsense reasoning for natural
language processing. In Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics: Tutorial Abstracts, pp. 27–33, Online. Association for Computational Linguistics.
Singh, A., Saha, S., Hasanuzzaman, M., and Dey, K. 2021. Multitask learning for complaint
LATEX Supplement 21