2022 Aacl-Srw 5
2022 Aacl-Srw 5
Abstract
The exponential surge of social media has en-
abled information propagation at an unprece-
dented rate. However, it also led to the gen-
eration of a vast amount of malign content,
such as hateful memes. To eradicate the detri-
mental impact of this content, over the last
few years hateful memes detection problem
has grabbed the attention of researchers. How- (a) Attack religious beliefs (b) Insult a person
ever, most past studies were conducted pri-
marily for English memes, while memes on
Figure 1: Examples of hateful memes having (a) only
resource-constraint languages (i.e., Bengali)
Bengali caption (b) Code-mixed (Bengali + English)
remain under-studied. Moreover, current re-
caption.
search considers memes with a caption writ-
ten in monolingual (either English or Bengali)
form. However, memes might have code-mixed
also trigger human emotions that can be consid-
captions (English+Bangla), and the existing
models can not provide accurate inference in ered harmful. Among them, the propagation of
such cases. Therefore, to facilitate research in hateful content can directly or indirectly attack so-
this arena, this paper introduces a multimodal cial harmony based on race, gender, religion, na-
hate speech dataset (named MUTE) consisting tionality, political support, immigration status, and
of 4158 memes having Bengali and code-mixed personal beliefs. In recent years, memes have be-
captions. A detailed annotation guideline is come a popular form of circulating hate speech
provided to aid the dataset creation in other
(Kiela et al., 2020). These memes on social media
resource-constraint languages. Additionally,
extensive experiments have been carried out
have a pernicious impact on societal polarization
on MUTE, considering the only visual, only as they can instigate hateful crimes. Therefore, to
textual, and both modalities. The result demon- restrain the interaction through hateful memes, an
strates that joint evaluation of visual and tex- automated system is required to quickly flag this
tual features significantly improves (≈ 3%) the content and lessen the inflicted harm to the readers.
hateful memes classification compared to the Several works (Davidson et al., 2017; Waseem and
unimodal evaluation. Hovy, 2016) have accomplished hateful memes
detection, most of which were for the English lan-
1 Introduction
guage. Unfortunately, no significant studies have
With the advent of the Internet, social media plat- been conducted on memes regarding low-resource
forms (i.e., Facebook, Twitter, Instagram) signif- languages, especially Bengali. In recent years an in-
icantly impact people’s day-to-day life. As a re- creasing trend has been observed among the people
sult, many users communicate by posting various to use Bengali memes. As a result, it becomes mon-
content in these mediums. This content includes umental to identify the Bengali hateful memes to
promulgating hate speech, misinformation, aggres- mitigate the spread of negativity. However, memes
sive and offensive views. While some contents analysis is complicated as it requires a holistic un-
are beneficial and enrich our knowledge, they can derstanding of visual and textual content to infer
WARNING: This paper contains meme examples and (Zhou et al., 2021). The visual content of the meme
words that are offensive in nature. alone may not be harmful (Figure 1 (a)). However,
32
Proceedings of the AACL-IJCNLP 2022 Student Research Workshop, pages 32–39
November 20, 2022. ©2022 Association for Computational Linguistics
it becomes hateful with the incorporation of textual Long Short Term Memory (LSTM) Network (Bad-
content as it directly attacks religious beliefs. A jatiya et al., 2017), and the combination of RNN
meme’s caption can be written in a mixed language and convolutional neural network (CNN) (Zhang
(written in both English and Bengali as in Figure et al., 2018b) based methods. Recently, Bidirec-
1 (b)), which can evade the surveillance engine in tional Encoder Representations for Transformers or
those cases. Developing a hateful meme detection BERT-based models (Pamungkas and Patti, 2019;
system for such a scenario is complicated as no Fortuna et al., 2021) are applied and achieved su-
standard dataset is available. Moreover, develop- perior performance compared to the deep learning-
ing an intelligent multimodal memes analysis sys- based methods.
tem for Bengali is challenging due to the unavail- Multimodal hate speech detection: In contrast to
ability of benchmark corpus, lack of reliable NLP the text-based analysis, in recent years, few pieces
tools (such as OCR), and the complex morpholog- of work considered multimodal information (i.e.,
ical structure of the Bengali language. Therefore, image + text) for hate speech detection. For exam-
this work aims to develop a multimodal dataset for ple, Kiela et al. (2020) introduced a multimodal
Bangla hate speech detection and investigate vari- memes dataset for detecting hate speech. Gomez
ous models for the task. The critical contributions et al. (2020) developed a large scale multimodal
of the work are summarized as follows: dataset (MMHS150k) for detecting hateful memes.
In another work, Rana and Jha (2022) introduced
• Created a multimodal hate speech dataset a multimodal hate speech dataset concerning three
(MUTE) in Bengali consisting of 4158 memes modalities (i.e., image, text, and audio). However,
annotated with Hate and Not-Hate labels. few works have been accomplished on multimodal
hate speech detection for resource constraint lan-
• Performed extensive experiments with state-
guages. Perifanos and Goutsos (2021) introduced
of-the-art visual and textual models and then
a multimodal dataset for detecting hate speech in
integrate the features of both modalities using
Greek social media. Likewise, Karim et al. (2022)
the early fusion approach.
developed a dataset for multimodal hate speech de-
2 Related Work tection from Bengali memes. Several approaches
were employed for detecting hate speech using mul-
This section discusses the past studies on hate timodal learning. Some researchers exploited the
speech detection based on unimodal (i.e., image or different fusion (Sai et al., 2022; Perifanos and
text) and multimodal data. Goutsos, 2021) techniques (i.e., early and late fu-
Unimodal based hate speech detection: Hate sion) to evaluate the image and textual features
speech detection is a prominent research issue jointly. Others have employed bi-linear pooling
among the researchers of different languages (Ross (Chandra et al., 2021; Choi and Lee, 2019) and
et al., 2016; Lekea and Karampelas, 2018). Most transformer-based methods (Kiela et al., 2020) such
hate speech detection works were accomplished as MMBT, ViLBERT, and Visual-BERT. Despite
based on the text data. For example, both Davidson having the state of the art multimodal transformer
et al. (2017) and Waseem and Hovy (2016) devel- architectures, these models have only applied for
oped hate speech datasets considering the Twitter high resource language (i.e., English).
posts. Similarly, De Gibert et al. (2018) constructs Differences with existing researches: Though
a dataset that considers the hate speech posted a considerable amount of work has been accom-
in a white supremacy forum. Some works were plished on multimodal hate speech detection, only
also accomplished concerning the low resource a few works studied low-resource languages (i.e.,
languages. For instance, Fortuna et al. (2019); Bengali). In our exploration, we found a work
Ousidhoum et al. (2019) introduced hate speech (Karim et al., 2022) that detects hate speech from
datasets for Portuguese and Arabic. A few works multimodal memes for the Bengali language. How-
have also been done on Bengali hate speech de- ever, they did not curate the social media memes
tection (Romim et al., 2021; Mathew et al., 2021; for analysis; instead artificially created a memes
Ishmam and Sharmin, 2019). Several architectures dataset for Bengali by conjoining the hateful texts
have been employed over the last few years to clas- into various images. Moreover, the current works
sify hateful texts. Earlier researchers widely used overlooked the memes containing captions written
Recurrent Neural Network (Gröndahl et al., 2018),
33
cross-lingually. Considering these drawbacks, the 2020; Gomez et al., 2020; Perifanos and Goutsos,
proposed research differs from the existing stud- 2021), we define the classes:
ies in three ways: (i) develops a multimodal hate Hate: A meme is considered as Hateful if it intends
speech dataset (i.e., MUTE) for Bengali consider- to vilify, denigrate, bullying, insult, and mocking
ing the Internet memes, (ii) provides a detailed an- an entity based on the characteristics including gen-
notation guideline that can be followed for resource der, race, religion, caste, and organizational status
creation in other low resource languages, and (iii) etc.
consider the memes that contain code-mixed (En- Not-Hate: A meme is reckoned as not-Hateful if it
glish + Bangla) and code-switched (written Bengali does not express any inappropriate cogitation and
dialects in English alphabets) caption. conveys positive emotions (i.e., affection, gratitude,
support, and motivation) explicitly or implicitly.
3 MUTE: A New Benchmark Dataset
3.2.1 Process of Annotation
This work developed MUTE: a novel multimodal We instructed the annotators to follow the class def-
dataset for Bengali Hateful memes detection. The initions for performing the annotation. It also asked
MUTE considered the memes with code-mixed and them to mention the reasons for assigning a meme
cod-switched captions. For developing the dataset, to a particular class. This explanation will aid the
we follow the guidelines provided by Kiela et al. expert in selecting the correct label during contra-
(2020). This section briefly describes the dataset diction. Initially, we trained the annotators with
development process with detailed statistics. some sample memes. Four annotators (computer
science graduate students) performed the manual
3.1 Data Accumulation
annotation process, and an expert (a Professor con-
For dataset construction, we have manually col- ducting NLP research for more than 20 years) ver-
lected memes from various social media platforms ified the labels. Annotators were equally divided
such as Facebook, Twitter, and Instagram. We into two groups where each annotated a subset of
search the memes using a set of keywords such memes. In case of disagreement, the expert de-
as Bengali Memes, Bangla Troll Memes, Bangla cided on the final label. The expert ruled a total
Celebrity Troll Memes, Bangla Funny Memes etc. of 113 non-hateful and 217 hateful memes as hos-
Besides, some popular public memes pages are tile and non-hateful. An inter-annotator agreement
also considered for the data collection, such as Keu was measured using Cohen (Cohen, 1960) Kappa
Amare Mairala, Ovodro Memes etc. We accumu- Coefficient to ensure the data annotation quality.
lated 4210 memes from January 10, 2022, to April We achieved a mean Kappa score of 0.714, which
15, 2022. During the data collection, some inappro- indicates a moderate agreement between the an-
priate memes are discarded by following the guide- notators. Earlier, it is mentioned that this work is
lines provided by Pramanick et al. (2021). The the very first attempt at multimodal hate speech
criteria for discarding data are: (i) memes contain detection that considers the social media memes of
only unimodal data, (ii) memes whose textual or the Bengali language. Therefore, it requires more
visual information is unclear and (iii) memes con- extensive scrutiny with more diverse data and a
tain cartoons. In this filtering process, 52 memes high level of annotator agreement to deploy the
were removed and ended up with a dataset of 4158 model trained on this dataset. The agreement score
memes. Afterwards, the caption of the memes illustrates the difficulty in identifying the potential
is manually extracted as Bengali has no standard hateful memes by humans and brings a question
OCR. Finally, the memes and their corresponding of biases, thus limiting the broader impact of this
captions are given to the annotators for annotation. work.
set, which contains a total of 483 memes with code- BiLSTM + Attention: We applied the additive
mixed captions. Moreover, it is also illustrate that attention (Bahdanau et al., 2015) mechanism to
the ‘Not-Hate’ class has a higher number of words the individual word representations of the BiLSTM
and unique words than the ‘Hate’ class. However, cell. The CNN is replaced with an attention layer.
the average caption length is almost identical in The attention layer tries to give higher weight to the
both classes. Apart from this, we carried out a significant words for inferring a particular class.
quantitative analysis using the Jaccard similarity in-
Transformers: Pretrained transformer models
dex to figure out the fraction of overlapping words
have recently obtained remarkable performance in
among the classes. We obtained a score of 0.391,
almost every NLP task (Naseem et al., 2020; Yang
indicating that some common words exist between
et al., 2020; Cao et al., 2020). As the MUTE con-
the classes.
tains cross-lingual text, this work employed three
transformer models, namely Multilingual Bidirec-
4 Methodology tional Encoder Representations for Transformer
(M-BERT (Devlin et al., 2019)), Bangla-BERT
Several computational models have been explored (Sarker, 2020), and Cross-Lingual Representation
to identify hateful memes by considering the single Learner (XLM-R (Conneau et al., 2020)). All the
modality (i.e., image, text) and the combination models are downloaded from HuggingFace1 trans-
of both modalities (image and text). This section former library. We follow their preprocessing 2 and
briefly discusses the methods and parameters uti- encoding technique for preparing the texts. The
lized to construct the models. transformer models provide a sentence represen-
tation vector of size 768. This vector is passed to
4.1 Baselines for Visual Modality a dense layer of 32 neurons, and then using the
pre-trained weights, models are retrained on the
This work employed convolutional neural networks
developed dataset with a sigmoid layer.
(CNN) to classify hateful memes based on visual
information. Initially, the images are resized into 4.3 Baselines for Multimodal Data
150 × 150 × 3 and then driven into the pre-trained
CNN models. Specifically, we curated the VGG19, In recent years, joint evaluation of visual and tex-
VGG16 (Simonyan and Zisserman, 2015), and tual data has proven superior in solving many com-
ResNet50 (He et al., 2016) architectures that fine- plex NLP problems (Hori et al., 2017; Yang et al.,
tuned on MUTE dataset by using the transfer learn- 2019; Alam et al., 2021). This work investigates the
ing (Tan et al., 2018) approach. Before that, the joint learning of multimodal data for hateful memes
top two layers of the models are replaced with a 1
https://huggingface.co/
2
sigmoid layer for classification. https://huggingface.co/docs/tokenizers/index
35
classification. For multimodal feature representa- On the other hand, with the multimodal informa-
tion, we employed the feature fusion (Nojavanas- tion, the outcomes of the models are not improved.
ghari et al., 2016) approach. In earlier experiments, Almost all the models’ WF lies around 0.60 except
all the visual and two textual (i.e., Bangla-BERT the VGG19 + B-BERT model (0.641). However,
and XLM-R) models are used to construct the mul- the VGG16 + B-BERT model outperformed all the
timodal models. For the model construction, we models by achieving the highest weighted WF of
added a dense layer of 100 neurons at both modality 0.672, which is approximately 2% higher than the
sides and then concatenated their outputs to make best unimodal model of B-BERT (0.649).
combined visual and textual data representations.
Finally, this combined feature is passed to a dense
5.2 Error Analysis
layer of 32 neurons, followed by a sigmoid layer
for the classification task. We conducted a quantitative error analysis to inves-
tigate the model’s mistakes across the two classes.
5 MUTE: Benchmark Evaluation To illustrate the errors, the number of misclassified
The training set is used to train the models, whereas instances is reported in Figure 2 for the best uni-
the validation set is for tweaking the hyperparame- modal (ResNet50 and B-BERT) and multimodal
ters. We have empirically tried several hyperparam- (VGG19 + B-BERT) models. It is observed that
eters to obtain a better model’s performance and the misclassification rate (MR) is increased ≈10%
reported the best one. The final evaluation of the and decreased ≈9% from visual to textual model,
models is done on the test set. This work selects respectively, for the ‘Hate’ and ‘Not-Hate’ classes.
the weighted f1-score (WF) as the primary metric However, the joint evaluation of multimodal fea-
for the evaluation due to the class imbalance na- tures significantly reduced the MR to 38% (from
ture of the dataset. Apart from this, we used the 44% and 54%) in the Hate class and thus improved
class weighting technique (Sun et al., 2009) to give the model’s overall performance. Though the mul-
equal priority to the minority class (hate) during timodal model showed superior performance com-
the model training. pared to the unimodal models, there is still room
for improvement. We point out several reasons
5.1 Results behind the model’s mistakes. Among them, identi-
Table 3 illustrates the outcome of the visual, textual, cal words in different written formats (code-mixed,
and multimodal models for hateful memes classifi- code-switched) made it difficult for the model to
cation. In the case of the visual model, ResNet50 identify accurate labels. Moreover, the discrep-
obtained the maximum WF of 0.641. For the text ancy between some memes’ visual and textual in-
modality, the B-BERT model obtained the high- formation creates confusion for the multimodal
est WF (0.649). The outcomes of the other tex- model. Indeed, these are some significant factors
tual models (i.e., BiLSTM + Attention, BiLSTM that should be tackled to develop a more sophisti-
+ CNN, and XLM-R) are not exhibited significant cated model for Bengali hateful memes classifica-
differences compared to the best model (B-BERT). tion.
Approach Models P R WF
VGG19 0.594 0.579 0.584
Visual VGG16 0.636 0.644 0.638
ResNet50 0.643 0.639 0.641
39