A Hierarchically-Labeled Portuguese Hate Speech Dataset

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

A Hierarchically-Labeled Portuguese Hate Speech Dataset

Paula Fortuna1,3 , João Rocha da Silva1,2 ,


Juan Soler-Company3 , Leo Wanner3,4 , Sérgio Nunes1,2
1
INESC TEC, 2 FEUP, University of Porto
Rua Dr. Roberto Frias, s/n 4200-465 Porto, Portugal
3
NLP Group, ETIC, Pompeu Fabra University, Barcelona, Spain
4
Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
[email protected],[email protected]
[email protected],[email protected], [email protected]

Abstract and defamation of specific individuals or groups of


people. One of these observed negative phenom-
Over the past years, the amount of online of-
fensive speech has been growing steadily. To ena is the propagation of hate speech. Hate speech
successfully cope with it, machine learning is leads to a negative self-image and social exclu-
applied. However, ML-based techniques re- sion of the targeted individuals, groups or popu-
quire sufficiently large annotated datasets. In lations, and incites violence against them. A clear
the last years, different datasets were pub- example of the extreme harm that can be caused
lished, mainly for English. In this paper, we by hate speech is the 1994 Rwandan genocide;
present a new dataset for Portuguese, which
see Schabas (2000) for a detailed analysis. The
has not been in focus so far. The dataset is
composed of 5,668 tweets. For its annota- detection of online hate speech is thus a press-
tion, we defined two different schemes used ing problem that calls for solutions. Over the
by annotators with different levels of exper- last decade, a considerable number of supervised
tise. First, non-experts annotated the tweets machine learning-based works tackled the prob-
with binary labels (‘hate’ vs. ‘no-hate’). Then, lem. Most of them focused on English (Waseem
expert annotators classified the tweets follow- and Hovy, 2016; Davidson et al., 2017; Nobata
ing a fine-grained hierarchical multiple label et al., 2016; Jigsaw, 2018), see also the overview
scheme with 81 hate speech categories in to-
by Schmidt and Wiegand (2017). As a result, also
tal. The inter-annotator agreement varied from
category to category, which reflects the insight many more annotated datasets, which are the pre-
that some types of hate speech are more sub- condition for the use of supervised machine learn-
tle than others and that their detection depends ing, are available for English (e.g., Waseem and
on personal perception. The hierarchical an- Hovy (2016); Davidson et al. (2017); Nobata et al.
notation scheme is the main contribution of the (2016); Jigsaw (2018)) than for other languages.
presented work, as it facilitates the identifica- However, hate speech is not a phenomenon that
tion of different types of hate speech and their
is observed only in English discourse; it is notori-
intersections. To demonstrate the usefulness
of our dataset, we carried a baseline classifica- ous in online media in other languages as well; cf.,
tion experiment with pre-trained word embed- e.g., Spanish (Fersini et al., 2018), Italian (Poletto
dings and LSTM on the binary classified data, et al., 2017; Sanguinetti et al., 2018), or German
with a state-of-the-art outcome. (Ross et al., 2016).
In this work, we aim to contribute to the field
1 Introduction
of hate speech detection. Our contribution is
The Internet is the source of an immense variety twofold: (i) diversification of the research on
of knowledge repositories (Wikipedia, Wordnet, hate speech by provision of a new dataset of
etc.) and applications (YouTube, Reddit, Twit- hate speech in another language than English,
ter, etc.) that everybody can access and take ad- namely Portuguese; (ii) introduction of a novel
vantage of; it is also the communication forum fine-grained hate speech typology that improves
of our time and the most important instrument to on the common state-of-the-art used typologies,
ensure freedom of speech. It allows us to freely which tend to disregard the existence of subtypes
state and disseminate our view on any private or of hate speech and either consider hate speech
public matter to vast audiences. But unfortunately recognition as a binary classification task, or take
it also opens the door to manipulation of masses into account only a few classes, such as ‘racism’

94
Proceedings of the Third Workshop on Abusive Language Online, pages 94–104
Florence, Italy, August 1, 2019. c 2019 Association for Computational Linguistics
and ‘sexism’ (Waseem and Hovy, 2016) – despite their taxonomy, Salminen et al. followed an iter-
the fact that such broad distinctions unduly over- ative and qualitative procedure called “open cod-
generalize. For instance, by classifying discrimi- ing” (Glaser and Strauss, 2017).
nation against both black people and refugees sim- There are obvious similarities between Salmi-
ply as ‘racism’, we ignore that in this case, dif- nen et al.’s approach and ours. However, there
ferent characteristics with a different motivation are also some significant differences. The first dif-
are targeted (also reflected in a different language ference concerns the underlying definition of hate.
style). In particular, we compile and annotate While they use the very generic definition “hateful
a new dataset composed of 5,668 tweets in Por- comments toward a specific group or target”, the
tuguese, which is one of the most commonly-used definition we adopt is more specific (cf. above).
languages online (Fox, 2013). Two types of an- This leads to differences in the taxonomy. For
notations are carried out. For the first, non-expert instance, they introduce ‘hate against media’ and
annotators classify the messages in a binary fash- ‘hate against religion’, which is hate against ab-
ion (‘hate’ vs. ‘no-hate’). For the second, we stract entities and not considered by us. Addition-
build a multilabel hate speech hierarchical anno- ally, they merge in the same hate speech taxonomy
tation schema with 81 hate categories in total1 . To the targets of hate and the type of discourse. In our
demonstrate the usefulness of our dataset, we car- case, we focus on the targets of hate speech only.
ried a baseline classification experiment with pre-
trained word embeddings and LSTM on the binary 2.2 Dataset Annotation
classified data, with a state-of-the-art outcome. Several hate speech datasets are publicly avail-
The remainder of the paper is structured as fol- able, e.g., for English (Waseem and Hovy, 2016;
lows: Section 2 reviews the literature. Section 3 Davidson et al., 2017; Nobata et al., 2016; Jigsaw,
describes our crawling procedure. In Section 4, we 2018), Spanish (Fersini et al., 2018), Italian (Po-
present the two annotation schemas we work with: letto et al., 2017; Sanguinetti et al., 2018), German
the binary and the hierarchical schema. Section 5 (Ross et al., 2016), Hindi (Kumar et al., 2018),
discusses a baseline hate speech experiment that and Portuguese (de Pelle and Moreira, 2017). In
we carried out to validate our new dataset. Sec- this section, we analyze the data collection strat-
tion 6 presents some ethical considerations of this egy, the annotation method and the dataset prop-
work. In Section 7, finally, the conclusions of our erties of three representative hate speech datasets:
work are presented. the Hate speech, Racism and Sexism dataset by
Waseem and Hovy (2016), the Offensive Lan-
2 Related Work guage Dataset by Davidson et al. (2017), and the
2.1 Hate Speech Concepts Portuguese News Comments dataset by de Pelle
and Moreira (2017). We have chosen the first two
Fortuna and Nunes (2018) analyze and compare because they are the most widely used datasets for
several aggression-related concepts. As a result of English hate speech automatic classification. They
their analysis, they present the following definition show how Twitter can be used to retrieve infor-
of hate speech: mation and how to conduct the manual classifica-
“Hate speech is language that attacks or dimin- tion relying on both expert and non-expert annota-
ishes, that incites violence or hate against groups, tors. The third is another annotated and published
based on specific characteristics such as physi-
cal appearance, religion, descent, national or eth- dataset for Portuguese, which is rather different
nic origin, sexual orientation, gender identity or from ours.
other, and it can occur with different linguistic
styles, even in subtle forms or when humour is Hate speech, Racism and Sexism Dataset.
used.”
This dataset2 (Waseem and Hovy, 2016) contains
We adopted this definition in our work. Our 16,914 tweets in English, which were classified by
work has also been inspired by the taxonomy pro- two annotators using the classes “Racism”, “Sex-
vided by Salminen et al. (2018), which includes 29 ism” and “Neither”. Regarding the tweet collec-
hate categories characterized in terms of hateful tion, an initial manual search was conducted on
language, target, and sub-target types. To create Twitter to collect common slurs and terms related
1
to religious, sexual, gender, and ethnic minorities.
https://github.com/paulafortuna/Port
2
uguese-Hate-Speech-Dataset https://github.com/ZeerakW/hatespeech

95
The authors identified frequently occurring terms agreement, the value was 0.71.
in tweets that contain hate speech and used those In comparison to this work, the dataset that we
terms to retrieve more messages. The messages have compiled provides more data and is not re-
were then annotated by the main researcher, to- stricted to specific topics. Additionally, our anno-
gether with a gender studies student; in total, 3,383 tation focuses only on hate speech, instead of gen-
tweets as sexist, 1,972 as racist, and 11,559 as nei- eral offensive content. We also use and provide a
ther sexist nor racist. The inter-annotator agree- complete labeling schema.
ment had a Cohen’s Kappa of 0.84. The authors of Compared to the previous two datasets, our sec-
the study concluded that the use of n-grams pro- ond annotation schema is considerably more fine-
vides good results in the task of automatic hate grained. As we will see below, our annotation pro-
speech detection, and adding demographic infor- cedure with the fine-grained schema is similar to
mation leads to little improvement in the perfor- that of Waseem and Hovy (2016).
mance of the classification model.
2.3 Classification methods
Offensive Language Dataset. Davidson et al. Different studies conclude that deep learning ap-
(2017) annotated a dataset3 with 14,510 tweets in proaches outperform classical machine learning
English, using the classes “Hate”, “Offensive” and algorithms in the task of hate speech detection;
“Neither”. Regarding the collection of the mes- see, e.g., Mehdad and Tetreault (2016); Park and
sages, they started with an English hate speech Fung (2017); Del Vigna et al. (2017); Pitsilis et al.
lexicon compiled by Hatebase.org, searching for (2018); Founta et al. (2018); Gambäck and Sikdar
tweets that contained terms from this lexicon. The (2017). For instance, Badjatiya et al. (2017) com-
outcome was a collection of tweets written by pare the use of different types of neural networks
33,458 Twitter users. The collected tweets were (CNN, LSTM) and deep learning libraries such as
completed by further follow-up tweets of these FastText with the use of classical machine learn-
users, which resulted in a corpus of 85.4 million ing techniques and experiment with different types
tweets. Finally, from this corpus, a random sample of word embeddings. The setup that achieved
of 25,000 tweets containing terms from the lex- the best performance consists of the combination
icon has been extracted and manually annotated of deep techniques with standard ML classifiers,
by CrowdFlower workers. Three or more work- and more precisely, of embeddings learned by an
ers from CrowdFlower annotated each message. LSTM model, combined with gradient boosted de-
The majority voting was used to assign a label to cision trees. We will follow a similar methodology
each tweet. Tweets that did not have a majority for classification.
class were discarded. This resulted in a sample of
24,802 labeled tweets. The inter-annotator agree- 3 Message Collection
ment score provided by CrowdFlower was 92%.
Our overall approach to message collection is out-
However, a total percentage of only 5% of tweets
lined in Figure 1. In what follows, we introduce in
were labeled as hate speech by the majority of the
detail the individual steps.
workers.
Use of Keywords and Profiles. We used Twit-
Portuguese News Comments Dataset. de Pelle ter’s search API for keywords and profiles because
and Moreira (2017) collected a dataset4 with 1,250 both can be complementary as message sources.
random comments from the Globo news site on With the first, we access a wider range of tweets
politics and sports news. Each comment was from different profiles, but we restrict the search
annotated by three annotators, who were asked to specific words or expressions that indicate hate.
to indicate whether it contained ‘racism’, ‘sex- With the second, we obtain more spontaneous
ism’, ‘homophobia’, ‘xenophobia’, ‘religious in- discourse, but from a more restricted number of
tolerance’, or ‘cursing’. ‘Cursing’ was the most users:
frequent label, while the other labels had few in-
stances in the corpus. Regarding the annotator • Hate-related keywords: We used Twitter’s
API search feature to look for keywords and
3
https://github.com/t-davidson/hate-s hashtags related to hate speech, such as fufas,
peech-and-offensive-language
4
https://github.com/rogersdepelle/Off sapatão ‘dyke’ or #LugarDeMulherENaCoz-
ComBR inha ‘#womensPlaceIsInTheKitchen’.

96
Pages and
keywods Crawling in Twitter Tweets Filtering Tweets Sampling
enumeration

Keywords related
Hate profiles
with hate speech

Figure 1: Method for message collection.

• Hate-related profiles: Using the profile 3,000). We decided then to use a maximum of 200
search API, we query with words like ódio tweets per search instance in order to keep a more
‘hate’, discurso de ódio ‘hate speech’ and diverse source of tweets.
ofensivo ‘offensive’ in order to find accounts
Final Dataset. Our final dataset contains 5,668
that post hateful messages. In Portuguese,
tweets, containing content from 1,156 different
there are social media users whose profile
users. The majority of the tweets (more than 95%)
is built specifically for sharing hateful con-
are from January, February, and March of 2017.
tent against certain minorities. We collect the
messages from those accounts with the ex- 4 Annotation of Hate Speech
pectation to find hate speech messages. This
search also allowed us to find counter hate In what follows, we present the annotation proce-
profiles. Those also use the same words in dures for binary hate speech and hierarchical hate
their description. It seemed adequate to keep speech annotation.
these profiles because they reproduce hate
4.1 Binary annotation
speech messages from other users.
Three annotators classified every message. 18 Por-
We looked at 29 specific profiles and used 19 key- tuguese native speakers (Information Science stu-
words and ten hashtags in a total for 58 search dent volunteers) were given annotation guidelines
instances.5 The goal has been to be exhaustive to perform the task (cf. Appendix A.1). All of
and cover different types of discrimination, based them received an equivalent number of messages.
on religion, gender, sexual orientation, ethnicity, The annotation was binary and the annotators had
and migration. We compiled this collection of to label each message as ‘hate speech’ or ‘not hate
search instances because there was no specific speech’.
hate speech lexicon available for Portuguese, e.g., To check the agreement between the three clas-
Hatebase contains generic hate (Hatebase, 2019). sifications of every message, we used Fleiss’s
Kappa (Fleiss, 1971). We observed a low agree-
Crawling. We used R to crawl content with re- ment with a value of K = 0.17. We think that this
spect to both keywords and profiles content on the low value is the result of relying exclusively on
8th and 9th of March of 2017. A total of 42,930 non-expert annotators for classifying hate speech.
messages were collected. For instance, in Waseem and Hovy (2016), the two
Tweet Filtering. We kept tweets categorized annotators were the author of the study plus a gen-
by Twitter as written in Portuguese. We elimi- der studies student. On the other hand, the two
nated repetitions and retweets from already col- other studies mentioned in Section 2 (de Pelle and
lected messages to avoid duplication and removed Moreira, 2017; Davidson et al., 2017), are more
HTML tags and messages with less than three generic in that they do not focus exclusively on
words. hate speech (as we do), but rather consider offen-
sive speech in general, which includes insults that
Tweet Sampling. The procedure previously de- are more explicit and easier to recognize, while
scribed resulted in 33,890 tweets. We noticed that hate speech is subtler and more difficult to iden-
the search instances returned several tweets from tify.
different magnitudes (e.g., some profiles had only For our final annotation, we applied the ma-
around 30 messages while others had more than jority vote, which resulted in a dataset in which
5
We use the term “search instance” to refer to profiles, 31.5% of the messages are annotated as ‘hate
keywords or hashtags used for the Twitter search. speech’.

97
4.2 Hierarchical annotation following properties:
When studying hate speech, it is possible to dis- • The ‘hate speech’ class corresponds to the
tinguish between different categories of it, like root of the graph.
‘racism’, ‘sexism’, or ‘homophobia’. A more fine-
grained view can be useful in hate speech classi- • If hate speech can be divided into several
fication because each category has a specific vo- types of hate, several nodes descend from the
cabulary and ways to be expressed, such that cre- root node. This gives rise to the second level
ating a language model for each category may be of classes (Table 1) according to the targets
helpful to improve the automatic detection of hate of the hate (e.g., ‘racism’, ‘homophobia’, and
speech (Warner and Hirschberg, 2012). ‘sexism’).
Another phenomenon we can observe when an- • This second level of nodes can also be di-
alyzing different categories of hate speech is their vided into subgroups of targets. For instance,
intersectionality. This concept appeared as an an- racist messages can be targeted against black
swer to the historical exclusion of black women people, Chinese people, Latinos, etc.
from early women’s rights movements often con-
cerned with the struggles of white women alone. • The division of classes can continue until we
Intersectionality brings attention to the experi- do not find more distinct groups, resulting in
ences of people who are subjected to multiple a terminal node.
forms of discrimination within a society (e.g., be-
ing woman and black) (Collins, 2015). Waseem • The lower nodes of the graph inherit the
(2016) introduce a hate speech labeling scheme classes from the upper nodes, up to the root.
that follows an intersectional approach. In ad- • The lower nodes of the graph can have one
dition to ‘racism’, ‘sexism’, and ‘neither’, they or more parents. In the second case, this
use the label “both” arguing that the intersection gives rise to a class that intersects the parent
of multiple oppression categories can differ from classes.
the forms of oppression it consists of (Crenshaw,
2018). • Instances are classified according to a multi-
To better take into account different hate speech label approach and can belong to classes as-
categories from an intersectional perspective, we signed to both terminal and/or non-terminal
approach the definition of the hate speech annota- nodes.
tion schema in terms of a hierarchical structure of
classes. Class Definition
Hate speech based on gender. Includes
Sexism
hate speech against woman.
4.2.1 Hate speech and hierarchical Hate speech based on body, such as fat,
classification Body
thin, tall or short people.
Origin Hate speech based on the place of origin.
In hierarchical classification, there is a structure Homophobia Hate speech based on sexual orientation.
defining the hierarchy between the categories of Racism Hate speech based on ethnicity.
the problem (Dumais and Chen, 2000). This is Hate speech based on a person’s ideas,
Ideology
such as feminist or left wing ideology.
opposed to flat classification, where categories are Religion Hate speech based on religion.
treated in isolation. Several structures can be used Hate speech based on health conditions,
Health
to represent a hierarchy of classes. One of them is such as against disabled people.
Hate speech based on life habits, such as
a Rooted Directed Acyclic Graph (rooted DAG), Other-Lifestyle
vegetarianism.
where each class corresponds to a node and can
have more than one parent. Another property of Table 1: Direct subtypes of the ‘hate speech’ type.
this graph is that documents can be assigned to
terminal categories and to non-terminal node cat- This annotation schema has several advantages
egories alike (Hao et al., 2007). In the specific compared to standard binary or disjoint flat clas-
case of hate speech classification, we propose to sification. Firstly, it models in a better way the
use a rooted DAG in order to be able to cover hate relationships between different subtypes of hate
speech subtypes and their intersections, as exem- speech. Additionally, it preserves rare classes,
plified in Figure 2. The graph of classes has the while signaling them as part of more generic

98
observed K = 0.72. We also consider the agree-
ment of the annotators by type of hate speech.
We ranked the classes by the best agreement and
removed the classes with only one instance for
any of the annotators. We found diverse values
in the different categories (Table 2), which points
out that some specific types of hate speech can be
more difficult to classify than others.

Classes K Annotator 1 Annotator 2


Lesbians 0.879 59 53
Health 0.856 3 4
Figure 2: Part of the rooted directed acyclic graph used Homofobia 0.823 69 61
for hate speech classification. Disabled people 0.799 2 3
Refugees 0.763 13 13
Migrants 0.751 15 14
Sexism 0.669 134 104
classes. For instance, with this classification, we Trans women 0.662 6 9
Men 0.657 12 15
can use a message to build a model for predicting Women 0.642 109 75
sexism even if the message was cataloged as ‘hate Fat women 0.637 30 16
against fat women’. Finally, with this approach, Body 0.637 32 17
Fat people 0.637 32 17
it is possible to study each subtype of hate speech Ideology 0.609 14 15
individually or in relation to others, depending on Feminists 0.581 13 14
the goal of the study. Hate speech 0.569 245 213
Racism 0.501 18 13
In the next subsection, we outline the hierar- Religion 0.493 5 11
chical annotation procedure conducted with the Black people 0.435 11 7
dataset described in Section 3, which comple- Origin 0.329 3 3
Islamists 0.329 2 10
ments the non-expert annotation. Gays 0.300 4 9
Ugly women 0.276 24 4
4.2.2 Building the hierarchy of hate speech
Similarly to Salminen et al. (2018), we use for Table 2: Annotator agreement by class, with the num-
ber of messages annotated by each annotator.
the annotation a data-driven approach based on an
open coding methodology. This means that we it-
eratively protocol the different classes as they ap-
pear in the dataset while we read and classify the 4.3 Hierarchical dataset
data. The classification hierarchy is then built by
creating and reorganizing categories until all avail- After the annotation phase, we obtain a multi-
able data was analyzed. For this annotation, we labeled dataset with 22% of hate speech instances.
applied an intersectional approach by enumerating The resulting hierarchy, the node depth (ND) and
all the possible groups cited in our dataset, no mat- class frequencies (Freq) are presented in Table 3.
ter their frequency (e.g., ‘feminist men’ appears As expected, the classes corresponding to nodes
only once). with a higher depth tend to have a smaller fre-
Based on all instances of the dataset, the hierar- quency. Note that our schema also identifies cate-
chy of classes was built by one researcher working gories that are less commonly mentioned in hate
in the area of automatic detection of hate speech, speech classification experiments, among them,
with training in social psychology. Then, the same e.g., ‘fat people’, ‘fat women’, ‘ugly people’,
researcher classified all the dataset messages using ‘ugly women’, ‘men’, ‘feminists’, ‘people with
the hierarchical class structure. left-wing ideology’. Some of them (such as, e.g.,
‘men’) may look neutral at the first glance, but,
4.2.3 Agreement between annotators in reality, they group messages whose vocabulary
For verifying the validity of this annotation proce- and language style reflect negative expectations to-
dure, a second annotator classified 500 messages. wards the corresponding collective (in the case of
Then, we used Cohens Kappa (Gamer et al., 2012) men those expectations reflect toxic masculinity
for checking the agreement between both. We norms).

99
Class ND Parent nodes Freq Class ND Parent nodes Freq
Hate speech 0 - 1228 Ageing 1 Hate speech 4
Sexism 1 Hate speech 672 Angolans 3 Africans 4
Women 2 Sexism 544 Nordestines 3 Rural people, Brazilians 4
Homophobia 1 Hate speech 322 Chinese 3 Asians 3
Homossexuals 2 Homophobia 288 Homeless 2 Other/Lifestyle 3
Lesbians 3 Homossexuals, Woman 248 Arabic 2 Origin 2
Body 1 Hate speech 164 Bissexuals 2 Homophobia 2
Fat people 2 Body 160 Blond women 2 Women, Body 2
Fat women 3 Women, Fat people 153 East europeans 2 Origin 2
Ugly people 2 Body 131 Jews 2 Religion 2
Ugly women 3 Women, Ugly people 130 Jornalists 2 Other/Lifestyle 2
Racism 1 Hate speech 94 Old people 2 Ageing 2
Ideology 1 Hate speech 92 Thin people 2 Body 2
Migrants 1 Hate speech 82 Thin women 3 Women, Thin people 2
Men 2 Sexism 70 Vegetarians 2 Other/Lifestyle 2
Refugees 2 Migrants 70 White people 2 Racism 2
Feminists 2 Ideology, Sexism 65 Young people 2 Ageing 2
Gays 3 Homossexuals 56 Agnostic 2 Ideology 1
Black people 2 Racism 52 Argentines 3 Latins 1
Religion 1 Hate speech 30 Autists 2 Health 1
Left wing ideology 2 Ideology 26 Brazilian women 3 Women, South Americans 1
Origin 1 Hate speech 26 Egyptians 3 Arabic 1
Trans women 3 Women, Transexuals 26 Football players women 2 Women, Other/Lifestyle 1
OtherLifestyle 1 Hate speech 20 Gamers 2 Other/Lifestyle 1
Islamists 2 Religion 17 Homeless women 3 Women, Homeless 1
Immigrants 2 Migrants 15 Indigenous 2 Racism 1
Transexuals 2 Sexism 14 Iranians 3 Arabic 1
Muslims 2 Religion 11 Japaneses 3 Asians 1
Black Women 3 Women, Black people 8 Men Feminists 3 Feminists, Men 1
Criminals 2 Other/Lifestyle 8 Mexicans 3 Latins 1
Latins 2 Racism, Origin 7 Muslim women 3 Muslims, Women 1
Health 1 Hate speech 6 Old women 3 Women, Old people 1
Rural people 2 Origin 6 Polyamorous 2 Other/Lifestyle 1
Travestis 3 Women 6 Poor people 2 Other/Lifestyle 1
Aborting women 3 Women 5 Russians 3 East europeans 1
Asians 2 Racism, Origin 5 Sertanejos 3 Rural people, Brazilians 1
Brazilians 3 South Americans 5 Street artists 2 Other/Lifestyle 1
Disabled people 2 Health 5 Ucranians 3 East europeans 1
South Americans 2 Origin 5 Venezuelans 3 Latins 1
Africans 2 Origin 4

Table 3: Hate subclasses (Class) and respective parent categories (Parent nodes) sorted by frequency (Freq). Infor-
mation of the node depth is also provided (ND).

5 Binary classification experiment Text pre-processing As far as text pre-


processing is concerned, we remove stop words
In order to obtain a first indicator of the usefulness using Gensim, and punctuation using the default
of our dataset, we carry out a preliminary binary string library and transform all tokens in the
classification experiment. tweets to lower case.

5.1 Methodology Feature extraction: Regarding the features


in our experiment, we use pre-trained Glove
To perform the experiment, we use 10-fold cross- word embeddings with 300 dimensions for Por-
validation (Chollet, 2017), combined with holdout tuguese (Hartmann et al., 2017). Methods pro-
validation, in which one part of the data is used for vided by Keras are then used to map each token
cross-validation and parameter tuning with grid in the input to an embedding.
search and the other part of unseen data is then
used for testing. Classification: For classification, we use a deep
As already Badjatiya et al. (2017), we pro- learning model, namely LSTMs, in an architecture
vide our source code 6 . We use Python 3.6, as already proposed by Badjatiya et al. (2017).
Keras (Chollet et al., 2015), Gensim (Řehůřek and The architecture contains an embedding Layer
Sojka, 2010) and Scikit-learn (Pedregosa et al., with the weights from the word embeddings ex-
2011) as main libraries. The following subsections traction procedure, an additional LSTM layer with
describe how we implement each step performed 50 dimensions, and dropouts at the end of both
by our system. layers. As loss function, we used binary cross-
entropy and for optimization Adam, 10 epochs and
6
https://github.com/paulafortuna/SemE 128 for batch size. With this model, we classify
val_2019_public data into binary classes, and we save the last layer

100
before the classification to extract 50 dimensions Hate speech dataset (PT)
CV f1-score 0.78
as input to the xgBoost algorithm,7 which is a gra- training data (N) 5099
dient boosting implementation from the Python li- test set f1-score 0.72
brary (Chen and Guestrin, 2016). testing data (N) 567
For xgBoost, the default parameter setting has Table 4: Results of Portuguese hate speech classifica-
been used, except for ‘eta’ and ‘gamma’. In this tion with the new dataset presented in this paper for bi-
case, we conducted a grid search combining sev- nary classification. We provide the micro-averaged F1
eral values of both (eta: 0, 0.3, 1; and gamma: 0.1, scores and also the number of instances used in each of
1, 10) in order to obtain the optimal eta and gamma the datasets (N).
settings. Figure 3 shows a graphical representation
of our model.
anonymized by omitting the tweet id. As a con-
sequence, it is possible to reach the original tweet
Raw data and user only by a search for the exact text of the
tweet. To also prevent this, we make our dataset
available in GitHub only for research purposes
Pre­trained
Word under the condition that no such a search is per-
Embeddings
formed. A disclaimer is attached, stating that any
attempt to violate the privacy of Twitter users is
against the established usage conditions, and that
LSTM
last layer the authors of this paper cannot be made liable for
this violation.
As far as the quality of the data collection is
xgBoost
concerned, sampling bias may have been intro-
duced. Firstly, because Twitter API was used and
this provides only a subset of the all posted data in
Figure 3: Classification method used as baseline for the platform. Secondly, we use a set of keywords
binary hate speech classification with the Portuguese and crawl profiles based on our decision criteria,
dataset.
as explained in Section 3. However, we do not
aim to have a representative sample of online hate
speech on Twitter. We consider that for building a
5.2 Results dataset with examples of hate speech, our method
In this section, we present the results of our is adequate, and that we could find diverse hate
classification experiment for classification of hate speech instances belonging to 80 different classes.
speech in Portuguese. Table 4 shows the base-
line results of the LSTM-based model on our new 7 Conclusions and Future Work
dataset. We provide the cross validation and test
In this work, we built a Portuguese dataset for re-
set F1 scores and also the number of instances we
search in hate speech detection.
used in each of these (N). The results show a state-
To gather our data, we crawled Twitter for mes-
of-the-art outcome. We can thus assume that even
sages and manually annotated them using guide-
if annotated merely in terms of basic binary (‘hate’
lines. Firstly, we developed a method for binary
vs. ‘not hate speech’) labels, our dataset already
classification using the classification of three an-
constitutes a valid hate speech resource.
notators per message as ground truth. With this
6 Ethical considerations dataset, we conducted a baseline classification ex-
periment using pre-trained word embeddings and
Regarding the ethical aspects of this study, we LSTM, achieving very competitive performance.
took into consideration the privacy of the authors Furthermore, we provided a hate speech hier-
of the collected messages. However, we acknowl- archical labeling schema that integrates the com-
edge the limitations of our sampling procedure plexity of hate speech subtypes and their intersec-
when studying online hate speech. The data was tions. This allowed us to find out that distinct types
7
We also experimented with higher dimensionality, but of hate speech present different agreement lev-
this did not improve the performance of the classifier. els between annotators. Therefore, future guide-

101
lines for annotation may benefit from specifying Thomas Davidson, Dana Warmsley, Michael Macy,
the particularities of the different subtypes of hate and Ingmar Weber. 2017. Automated Hate Speech
Detection and the Problem of Offensive Language.
speech.
In Proceedings of ICWSM.
As far as future work is concerned, in the con-
text of the annotation procedure, the agreement Fabio Del Vigna, Andrea Cimino, Felice Dell?Orletta,
between annotators can still be improved. We Marinella Petrocchi, and Maurizio Tesconi. 2017.
Hate me, hate me not: Hate speech detection on
think that the subjectivity of the task makes the facebook. In Proceedings of the First Italian Con-
learning process challenging and more specific ference on Cybersecurity, pages 86–95.
training is necessary for the annotators. Addition-
ally, based on our experiment, we suggest that fu- Susan Dumais and Hao Chen. 2000. Hierarchical clas-
sification of web content. In Proceedings of the 23rd
ture data collection procedures should assure sam- annual international ACM SIGIR conference on Re-
pling of different subtypes of hate to improve the search and development in information retrieval,
identification of less common subtypes. pages 256–263. ACM.
Finally, in future explorations of this dataset,
Elisabetta Fersini, Paolo Rosso, and Maria Anzovino.
we will experiment with multilabel classification 2018. Overview of the task on automatic misogyny
of hate speech to identify not only whether a mes- identification at ibereval 2018.
sage contains hate, but also the targeted groups.
Joseph L Fleiss. 1971. Measuring nominal scale agree-
Acknowledgments ment among many raters. Psychological bulletin,
76(5):378.
This work was partially funded by the Google
Paula Fortuna and Sérgio Nunes. 2018. A survey on
DNI project Stop PropagHate. Soler-Company
automatic detection of hate speech in text. ACM
and Wanner have been supported by the European Computing Surveys (CSUR), 51(4):85.
Commission under the contract numbers H2020–
7000024-RIA and H2020-786731-RIA. We would Antigoni-Maria Founta, Despoina Chatzakou, Nico-
las Kourtellis, Jeremy Blackburn, Athena Vakali,
like to thank the anonymous reviewers for their in-
and Ilias Leontiadis. 2018. A unified deep learn-
sightful comments and to the annotators for their ing architecture for abuse detection. arXiv preprint
contribution to this work. arXiv:1802.00385.

Zoe Fox. 2013. Top 10 most popular lan-


References guages on twitter. Available in http:
//mashable.com/2013/12/17/twitte
Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, r-popular-languages/, accessed last time in
and Vasudeva Varma. 2017. Deep learning for hate May 2017.
speech detection in tweets. In Proceedings of the
26th International Conference on World Wide Web Björn Gambäck and Utpal Kumar Sikdar. 2017. Using
Companion, pages 759–760. International World Convolutional Neural Networks to Classify Hate-
Wide Web Conferences Steering Committee. speech. In Proceedings of the First Workshop on
Abusive Language Online, pages 85–90.
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A
scalable tree boosting system. In Proceedings of the Matthias Gamer, Jim Lemon, Maintainer Matthias
22nd ACM SIGKDD International Conference on Gamer, A Robinson, and W Kendall’s. 2012. Pack-
Knowledge Discovery and Data Mining, KDD ’16, age ?irr? Various coefficients of interrater reliability
pages 785–794, New York, NY, USA. ACM. and agreement.
François Chollet et al. 2015. Keras. https://kera Barney G Glaser and Anselm L Strauss. 2017. Dis-
s.io, accessed last time in February 2019. covery of grounded theory: Strategies for qualitative
Francois Chollet. 2017. Deep learning with python. research. Routledge.
Manning Publications Co.
Pei-Yi Hao, Jung-Hsien Chiang, and Yi-Kun Tu. 2007.
Patricia Hill Collins. 2015. Intersectionality’s defini- Hierarchically svm classification based on support
tional dilemmas. Annual Review of Sociology, 41:1– vector clustering method and its application to doc-
20. ument categorization. Expert Systems with applica-
tions, 33(3):627–635.
Kimberle Crenshaw. 2018. Demarginalizing the inter-
section of race and sex: A black feminist critique Nathan Hartmann, Erick Fonseca, Christopher Shulby,
of antidiscrimination doctrine, feminist theory, and Marcos Treviso, Jéssica Silva, and Sandra Aluı́sio.
antiracist politics [1989]. In Feminist legal theory, 2017. Portuguese word embeddings: Evaluating
pages 57–80. Routledge. on word analogies and natural language tasks. In

102
Proceedings of the 11th Brazilian Symposium in In- Radim Řehůřek and Petr Sojka. 2010. Software Frame-
formation and Human Language Technology, pages work for Topic Modelling with Large Corpora. In
122–131, Uberlândia, Brazil. Sociedade Brasileira Proceedings of the LREC 2010 Workshop on New
de Computação. Challenges for NLP Frameworks, pages 45–50, Val-
letta, Malta. ELRA. http://is.muni.cz/pub
Hatebase. 2019. Hatebase. Available in https: lication/884893/en.
//www.hatebase.org/, accessed last time in
February 2019. Björn Ross, Michael Rist, Guillermo Carbonell, Ben
Cabrera, Nils Kurowsky, and Michael Wojatzki.
Jigsaw. 2018. Toxic comment classifica- 2016. Measuring the Reliability of Hate Speech An-
tion challenge identify and classify toxic notations: The Case of the European Refugee Cri-
online comments. Available in https: sis. In Proceedings of NLP4CMC III: 3rd Workshop
//www.kaggle.com/c/jigsaw-toxic on Natural Language Processing for Computer-
-comment-classification-challenge, Mediated Communication, pages 6–9.
accessed last time in 23 May 2018.
Joni Salminen, Hind Almerekhi, Milica Milenković,
Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, and Soon-gyo Jung, Jisun An, Haewoon Kwak, and
Marcos Zampieri. 2018. Benchmarking Aggression Bernard J Jansen. 2018. Anatomy of online hate:
Identification in Social Media. In Proceedings of the developing a taxonomy and machine learning mod-
First Workshop on Trolling, Aggression and Cyber- els for identifying and classifying hate in online
bulling (TRAC), Santa Fe, USA. news media. In Twelfth International AAAI Confer-
ence on Web and Social Media.
Yashar Mehdad and Joel Tetreault. 2016. Do charac-
Manuela Sanguinetti, Fabio Poletto, Cristina Bosco,
ters abuse more than words? In Proceedings of the
Viviana Patti, and Marco Stranisci. 2018. An italian
SIGdial 2016 Conference: The 17th Annual Meet-
Twitter corpus of hate speech against immigrants. In
ing of the Special Interest Group on Discourse and
Proceedings of LREC.
Dialogue, pages 299–303.
William A Schabas. 2000. Hate speech in rwanda: The
Chikashi Nobata, Joel Tetreault, Achint Thomas, road to genocide. McGill Law Journal, 46:141.
Yashar Mehdad, and Yi Chang. 2016. Abusive
language detection in online user content. In Anna Schmidt and Michael Wiegand. 2017. A survey
Proceedings of the 25th International Conference on hate speech detection using natural language pro-
on World Wide Web, pages 145–153. International cessing. SocialNLP 2017, page 1.
World Wide Web Conferences Steering Committee.
William Warner and Julia Hirschberg. 2012. Detecting
Ji Ho Park and Pascale Fung. 2017. One-step and Two- hate speech on the world wide web. In Proceed-
step Classification for Abusive Language Detection ings of the Second Workshop on Language in Social
on Twitter. In Proceedings of the First Workshop on Media, pages 19–26. Association for Computational
Abusive Language Online. Linguistics.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, Zeerak Waseem. 2016. Are you a racist or am i see-
B. Thirion, O. Grisel, M. Blondel, P. Pretten- ing things? annotator influence on hate speech de-
hofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- tection on Twitter. In Proceedings of the 1st Work-
sos, D. Cournapeau, M. Brucher, M. Perrot, and shop on Natural Language Processing and Compu-
E. Duchesnay. 2011. Scikit-learn: Machine learning tational Social Science, pages 138–142.
in Python. Journal of Machine Learning Research,
12:2825–2830. Zeerak Waseem and Dirk Hovy. 2016. Hateful sym-
bols or hateful people? predictive features for hate
Rogers Prates de Pelle and Viviane P Moreira. 2017. speech detection on twitter. In Proceedings of
Offensive comments in the brazilian web: a dataset NAACL-HLT, pages 88–93.
and baseline results. In 6o Brazilian Workshop
on Social Network Analysis and Mining (BraSNAM .
2017), volume 6. SBC.

Georgios K Pitsilis, Heri Ramampiaro, and Helge


Langseth. 2018. Detecting offensive language
in tweets using deep learning. arXiv preprint
arXiv:1801.04433.

Fabio Poletto, Marco Stranisci, Manuela Sanguinetti,


Viviana Patti, and Cristina Bosco. 2017. Hate
speech annotation: Analysis of an italian Twitter
corpus. In Ceur Workshop Proceedings, volume
2006, pages 1–6. CEUR-WS.

103
A Appendices
A.1 Non-expert annotators guidelines
translated to English
Analyse the tweets from the first set and evaluate
if according* to your opinion, these tweets contain
hate speech.
For every tweet, mark manually with 1 or 0 if
you think the tweet contains or not hate, respec-
tively, accordingly with Table 5.
Tweet HS A
Black people should go back to their land!! 1 A
Meat and black beans are delicious! 0 A
Muslim people are terrorists! 1 A

Table 5: Hate speech (HS) annotation examples with


respective annotator (A) in English.

104

You might also like