0% found this document useful (0 votes)
50 views13 pages

Toxic Language Detection in Emails-3

Uploaded by

Corey Williams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views13 pages

Toxic Language Detection in Emails-3

Uploaded by

Corey Williams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Say ‘YES’ to Positivity: Detecting Toxic Language in Workplace

Communications
Meghana Moorthy Bhat ∗† Saghar Hosseini‡ Ahmed Hassan Awadallah‡
Paul N. Bennett‡ Weisheng Li§

The Ohio State University ‡ Microsoft Research § Microsoft
[email protected], {sahoss, hassanam, pauben, weishli}@microsoft.com

Abstract
Warning: this paper contains content that
may be offensive or upsetting.

Workplace communication (e.g. email, chat,


etc.) is a central part of enterprise productivity.
Healthy conversations are crucial for creating
an inclusive environment and maintaining har-
mony in an organization. Toxic communica-
tions at workplace can negatively impact over- Figure 1: An example of workplace communication.
all job satisfaction and are often subtle, hid- The highlighted sentence was annotated as toxic and
den or demonstrate human biases. The linguis- gossip by annotators. This instance has a confidence
tic subtlety of mild yet hurtful conversations score of 0.15 on Perspective API1
has made it difficult for researchers to quan-
tify and extract toxic conversations automati-
cally. While offensive language or hate speech employees and low morale.Duffy et al. (2002) and
has been extensively studied in social commu- Kong (2018) find that workplace incivility leads to
nities, there has been little work studying toxic
workplace communications. Specifically, the
social undermining of employees which could lead
lack of corpus, sparsity of toxicity in enter- to trust issues, difficulty in establishing cooperative
prise emails and a well-defined criteria for an- relationship, lower job satisfaction and attitudinal
notating toxic conversations have prevented re- outcomes such as gaining personal power and repu-
searchers from addressing the problem at scale. tation (Aquino and Thau, 2009; Baumeister, 1995;
We take the first step towards studying toxic- Ellwardt et al., 2012; McAndrew et al., 2007).
ity in workplace communications by providing
Many organizations enact policies that prohibits
(1) a general and computationally viable tax-
onomy to study toxic language at workplace
practicing extremely toxic behaviors like bullying,
(2) a dataset to study toxic language at work- verbal threats, profanity, harassment and discrimi-
place based on the taxonomy and (3) analy- nation; yet detecting more subtle forms of toxicity
sis on why offensive language and hate-speech like negative gossiping, stereotyping, sarcasm, and
datasets are not suitable to detect workplace microaggressions in conversations remains a chal-
toxicity. Our implementation, analysis and lenge.
data will be available at https://aka.ms/
Toxicity can be manifested in different ways.
ToxiScope.
It spans a wide spectrum that includes subtle and
1 Introduction indirect signals; that can often be no less toxic than
overly offensive language (Jurgens et al., 2019).
Studies have shown that more than 80% of the While the research community has made enormous
issues affecting employees’ productivity and satis- progress in detecting overly offensive language and
faction are related to negative work environment hate speech (Schmidt and Wiegand, 2017; Waseem
behaviors such as harassment, bullying, ostracism, et al., 2018; Fortuna and Nunes, 2018; Qian et al.,
gossiping, and incivility (Anjum et al., 2018). 2019), there has been less focus on computation-
Moreover, workplace gossiping results in distracted ally evaluating other subtle expressions of toxicity.

Most of the work was done while the first author was an
1
intern at Microsoft Research https://www.perspectiveapi.com
Qualitative studies have found these subtle signals workplace toxicity.
to have long lasting negative effect (Sue, 2010; Microaggression datasets: Breitfeller et al.
Nadal et al., 2014). As Figure 1 shows, currently (2019) released a dataset from Reddit, Gab, and
popular toxicity detection tools cannot detect sub- www.microaggressions.com showing that it’s possi-
tle yet hurtful conversations as harmful. We argue ble to annotate these highly subjective and linguis-
that it is equally important to detect these subtle tically subtle uncivil communications and detect
aggressive conversations and educate employees them using computational methods. It is focused on
for a healthy workplace. Detecting wider aspects gender-based discrimination due to their availabil-
of toxic text can be challenging. Subtle signals ity in social media. The annotation guideline also
like stereotyping, mild aggression can be context- use gender as discrimination axis to determine tox-
sensitive, sparse, highly subjective and do not have icity. Whereas we are interested in formal conver-
well defined annotation guidelines; whereas overly sations that are context dependent and are majorly
toxic language and hate speech are rarely context- targeted towards individuals addressed in emails
sensitive (Pavlopoulos et al., 2020) and have well- irrespective of gender. Wang and Potts (2019) in-
defined guidelines (Waseem et al., 2017). In this troduced a new Reddit dataset with labels corre-
paper, we take first steps towards (1) defining a tax- sponding to the condescending linguistic acts in
onomy for studying toxic language in workplace conversations and showed that by leveraging the
setting by analyzing the definitions from impolite- context, it is possible to detect this type of challeng-
ness theory and psychology (2) building a dataset ing toxic language. Similarly, Caselli et al. (2020)
of human annotations on publicly available email leveraged the context of occurrences to create a
corpus (3) providing computational methods to es- Twitter dataset for implicit and explicit abusive lan-
tablish baselines for detecting toxic language in guage. Implicit abusive language does not imme-
enterprise emails, and (4) analyzing why current diately insinuate abuse. However, its true meaning
datasets and tools for detecting hate speech do not is often concealed by lack of profanity or hateful
work in our setting. terms which makes it difficult to detect. Oprea and
Magdy (2020) released a corpus for sarcasm self-
2 Related Work annotated by authors on Reddit. However, these
datasets mainly contain abusive language and sar-
Offensive Language Detection: Perspective API castic tweets on popular social events and are infor-
is a popular toxicity detector for detecting offen- mal.
sive conversations. Waseem et al. (2018) devised
a taxonomy and created a dataset to detect hate To the best of our knowledge, there is no avail-
speech and discrimination. Xu et al. (2012) studied able dataset in our community to study toxic lan-
bullying, Chatzakou et al. (2017) released a dataset guage in emails. The most similar work to ours
to study bullying in online posts, and Zampieri can be Raman et al. (2020). However, the focus of
et al. (2019a) released a corpus for offensive posts this work has been mostly offensive language in
named OffensEval which has been encouraging GitHub community whereas our work focuses on
researchers to study offensive contents. Recently, detecting toxicity in workplace emails.
Safi Samghabadi et al. (2020) released a dataset Email Communications: There is also some prior
with emojis for identifying sexually profane lan- work on Email corpus for sociolinguistic down-
guage and Rajamanickam et al. (2020) showed stream tasks. Prabhakaran et al. (2014) explored
joint model of emotion and abusive language de- the relation between power and gender on Enron
tection helps model performance. However, toxic corpus. They showed that the manifestations of
language in workplace has often subtle aggressive power differ significantly between genders and the
conversations and lesser offensive text. Subtle ag- gender information can be used to predict the power
gressive conversations can be covert faux pas or of people in conversations. Similarly, Bramsen
unintentional whereas offensive text is overt and et al. (2011) studied social power relationships be-
includes intentional choice of words. Also, a con- tween members of a social network, based purely
versation in a workplace is more formal than the on the content of their interpersonal communica-
social media text. Due to their fundamental differ- tion using statistical methods. Madaan et al. (2020)
ent structure, current datasets and models trained released automatically labeled Enron corpus for
on these datasets are not able to properly detect politeness. However, their definition for polite-
ness does not capture toxic language. Chhaya et al. body and previous emails and not only the given
(2018) devised computational method to identify text. We quantify these observations through anno-
conversation tone in Enron corpus. They categorize tations before using context aware representations
tones as frustration, formal and polite and find that in our modeling.
affect-based features are important to detect tone
in conversation. However, affect-based features do 3.1 Taxonomy for toxic language
not capture subtle offensive text. We are interested
in studying subtle and offensive text in workplace We leveraged the different negative culture prac-
emails which are different from the prior work in tices with definitions from impoliteness theory
this area. (Culpeper, 1996) and offensive language detection
in social media (Zampieri et al., 2019b,a) to define
3 Toxicity in Enterprise Email taxonomy for toxic language in workplace commu-
nications. We have the following goals in mind:
Our goal is to study and understand workplace (1) generalizable across different organizations, (2)
toxic communications through one of the most fre- sufficiently represented in our corpus, (3) cover the
quently used ways of communication in organiza- main dimensions of negative culture in workplace
tional settings, emails (The Radicati Group, 2020). from cross-domain literature. We have summa-
The distribution of our dataset (Section 3.2) demon- rized the definitions in Table 2 and described each
strates the significant presence of the implicit and of these below.
subtle toxic language in workplace email commu- Non-Toxic: The non-toxic class has instances of
nications contrary to social media and open source friendly, knowledge sharing, formal respectful type
communities. Table 1 also provides the statistics of of conversations. These conversations often have
different datasets that study the implicit and explicit positive or neutral connotations.
toxic language. Impolite: The impolite class has instances of sar-
Dataset size toxic comments Type Agreement Score casm, stereotyping, rude statements. These conver-
(Raman et al., 2020) 1594 189 (11%) Explicit N/A
(Breitfeller et al., 2019) 1065 337 (30%) Implicit 0.41
sations often have opposite polarity to their previ-
(Wulczyn et al., 2017) 69.5k 26.5k (37.4%) Explicit 0.45 ous context with negative or neutral connotations
ToxiScope (Ours) 10k 1210 (11.9%) Implicit 0.77
that might complement the work on benevolent
Table 1: Distribution of different datasets that study im- sexism (Jha and Mamidi, 2017). Following Impo-
plicit and explicit toxic language. liteness theory (Culpeper, 1996), we define ‘Rude’
as direct, intentionally disrespectful words to the
We created a taxonomy (Section 3.1) and a addressee whereas sarcasm (implicature to express
crowd sourced annotation task (Section 3.2) to man- the opposite of being said), stereotyping (uninten-
ually annotate toxic language in the Avocado re- tional) need not be necessarily direct yet disrespect-
search email collection (Oard et al., 2015). This ful comments to the addressee in the conversation.
collection contains corporate emails from an in- Negative Gossip: The gossip class includes rude,
formation technology company referred to as “Av- mocking conversations about a person not involved
ocado”. The collection contains an anonymized in the conversation. We find these instances have
version of the full content of emails, and different negative connotations with a tone of complaint and
meta information from Outlook mailboxes of em- lack of respect toward the target. Kong (2018)
ployees’ emails. The full collection contains 279 found repeated gossip conversations in organiza-
employees and 938,035 emails. tions caused hostility and stress among the em-
In addition, we perform analysis of different ployees. As shown by Wulczyn et al. (2017), con-
emotional affects for each category of toxic lan- versations targeted targeted towards a third person
guage. From previous work, we understand that need not necessarily be extreme yet can be disre-
toxic language has a strong correlation with neg- spectful. Evidently, our annotators find our anno-
ative emotions. We also studied whether using tators feel gossip conversations are more annoying
context was beneficial in determining toxicity. To whereas impolite conversations have more sadness
this end, we conducted an analysis to study whether with higher overlap with offensive category (Fig-
humans benefit from context in detecting toxic lan- ure 3). We refer to this type as "Gossip" in the rest
guage in emails. We assume that to determine of the paper.
toxicity in a text, humans read the entire email Offensive: Detecting overly toxic language has
Type Sub-type Example
Non-toxic NA Hey, how are you holding up?
Can you please reschedule the meeting for tomorrow?
Impolite sarcasm You need big glasses huh, LOL!!!? Its 11:00AM
stereotype Ladies, since you all are good at cooking and are used to it,
I invite you to participate for potluck in office.
forced teaming We all are victims of the new policy. Let the retaliation begin!
authoritarian I want you to give me the numbers by 9PM today. I do not have time to wait until tomorrow.
rude I did not want to yell at you in front of everyone, but you are performing poorly!
Negative Gossip mocking When I take a long time I am slow and when my boss takes a long time, he is thorough
complain How does this guy function in society?
Offensive profanity Let’s kiss their a** and get it done.
discrimination Would you rather be called African-American or black?
bullying Whoever is doing these tags is brain dead enough to send the wrong tag.
violence All [nationality/race] are lazy and don’t deserve to work here
harassment Your backside is banging in that dress.

Table 2: Sub-type categories of toxic language that we developed based on the literature, and email conversations.
Examples demonstrate that the phenomenon is complex and is different from offensive text or negative sentiment.

been extensively studied in the research commu- For each highlighted sentence, annotators indicate
nity. We follow a similar definition of offensive whether the post is toxic, type of toxicity, whether
language as Zampieri et al. (2019b) which refers the target of the toxic comment is the recipient or
to any form of unacceptable language to insult a someone else, whether the prior email as context
targeted individual or group. In our setting, we was helpful, the kind of negative affect associated
define offensive language such that it includes five with toxicity and whether the whole email was
broad categories: profanity, bullying, harassment, toxic. We provide a subset of negative affects to
discrimination and violence. the annotators from WordNet-Affect (Strapparava
and Valitutti, 2004). The annotators answer the
3.2 Annotation task questions on type of toxicity and the target only if
We design a hierarchical annotation framework to they indicate potential toxicity during annotation.
collect instances of sentence in an email and the They can also choose multiple toxic categories for
corresponding label on a crowd-sourcing platform. a highlighted sentence. Finally, the annotators are
Before working on the task, annotators go through provided an optional text box to provide additional
a brief set of guidelines explaining the task. We col- details if the highlighted sentence did not belong
lect the dataset in batches of around 1000 examples to any of the categories we defined. Please note
each. For the first three batches, we upload 75-100 that the sub-types of toxicity do not have a clear
instances manually labeled as toxic by the group of boundary and are not mutually-exclusive.
researchers working on the project to understand A total of 76 annotators participated in this task.
if the annotators followed the guidelines. We re- All annotators were fluent in English and came
peat the pilot testing until desirable performance from 4 countries: USA, Canada, Great Britain and
is achieved. Also, we manually review a sample India, with the majority of them residing in the
of the examples submitted by each annotator after USA. Each highlighted statement in the email was
each batch and exclude those who do not provide annotated by three annotators and they were com-
accurate inputs from the annotators pool and redo pensated based on an hourly rate (as opposed to
all their annotations. A key characteristic of subtle per annotation) to encourage them to optimize for
toxic emails are that they often result from prior quality. They took an average of 5 minutes per an-
experiences, cultural difference or background be- notation. We assume a sentence is toxic even if one
tween individuals (Sue et al., 2007). Hence, design- out of three annotators perceived it as toxic. We
ing annotation for detecting toxicity is a difficult adopt this principle to be inclusive of every individ-
task and there will be discrepancies in perceived ual’s background, culture, sexual orientation and
toxicity between the annotators. In order to min- implicit toxic language can be subtle. Similarly, we
imize ambiguity and provide a clearer context to included the union of the toxicity types selected by
the annotators, we provide email body, subject, and the three annotators for the instance. A snapshot
the prior email in thread as context information. of our crowd-sourcing framework can be found in
Appendix 5
Due to the scarce nature of toxic conversations
in emails, we adopt two round approach for data
collection. For the first round of annotations, we
use several heuristics to increase the chances of
identifying positive instances in the sample. We
tried running the Perspective API and the microag-
gression model (Breitfeller et al., 2019) against Av-
ocado corpus. The coverage of Perspective API is
extremely low (0.1%) since not many overly toxic
text is present in Avocado corpus. On the other Figure 2: Frequency of each sub-category of toxic sen-
hand, the microaggression model output has low tences.
precision (0.12%). To further prune the false pos-
itives, we employ filtering methods2 over the out-
puts from microaggression model before sending gory, annotators agreed on a sentence being offen-
the positive labels for annotation. The first round sive at Krippendorf’s α = 0.77, impolite at Krip-
of annotations provided a positive label ratio of pendorf’s α = 0.29 and gossip at Krippendorf’s
2.74% compared to 0.29% from a manually anno- α = 0.32. The high agreement score on overall
tated batch of around 800 random email sentences. toxicity shows that annotator judgements are reli-
This implies the need to be selective regarding the able and the lower agreement score on sub-types
emails we submit for annotation. In addition, for are indicative of the subjectivity and lack of objec-
the second round of annotations, we used SVM tivity for implicit toxicity (Lilienfeld, 2017) and
classifier to pick positive instances from the un- not the quality. We also quote several prior works
labeled email corpus. To avoid model biases, we in toxicity setting and other tasks that lack objec-
randomly sample unlabeled email sentences based tivity, and have inter-annotator agreement score in
on their probability scores with more instances be- our range. Microaggression dataset has a score of
ing sampled from the higher scores ranges. The 0.41 for 200 instances and Rashkin et al. (2016)
second round of annotations provided a positive la- has a score of 0.25 for inter-annotator agreement.
bel ratio of 11.2% which is significantly higher than Insights from annotation task: Sometimes
our previous rounds.The classifier is updated with defining a clear boundary between categories of
more examples after each round of annotations. toxic language is challenging because they are not
Overall, the final dataset contains 10,110 email mutually exclusive. Therefore a statement can be-
sentences of which 1,120 of the sentences are la- long to multiple toxic categories. For example, the
beled as toxic by annotators. We call this dataset content of an email can be about gossiping and at
for studying toxic language in workplace commu- the same time be discriminatory against a certain
nications as ToxiScope. Please note that we asked group of people. Our analysis shows that 92% of
the annotators to identify spam emails and their emails belong to a single toxic category while the
types including Advertisement, Adult content, and rest of the emails contain two or more types of
Derogatory content. We observed that 99% of the toxic language. Figure 3 shows the co-occurrence
emails in Spam category are advertisement and of different toxic contents in the same email. We
we decided to exclude those emails since advertise- can observe that the Offensive and Impolite cat-
ment contents are not in the scope of toxic language egories are slightly more likely to happen in the
detection. Figure 2 shows the distribution of toxic same email than with Gossip. Since our task is
emails over sub-categories of toxic language which highly subjective, in order to understand the rea-
indicates higher frequency for Impolite emails. sons behind perceived toxicity we ask annotators
Annotators Agreement: Overall, the annotations several questions about the target and affect of the
showed inter-annotator agreement score of Krip- toxic statement, and whether the context (previous
pendorf’s α = 0.718 to detect whether a given sen- email) is useful in determining the toxicity of the
tence was toxic or not. Broken down by each cate- statement. We find that in 41% of the instances,
2
context information was helpful to determine toxic-
LIWC lexicon (Pennebaker et al., 2015), WordNetAffect
(Strapparava and Valitutti, 2004), https://github.com/ ity. In 76.86% of the toxic instances, the language
snguyenthanh/better_profanity was targeted to another individual or a group. Un-
of-the-art models in literature and the Perspective
API:
Linear Models: We generate n-grams (where n is
up to 2) and feed them as feature vectors for the
classifier. We experiment with Logistic Regression
and Support Vector Machines (SVM) as utilized by
Breitfeller et al. and Raman et al. for our task.
Context-Aware Sentence Classification: Wang
et al. developed a GRU model with context en-
coder that uses attention mechanism on the context
Figure 3: Correlation between emails toxic categories. sentences and a fusion layer that concatenates tar-
get and context sentence representations to study
the influence of context in intent classification. We
derstandably, all the toxic instances have negative leverage this model for our experiments.
affect with anger and hostile being present in most Bert Classification: We experimented with the
of the cases. However, annotators find gossip ex- Bert-based model proposed by Liu et al.. We
amples more disgusting and a toxic sentence to fine-tuned the model that was initially trained on
be 6.1% more annoying when they are targeted to Zampieri et al. with ToxiScope. This model con-
another individual not in the conversation. catenates the text of the parent and target comments,
We use 70% of the data for training and 10% separated by Bert’s [SEP] token, as in Bert’s next
as validation set. We hold out 20% of the data for sentence prediction pre-training task.
test set. Table 3 provides a summary of the final Bert+ MLP:For this model, we experimented with
dataset. context-aware version of Bert-based classifier as
Sentence Type Train Dev Test explained above. We freeze the first 8 layers of Bert
and add a non-linear activation function before the
Toxic 886 117 207
classification layer.
Impolite 636 84 139
Gossip 176 23 47
Offensive 74 10 21 5 Results and Analysis
Non-toxic 6308 864 1728 Table 4 summarizes the performance of models
Total 7194 981 1935 trained and tested on ToxiScope. The baselines
Table 3: Number of instances in each toxic category performance are reported for binary classification
and set of ToxiScope (toxic vs non-toxic). We report evaluation metrics
in F1 (macro and micro) and accuracy (TPR and
TNR) of different classes due to class imbalance.
For the models in Table 4, which required context
4 Detecting toxic conversations in Emails
as an input, we took the prior email in the thread
We design our experiments with the following during pre-processing. The results imply pretrained
goals: (1) Investigate if contextual information Bert models fine-tuned on ToxiScope perform bet-
(email body, the parent email) helps in determin- ter than non-pretrained models. Hence, we will fo-
ing toxicity. We also study which categories of cus on these models to evaluate the effect of context
toxic language benefit from adding context to the on the outcome. In addition, the low recall perfor-
sentence. (2) We also test our hypothesis that cur- mance or True Positive Rate (TPR) demonstrates
rent toxic language datasets cannot identify indirect the challenge in detecting subtle toxic instances in
aggressive or impolite sentences. We consider cur- communications and from now on we pay more
rent state-of-the-art toxic language detectors for attention to TPR and F1 score metrics.
this task. (3) Evaluate our baseline models on other
datasets including Wiki Comments (Wulczyn et al., Effect of adding context: As outlined in Sec-
2017) and GitHub (Raman et al., 2020) to study if tion 3.2, annotators find prior email and email
understanding subtle signals help in determining body helpful to determine toxicity. Pavlopoulos
overly toxic language. et al. (2020) showed that adding context did not
We experimented with publicly available state- help pre-trained models like Bert in boosting the
Model Accuracy F1
toxic non-toxic overall (macro/micro)
(TPR) (TNR)
Logistic Regression 0.0097 1.00 0.5050 0.4816/0.8941
Linear SVM 0.3092 0.9421 0.6257 0.6378/0.8744
DCRNN (Wang et al., 2019) 0.1223 1.00 0.5610 0.4980/0.8537
Bert Classification 0.4348 0.9825 0.7102 0.75/0.91
Bert + MLP 0.4300 0.9925 0.7112 0.7696/0.9213

Table 4: Performance of different models trained and tested on ToxiScope. We report True Positive Rate (TPR),
True Negative Rate (TNR), and overall accuracy along with F1 (macro and micro) scores.

Model Context Offensive Gossip Impolite Average


no context 0.75 0.3333 0.3089 0.4640
Bert email body 0.675 0.2410 0.3581 0.4247
Classification (+/-1) adjacent sentences 0.80 0.5027 0.2133 0.5053
previous email 0.80 0.3675 0.3966 0.5213
no context 0.75 0.39 0.379 0.5063
Bert + email body 0.75 0.4718 0.375 0.5322
MLP (+/-1) adjacent sentences 0.80 0.5156 0.1869 0.5008
previous email 0.80 0.4523 0.365 0.5391

Table 5: Performance of our baseline models across different categories of toxic language. We report True Positive
Rate (TPR) for each category and the average over their TPR.

performance. However, the dataset in their set- categories.


ting was small in size and the target comments Generalization to other domains: To investi-
were mostly offensive. These observations may not gate how other domains can lever our dataset, we
generalize in our case since we are interested in trained the baseline models for toxic language de-
detecting implicit and subtle cases of aggressive tection (Breitfeller et al., 2019; Raman et al., 2020)
language. In order to evaluate the effect of the and context aware sentence classification (Wang
contextual information, we experimented with dif- et al., 2019) on ToxiScope. Then, we tested these
ferent variations of the context. Table 5 presents the models against different toxic language datasets.
TPR for different categories of the toxic language. Since we did not find any dataset studying toxic
Based on our experiments, models find context language in workplace (with implicit and explicit
helpful to detect toxicity. Interestingly, models do toxic text), we picked the datasets that overlap with
not find contextual information necessary to detect one or few categories of our interest. The results are
offensive language unlike other categories. We also presented in Table 6 which shows that Bert based
observed gossip category benefits the most from models outperform other methods in all of the do-
the neighborhood sentences as context. The ma- mains. Note that on microaggression dataset we
jority of the gossip emails in our dataset belong achieve TPR of 0.54 which performs better than the
to complain sub-category which are spread across model provided by Breitfeller et al. (2019) with
multiple sentences. Hence, many of the neighbor- best TPR of 0.363 . On Wiki Comments dataset,
ing sentences could have had negative connotations our baseline models using Bert have good accu-
that would have aided the models. However, on racy (TPR 0.86) in detecting toxic text which is
average using the previous email in the thread is comparable to the TPR of Perspective API (0.85).
most helpful in detecting the toxic language. In The reason for high false positive rate could be that
general, finding implicit toxic language is a diffi- Wiki Comments dataset does not consider subtle
cult task. This is evident in low TPR of gossip and
impolite classes as well as their sparse labels and 3
Since test set for Microaggression datset is not publicly
the low inter-annotator agreement scores in those available, we randomly split the available set to 80:20 for
training and test.
Model Microaggression dataset Wiki Comments GitHub
F1 Accuracy F1 Accuracy F1 Accuracy
(macro/micro) toxic (TPR) (macro/micro) toxic (TPR) (macro/micro) toxic (TPR)
Logistic Regression 0.4169/0.6769 0.014 0.6451/0.7964 0.4903 0.3413/0.5181 0.0
Linear SVM 0.5427/0.6056 0.3571 0.4867/0.5668 0.5870 0.4751/0.5544 0.1720
DCRNN (Wang et al., 2019) 0.4517/0.6914 0.13 0.5215/0.8856 0.2382 0.3997/0.5231 0.051
Bert Classification 0.6578/0.7136 0.4714 0.7430/0.8388 0.7805 0.4368/0.5506 0.1011
Bert+MLP 0.6233/0.6573 0.5429 0.7210/0.8070 0.8608 0.5525/0.5843 0.2287

Table 6: Performance of baseline models trained on ToxiScope and tested on several toxic language datasets.

aggressive text as toxic. The best performing clas- on Microaggression dataset are more applicable to
sifier by Raman et al. (2020) on GitHub datatset workplace toxic language detection. However, they
has a TPR of 0.35. One reason for poor scores on are still performing worse than the in-domain mod-
GitHub dataset can be attributed to noisy labels. els (Table 4). Impolite and gossip (constituting of
We sampled a few instances from GitHub dataset sarcasm, stereotyping, rude) categories are predom-
and found 15% of them to be noisy. Overall, these inantly present in ToxiScope while there are not
experiment results imply the potential benefits of many datasets available for these tasks and the ex-
using our dataset for detecting toxic language in isting datasets are small in size. This could explain
social media and open source community domains. the inadequate performance of these models.

Leveraging social media and open source 6 Conclusion


communities data to detect workplace toxicity:
Previously, we saw a gap in available resources
Offensive language is widely studied on social me-
to detect workplace negative communications and
dia language and there are several datasets and
based on our observations, Microaggression dataset
methods available for this task. Tables 8 presents
was the only resource applicable to this domain
the performance of the publicly available models
which did not show promising performance. Hence,
and API4 on ToxiScope. The model from Breit-
we created ToxiScope to close this gap. We pre-
feller et al. (2019) has a reasonable performance
sented a taxonomy and annotation guidelines to
on ToxiScope. Their method uses lexicons for mi-
study toxic language in workplace emails. We also
croaggressions from external sources. Leveraging
provided baseline methods to detect toxic language
these external sources as weak supervision signals
in ToxiScope. Further, we demonstrated the ne-
might help in boosting performance of models for
cessity of new dataset to detect workplace toxicity
ToxiScope as well.
since the models trained on existing overly toxic
Next, we investigated if these datasets can be
datasets and on Microaggression dataset do not
helpful in training models for detecting workplace
detect subtle toxic text. In addition, we observed
toxicity. We fine-tuned and trained Bert based mod-
that context help Bert based models to detect sub-
els over Microaggression, GitHub, and Wiki Com-
tle toxic sentences. However, our results indicate
ments and ran the inference on ToxiScope. As we
that we need more sophisticated models and better
expected, Table 7 shows that the models trained
representation of context to detect implicit toxic
4
We utilized Perspective API which is trained over 160k sentences. In future, we will explore other meth-
human labeled annotations of Wikipedia comments. ods like weak supervision from other sources and

Model Microaggression Wiki Comments GitHub


F1 Accuracy F1 Accuracy F1 Accuracy
(macro/micro) toxic (TPR) (macro/micro) toxic (TPR) (macro/micro) toxic (TPR)
Bert Classification 0.6780/0.8889 0.3720 0.6078/0.8992 0.1739 0.4906/0.8941 0.0483
Bert+MLP 0.6921/0.9106 0.3188 0.5951/0.8956 0.1594 0.5971/0.9070 0.1401

Table 7: Performance of Bert models trained on Microaggression, Wiki Comments, GitHub datasets and tested on
ToxiScope. The column denotes the dataset all the models were trained on.
Model Accuracy F1
toxic non-toxic overall (macro/micro)
in email communications from the results of this re-
(TPR) (TNR) search, as well as any public benefit that may come
Perspective API 0.2174 0.9907 0.6040 0.6432/0.8848 from these Research Results being shared with the
Raman et al. (2020) 0.1014 0.9797 0.8858 0.5492/0.8734
Breitfeller et al. (2019) 0.3987 0.5556 0.5217 0.4375/0.5483 greater scientific community.
Liu et al. (2019) 0.4348 0.9825 0.7102 0.75/0.91 Risks: During your participation, you may experi-
ence some discomfort being exposed to profanity,
Table 8: Performance of different models with infer-
toxic and discriminatory language in emails. To
ence on ToxiScope.
mitigate this risk, the research team makes it pos-
sible for you to take a break or skip tasks without
self-training for better performance. adversely affecting your ratings within the crowd-
Going forward, we will also investigate other re- sourcing platform. This research may involve risks
search questions pertaining to the likelihood of an to you that are currently unforeseeable.
individual using toxic language repeatedly, corre- In addition, we did not collect any personal or
lation of power and gender dynamics with respect demographic information other than their crowd
to toxicity, presence of the bias (racial/gender) in source platform identification number. The consent
ToxiScope, understanding the degree of severity of form explains how we manage their information
toxic text. We hope our work will encourage the and provide details about their compensations. Re-
researchers in the community to study and develop sources were also provide to answer the annotators
methods to detect workplace toxicity. questions and concerns. Moreover, we limited the
number of emails an annotator can work on in a
Acknowledgments task and paid them above minimum wage ($12-15
We thank anonymous reviewers for their construc- per hour).
tive feedback and Saleema Amershi, Michael Gam-
mon, Alexandra Olteanu, Allison Hegel, Liye Fu, 7.2 Deployment
Subho Mukherjee for valuable discussions during
Detecting harmful language in email communica-
the project.
tion is a difficult task even for human. Recent work
7 Ethical Considerations have shown that the toxic language detection mod-
els are also very prone to racial biases (Sap et al.,
7.1 Annotation 2019; Davidson et al., 2019) due to the fact that
In this work, we leverage the publicly available they are using biased datasets. In this work, we
Avocado corpus which belongs to Language Data hired annotators from different English speaking
Consortium (LDC). This email dataset has been countries to reduce the bias in our dataset. How-
processed and anonymized by LDC. We received ever, this is a research paper with the goal to bet-
approval from our organization Internal Review ter understand the problem of toxic language in
Board (IRB) before starting the annotation task to workplace communications and encouraging other
make sure we are in compliance with the Avocado researchers to work on this problem. We believe
Research Email Collection license agreements as further study needs to be done on this dataset to
well as the ethical guidelines. We understand that make sure it’s not biased before deploying any com-
annotating potentially toxic content can have neg- putational model.
ative impact on the workers. In order to reduce In addition, for deploying this technology, we
these effects, we provided warnings and informa- need access to the employees’ communications. To
tion about the research project in a consent form. the best of our knowledge, most workplaces do not
We asked the annotators to read the consent form provide any guarantee of privacy for employee’s
and only proceed if they’ve agreed to its terms (Fig- communications using enterprise systems. In addi-
ure 4). The risks and benefits of working on this tion, there are several existing technologies being
annotation tasks were presented to annotators in implemented on workplace communications for im-
the consent form: proving users’ productivity such as response gen-
Benefits: There are no direct benefits to you eration and intent detection in emails. These tech-
that might reasonably be expected as a result of nologies are being used without violating user’s
being in this study. The research team expects to privacy thanks to advances in the fields of unsu-
learn to detect micro-aggressive and toxic language pervised learning and privacy-preserving machine
learning. and Rafael Alonso. 2011. Extracting social power
Moreover, this technology have multiple appli- relationships from natural language. In Proceed-
ings of the 49th Annual Meeting of the Association
cations and some of them can potentially be used to
for Computational Linguistics: Human Language
harm employees and their friends and family. For Technologies - Volume 1, HLT ’11, page 773–782,
example, using this model to detect toxic language USA. Association for Computational Linguistics.
and report employees to HR or their manager is a
Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia
high-stake application. If this system makes a false
Tsvetkov. 2019. Finding microaggressions in the
positive error, it may damage employee’s reputa- wild: A case for locating elusive phenomena in so-
tion, forces the employee to defend themselves and cial media posts. In Proceedings of the 2019 Con-
diminishes their trust in the company. This technol- ference on Empirical Methods in Natural Language
ogy can also be used to provide feedback to employ- Processing and the 9th International Joint Confer-
ence on Natural Language Processing (EMNLP-
ees about their written communication style. This IJCNLP), Hong Kong, China. Association for Com-
tool can be used for training purposes and increas- putational Linguistics.
ing workers awareness of such a micro-aggressive
language. If this system makes frequent false pos- Tommaso Caselli, Valerio Basile, Jelena Mitrović, Inga
Kartoziya, and Michael Granitzer. 2020. I feel
itive errors, employees will become annoyed and offended, don’t be abusive! implicit/explicit mes-
be less productive, which causes an eventual drop sages in offensive and abusive language. In Pro-
in the company‘s profits. Companies can pursue ceedings of the 12th Language Resources and Eval-
mitigation steps and allow employees to provide uation Conference, pages 6193–6202, Marseille,
France. European Language Resources Associa-
feedback and dispute the system’s predictions. tion.

Despoina Chatzakou, Nicolas Kourtellis, Jeremy


References Blackburn, Emiliano De Cristofaro, Gianluca
Stringhini, and Athena Vakali. 2017. Mean birds:
A. Anjum, X. Ming, and S. F. Siddiqi, A. F.and Ra-
Detecting aggression and bullying on twitter. In
sool. 2018. An empirical study analyzing job pro-
Proceedings of the 2017 ACM on Web Science Con-
ductivity in toxic workplace environments. Interna-
ference, WebSci ’17, page 13–22, New York, NY,
tional journal of environmental research and public
USA. Association for Computing Machinery.
health.
Karl Aquino and Stefan Thau. 2009. Workplace vic- Niyati Chhaya, Kushal Chawla, Tanya Goyal, Projjal
timization: Aggression from the target’s perspec- Chanda, and Jaya Singh. 2018. Frustrated, po-
tive. Annual Review of Psychology, 60(1):717–741. lite, or formal: Quantifying feelings and tone in
PMID: 19035831. email. In Proceedings of the Second Workshop on
Computational Modeling of People’s Opinions, Per-
Leary M. R. Baumeister, R. F. 1995. The need to sonality, and Emotions in Social Media, pages 76–
belong: Desire for interpersonal attachments as a 86, New Orleans, Louisiana, USA. Association for
fundamental human motivation. Psychological Bul- Computational Linguistics.
letin, 117(3):497–529.
Jonathan Culpeper. 1996. Towards an anatomy of im-
Philip Bramsen, Martha Escobar-Molano, Ami Patel, politeness. Journal of Pragmatics, 25(3):349 – 367.

Figure 4: A snapshot of the annotation task which shows the annotator must read the consent form and agree to its
terms before proceeding to annotate an email
Thomas Davidson, Debasmita Bhattacharya, and Ing- Kevin L. Nadal, Katie E. Griffin, Yinglee Wong,
mar Weber. 2019. Racial bias in hate speech and Sahran Hamit, and Morgan Rasmus. 2014. The
abusive language detection datasets. In Proceed- impact of racial microaggressions on mental health:
ings of the Third Workshop on Abusive Language Counseling implications for clients of color. Jour-
Online, pages 25–35, Florence, Italy. Association nal of Counseling & Development, 92(1):57–66.
for Computational Linguistics.
Douglas Oard, William Webber, David Kirsch, and
Michelle K. Duffy, Daniel C. Ganster, and Milan Sergey Golitsynskiy. 2015. Avocado research email
Pagon. 2002. Social undermining in the workplace. collection. DVD.
Academy of Management Journal, 45(2):331–351.
Silviu Oprea and Walid Magdy. 2020. iSarcasm: A
Lea Ellwardt, Giuseppe (Joe) Labianca, and Rafael Wit- dataset of intended sarcasm. In Proceedings of the
tek. 2012. Who are the objects of positive and neg- 58th Annual Meeting of the Association for Compu-
ative gossip at work?: A social network perspective tational Linguistics, pages 1279–1289, Online. As-
on workplace gossip. Social Networks, 34(2):193 – sociation for Computational Linguistics.
205.
John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon,
Paula Fortuna and Sérgio Nunes. 2018. A survey on Nithum Thain, and Ion Androutsopoulos. 2020.
automatic detection of hate speech in text. ACM Toxicity detection: Does context really matter? In
Comput. Surv., 51(4). Proceedings of the 58th Annual Meeting of the Asso-
ciation for Computational Linguistics, Online. As-
Akshita Jha and Radhika Mamidi. 2017. When does sociation for Computational Linguistics.
a compliment become sexist? analysis and classifi-
cation of ambivalent sexism using twitter data. In James Pennebaker, Martha Francis, and Roger Booth.
Proceedings of the Second Workshop on NLP and 2015. Linguistic inquiry and word count (liwc).
Computational Social Science, pages 7–16, Vancou-
ver, Canada. Association for Computational Lin- Vinodkumar Prabhakaran, Emily E. Reid, and Owen
guistics. Rambow. 2014. Gender and power: How gender
and gender environment affect manifestations of
David Jurgens, Libby Hemphill, and Eshwar Chan-
power. In Proceedings of the 2014 Conference on
drasekharan. 2019. A just and comprehensive strat-
Empirical Methods in Natural Language Process-
egy for using NLP to address online abuse. In Pro-
ing (EMNLP), pages 1965–1976, Doha, Qatar. As-
ceedings of the 57th Annual Meeting of the Associ-
sociation for Computational Linguistics.
ation for Computational Linguistics, pages 3658–
3666, Florence, Italy. Association for Computa-
Jing Qian, Mai ElSherief, Elizabeth Belding, and
tional Linguistics.
William Yang Wang. 2019. Learning to decipher
Ming Kong. 2018. Effect of perceived negative work- hate symbols. In Proceedings of the 2019 Confer-
place gossip on employees’ behaviors. Frontiers in ence of the North American Chapter of the Associ-
Psychology, 9:1112. ation for Computational Linguistics: Human Lan-
guage Technologies, Volume 1 (Long and Short Pa-
Scott O. Lilienfeld. 2017. Microaggressions: Strong pers), pages 3006–3015, Minneapolis, Minnesota.
claims, inadequate evidence. Perspectives on Association for Computational Linguistics.
Psychological Science, 12(1):138–169. PMID:
28073337. Santhosh Rajamanickam, Pushkar Mishra, Helen Yan-
nakoudakis, and Ekaterina Shutova. 2020. Joint
Ping Liu, Wen Li, and Liang Zou. 2019. NULI at modelling of emotion and abusive language detec-
SemEval-2019 task 6: Transfer learning for offen- tion. In Proceedings of the 58th Annual Meeting
sive language detection using bidirectional trans- of the Association for Computational Linguistics,
formers. In Proceedings of the 13th Interna- pages 4270–4279, Online. Association for Compu-
tional Workshop on Semantic Evaluation, pages 87– tational Linguistics.
91, Minneapolis, Minnesota, USA. Association for
Computational Linguistics. Naveen Raman, Minxuan Cao, Yulia Tsvetkov, Chris-
tian Kästner, and Bogdan Vasilescu. 2020. Stress
Aman Madaan, Amrith Setlur, Tanmay Parekh, Barn- and burnout in open source: Toward finding, un-
abas Poczos, Graham Neubig, Yiming Yang, Rus- derstanding, and mitigating unhealthy interactions.
lan Salakhutdinov, Alan W Black, and Shrimai New York, NY, USA. Association for Computing
Prabhumoye. 2020. Politeness transfer: A tag and Machinery.
generate approach.
Hannah Rashkin, Sameer Singh, and Yejin Choi. 2016.
Francis T. McAndrew, Emily K. Bell, and Con- Connotation frames: A data-driven investigation.
titta Maria Garcia. 2007. Who do we tell and whom In Proceedings of the 54th Annual Meeting of the
do we tell on? gossip as a strategy for status en- Association for Computational Linguistics (Volume
hancement1. Journal of Applied Social Psychology, 1: Long Papers), pages 311–321, Berlin, Germany.
37(7):1562–1577. Association for Computational Linguistics.
Niloofar Safi Samghabadi, Afsheen Hatami, Mahsa Zeerak Waseem, James Thorne, and Joachim Bingel.
Shafaei, Sudipta Kar, and Thamar Solorio. 2020. 2018. Bridging the Gaps: Multi Task Learning for
Attending the emotions to detect online abusive lan- Domain Transfer of Hate Speech Detection, pages
guage. In Proceedings of the Fourth Workshop on 29–55. Springer International Publishing, Cham.
Online Abuse and Harms, pages 79–88, Online. As-
sociation for Computational Linguistics. Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017.
Wikipedia talk labels: Personal attacks.
Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi,
Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, and Amy
and Noah A. Smith. 2019. The risk of racial bias
Bellmore. 2012. Learning from bullying traces in
in hate speech detection. In Proceedings of the
social media. In Proceedings of the 2012 Con-
57th Annual Meeting of the Association for Com-
ference of the North American Chapter of the As-
putational Linguistics, pages 1668–1678, Florence,
sociation for Computational Linguistics: Human
Italy. Association for Computational Linguistics.
Language Technologies, NAACL HLT ’12, page
656–666, USA. Association for Computational Lin-
Anna Schmidt and Michael Wiegand. 2017. A survey guistics.
on hate speech detection using natural language pro-
cessing. In Proceedings of the Fifth International Marcos Zampieri, Shervin Malmasi, Preslav Nakov,
Workshop on Natural Language Processing for So- Sara Rosenthal, Noura Farra, and Ritesh Kumar.
cial Media, pages 1–10, Valencia, Spain. Associa- 2019a. Predicting the type and target of offen-
tion for Computational Linguistics. sive posts in social media. In Proceedings of the
2019 Conference of the North American Chapter
Carlo Strapparava and Alessandro Valitutti. 2004. of the Association for Computational Linguistics:
WordNet affect: an affective extension of Word- Human Language Technologies, Volume 1 (Long
Net. In Proceedings of the Fourth International and Short Papers), pages 1415–1420, Minneapolis,
Conference on Language Resources and Evaluation Minnesota. Association for Computational Linguis-
(LREC’04), Lisbon, Portugal. European Language tics.
Resources Association (ELRA).
Marcos Zampieri, Shervin Malmasi, Preslav Nakov,
Derald Sue, Christina Capodilupo, Gina Torino, Jen- Sara Rosenthal, Noura Farra, and Ritesh Kumar.
nifer Bucceri, be Aisha, Kevin Nadal, and Marta Es- 2019b. SemEval-2019 task 6: Identifying and cat-
quilin. 2007. Racial microaggressions in everyday egorizing offensive language in social media (Of-
life: Implications for clinical practice. The Ameri- fensEval). In Proceedings of the 13th Interna-
can psychologist, 62:271–86. tional Workshop on Semantic Evaluation, pages 75–
86, Minneapolis, Minnesota, USA. Association for
Derald Wing Sue. 2010. Microaggressions in Everyday Computational Linguistics.
Life: Race, Gender, and Sexual Orientation. Wiley.

INC. The Radicati Group. 2020. Email statistics report,


2020-2024.

W. Wang, Saghar Hosseini, Ahmed Hassan Awadal-


lah, P. Bennett, and Chris Quirk. 2019. Context-
aware intent identification in email conversations.
Proceedings of the 42nd International ACM SIGIR
Conference on Research and Development in Infor-
mation Retrieval.

Zijian Wang and Christopher Potts. 2019. TalkDown:


A corpus for condescension detection in context. In
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the
9th International Joint Conference on Natural Lan-
guage Processing (EMNLP-IJCNLP), pages 3711–
3719, Hong Kong, China. Association for Compu-
tational Linguistics.

Zeerak Waseem, Thomas Davidson, Dana Warmsley,


and Ingmar Weber. 2017. Understanding abuse:
A typology of abusive language detection subtasks.
In Proceedings of the First Workshop on Abusive
Language Online, pages 78–84, Vancouver, BC,
Canada. Association for Computational Linguis-
tics.
A Appendix

Figure 5: Snapshot of crowd sourcing task.

You might also like