0% found this document useful (0 votes)
49 views12 pages

Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations For Cyberbullying Classification

Uploaded by

Belén Bln
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views12 pages

Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations For Cyberbullying Classification

Uploaded by

Belén Bln
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Aggressive, Repetitive, Intentional, Visible, and Imbalanced:

Refining Representations for Cyberbullying Classification

Caleb Ziems Ymir Vigfusson Fred Morstatter


Emory University Emory University USC Information Sciences Institute
[email protected] [email protected] [email protected]
arXiv:2004.01820v1 [cs.SI] 4 Apr 2020

Abstract moderation is often unfeasible. For this reason, social media


platforms are beginning to rely instead on machine learning
Cyberbullying is a pervasive problem in online communi- classifiers for automatic cyberbullying detection (Van Hee et
ties. To identify cyberbullying cases in large-scale social net-
works, content moderators depend on machine learning clas-
al. 2018).
sifiers for automatic cyberbullying detection. However, exist- The research community has developed increasingly com-
ing models remain unfit for real-world applications, largely petitive classifiers to detect harmful or aggressive content in
due to a shortage of publicly available training data and a lack text. Despite significant progress in recent years, however,
of standard criteria for assigning ground truth labels. In this existing models remain unfit for real-world applications.
study, we address the need for reliable data using an original This is due, in part, to shortcomings in the training and test-
annotation framework. Inspired by social sciences research ing data (Hosseinmardi et al. 2016; Salawu, He, and Lums-
into bullying behavior, we characterize the nuanced problem den 2017; Rosa et al. 2019). Most annotation schemes have
of cyberbullying using five explicit factors to represent its ignored the importance of social context, and researchers
social and linguistic aspects. We model this behavior using have neglected to provide annotators with objective criteria
social network and language-based features, which improve
classifier performance. These results demonstrate the impor-
for distinguishing cyberbullying from other crude messages.
tance of representing and modeling cyberbullying as a social To address the urgent need for reliable data, we provide
phenomenon. an original annotation framework and an annotated Twitter
dataset.1 The key advantages to our labeling approach are:
• Contextually-informed ground truth. We provide an-
Introduction notators with the social context surrounding each mes-
Cyberbullying poses a serious threat to the safety of on- sage, including the contents of the reply thread and the
line communities. The Centers for Disease Control and Pre- account information of each user involved.
vention (CDC) identify cyberbullying as a “growing public
• Clear labeling criteria. We ask annotators to provide
health problem in need of additional research and preven-
labels for five clear cyberbullying criteria. These criteria
tion efforts” (David-Ferdon and Hertz 2009). Cyberbullying
can be combined and adapted for revised definitions of
has been linked to negative mental health outcomes, includ-
cyberbullying.
ing depression, anxiety, and other forms of self-harm, sui-
cidal ideation, suicide attempts, and difficulties with social Using our new dataset, we experiment with existing NLP
and emotional processing (Miller 2016; Price, Dalgleish, features and compare results with a newly-proposed set
and others 2010; Sampasa-Kanyinga, Roumeliotis, and Xu of features. We designed these features to encode the dy-
2014). Where traditional bullying was once limited to a spe- namic relationship between a potential bully and victim, us-
cific time and place, cyberbullying can occur at any hour and ing comparative measures from their relative linguistic and
from any location on earth (Chatzakou et al. 2017). Once the social network profiles. Additionally, our features have low
first message has been sent, the attack can escalate rapidly as computational complexity, so they can scale to internet-scale
harmful content is spread across shared media, compound- datasets, unlike expensive network centrality and clustering
ing these negative effects (Waasdorp and Bradshaw 2015; measurements.
Huang and Chou 2010). Results from our experiments suggest that, although ex-
Internet users depend on content moderators to flag abu- isting NLP models can reliably detect aggressive language
sive text and ban cyberbullies from participating in online in text, these lexically-trained classifiers will fall short of
communities. However, due to the overwhelming volume the more subtle goal of cyberbullying detection. With n-
of social media data produced every day, manual human grams and dictionary-based features, classifiers prove un-
able to detect harmful intent, visibility among peers, power
Copyright c 2020, Association for the Advancement of Artificial
1
Intelligence (www.aaai.org). All rights reserved. https://github.com/cjziems/cyberbullying-representations
Table 1: Datasets built from different related definitions of cyberbullying. For each dataset, we report the size, positive class
balance, inter-annotator agreement, and whether the study incorporated social context in the annotation process.
Work AGGR REP HARM PEER POWER Data Source Size Balance Agreement Context
Al-garadi, Varathan, and Ravana (2016) 3 3 Twitter 10,007 6.0% – 7
Chatzakou et al. (2017) 3 3 3 3 Twitter 9,484 – 0.54 3
Hosseinmardi et al. (2015) 3 3 3 Instagram 1,954 29.0% 0.50 3
Huang, Singh, and Atrey (2014) 3 3 Twitter 4,865 1.9% – 7
Reynolds, Kontostathis, and Edwards (2011) 3 3 Formspring 3,915 14.2% – 7
Rosa et al. (2019) 3 3 3 3 Formspring 13,160 19.4% – 7
Sugandhi et al. (2016) 3 3 Mixed 3,279 12.0% – 7
Van Hee et al. (2018) 3 3 AskFM 113,698 4.7% 0.59 3

imbalance, or the repetitive nature of aggression with suf- sufficient to render the victim defenseless (Slonje and Smith
ficiently high precision and recall. However, our proposed 2008).
feature set improves F1 scores on all four of these social The machine learning community has not reached a unan-
measures. Real-world detection systems can benefit from imous definition of cyberbullying either. They have instead
our proposed approach, incorporating the social aspects of echoed the uncertainty of the social scientists. Moreover,
cyberbullying into existing models and training these mod- some authors have neglected to publish any objective cyber-
els on socially-informed ground truth labels. bullying criteria or even a working definition for their anno-
tators, and among those who do, the formulation varies. This
Background disagreement has slowed progress in the field, since classi-
fiers and datasets cannot be as easily compared. Upon re-
Existing approaches to cyberbullying detection generally
view, however, we found that all available definitions con-
follow a common workflow. Data is collected from social
tained a strict subset of the following criteria: aggression
networks or other online sources, and ground truth is estab-
(AGGR), repetition (REP), harmful intent (HARM), visibility
lished through manual human annotation. Machine learning
among peers (PEER), and power imbalance (POWER). The
algorithms are trained on the labeled data using the message
datasets built from these definitions are outlined in Table 1.
text or hand-selected features. Then results are typically re-
ported using precision, recall, and F1 scores. Comparison
across studies is difficult, however, because the definition
Existing Sources of Cyberbullying Data
of cyberbullying has not been standardized. Therefore, an According to Van Hee et al. (2018), data collection is the
important first step for the field is to establish an objective most restrictive “bottleneck” in cyberbullying research. Be-
definition of cyberbullying. cause there are very few publicly available datasets, some re-
searchers have turned to crowdsourcing using Amazon Me-
Defining Cyberbullying chanical Turk or similar platforms.
Some researchers view cyberbullying as an extension of In most studies to date, annotators labeled individual mes-
more “traditional” bullying behaviors (Hinduja and Patchin sages instead of message threads, ignoring social context
2008; Olweus 2012; Raskauskas and Stoltz 2007). In one altogether (Al-garadi, Varathan, and Ravana 2016; Huang,
widely-cited book, the psychologist Dan Olweus defines Singh, and Atrey 2014; Nahar et al. 2014; Reynolds, Kon-
schoolyard bullying in terms of three criteria: repetition, tostathis, and Edwards 2011; Singh, Huang, and Atrey 2016;
harmful intent, and an imbalance of power (Olweus Sugandhi et al. 2016). Only three of the papers that we re-
1994). He then identifies bullies by their intention to “inflict viewed incorporated social context in the annotation pro-
injury or discomfort” upon a weaker victim through repeated cess. Chatzakou et al. (2017) considered batches of time-
acts of aggression. sorted tweets called sessions, which were grouped by user
Social scientists have extensively studied this form of accounts, but they did not include message threads or any
bullying as it occurs among adolescents in school (Kowal- other form of context. Van Hee et al. (2018) presented “orig-
ski and Limber 2013; Li 2006). However, experts disagree inal conversation[s] when possible,” but they did not explain
whether cyberbullying should be studied as a form of tra- when this information was available. Hosseinmardi et al.
ditional bullying or a fundamentally different phenomenon (2016) was the only study to label full message reply threads
(Kowalski and Limber 2013; Olweus 2012). Some argue as they appeared in the original online source.
that, although cyberbullying might involve repeated acts of
aggression, this condition might not necessarily hold in all Modeling Cyberbullying Behavior
cases, since a single message can be otherwise forwarded A large body of work has been published on cyberbul-
and publicly viewed without repeated actions from the au- lying detection and prediction, primarily through the use
thor (Slonje, Smith, and Frisén 2013; Waasdorp and Brad- of natural language processing techniques. Most common
shaw 2015). Similarly, the role of power imbalance is un- approaches have relied on lexical features such as n-
certain in online scenarios. Power imbalances of physical grams (Hosseinmardi et al. 2016; Van Hee et al. 2018;
strength or numbers may be less relevant, whereas bully Xu et al. 2012), TF-IDF vectors (Dinakar, Reichart, and
anonymity and the permanence of online messages may be Lieberman 2011; Nahar et al. 2013; Sugandhi et al. 2016),
word embeddings (Zhao, Zhou, and Mao 2016), or pho- Table 2: State of the Art in Cyberbullying Detection. Here,
netic representations of messages (Zhang et al. 2016), as results are reported on either the Cyberbullying (CB) class
well as dictionary-based counts on curse words, hateful exclusively or on the entire (total) dataset.
or derogatory terms, pronouns, emoticons, and punctuation
Work Model Precision Recall F1 Class
(Al-garadi, Varathan, and Ravana 2016; Dadvar et al. 2013; Zhang et al. (2016) CNN 99.1% 97.0% 98.0% total
Reynolds, Kontostathis, and Edwards 2011; Singh, Huang, Al-garadi, Varathan,
Random Forest 94.1% 93.9% 93.6% total
and Ravana (2016)
and Atrey 2016). Some studies have also used message sen- Nahar et al. (2014) SVM 87.0% 97.0% 92.0% CB
timent (Singh, Huang, and Atrey 2016; Sugandhi et al. 2016; Sugandhi et al.
SVM 91.0% 91.0% 91.0% total
Van Hee et al. 2018) or the age, gender, personality, and (2016)
Soni and Singh
psychological state of the message author according to text Naı̈ve Bayes 80.2% 80.2% 80.2% total
(2018)
from their timelines (Al-garadi, Varathan, and Ravana 2016; Zhao, Zhou, and
SVM 76.8% 79.4% 78.0% total
Mao (2016)
Dadvar et al. 2013). These methods have been reported with Xu et al. (2012) SVM 76.0% 79.0% 77.0% total
appreciable success as shown in Table 2. Hosseinmardi et al.
Logistic Regression 78.0% 72.0% 75.0% CB
(2016)
Some researchers argue, however, that lexical features Yao et al. (2019) CONcISE 69.5% 79.4% 74.1% CB
alone may not adequately represent the nuances of cyberbul- Van Hee et al. (2018) SVM 73.3% 57.2% 64.3% total
lying. Hosseinmardi et al. (2015) found that among Insta- Singh, Huang, and
Proposed 82.0% 53.0% 64.0% CB
Atrey (2016)
gram media sessions containing profane or vulgar content, Rosa et al. (2019) SVM 46.0% - 45.0% CB
only 30% were acts of cyberbullying. They also found that Dadvar et al. (2013) SVM 31.0% 15.0% 20.0% CB
Huang, Singh, and
while cyberbullying posts contained a moderate proportion Atrey (2014)
Dagging 76.3% - - CB
of negative terms, the most negative posts were not consid-
ered cases of cyberbullying by the annotators. Instead, these
negative posts referred to politics, sports, and other domestic work by developing a dataset that better reflects the defini-
matters between friends (Hosseinmardi et al. 2015). tions of cyberbullying presented by social scientists, and by
The problem of cyberbullying cuts deeper than merely proposing and evaluating a feature set that represents infor-
the exchange of aggressive language. The meaning and in- mation pertaining to the social processes that underlie cyber-
tent of an aggressive post is revealed through conversation bullying behavior.
and interaction between peers. Therefore, to properly distin-
guish cyberbullying from other uses of aggressive or pro- Curating a Comprehensive
fane language, future studies should incorporate key indica-
tors from the social context of each message. Specifically, Cyberbullying Dataset
researchers can measure the author’s status or social advan- Here, we provide an original annotation framework and a
tage, the author’s harmful intent, the presence of repeated new dataset for cyberbullying research, built to unify exist-
aggression in the thread, and the visibility of the thread ing methods of ground truth annotation. In this dataset, we
among peers (Hosseinmardi et al. 2015; Rosa et al. 2019; decompose the complex issue of cyberbullying into five key
Salawu, He, and Lumsden 2017). criteria, which were drawn from the social science and ma-
Since cyberbullying is an inherently social phenomenon, chine learning communities. These criteria can be combined
some studies have naturally considered social network mea- and adapted for revised definitions of cyberbullying.
sures for classification tasks. Several features have been de-
rived from the network representations of the message inter- Data Collection
actions. The degree and eigenvector centralities of nodes, the We collected a sample of 1.3 million unlabeled tweets from
k-core scores, and clustering of communities, as well as the the Twitter Filter API. Since cyberbullying is a social phe-
tie strength and betweenness centralities of mention edges nomenon, we chose to filter for tweets containing at least
have all been shown to improve text-based models (Huang, one “@” mention. To restrict our investigation to origi-
Singh, and Atrey 2014; Singh, Huang, and Atrey 2016). Ad- nal English content, we removed all non-English posts and
ditionally, bullies and victims can be more accurately iden- retweets (RTs), narrowing the size of our sample to 280,301
tified by their relative network positions. For example, the tweets.
Jaccard coefficient between neighborhood sets in bully and Since aggressive language is a key component of cyber-
victim networks has been found to be statistically significant bullying (Hosseinmardi et al. 2015), we ran the pre-trained
(Chelmis, Zois, and Yao 2017). The ratio of all messages classifier of Davidson et al. (2017) over our dataset to iden-
sent and received by each user was also significant. tify hate speech and aggressive language and increase the
These findings show promising directions for future work. prevalence of cyberbullying examples 2 . This gave us a fil-
Social network features may provide the information neces- tered set of 9,803 aggressive tweets.
sary to reliably classify cyberbullying. However, it may be We scraped both the user and timeline data for each author
prohibitively expensive to build out social networks for each in the aggressive set, as well as any users who were men-
user due to time constraints and the limitations of API calls tioned in one of the aggressive tweets. In total, we collected
(Yao et al. 2019). For this reason, alternative measurements data from 21,329 accounts. For each account, we saved the
of online social relationships should be considered.
In the present study, we leverage prior work by incorpo- 2
Without this step, our positive class balance would be pro-
rating linguistic signals into our classifiers. We extend prior hibitively small. See Appendix 1 for details.
full user object, including profile name, description, loca- Table 3: Analysis of Labeled Twitter Data
tion, verified status, and creation date. We also saved a com-
Criterion Positive Inter-annotator Cyberbullying
plete list of the user’s friends and followers, and a 6-month Balance Agreement Correlation
timeline of all their posts and mentions from January 1st aggression 74.8% 0.23 0.22
through June 10th , 2019. For author accounts, we extended repetition 6.6% 0.18 0.27
harmful intent 16.1% 0.42 0.68
our crawl to include up to four years of timeline content. visibility among peers 30.1% 0.51 0.07
Lastly, we collected metadata for all tweets belonging to the target power 34.3% 0.37 0.11
corresponding message thread for each aggressive message. author power 3.1% 0.10 -0.02
equal power 59.7% 0.22 -0.09
cyberbullying 0.7% 0.18 –
Annotation Task
We presented each tweet in the dataset to three separate an-
notators as a Human Intelligence Task (HIT) on Amazon’s there strong evidence that the author is more powerful than
Mechanical Turk (MTurk) platform. By the time of recruit- the target? Is the target more powerful? Or if there is not
ment, 6,897 of the 9,803 aggressive tweets were accessible any good evidence, just mark equal.” We recognized that an
from the Twitter web page. The remainder of the tweets had imbalance of power might arise in a number of different cir-
been removed, or the Twitter account had been locked or cumstances. Therefore, we did not restrict our definition to
suspended. just one form of power, such as follower count or popularity.
We asked our annotators to consider the full message
For instructional purposes, we provided five sample
thread for each tweet as displayed on Twitter’s web inter-
threads to demonstrate both positive and negative examples
face. We also gave them a list of up to 15 recent mentions
for each of the five criteria. Two of these threads are shown
by the author of the tweet, directed towards any of the other
here. The thread in Figure 1a displays bullying behavior that
accounts mentioned in the original thread. Then we asked
is targeted against the green user, with all five cyberbully-
annotators to interpret each tweet in light of this social con-
ing criteria displayed. The thread includes repeated use of
text, and had them provide us with labels for five key cy-
aggressive language such as “she really fucking tried” and
berbullying criteria. We defined these criteria in terms of the
“she knows she lost.” The bully’s harmful intent is evident
author account (“who posted the given tweet?”) and the tar-
in the victim’s defensive responses. And lastly, the thread is
get (“who was the tweet about?” – not necessarily the first
visible among four peers as three gang up against one, cre-
mention). We also stated that “if the target is not on Twit-
ating a power imbalance.
ter or their handle cannot be identified” the annotator should
“please write OTHER.” With this framework established, we The final tweet in Figure 1b shows the importance of con-
gave the definitions for our five cyberbullying criteria as fol- text in the annotation process. If we read only this individ-
lows. ual message, we might decide that the post is cyberbullying,
but given the social context here, we can confidently assert
1. Aggressive language: (AGGR) Regardless of the au- that this post is not cyberbullying. Although it contains the
thor’s intent, the language of the tweet could be seen aggressive phrase “FUCK YOU TOO BITCH”, the author
as aggressive. The user either addresses a group or in- does not intend harm. The message is part of a joking ex-
dividual, and the message contains at least one phrase change between two friends or equals, and no other peers
that could be described as confrontational, derogatory, have joined in the conversation or interacted with the thread.
insulting, threatening, hostile, violent, hateful, or sexu-
After asking workers to review these examples, we gave
ally abusive.
them a short 7-question quiz to test their knowledge. Work-
2. Repetition: (REP) The target user has received at least ers were given only one quiz attempt, and they were ex-
two aggressive messages in total (either from the author pected to score at least 6 out of 7 questions correctly before
or from another user in the visible thread). they could proceed to the paid HIT. Workers were then paid
3. Harmful intent: (HARM) The tweet was designed to tear $0.12 for each thread that they annotated.
down or disadvantage the target user by causing them We successfully recruited 170 workers to label all 6,897
distress or by harming their public image. The target available threads in our dataset. They labeled an average of
does not respond agreeably as to a joke or an otherwise 121.7 threads and a median of 7 threads each. They spent
lighthearted comment. an average time of 3 minutes 50 seconds, and a median time
4. Visibility among peers: (PEER) At least one other user of 61 seconds per thread. For each thread, we collected an-
besides the target has liked, retweeted, or responded to notations from three different workers, and from this data
at least one of the author’s messages. we computed our reliability metrics using Fleiss’s Kappa for
inter-annotator agreement as shown in Table 3.
5. Power imbalance: (POWER) Power is derived from au- We determined ground truth for our data using a 2 out
thority and perceived social advantage. Celebrities and of 3 majority vote as in Hosseinmardi et al. (2015). If the
public figures are more powerful than common users. message thread was missing or a target user could not be
Minorities and disadvantaged groups have less power. identified, we removed the entry from the dataset, since later
Bullies can also derive power from peer support. we would need to draw our features from both the thread
Each of these criteria was represented as a binary label, ex- and the target profile. After filtering in this way, we were
cept for power imbalance, which was ternary. We asked “Is left with 5,537 labeled tweets.
a ? t

(c) Downward overlap


a ? t

(d) Upward overlap


a ? t

(e) Inward overlap


a ? t

(f) Outward overlap


a ? t

(a) Cyberbullying (b) Not Cyberbullying (g) Bidirectional overlap

Figure 1: Cyberbullying or not. The leftmost thread demonstrates all five cyberbullying criteria. Although the thread in the
middle contains repeated use of aggressive language, there is no harmful intent, visibility among peers, or power imbalance.
Overlap measures. (right) Graphical representation of the neighborhood overlap measures of author a and target t.

Cyberbullying Transcends Cyberaggression the message author and target, using network and timeline
As discussed earlier, some experts have argued that cyber- similarities, expectations from language models, and other
bullying is different from online aggression (Hosseinmardi signals taken from the message thread.
et al. 2015; Rosa et al. 2019; Salawu, He, and Lumsden For each feature and each cyberbullying criterion, we
2017). We asked our annotators to weigh in on this is- compare the cumulative distributions of the positive and
sue by asking them the subjective question for each thread: negative class using the two-sample Kolmogorov-Smirnov
“Based on your own intuition, is this tweet an example of test. We report the Kolmogorov-Smirnov statistic D (a nor-
cyberbullying?” We did not use the cyberbullying label as malized distance between the CDF of the positive and nega-
ground truth for training models; we used this label to better tive class) as well as the p-value with α = 0.05 as our level
understand worker perceptions of cyberbullying. We found for statistical significance.
that our workers believed cyberbullying will depend on a
weighted combination of the five criteria presented in this Text-based Features
paper, with the strongest correlate being harmful intent as To construct realistic and competitive baseline models, we
shown in Table 3. consider a set of standard text-based features that have been
Furthermore, the annotators decided our dataset contained used widely throughout the literature. Specifically, we use
74.8% aggressive messages as shown in the Positive Bal- the NLTK library (Bird, Klein, and Loper 2009) to con-
ance column of Table 3. We found that a large majority of struct unigrams, bigrams, and trigrams for each labeled mes-
these aggressive tweets were not labeled as “cyberbullying.” sage. This parallels the work of Hosseinmardi et al. (2016),
Rather, only 10.5% were labeled by majority vote as cyber- Van Hee et al. (2018), and Xu et al. (2012). Following Zhang
bullying, and only 21.5% were considered harmful. From et al. (2016), we incorporate counts from the Linguistic In-
this data, we propose that cyberbullying and cyberaggres- quiry and Word Count (LIWC) dictionary to measure the
sion are not equivalent classes. Instead, cyberbullying tran- linguistic and psychological processes that are represented
scends cyberaggression. in the text (Pennebaker, Booth, and Francis 2007). We also
use a modified version of the Flesch-Kincaid Grade Level
Feature Engineering and Flesch Reading Ease scores as computed in Davidson et
We have established that cyberbullying is a complex social al. (2017). Lastly, we encode the sentiment scores for each
phenomenon, different from the simpler notion of cyberag- message using the Valence Aware Dictionary and sEntiment
gression. Standard Bag of Words (BoW) features based on Reasoner (VADER) of Hutto and Gilbert (2014).
single sentences, such as n-grams and word embeddings,
may thus lead machine learning algorithms to incorrectly Social Network Features
classify friendly or joking behavior as cyberbullying (Hos- Network features have been shown to improve text-based
seinmardi et al. 2015; Rosa et al. 2019; Salawu, He, and models (Huang and Chou 2010; Singh, Huang, and Atrey
Lumsden 2017). To more reliably capture the nuances of 2016), and they can help classifiers distinguish between bul-
repetition, harmful intent, visibility among peers, and power lies and victims (Chelmis, Zois, and Yao 2017). These fea-
imbalance, we designed a new set of features from the social tures may also capture some of the more social aspects of cy-
and linguistic traces of Twitter users. These measures allow berbullying, such as power imbalance and visibility among
our classifiers to encode the dynamic relationship between peers. However, many centrality measures and clustering
algorithms require detailed network representations. These
features may not be scalable for real-world applications.
We propose a set of low-complexity measurements that can
be used to encode important higher-order relations at scale.
Specifically, we measure the relative positions of the author
and target accounts in the directed following network by
computing modified versions of Jaccard’s similarity index
as we now explain. (a) Downward Overlap (b) Upward Overlap
+
Neighborhood Overlap Let N (u) be the set of all ac-
counts followed by user u and let N − (u) be the set of all
accounts that follow user u. Then N (u) = N + (u) ∪ N − (u)
is the neighborhood set of u. We consider five related mea-
surements of neighborhood overlap for a given author a and
target t, listed here.
|N + (a)∩N − (t)|
down(a, t) = |N + (a)∪N − (t)| (c) Inward Overlap (d) Outward Overlap
− +
|N (a)∩N (t)|
up(a, t) = |N − (a)∪N + (t)| Figure 2: Cumulative Distribution Functions for neighbor-
|N − (a)∩N − (t)|
hood overlap on relevant features. These measures are
in(a, t) = |N − (a)∪N − (t)| shown to be predictive of power imbalance and visibility
|N + (a)∩N + (t)| among peers.
out(a, t) = |N + (a)∪N + (t)|
|N (a)∩N (t)|
bi(a, t) = |N (a)∪N (t)| User-based features We also use basic user account met-
Downward overlap measures the number of two-hop paths rics drawn from the author and target profiles. Specifically,
from the author to the target along following relationships; we count the friends and followers of each user, their verified
upward overlap measures two-hop paths in the opposite di- status, and the number of tweets posted within six-month
rection. Inward overlap measures the similarity between the snapshots of their timelines, as in Al-garadi, Varathan, and
two users’ follower sets, and outward overlap measures the Ravana (2016), Chatzakou et al. (2017), and Hosseinmardi
similarity between their sets of friends. Bidirectional overlap et al. (2016).
then is a more generalized measure of social network sim-
ilarity. We provide a graphical depiction for each of these Timeline Features
features on the right side of Figure 1. Here, we consider linguistic features, drawn from both the
High downward overlap likely indicates that the target is author and target timelines. These are intended to capture
socially relevant to the author, as high upward overlap indi- the social relationship between each user, their common in-
cates the author is relevant to the target. Therefore, when the terests, and the surprise of a given message relative to the
author is more powerful, downward overlap is expected to be author’s timeline history.
lower and upward overlap is expected be higher. This trend
is slight but visible in the cumulative distribution functions Message Behavior To more clearly represent the social
of Figure 2 (a): downward overlap is indeed lower when relationship between the author and target users, we con-
the author is more powerful than when the users are equals sider the messages sent between them as follows:
(D = 0.143). However, there is not a significant difference - Downward mention count: How many messages has the
for upward overlap (p = 0.85). We also observe that, when author sent to the target?
the target is more powerful, downward and upward overlap - Upward mention count: How many messages has the tar-
are both significantly lower (D = 0.516 and D = 0.540 get sent to the author?
respectively). It is reasonable to assume that messages can
be sent to celebrities and other powerful figures without the - Mention overlap: Let Ma be the set of all accounts men-
need for common social connections. tioned by author a, and let Mt be the set of all accounts
Next, we consider inward and outward overlap. When the mentioned by target t. We compute the ratio |M a ∩Mt |
|Ma ∪Mt | .
inward overlap is high, the author and target could have
more common visibility. Similarly, if the outward overlap - Multiset mention overlap: Let M̂a be the multiset of all
is high, then the author and target both follow similar ac- accounts mentioned by author a (with repeats for each
counts, so they might have similar interests or belong to the mention), and let M̂t be the multiset of all accounts men-

same social circles. Both inward and outward overlaps are tioned by target t. We measure ||M̂a ∩ M̂t |
M̂a ∪M̂t |
where ∩∗ takes
expected to be higher when a post is visible among peers.
the multiplicity of each element to be the sum of the mul-
This is true of both distributions in Figure 2. The difference
in outward overlap is significant (D = 0.04, p = 0.03), tiplicity from M̂a and the multiplicity from M̂b
and the difference for inward overlap is short of significant The direct mention count measures the history of repeated
(D = 0.04, p = 0.08). communication between the author and the target. For harm-
(a) Downward Mentions (b) Upward Mentions (a) Timeline Similarity (b) Timeline Similarity

Figure 4: Cumulative Distribution Functions for timeline


similarity on relevant features. These measures are shown
to be predictive of power imbalance and harmful intent.

(c) Mention Overlap (d) Multiset Mention Overlap

Figure 3: Cumulative Distribution Functions for message


behavior on relevant features. These measures are shown
to be indicative of harmful intent and repetition.
(a) New Words Ratio (b) Cross Entropy

ful messages, downward overlap is higher (D = 0.178) and Figure 5: Cumulative Distribution Functions for language
upward overlap is lower (D = 0.374) than for harmless models on relevant features. These measures are shown to
messages, as shown in Figure 3. This means malicious au- be predictive of harmful intent.
thors tend to address the target repeatedly while the target
responds with relatively few messages.
Mention overlap is a measure of social similarity that is more powerful (p = 0.58). What we do observe is likely
based on shared conversations between the author and the caused by noise from extreme class imbalance and low inter-
target. Multiset mention overlap measures the frequency of annotator agreement on labels for author power.
communication within this shared space. These features may Turning to Figure 4 (b), we see that aggressive messages
help predict visibility among peers, or repeated aggression were less likely to harbor harmful intent if they were sent
due to pile-on bullying situations. We see in Figure 3 that re- between users with similar timelines (D = 0.285). Aggres-
peated aggression is linked to slightly greater mention over- sive banter between friends is generally harmless, so again,
lap (D = 0.07, p = 0.07), but the trend is significant only this confirms our intuitions.
for multiset mention overlap (D = 0.08, p = 0.03).
Timeline Similarity Timeline similarity is used to indi- Language Models Harmful intent is difficult to measure
cate common interests and shared topics of conversation be- in isolated messages because social context determines prag-
tween the author and target timelines. High similarity scores matic meaning. We attempt to approximate the author’s
might reflect users’ familiarity with one another, or suggest harmful intent by measuring the linguistic “surprise” of a
that they occupy similar social positions. This can be used given message relative to the author’s timeline history. We
to distinguish cyberbullying from harmless banter between do this in two ways: through a simple ratio of new words,
friends and associates. To compute this metric, we represent and through the use of language models.
the author and target timelines as TF-IDF vectors A ~ and T~ . To estimate historical language behavior, we count uni-
We then take the cosine similarity between the vectors as gram and bigram frequencies from a 4-year snapshot of the
author’s timeline. Then, after removing all URLs, punctua-
~ · T~
A tion, stop words, mentions, and hashtags from the original
cos θ = . post, we take the cardinality of the set unigrams in the post
~ T~ k
kAkk
having zero occurrences in the timeline. Lastly, we divide
A cosine similarity of 1 means that users’ timelines had this count by the length of the processed message to arrive
identical counts across all weighted terms; a cosine simi- at our new words ratio. We can also build a language model
larity of 0 means that their timelines did not contain any from the bigram frequencies, using Kneser-Ney smoothing
words in common. We expect higher similarity scores be- as implemented in NLTK (Bird, Klein, and Loper 2009).
tween friends and associates. From the language model, we compute the surprise of the
In Figure 4 (a), we see that the timelines were significantly original message m according to its cross-entropy, given
less similar when the target was in a position of greater by
power (D = 0.294). This is not surprising, since power can N
be derived from such differences between social groups. We 1 X
H(m) = − log P (bi )
do not observe the same dissimilarity when the author was N i=1
Table 4: Feature Combinations Table 6: Recall
Feature BoW Text User Proposed Combined Criterion BoW Text User Proposed Combined
n-grams 3 3 3 aggression 77.0% 84.8% 47.8% 51.6% 85.6%
LIWC, VADER, Flesch-Kincaid 3 3
Friend/following counts, tweet count, verified 3 3 3 repetition 17.6% 7.3% 49.5% 64.3% 26.2%
Neighborhood overlap measures 3 3 3 harmful intent 40.2% 44.4% 63.4% 67.7% 52.7%
Mention counts and overlaps 3 3 3 visibility among peers 34.8% 20.4% 47.1% 54.2% 33.7%
Timeline similarity 3 3 3 author power 6.5% 1.6% 74.1% 80.0% 11.9%
New words ratio, cross-entropy 3 3 3
Thread visibility features 3 3
target power 49.4% 43.3% 73.3% 80.8% 71.1%
Thread aggression features 3 3

Table 7: F1 Scores
Table 5: Precision
Criterion BoW Text User Proposed Combined
Criterion BoW Text User Proposed Combined aggression 79.7% 83.5% 59.0% 62.3% 84.1%
aggression 82.5% 82.3% 77.1% 78.7% 82.6% repetition 10.8% 9.4% 13.3% 24.7% 28.7%
repetition 7.8% 13.4% 7.7% 15.3% 31.7% harmful intent 34.1% 46.7% 38.7% 45.7% 53.8%
harmful intent 29.6% 49.4% 35.8% 34.5% 55.3% visibility among peers 32.7% 25.5% 39.5% 47.4% 45.5%
visibility among peers 30.8% 34.3% 34.0% 42.2% 46.8% author power 2.9% 2.2% 13.7% 17.5% 14.0%
author power 1.9% 3.6% 7.6% 9.8% 17.0% target power 46.2% 47.0% 75.3% 77.9% 73.9%
target power 43.5% 51.5% 77.6% 75.2% 77.0%

- Aggressive user count: Of the users who posted a reply


where m is composed of bigrams b1 , b2 , . . . , bN , and P (bi ) in the thread after the author first commented, count how
is the probability of the ith bigram from the language model. many had a message classified as aggressive
We see in Figure 5 that harmfully intended messages have
a greater density of new words (D = 0.06). This is intuitive,
since attacks may be staged around new topics of conversa-
Experimental Evaluation
tion. However, the cross entropy of these harmful messages Using our proposed features from the previous section and
is slightly lower than for harmless messages (D = 0.06). ground truth labels from our annotation task, we trained a
This may be due to harmless jokes, since joking messages separate Logistic Regression classifier for each of the five
might depart more from the standard syntax of the author’s cyberbullying criteria, and we report precision, recall, and
timeline. F1 measures over each binary label independently. We aver-
aged results using five-fold cross-validation, with 80% of the
Thread Features data allocated for training and 20% of the data allocated for
Finally, we turn to the messages of the thread itself to com- testing at each iteration. To account for the class imbalance
pute measures of visibility and repeated aggression. in the training data, we used the synthetic minority over-
sampling technique (SMOTE) (Chawla et al. 2002). We did
Visibility To determine the public visibility of the author’s not over-sample testing sets, however, to ensure that our tests
post, we collect basic measurements from the interactions of better match the class distributions obtained as we did by
other users in the thread. They are as follows. pre-filtering for aggressive directed Twitter messages.
- Message count: Count the messages posted in the thread We compare our results across the five different feature
combinations given in Table 4. Note that because we do not
- Reply message count: Count the replies posted in the
include thread features in the User set, it can be used for
thread after the author’s first comment.
cyberbullying prediction and early intervention. The Pro-
- Reply user count: Count the users who posted a reply in posed set can be used for detection, sinct it is a collection of
the thread after the author’s first comment. all newly proposed features, including thread features. The
- Maximum author favorites: The largest number of fa- Combined adds these to the baseline text features.
vorites the author received on a message in the thread. The performance of the different classifiers is summarized
in Tables 5, 6, and 7. Here, we see that Bag of Words and
- Maximum author retweets: The largest number of
text-based methods performed well on the aggressive lan-
retweets the author received on a message in the thread.
guage classification task, with an F1 score of 83.5%. This
Aggression To detect repeated aggression, we again em- was expected and the score aligns well with the success of
ploy the hate speech and offensive language classifier of other published results of Table 2.
Davidson et al. (2017). Each message is given a binary label Cyberbullying detection is more complex than simply
according to the classifier-assigned class: aggressive (classi- identifying aggressive text, however. We find that these same
fied as hate speech or offensive language) or non-aggressive baseline methods fail to reliably detect repetition, harm-
(classified as neither hate speech nor offensive language). ful intent, visibility among peers, and power imbalance, as
From these labels, we derive the following features. shown by the low recall scores in Table 6. We conclude that
our investigation of socially informed features was justified.
- Aggressive message count: Count the messages in the Our proposed set of features beats recall scores for lexi-
thread classified as aggressive cally trained baselines in all but the aggression criterion. We
- Aggressive author message count: Count the author’s also improve precision scores for repetition, visibility among
messages that were classified as aggressive peers, and power imbalance. When we combine all features,
we see our F1 scores beat baselines for each criterion. This The main contribution of our paper is not that we solved
demonstrates the effectiveness of our approach, using lin- the problem of cyberbullying detection. Instead, we have ex-
guistic similarity and community measurements to encode posed the challenge of defining and measuring cyberbully-
social characteristics for cyberbullying classification. ing activity, which has been historically overlooked in the
Similar results were obtained by replacing our logistic re- research community.
gression model with any of a random forest model, support
Future Directions Cyberbullying detection is an increas-
vector machine (SVM), AdaBoost, or Multilayer Perceptron
ingly important and yet challenging problem to tackle. A
(MLP). We report all precision, recall, and F1 scores in Ap-
lack of detailed and appropriate real-world datasets stymies
pendix 2, Tables 9-17. We chose to highlight logistic regres-
progress towards more reliable detection methods. With cy-
sion because it can be more easily interpreted. As a result,
berbullying being a systemic issue across social media plat-
we can identify the relative importance of our proposed fea-
forms, we urge the development of a methodology for data
tures. The feature weights are also given in Appendix 2, Ta-
sharing with researchers that provides adequate access to
bles 18-22. There we observe a trend. The aggressive lan-
rich data to improve on the early detection of cyberbully-
guage and repetition criteria are dominated by lexical fea-
ing while also addressing the sensitive privacy issues that
tures; the harmful intent is split between lexical and histor-
accompany such instances.
ical communication features; and the visibility among peers
and target power criteria are dominated by our proposed so-
cial features. Conclusion
Although we achieve moderately competitive scores in In this study, we produced an original dataset for cyberbul-
most categories, our classifiers are still over-classifying cy- lying detection research and an approach that leverages this
berbullying cases. Precision scores are generally much lower dataset to more accurately detect cyberbullying. Our label-
than recall scores across all models. To reduce our misclas- ing scheme was designed to accommodate the cyberbullying
sification of false positives and better distinguish between definitions that have been proposed throughout the literature.
joking or friendly banter and cyberbullying, it may be nec- In order to more accurately represent the nature of cyberbul-
essary to mine for additional social features. Overall, we lying, we decomposed this complex issue into five represen-
should work to increase all F1 scores to above 0.8 before tative characteristics. Our classes distinguish cyberbullying
we can consider our classifiers ready for real-world applica- from other related behaviors, such as isolated aggression or
tions (Rosa et al. 2019). crude joking. To help annotators infer these distinctions, we
provided them with the full context of each message’s reply
Discussion thread, along with a list of the author’s most recent mentions.
In this way, we secured a new set of labels for more reliable
Limitations Our study focuses on the Twitter ecosystem cyberbullying representations.
and a small part of its network. The initial sampling of From these ground truth labels, we designed a new set
tweets was based on a machine learning classifier of ag- of features to quantify each of the five cyberbullying crite-
gressive English language. This classifier has an F1 score ria. Unlike previous text-based or user-based features, our
of 0.90 (Davidson et al. 2017). Even with this filter, only features measure the relationship between a message author
0.7% of tweets were deemed by a majority of MTurk work- and target. We show that these features improve the perfor-
ers as cyberbullying (Table 3). This extreme class imbalance mance of standard text-based models. These results demon-
can disadvantage a wide range of machine learning mod- strate the relevance of social-network and language-based
els. Moreover, the MTurk workers exhibited only moderate measurements to account for the nuanced social characteris-
inter-annotator agreement (Table 3). We also acknowledge tics of cyberbullying.
that notions of harmful intent and power imbalance can be Despite improvements over baseline methods, our classi-
subjective, since they may depend on the particular conven- fiers have not attained the high levels of precision and recall
tions or social structure of a given community. For these rea- that should be expected of real-world detection systems. For
sons, we recognize that cyberbullying still has not been un- this reason, we argue that the challenging task of cyberbul-
ambiguously defined. Moreover, their underlying constructs lying detection remains an open research problem.
are difficult to identify. In this study, we did not train work-
ers to recognize subtle cues for interpersonal popularity, nor Acknowledgements
the role of anonymity in creating a power imbalance.
This material is based upon work supported by the De-
Furthermore, because we lack the authority to define cy-
fense Advanced Research Projects Agency (DARPA) under
berbullying, we cannot assert a two-way implication be-
Agreement No. HR0011890019, and by the National Sci-
tween cyberbullying and the five criteria outlined here. It
ence Foundation (NSF) under Grant No. 1659886 and Grant
may be possible for cyberbullying to exist with only one
No. 1553579.
criterion present, such as harmful intent. Our five criteria
also might not span all of the dimensions of cyberbullying.
However, they are representative of the literature in both the References
social science and machine learning communities, and they [Al-garadi, Varathan, and Ravana 2016] Al-garadi, M. A.;
can be used in weighted combinations to accommodate new Varathan, K. D.; and Ravana, S. D. 2016. Cybercrime de-
definitions. tection in online communications: The experimental case of
cyberbullying detection in the twitter network. Computers ysis of social media text. In Eighth international AAAI con-
in Human Behavior 63:433–443. ference on weblogs and social media.
[Bird, Klein, and Loper 2009] Bird, S.; Klein, E.; and Loper, [Kowalski and Limber 2013] Kowalski, R. M., and Limber,
E. 2009. Natural language processing with Python: analyz- S. P. 2013. Psychological, physical, and academic corre-
ing text with the natural language toolkit. ” O’Reilly Media, lates of cyberbullying and traditional bullying. Journal of
Inc.”. Adolescent Health 53(1):S13–S20.
[Chatzakou et al. 2017] Chatzakou, D.; Kourtellis, N.; [Li 2006] Li, Q. 2006. Cyberbullying in schools: A re-
Blackburn, J.; De Cristofaro, E.; Stringhini, G.; and Vakali, search of gender differences. School psychology interna-
A. 2017. Mean birds: Detecting aggression and bullying on tional 27(2):157–170.
twitter. In Proceedings of the 2017 ACM on web science [Miller 2016] Miller, K. 2016. Cyberbullying and its con-
conference, 13–22. ACM. sequences: How cyberbullying is contorting the minds of
[Chawla et al. 2002] Chawla, N. V.; Bowyer, K. W.; Hall, victims and bullies alike, and the law’s limited available re-
L. O.; and Kegelmeyer, W. P. 2002. SMOTE: synthetic dress. S. Cal. Interdisc. LJ 26:379.
minority over-sampling technique. JAIR 16:321–357. [Nahar et al. 2013] Nahar, V.; Li, X.; Pang, C.; and Zhang, Y.
[Chelmis, Zois, and Yao 2017] Chelmis, C.; Zois, D.-S.; and 2013. Cyberbullying detection based on text-stream classi-
Yao, M. 2017. Mining patterns of cyberbullying on twitter. fication. In The 11th Australasian Data Mining Conference
In ICDMW, 126–133. IEEE. (AusDM 2013).
[Dadvar et al. 2013] Dadvar, M.; Trieschnigg, D.; Ordelman, [Nahar et al. 2014] Nahar, V.; Al-Maskari, S.; Li, X.; and
R.; and de Jong, F. 2013. Improving cyberbullying detection Pang, C. 2014. Semi-supervised learning for cyberbully-
with user context. In European Conference on Information ing detection in social networks. In Australasian Database
Retrieval, 693–696. Springer. Conference, 160–171. Springer.
[David-Ferdon and Hertz 2009] David-Ferdon, C., and [Olweus 1994] Olweus, D. 1994. Bullying at school. In
Hertz, M. F. 2009. Electronic media and youth violence; a Aggressive behavior. Springer. 97–130.
CDC issue brief for researchers. [Olweus 2012] Olweus, D. 2012. Cyberbullying: An over-
[Davidson et al. 2017] Davidson, T.; Warmsley, D.; Macy, rated phenomenon? European Journal of Developmental
M.; and Weber, I. 2017. Automated hate speech detection Psychology 9(5):520–538.
and the problem of offensive language. In Eleventh Interna- [Pennebaker, Booth, and Francis 2007] Pennebaker, J. W.;
tional AAAI Conference on Web and Social Media. Booth, R. J.; and Francis, M. E. 2007. Liwc2007: Linguistic
[Dinakar, Reichart, and Lieberman 2011] Dinakar, K.; Re- inquiry and word count. Austin, Texas: liwc. net.
ichart, R.; and Lieberman, H. 2011. Modeling the detection [Price, Dalgleish, and others 2010] Price, M.; Dalgleish, J.;
of textual cyberbullying. In fifth international AAAI confer- et al. 2010. Cyberbullying: Experiences, impacts and cop-
ence on weblogs and social media. ing strategies as described by australian young people. Youth
[Hinduja and Patchin 2008] Hinduja, S., and Patchin, J. W. Studies Australia 29(2):51.
2008. Cyberbullying: An exploratory analysis of factors [Raskauskas and Stoltz 2007] Raskauskas, J., and Stoltz,
related to offending and victimization. Deviant behavior A. D. 2007. Involvement in traditional and electronic
29(2):129–156. bullying among adolescents. Developmental psychology
[Hosseinmardi et al. 2015] Hosseinmardi, H.; Mattson, 43(3):564.
S. A.; Rafiq, R. I.; Han, R.; Lv, Q.; and Mishra, S. 2015. [Reynolds, Kontostathis, and Edwards 2011] Reynolds, K.;
Analyzing labeled cyberbullying incidents on the instagram Kontostathis, A.; and Edwards, L. 2011. Using machine
social network. In International conference on social learning to detect cyberbullying. In 2011 10th Interna-
informatics, 49–66. Springer. tional Conference on Machine learning and applications
[Hosseinmardi et al. 2016] Hosseinmardi, H.; Rafiq, R. I.; and workshops, volume 2, 241–244. IEEE.
Han, R.; Lv, Q.; and Mishra, S. 2016. Prediction of cyber- [Rosa et al. 2019] Rosa, H.; Pereira, N.; Ribeiro, R.; Fer-
bullying incidents in a media-based social network. In 2016 reira, P.; Carvalho, J.; Oliveira, S.; Coheur, L.; Paulino, P.;
IEEE/ACM International Conference on Advances in Social Simão, A. V.; and Trancoso, I. 2019. Automatic cyberbul-
Networks Analysis and Mining (ASONAM), 186–192. IEEE. lying detection: A systematic review. Computers in Human
[Huang and Chou 2010] Huang, Y.-y., and Chou, C. 2010. Behavior 93:333–345.
An analysis of multiple factors of cyberbullying among ju- [Salawu, He, and Lumsden 2017] Salawu, S.; He, Y.; and
nior high school students in taiwan. Computers in Human Lumsden, J. 2017. Approaches to automated detection of
Behavior 26(6):1581–1590. cyberbullying: A survey. IEEE Transactions on Affective
[Huang, Singh, and Atrey 2014] Huang, Q.; Singh, V. K.; Computing.
and Atrey, P. K. 2014. Cyber bullying detection using social [Sampasa-Kanyinga, Roumeliotis, and Xu 2014] Sampasa-
and textual analysis. In Proceedings of the 3rd International Kanyinga, H.; Roumeliotis, P.; and Xu, H. 2014. As-
Workshop on Socially-Aware Multimedia, 3–6. ACM. sociations between cyberbullying and school bullying
[Hutto and Gilbert 2014] Hutto, C. J., and Gilbert, E. 2014. victimization and suicidal ideation, plans and attempts
Vader: A parsimonious rule-based model for sentiment anal- among canadian schoolchildren. PloS one 9(7):e102145.
[Singh, Huang, and Atrey 2016] Singh, V. K.; Huang, Q.; Twitter Decahose stream across the entire month of Octo-
and Atrey, P. K. 2016. Cyberbullying detection using prob- ber 2016. Using the same methodology given in the paper,
abilistic socio-textual information fusion. In Proceedings of we had these tweets labeled three times each on Amazon
the 2016 IEEE/ACM International Conference on Advances Mechanical Turk. Again, ground truth was determined using
in Social Networks Analysis and Mining, 884–887. IEEE 2 out of 3 majority vote. Upon analysis, we found that the
Press. positive class balance was prohibitively small, especially for
[Slonje and Smith 2008] Slonje, R., and Smith, P. K. 2008. repetition, harmful intent, visibility among peers, and author
Cyberbullying: Another main type of bullying? Scandina- power, which were all under 5%.
vian journal of psychology 49(2):147–154.
Table 8: Analysis of Unfiltered Decahose Data
[Slonje, Smith, and Frisén 2013] Slonje, R.; Smith, P. K.;
and Frisén, A. 2013. The nature of cyberbullying, and Criterion Positive Inter-annotator Cyberbullying
strategies for prevention. Computers in human behavior Balance Agreement Correlation
29(1):26–32. aggression 6.3% 0.23 0.68
repetition 0.9% 0.04 0.46
[Soni and Singh 2018] Soni, D., and Singh, V. 2018. Time harmful intent 1.4% 0.31 0.75
reveals all wounds: Modeling temporal characteristics of cy- visibility among peers 0.17% 0.51 0.11
target power 22.5% 0.23 0.11
berbullying. In Twelfth International AAAI Conference on author power 3.6% 0.04 0.06
Web and Social Media. equal power 64.7% 0.15 -0.14
cyberbullying 2.7% 0.25 -
[Sugandhi et al. 2016] Sugandhi, R.; Pande, A.; Agrawal, A.;
and Bhagat, H. 2016. Automatic monitoring and prevention
of cyberbullying. International Journal of Computer Appli-
cations 8:17–19. Appendix 2: Model Evaluation
[Van Hee et al. 2018] Van Hee, C.; Jacobs, G.; Emmery, C.; For the sake of comparison, we provide precision, recall,
Desmet, B.; Lefever, E.; Verhoeven, B.; De Pauw, G.; Daele- and F1 scores for five different machine learning models: k-
mans, W.; and Hoste, V. 2018. Automatic detection of cy- nearest neighbors (KNN), random forest, support vector ma-
berbullying in social media text. PloS one 13(10):e0203794. chine (SVM), AdaBoost, and Multilayer Perceptron (MLP).
Then we provide feature weights for our logistic regression
[Waasdorp and Bradshaw 2015] Waasdorp, T. E., and Brad- model trained on each of the five cyberbullying criteria.
shaw, C. P. 2015. The overlap between cyberbullying
and traditional bullying. Journal of Adolescent Health Table 9: Random Forest Precision
56(5):483–488.
[Xu et al. 2012] Xu, J.-M.; Jun, K.-S.; Zhu, X.; and Bell- Criterion BoW Text User Proposed Combined
more, A. 2012. Learning from bullying traces in social aggression 77.6% 80.1% 78.3% 78.7% 79.7%
repetition 6.5% 6.8% 7.7% 16.1% 10.8%
media. In Proceedings of the 2012 conference of the North harmful intent 18.4% 28.1% 33.2% 33.4% 43.1%
American chapter of the association for computational lin- visibility among peers 28.7% 32.7% 34.8% 42.8% 35.1%
guistics: Human language technologies, 656–666. Associa- target power 39.3% 43.3% 77.9% 74.5% 69.6%
tion for Computational Linguistics.
[Yao et al. 2019] Yao, M.; Chelmis, C.; Zois, D.; et al. 2019.
Cyberbullying ends here: Towards robust detection of cy- Table 10: AdaBoost Precision
berbullying in social media. In The World Wide Web Con- Criterion BoW Text User Proposed Combined
ference, 3427–3433. ACM. aggression 82.6% 81.6% 77.0% 77.5% 81.6%
[Zhang et al. 2016] Zhang, X.; Tong, J.; Vishwamitra, N.; repetition 7.8% 9.0% 7.3% 16.6% 25.8%
harmful intent 29.1% 46.4% 34.3% 39.9% 60.0%
Whittaker, E.; Mazer, J. P.; Kowalski, R.; Hu, H.; Luo, F.; visibility among peers 30.5% 32.9% 35.9% 45.8% 46.1%
Macbeth, J.; and Dillon, E. 2016. Cyberbullying detection target power 42.5% 46.5% 78.0% 78.2% 77.9%
with a pronunciation based convolutional neural network.
In 2016 15th IEEE International Conference on Machine
Learning and Applications (ICMLA), 740–745. IEEE. Table 11: MLP Precision
[Zhao, Zhou, and Mao 2016] Zhao, R.; Zhou, A.; and Mao,
Criterion BoW Text User Proposed Combined
K. 2016. Automatic detection of cyberbullying on social aggression 82.8% 78.8% 76.7% 77.4% 78.3%
networks based on bullying features. In Proceedings of the repetition 7.7% 8.7% 8.6% 16.9% 19.6%
17th international conference on distributed computing and harmful intent 27.4% 42.8% 37.3% 38.4% 46.8%
networking, 43. ACM. visibility among peers 30.1% 34.0% 34.3% 41.6% 38.5%
target power 39.6% 45.2% 74.3% 72.0% 68.6%

Appendix 1: Analysis of the Real-World Class


Distribution for Cyberbullying Criteria
To understand the real-world class distribution for the cy-
berbullying criteria, we randomly selected 222 directed En-
glish tweets from an unbiased sample of drawn from the
Table 18: Top Absolute Weights for Aggressive Language
Table 12: Random Forest Recall
Rank Feature Weight
Criterion BoW Text User Proposed Combined 1 affect (LIWC) -1.34
aggression 56.4% 78.5% 43.7% 45.3% 76.2% 2 sexual (LIWC) 1.07
repetition 36.2% 24.9% 46.3% 64.7% 29.9% 3 negemo (LIWC) 0.90
harmful intent 42.4% 35.1% 78.4% 78.2% 53.5% 4 maximum author retweets 0.86
visibility among peers 48.1% 30.6% 50.5% 49.9% 32.5% 5 relativ (LIWC) -0.75
target power 60.1% 38.0% 79.0% 81.9% 76.7% 6 bio (LIWC) -0.69
7 posemo (LIWC) 0.66
8 num chars -0.64
9 space (LIWC) 0.52
10 upward overlap 0.51
Table 13: AdaBoost Recall
Table 19: Top Absolute Weights for Repetition Features
Criterion BoW Text User Proposed Combined
aggression 75.0% 86.4% 65.9% 77.4% 86.3% Rank Feature Weight
repetition 23.8% 4.1% 26.8% 31.2% 17.8% 1 negemo (LIWC) 1.40
harmful intent 44.4% 37.8% 57.0% 52.8% 50.8% 2 author verified status -1.32
visibility among peers 41.0% 15.4% 42.8% 43.1% 32.0% 3 affect (LIWC) -1.24
target power 56.0% 39.4% 81.8% 81.0% 75.6% 4 cogmech (LIWC) -0.96
5 relativ (LIWC) -0.89
6 posemo (LIWC) 0.80
7 social (LIWC) 0.77
8 aggressive user count 0.63
Table 14: MLP Recall 9 upward overlap 0.62
10 number of unique terms 0.61
Criterion BoW Text User Proposed Combined
aggression 64.1% 86.5% 65.5% 68.0% 85.6% Table 20: Top Absolute Weights for Harmful Intent
repetition 26.8% 6.8% 22.5% 27.1% 12.6%
harmful intent 51.0% 33.3% 57.0% 57.0% 37.2%
visibility among peers 51.6% 23.5% 45.6% 50.2% 26.5% Rank Feature Weight
target power 61.6% 37.5% 76.5% 76.2% 65.6% 1 number of words -1.70
2 number of unique terms 1.41
3 bio (LIWC) -1.05
4 funct (LIWC) 0.95
5 author follower count -0.90
6 present (LIWC) 0.83
Table 15: Random Forest F1 7 you (LIWC) 0.83
8 message count 0.79
Criterion BoW Text User Proposed Combined 9 upward mention count -0.71
aggression 65.2% 79.3% 56.0% 57.5% 77.9% 10 verb (LIWC) -0.67
repetition 11.0% 10.6% 13.2% 25.8% 15.8%
harmful intent 25.6% 31.1% 46.6% 46.8% 47.7%
visibility among peers 35.7% 30.8% 41.2% 46.1% 33.6% Table 21: Top Absolute Weights for Visibility Among Peers
target power 47.4% 39.9% 78.4% 78.0% 72.8%
Rank Feature Weight
1 author follower count 6.29
2 maximum author retweets -1.63
3 maximum author favorites 1.46
4 aggressive user count -1.36
Table 16: AdaBoost F1 5 number of words -1.16
6 reply user count 1.03
Criterion BoW Text User Proposed Combined 7 number of unique terms 1.02
aggression 78.6% 83.9% 71.0% 77.5% 83.9% 8 reply message count -0.91
repetition 11.7% 5.6% 11.5% 21.6% 20.9% 9 message count 0.77
harmful intent 35.1% 41.6% 42.8% 45.4% 55.0% 10 affect (LIWC) -0.67
visibility among peers 34.9% 21.0% 39.1% 44.3% 37.8%
target power 48.3% 42.7% 79.8% 79.6% 76.7%
Table 22: Top Absolute Weights for Target Power

Rank Feature Weight


1 target follower count 2.28
2 author follower count -1.67
Table 17: MLP F1 3 bidirectional overlap -1.22
4 target verified status 1.20
Criterion BoW Text User Proposed Combined 5 upward overlap -1.11
aggression 72.2% 82.5% 70.7% 72.4% 81.8% 6 downward overlap 1.04
repetition 12.0% 7.6% 12.4% 20.7% 15.2% 7 relativ (LIWC) 0.76
harmful intent 35.7% 37.3% 45.0% 45.8% 41.3%
visibility among peers 38.0% 27.7% 39.2% 45.5% 31.4%
8 reply user count -0.69
target power 48.2% 41.0% 75.4% 74.0% 67.0% 9 space (LIWC) -0.68
10 message count -0.63

You might also like