Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations For Cyberbullying Classification
Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations For Cyberbullying Classification
imbalance, or the repetitive nature of aggression with suf- sufficient to render the victim defenseless (Slonje and Smith
ficiently high precision and recall. However, our proposed 2008).
feature set improves F1 scores on all four of these social The machine learning community has not reached a unan-
measures. Real-world detection systems can benefit from imous definition of cyberbullying either. They have instead
our proposed approach, incorporating the social aspects of echoed the uncertainty of the social scientists. Moreover,
cyberbullying into existing models and training these mod- some authors have neglected to publish any objective cyber-
els on socially-informed ground truth labels. bullying criteria or even a working definition for their anno-
tators, and among those who do, the formulation varies. This
Background disagreement has slowed progress in the field, since classi-
fiers and datasets cannot be as easily compared. Upon re-
Existing approaches to cyberbullying detection generally
view, however, we found that all available definitions con-
follow a common workflow. Data is collected from social
tained a strict subset of the following criteria: aggression
networks or other online sources, and ground truth is estab-
(AGGR), repetition (REP), harmful intent (HARM), visibility
lished through manual human annotation. Machine learning
among peers (PEER), and power imbalance (POWER). The
algorithms are trained on the labeled data using the message
datasets built from these definitions are outlined in Table 1.
text or hand-selected features. Then results are typically re-
ported using precision, recall, and F1 scores. Comparison
across studies is difficult, however, because the definition
Existing Sources of Cyberbullying Data
of cyberbullying has not been standardized. Therefore, an According to Van Hee et al. (2018), data collection is the
important first step for the field is to establish an objective most restrictive “bottleneck” in cyberbullying research. Be-
definition of cyberbullying. cause there are very few publicly available datasets, some re-
searchers have turned to crowdsourcing using Amazon Me-
Defining Cyberbullying chanical Turk or similar platforms.
Some researchers view cyberbullying as an extension of In most studies to date, annotators labeled individual mes-
more “traditional” bullying behaviors (Hinduja and Patchin sages instead of message threads, ignoring social context
2008; Olweus 2012; Raskauskas and Stoltz 2007). In one altogether (Al-garadi, Varathan, and Ravana 2016; Huang,
widely-cited book, the psychologist Dan Olweus defines Singh, and Atrey 2014; Nahar et al. 2014; Reynolds, Kon-
schoolyard bullying in terms of three criteria: repetition, tostathis, and Edwards 2011; Singh, Huang, and Atrey 2016;
harmful intent, and an imbalance of power (Olweus Sugandhi et al. 2016). Only three of the papers that we re-
1994). He then identifies bullies by their intention to “inflict viewed incorporated social context in the annotation pro-
injury or discomfort” upon a weaker victim through repeated cess. Chatzakou et al. (2017) considered batches of time-
acts of aggression. sorted tweets called sessions, which were grouped by user
Social scientists have extensively studied this form of accounts, but they did not include message threads or any
bullying as it occurs among adolescents in school (Kowal- other form of context. Van Hee et al. (2018) presented “orig-
ski and Limber 2013; Li 2006). However, experts disagree inal conversation[s] when possible,” but they did not explain
whether cyberbullying should be studied as a form of tra- when this information was available. Hosseinmardi et al.
ditional bullying or a fundamentally different phenomenon (2016) was the only study to label full message reply threads
(Kowalski and Limber 2013; Olweus 2012). Some argue as they appeared in the original online source.
that, although cyberbullying might involve repeated acts of
aggression, this condition might not necessarily hold in all Modeling Cyberbullying Behavior
cases, since a single message can be otherwise forwarded A large body of work has been published on cyberbul-
and publicly viewed without repeated actions from the au- lying detection and prediction, primarily through the use
thor (Slonje, Smith, and Frisén 2013; Waasdorp and Brad- of natural language processing techniques. Most common
shaw 2015). Similarly, the role of power imbalance is un- approaches have relied on lexical features such as n-
certain in online scenarios. Power imbalances of physical grams (Hosseinmardi et al. 2016; Van Hee et al. 2018;
strength or numbers may be less relevant, whereas bully Xu et al. 2012), TF-IDF vectors (Dinakar, Reichart, and
anonymity and the permanence of online messages may be Lieberman 2011; Nahar et al. 2013; Sugandhi et al. 2016),
word embeddings (Zhao, Zhou, and Mao 2016), or pho- Table 2: State of the Art in Cyberbullying Detection. Here,
netic representations of messages (Zhang et al. 2016), as results are reported on either the Cyberbullying (CB) class
well as dictionary-based counts on curse words, hateful exclusively or on the entire (total) dataset.
or derogatory terms, pronouns, emoticons, and punctuation
Work Model Precision Recall F1 Class
(Al-garadi, Varathan, and Ravana 2016; Dadvar et al. 2013; Zhang et al. (2016) CNN 99.1% 97.0% 98.0% total
Reynolds, Kontostathis, and Edwards 2011; Singh, Huang, Al-garadi, Varathan,
Random Forest 94.1% 93.9% 93.6% total
and Ravana (2016)
and Atrey 2016). Some studies have also used message sen- Nahar et al. (2014) SVM 87.0% 97.0% 92.0% CB
timent (Singh, Huang, and Atrey 2016; Sugandhi et al. 2016; Sugandhi et al.
SVM 91.0% 91.0% 91.0% total
Van Hee et al. 2018) or the age, gender, personality, and (2016)
Soni and Singh
psychological state of the message author according to text Naı̈ve Bayes 80.2% 80.2% 80.2% total
(2018)
from their timelines (Al-garadi, Varathan, and Ravana 2016; Zhao, Zhou, and
SVM 76.8% 79.4% 78.0% total
Mao (2016)
Dadvar et al. 2013). These methods have been reported with Xu et al. (2012) SVM 76.0% 79.0% 77.0% total
appreciable success as shown in Table 2. Hosseinmardi et al.
Logistic Regression 78.0% 72.0% 75.0% CB
(2016)
Some researchers argue, however, that lexical features Yao et al. (2019) CONcISE 69.5% 79.4% 74.1% CB
alone may not adequately represent the nuances of cyberbul- Van Hee et al. (2018) SVM 73.3% 57.2% 64.3% total
lying. Hosseinmardi et al. (2015) found that among Insta- Singh, Huang, and
Proposed 82.0% 53.0% 64.0% CB
Atrey (2016)
gram media sessions containing profane or vulgar content, Rosa et al. (2019) SVM 46.0% - 45.0% CB
only 30% were acts of cyberbullying. They also found that Dadvar et al. (2013) SVM 31.0% 15.0% 20.0% CB
Huang, Singh, and
while cyberbullying posts contained a moderate proportion Atrey (2014)
Dagging 76.3% - - CB
of negative terms, the most negative posts were not consid-
ered cases of cyberbullying by the annotators. Instead, these
negative posts referred to politics, sports, and other domestic work by developing a dataset that better reflects the defini-
matters between friends (Hosseinmardi et al. 2015). tions of cyberbullying presented by social scientists, and by
The problem of cyberbullying cuts deeper than merely proposing and evaluating a feature set that represents infor-
the exchange of aggressive language. The meaning and in- mation pertaining to the social processes that underlie cyber-
tent of an aggressive post is revealed through conversation bullying behavior.
and interaction between peers. Therefore, to properly distin-
guish cyberbullying from other uses of aggressive or pro- Curating a Comprehensive
fane language, future studies should incorporate key indica-
tors from the social context of each message. Specifically, Cyberbullying Dataset
researchers can measure the author’s status or social advan- Here, we provide an original annotation framework and a
tage, the author’s harmful intent, the presence of repeated new dataset for cyberbullying research, built to unify exist-
aggression in the thread, and the visibility of the thread ing methods of ground truth annotation. In this dataset, we
among peers (Hosseinmardi et al. 2015; Rosa et al. 2019; decompose the complex issue of cyberbullying into five key
Salawu, He, and Lumsden 2017). criteria, which were drawn from the social science and ma-
Since cyberbullying is an inherently social phenomenon, chine learning communities. These criteria can be combined
some studies have naturally considered social network mea- and adapted for revised definitions of cyberbullying.
sures for classification tasks. Several features have been de-
rived from the network representations of the message inter- Data Collection
actions. The degree and eigenvector centralities of nodes, the We collected a sample of 1.3 million unlabeled tweets from
k-core scores, and clustering of communities, as well as the the Twitter Filter API. Since cyberbullying is a social phe-
tie strength and betweenness centralities of mention edges nomenon, we chose to filter for tweets containing at least
have all been shown to improve text-based models (Huang, one “@” mention. To restrict our investigation to origi-
Singh, and Atrey 2014; Singh, Huang, and Atrey 2016). Ad- nal English content, we removed all non-English posts and
ditionally, bullies and victims can be more accurately iden- retweets (RTs), narrowing the size of our sample to 280,301
tified by their relative network positions. For example, the tweets.
Jaccard coefficient between neighborhood sets in bully and Since aggressive language is a key component of cyber-
victim networks has been found to be statistically significant bullying (Hosseinmardi et al. 2015), we ran the pre-trained
(Chelmis, Zois, and Yao 2017). The ratio of all messages classifier of Davidson et al. (2017) over our dataset to iden-
sent and received by each user was also significant. tify hate speech and aggressive language and increase the
These findings show promising directions for future work. prevalence of cyberbullying examples 2 . This gave us a fil-
Social network features may provide the information neces- tered set of 9,803 aggressive tweets.
sary to reliably classify cyberbullying. However, it may be We scraped both the user and timeline data for each author
prohibitively expensive to build out social networks for each in the aggressive set, as well as any users who were men-
user due to time constraints and the limitations of API calls tioned in one of the aggressive tweets. In total, we collected
(Yao et al. 2019). For this reason, alternative measurements data from 21,329 accounts. For each account, we saved the
of online social relationships should be considered.
In the present study, we leverage prior work by incorpo- 2
Without this step, our positive class balance would be pro-
rating linguistic signals into our classifiers. We extend prior hibitively small. See Appendix 1 for details.
full user object, including profile name, description, loca- Table 3: Analysis of Labeled Twitter Data
tion, verified status, and creation date. We also saved a com-
Criterion Positive Inter-annotator Cyberbullying
plete list of the user’s friends and followers, and a 6-month Balance Agreement Correlation
timeline of all their posts and mentions from January 1st aggression 74.8% 0.23 0.22
through June 10th , 2019. For author accounts, we extended repetition 6.6% 0.18 0.27
harmful intent 16.1% 0.42 0.68
our crawl to include up to four years of timeline content. visibility among peers 30.1% 0.51 0.07
Lastly, we collected metadata for all tweets belonging to the target power 34.3% 0.37 0.11
corresponding message thread for each aggressive message. author power 3.1% 0.10 -0.02
equal power 59.7% 0.22 -0.09
cyberbullying 0.7% 0.18 –
Annotation Task
We presented each tweet in the dataset to three separate an-
notators as a Human Intelligence Task (HIT) on Amazon’s there strong evidence that the author is more powerful than
Mechanical Turk (MTurk) platform. By the time of recruit- the target? Is the target more powerful? Or if there is not
ment, 6,897 of the 9,803 aggressive tweets were accessible any good evidence, just mark equal.” We recognized that an
from the Twitter web page. The remainder of the tweets had imbalance of power might arise in a number of different cir-
been removed, or the Twitter account had been locked or cumstances. Therefore, we did not restrict our definition to
suspended. just one form of power, such as follower count or popularity.
We asked our annotators to consider the full message
For instructional purposes, we provided five sample
thread for each tweet as displayed on Twitter’s web inter-
threads to demonstrate both positive and negative examples
face. We also gave them a list of up to 15 recent mentions
for each of the five criteria. Two of these threads are shown
by the author of the tweet, directed towards any of the other
here. The thread in Figure 1a displays bullying behavior that
accounts mentioned in the original thread. Then we asked
is targeted against the green user, with all five cyberbully-
annotators to interpret each tweet in light of this social con-
ing criteria displayed. The thread includes repeated use of
text, and had them provide us with labels for five key cy-
aggressive language such as “she really fucking tried” and
berbullying criteria. We defined these criteria in terms of the
“she knows she lost.” The bully’s harmful intent is evident
author account (“who posted the given tweet?”) and the tar-
in the victim’s defensive responses. And lastly, the thread is
get (“who was the tweet about?” – not necessarily the first
visible among four peers as three gang up against one, cre-
mention). We also stated that “if the target is not on Twit-
ating a power imbalance.
ter or their handle cannot be identified” the annotator should
“please write OTHER.” With this framework established, we The final tweet in Figure 1b shows the importance of con-
gave the definitions for our five cyberbullying criteria as fol- text in the annotation process. If we read only this individ-
lows. ual message, we might decide that the post is cyberbullying,
but given the social context here, we can confidently assert
1. Aggressive language: (AGGR) Regardless of the au- that this post is not cyberbullying. Although it contains the
thor’s intent, the language of the tweet could be seen aggressive phrase “FUCK YOU TOO BITCH”, the author
as aggressive. The user either addresses a group or in- does not intend harm. The message is part of a joking ex-
dividual, and the message contains at least one phrase change between two friends or equals, and no other peers
that could be described as confrontational, derogatory, have joined in the conversation or interacted with the thread.
insulting, threatening, hostile, violent, hateful, or sexu-
After asking workers to review these examples, we gave
ally abusive.
them a short 7-question quiz to test their knowledge. Work-
2. Repetition: (REP) The target user has received at least ers were given only one quiz attempt, and they were ex-
two aggressive messages in total (either from the author pected to score at least 6 out of 7 questions correctly before
or from another user in the visible thread). they could proceed to the paid HIT. Workers were then paid
3. Harmful intent: (HARM) The tweet was designed to tear $0.12 for each thread that they annotated.
down or disadvantage the target user by causing them We successfully recruited 170 workers to label all 6,897
distress or by harming their public image. The target available threads in our dataset. They labeled an average of
does not respond agreeably as to a joke or an otherwise 121.7 threads and a median of 7 threads each. They spent
lighthearted comment. an average time of 3 minutes 50 seconds, and a median time
4. Visibility among peers: (PEER) At least one other user of 61 seconds per thread. For each thread, we collected an-
besides the target has liked, retweeted, or responded to notations from three different workers, and from this data
at least one of the author’s messages. we computed our reliability metrics using Fleiss’s Kappa for
inter-annotator agreement as shown in Table 3.
5. Power imbalance: (POWER) Power is derived from au- We determined ground truth for our data using a 2 out
thority and perceived social advantage. Celebrities and of 3 majority vote as in Hosseinmardi et al. (2015). If the
public figures are more powerful than common users. message thread was missing or a target user could not be
Minorities and disadvantaged groups have less power. identified, we removed the entry from the dataset, since later
Bullies can also derive power from peer support. we would need to draw our features from both the thread
Each of these criteria was represented as a binary label, ex- and the target profile. After filtering in this way, we were
cept for power imbalance, which was ternary. We asked “Is left with 5,537 labeled tweets.
a ? t
Figure 1: Cyberbullying or not. The leftmost thread demonstrates all five cyberbullying criteria. Although the thread in the
middle contains repeated use of aggressive language, there is no harmful intent, visibility among peers, or power imbalance.
Overlap measures. (right) Graphical representation of the neighborhood overlap measures of author a and target t.
Cyberbullying Transcends Cyberaggression the message author and target, using network and timeline
As discussed earlier, some experts have argued that cyber- similarities, expectations from language models, and other
bullying is different from online aggression (Hosseinmardi signals taken from the message thread.
et al. 2015; Rosa et al. 2019; Salawu, He, and Lumsden For each feature and each cyberbullying criterion, we
2017). We asked our annotators to weigh in on this is- compare the cumulative distributions of the positive and
sue by asking them the subjective question for each thread: negative class using the two-sample Kolmogorov-Smirnov
“Based on your own intuition, is this tweet an example of test. We report the Kolmogorov-Smirnov statistic D (a nor-
cyberbullying?” We did not use the cyberbullying label as malized distance between the CDF of the positive and nega-
ground truth for training models; we used this label to better tive class) as well as the p-value with α = 0.05 as our level
understand worker perceptions of cyberbullying. We found for statistical significance.
that our workers believed cyberbullying will depend on a
weighted combination of the five criteria presented in this Text-based Features
paper, with the strongest correlate being harmful intent as To construct realistic and competitive baseline models, we
shown in Table 3. consider a set of standard text-based features that have been
Furthermore, the annotators decided our dataset contained used widely throughout the literature. Specifically, we use
74.8% aggressive messages as shown in the Positive Bal- the NLTK library (Bird, Klein, and Loper 2009) to con-
ance column of Table 3. We found that a large majority of struct unigrams, bigrams, and trigrams for each labeled mes-
these aggressive tweets were not labeled as “cyberbullying.” sage. This parallels the work of Hosseinmardi et al. (2016),
Rather, only 10.5% were labeled by majority vote as cyber- Van Hee et al. (2018), and Xu et al. (2012). Following Zhang
bullying, and only 21.5% were considered harmful. From et al. (2016), we incorporate counts from the Linguistic In-
this data, we propose that cyberbullying and cyberaggres- quiry and Word Count (LIWC) dictionary to measure the
sion are not equivalent classes. Instead, cyberbullying tran- linguistic and psychological processes that are represented
scends cyberaggression. in the text (Pennebaker, Booth, and Francis 2007). We also
use a modified version of the Flesch-Kincaid Grade Level
Feature Engineering and Flesch Reading Ease scores as computed in Davidson et
We have established that cyberbullying is a complex social al. (2017). Lastly, we encode the sentiment scores for each
phenomenon, different from the simpler notion of cyberag- message using the Valence Aware Dictionary and sEntiment
gression. Standard Bag of Words (BoW) features based on Reasoner (VADER) of Hutto and Gilbert (2014).
single sentences, such as n-grams and word embeddings,
may thus lead machine learning algorithms to incorrectly Social Network Features
classify friendly or joking behavior as cyberbullying (Hos- Network features have been shown to improve text-based
seinmardi et al. 2015; Rosa et al. 2019; Salawu, He, and models (Huang and Chou 2010; Singh, Huang, and Atrey
Lumsden 2017). To more reliably capture the nuances of 2016), and they can help classifiers distinguish between bul-
repetition, harmful intent, visibility among peers, and power lies and victims (Chelmis, Zois, and Yao 2017). These fea-
imbalance, we designed a new set of features from the social tures may also capture some of the more social aspects of cy-
and linguistic traces of Twitter users. These measures allow berbullying, such as power imbalance and visibility among
our classifiers to encode the dynamic relationship between peers. However, many centrality measures and clustering
algorithms require detailed network representations. These
features may not be scalable for real-world applications.
We propose a set of low-complexity measurements that can
be used to encode important higher-order relations at scale.
Specifically, we measure the relative positions of the author
and target accounts in the directed following network by
computing modified versions of Jaccard’s similarity index
as we now explain. (a) Downward Overlap (b) Upward Overlap
+
Neighborhood Overlap Let N (u) be the set of all ac-
counts followed by user u and let N − (u) be the set of all
accounts that follow user u. Then N (u) = N + (u) ∪ N − (u)
is the neighborhood set of u. We consider five related mea-
surements of neighborhood overlap for a given author a and
target t, listed here.
|N + (a)∩N − (t)|
down(a, t) = |N + (a)∪N − (t)| (c) Inward Overlap (d) Outward Overlap
− +
|N (a)∩N (t)|
up(a, t) = |N − (a)∪N + (t)| Figure 2: Cumulative Distribution Functions for neighbor-
|N − (a)∩N − (t)|
hood overlap on relevant features. These measures are
in(a, t) = |N − (a)∪N − (t)| shown to be predictive of power imbalance and visibility
|N + (a)∩N + (t)| among peers.
out(a, t) = |N + (a)∪N + (t)|
|N (a)∩N (t)|
bi(a, t) = |N (a)∪N (t)| User-based features We also use basic user account met-
Downward overlap measures the number of two-hop paths rics drawn from the author and target profiles. Specifically,
from the author to the target along following relationships; we count the friends and followers of each user, their verified
upward overlap measures two-hop paths in the opposite di- status, and the number of tweets posted within six-month
rection. Inward overlap measures the similarity between the snapshots of their timelines, as in Al-garadi, Varathan, and
two users’ follower sets, and outward overlap measures the Ravana (2016), Chatzakou et al. (2017), and Hosseinmardi
similarity between their sets of friends. Bidirectional overlap et al. (2016).
then is a more generalized measure of social network sim-
ilarity. We provide a graphical depiction for each of these Timeline Features
features on the right side of Figure 1. Here, we consider linguistic features, drawn from both the
High downward overlap likely indicates that the target is author and target timelines. These are intended to capture
socially relevant to the author, as high upward overlap indi- the social relationship between each user, their common in-
cates the author is relevant to the target. Therefore, when the terests, and the surprise of a given message relative to the
author is more powerful, downward overlap is expected to be author’s timeline history.
lower and upward overlap is expected be higher. This trend
is slight but visible in the cumulative distribution functions Message Behavior To more clearly represent the social
of Figure 2 (a): downward overlap is indeed lower when relationship between the author and target users, we con-
the author is more powerful than when the users are equals sider the messages sent between them as follows:
(D = 0.143). However, there is not a significant difference - Downward mention count: How many messages has the
for upward overlap (p = 0.85). We also observe that, when author sent to the target?
the target is more powerful, downward and upward overlap - Upward mention count: How many messages has the tar-
are both significantly lower (D = 0.516 and D = 0.540 get sent to the author?
respectively). It is reasonable to assume that messages can
be sent to celebrities and other powerful figures without the - Mention overlap: Let Ma be the set of all accounts men-
need for common social connections. tioned by author a, and let Mt be the set of all accounts
Next, we consider inward and outward overlap. When the mentioned by target t. We compute the ratio |M a ∩Mt |
|Ma ∪Mt | .
inward overlap is high, the author and target could have
more common visibility. Similarly, if the outward overlap - Multiset mention overlap: Let M̂a be the multiset of all
is high, then the author and target both follow similar ac- accounts mentioned by author a (with repeats for each
counts, so they might have similar interests or belong to the mention), and let M̂t be the multiset of all accounts men-
∗
same social circles. Both inward and outward overlaps are tioned by target t. We measure ||M̂a ∩ M̂t |
M̂a ∪M̂t |
where ∩∗ takes
expected to be higher when a post is visible among peers.
the multiplicity of each element to be the sum of the mul-
This is true of both distributions in Figure 2. The difference
in outward overlap is significant (D = 0.04, p = 0.03), tiplicity from M̂a and the multiplicity from M̂b
and the difference for inward overlap is short of significant The direct mention count measures the history of repeated
(D = 0.04, p = 0.08). communication between the author and the target. For harm-
(a) Downward Mentions (b) Upward Mentions (a) Timeline Similarity (b) Timeline Similarity
ful messages, downward overlap is higher (D = 0.178) and Figure 5: Cumulative Distribution Functions for language
upward overlap is lower (D = 0.374) than for harmless models on relevant features. These measures are shown to
messages, as shown in Figure 3. This means malicious au- be predictive of harmful intent.
thors tend to address the target repeatedly while the target
responds with relatively few messages.
Mention overlap is a measure of social similarity that is more powerful (p = 0.58). What we do observe is likely
based on shared conversations between the author and the caused by noise from extreme class imbalance and low inter-
target. Multiset mention overlap measures the frequency of annotator agreement on labels for author power.
communication within this shared space. These features may Turning to Figure 4 (b), we see that aggressive messages
help predict visibility among peers, or repeated aggression were less likely to harbor harmful intent if they were sent
due to pile-on bullying situations. We see in Figure 3 that re- between users with similar timelines (D = 0.285). Aggres-
peated aggression is linked to slightly greater mention over- sive banter between friends is generally harmless, so again,
lap (D = 0.07, p = 0.07), but the trend is significant only this confirms our intuitions.
for multiset mention overlap (D = 0.08, p = 0.03).
Timeline Similarity Timeline similarity is used to indi- Language Models Harmful intent is difficult to measure
cate common interests and shared topics of conversation be- in isolated messages because social context determines prag-
tween the author and target timelines. High similarity scores matic meaning. We attempt to approximate the author’s
might reflect users’ familiarity with one another, or suggest harmful intent by measuring the linguistic “surprise” of a
that they occupy similar social positions. This can be used given message relative to the author’s timeline history. We
to distinguish cyberbullying from harmless banter between do this in two ways: through a simple ratio of new words,
friends and associates. To compute this metric, we represent and through the use of language models.
the author and target timelines as TF-IDF vectors A ~ and T~ . To estimate historical language behavior, we count uni-
We then take the cosine similarity between the vectors as gram and bigram frequencies from a 4-year snapshot of the
author’s timeline. Then, after removing all URLs, punctua-
~ · T~
A tion, stop words, mentions, and hashtags from the original
cos θ = . post, we take the cardinality of the set unigrams in the post
~ T~ k
kAkk
having zero occurrences in the timeline. Lastly, we divide
A cosine similarity of 1 means that users’ timelines had this count by the length of the processed message to arrive
identical counts across all weighted terms; a cosine simi- at our new words ratio. We can also build a language model
larity of 0 means that their timelines did not contain any from the bigram frequencies, using Kneser-Ney smoothing
words in common. We expect higher similarity scores be- as implemented in NLTK (Bird, Klein, and Loper 2009).
tween friends and associates. From the language model, we compute the surprise of the
In Figure 4 (a), we see that the timelines were significantly original message m according to its cross-entropy, given
less similar when the target was in a position of greater by
power (D = 0.294). This is not surprising, since power can N
be derived from such differences between social groups. We 1 X
H(m) = − log P (bi )
do not observe the same dissimilarity when the author was N i=1
Table 4: Feature Combinations Table 6: Recall
Feature BoW Text User Proposed Combined Criterion BoW Text User Proposed Combined
n-grams 3 3 3 aggression 77.0% 84.8% 47.8% 51.6% 85.6%
LIWC, VADER, Flesch-Kincaid 3 3
Friend/following counts, tweet count, verified 3 3 3 repetition 17.6% 7.3% 49.5% 64.3% 26.2%
Neighborhood overlap measures 3 3 3 harmful intent 40.2% 44.4% 63.4% 67.7% 52.7%
Mention counts and overlaps 3 3 3 visibility among peers 34.8% 20.4% 47.1% 54.2% 33.7%
Timeline similarity 3 3 3 author power 6.5% 1.6% 74.1% 80.0% 11.9%
New words ratio, cross-entropy 3 3 3
Thread visibility features 3 3
target power 49.4% 43.3% 73.3% 80.8% 71.1%
Thread aggression features 3 3
Table 7: F1 Scores
Table 5: Precision
Criterion BoW Text User Proposed Combined
Criterion BoW Text User Proposed Combined aggression 79.7% 83.5% 59.0% 62.3% 84.1%
aggression 82.5% 82.3% 77.1% 78.7% 82.6% repetition 10.8% 9.4% 13.3% 24.7% 28.7%
repetition 7.8% 13.4% 7.7% 15.3% 31.7% harmful intent 34.1% 46.7% 38.7% 45.7% 53.8%
harmful intent 29.6% 49.4% 35.8% 34.5% 55.3% visibility among peers 32.7% 25.5% 39.5% 47.4% 45.5%
visibility among peers 30.8% 34.3% 34.0% 42.2% 46.8% author power 2.9% 2.2% 13.7% 17.5% 14.0%
author power 1.9% 3.6% 7.6% 9.8% 17.0% target power 46.2% 47.0% 75.3% 77.9% 73.9%
target power 43.5% 51.5% 77.6% 75.2% 77.0%