Identity Construction in A Misogynist Incels Forum

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Identity Construction in a Misogynist Incels Forum

Michael Miller Yoder,1 Chloe Perry,2 David West Brown,3


Kathleen M. Carley,1 Meredith Pruden4
1
Software and Societal Systems Dept., Carnegie Mellon University, Pittsburgh, PA, USA
2
Dept. of American Culture, University of Michigan, Ann Arbor, MI, USA
3
Dept. of English, Carnegie Mellon University, Pittsburgh, PA, USA
4
School of Communication and Media, Kennesaw State University, Kennesaw, GA, USA
[email protected], [email protected], [email protected],
[email protected], [email protected]

Abstract We focus this analysis on mentions of identities,


which are key to automatically identifying hate
Online communities of involuntary celibates speech (Uyheng and Carley, 2021) and a window
(incels) are a prominent source of misogynist
into the ideologies of social movements (Benford
hate speech. In this paper, we use quantita-
tive text and network analysis approaches to and Snow, 2000). We ground this analysis in the-
examine how identity groups are discussed on oretical approaches that focus on how identities
incels.is, the largest black-pilled incels forum. are constructed in interaction (Bucholtz and Hall,
We find that this community produces a wide 2005; Burr and Dick, 2017) and investigate the
range of novel identity terms and, while terms following research questions:
for women are most common, mentions of
other minoritized identities are increasing. An RQ1. How frequently are different identi-
analysis of the associations made with iden-
ties, including novel terms for iden-
tity groups suggests an essentialist ideology
where physical appearance, as well as gender tities, mentioned in incels.is dis-
and racial hierarchies, determine human value. course?
We discuss implications for research into auto- RQ2. How do identity mentions in in-
mated misogynist hate speech detection. cels.is discourse change over time?
RQ3. How are identity mentions used dif-
1 Introduction
ferently by central incels.is users?
Warning: this paper contains content that is dis- RQ4. What textual associations are made
turbing, offensive, and/or hateful. with identity groups on incels.is?
Online communities of those calling themselves
“involuntary celibates,” (incels) are known for on- To address these research questions, we first mea-
line misogynist hate speech and offline violence sure the distribution of identity term mentions using
targeting women, including incidents of mass vio- a large generic list of identity terms combined with
lence in Isla Vista, California, in 2014 and Toronto, community-specific identity terms surfaced from
Canada, in 2018, among others. Though some a word embedding-based approach. We confirm
work in natural language processing (NLP) has fo- the most frequent identity mentions in this data are
cused on features of misogynist language in general for women, with almost one-fourth of these being
(Anzovino et al., 2018; Samghabadi et al., 2020; derogatory community-specific neologisms, such
Guest et al., 2021), online incel communities are as “femoids.” Mentions of gender are much higher
known for significant lexical innovation (Farrell than in a comparative white supremacist dataset, a
et al., 2020; Gothard, 2021). Training with data similar commonly-used source of unlabeled hate
from incel forums would enable misogynist hate speech (Simons and Skillicorn, 2020; Alatawi et al.,
speech classifiers to identify the neologisms and 2021). We find increasing mentions of other mi-
novel ideological features of this dangerous form noritized identities, such as Black, LGBTQ+ and
of online misogyny (Jaki et al., 2019). Jewish people on incels.is, suggesting a consol-
In this paper, we provide hate speech researchers idation with broader far-right discourses. Users
with a quantitative overview of trends and partic- who are central to the network proportionally men-
ularities of language in one of the largest misogy- tion more of these other marginalized identities.
nist incel communities, incels.is, which launched From a quantitative analysis of the immediate con-
in 2017 following the r/incels ban from Reddit. texts in which identity term mentions appear, a
1
The 7th Workshop on Online Abuse and Harms (WOAH), pages 1–13
July 13, 2023 c 2023 Association for Computational Linguistics
pervasive hatred of women is clear, as well as rein- tivists (Berger, 2018) and more moderate groups
forcement of stereotypes about other marginalized that are less niche and without the emergent vo-
groups. The incel identity itself is often discussed cabulary common in incel spaces. We find this
with themes of victimhood and boundary-keeping tendency extends to the data sources in automated
for “true” versus “fake” incels. hate speech research and argue for the importance
Throughout our analysis, we find evidence of of attending to the particularly dangerous discourse
an essentialist, black-pilled ideological framework of black-pilled incels such as those on incels.is.
where physical appearance determines the value of Quantitative and computational studies of the
individuals and groups. While rigid racial and gen- manosphere often focus on the unique misogynist
der hierarchies are not new (e.g., eugenics) and are language use of these communities. Gothard (2021)
often circulated in far-right discourse (Miller-Idriss, and Jaki et al. (2019) surface incel jargon by com-
2022), this incel community attaches many novel paring word frequencies in incel Reddit posts with
measurements to appearance related to these hier- subreddits and Wikipedia articles outside of the
archies, re-entrenching and extending them. We incel movement, while Farrell et al. (2020) find fre-
argue that to detect such a particular form of ex- quent incel terms not present in English dictionaries
tremism, hate speech researchers must heed both and expand their lexicon with a word embedding
the jargon and deeper ideologies of this movement. space. Such word frequency analysis, as well as
hand-crafted lexicons, are often used to measure
2 Incels and Male Supremacism and study misogyny in the manosphere (Heritage
et al., 2019; Farrell et al., 2019; Jaki et al., 2019).
Online misogynist incel communities are situated Pruden (2021) and Perry and DeDeo (2021) use
within a set of anti-feminist groups often termed topic modeling to characterize narratives and map
the “manosphere.” These groups include Men’s out user trajectories on incels.is and r/theRedPill,
Rights Activists, Pick Up Artists (PUAs), Men respectively. Jaki et al. (2019) use word frequency
Going Their Own Way (MGTOW). Such groups analysis to study identity construction on a simi-
are often associated with a “red pill” ideology, a lar forum, incels.me, though their 6-month dataset
Matrix film reference to seeing the hidden truth only enables limited time-series analysis. In con-
behind the world, in this far-right context a view trast, our work focuses on the use and contexts
that feminism has brainwashed and subordinated of generic and community-specific identity terms
men (Ging, 2019). In addition, incels often refer to beyond a sole focus on misogyny, as well as how
a “black pill,” the idea that they are genetically pre- identity term use changes over time.
determined to be incels and cannot improve their
situation through work or self-improvement (Pru-
2.1 Automated misogyny detection
den, 2021). This leaves many black-pilled incels
feeling that their only options are to cope, com- In early work on automated misogyny detec-
mit suicide, or commit mass violence (expressed in tion, Hewitt et al. (2016) and Waseem and Hovy
the common phrase, “cope, rope or go ER [Elliot (2016) developed small Twitter datasets annotated
Rodger, an incel mass shooter]).” Among groups for sexism. Anzovino et al. (2018) proposed a
in the manosphere, incels are most associated with keyword-based annotated dataset and taxonomy for
violent and high-profile events that demonstrate misogyny detection on Twitter, with later shared
“extreme misogyny” (Ging, 2019). NLP tasks (Fersini et al., 2018b,a; Basile et al.,
Ribeiro et al. (2021) find that incel commu- 2019; Bhattacharya et al., 2020). Data for these
nities are both more extreme and more popular tasks came from Twitter posts and YouTube com-
than older, more moderate male supremacist move- ments based on keywords, profile information, and
ments, while LaViolette and Hogan (2019) find YouTube video topics. Such data sources may cap-
more extreme manosphere movements contain ture mainstream misogyny but miss the unique lin-
essentialist, deterministic ideologies of identity, guistic characteristics of the incel movement.
which we also find in incels.is. We find evidence of Other annotated hate speech datasets have
this reductionist and biologically essentialist world- included data from manosphere subreddits,
view in associations with identities in incels.is. such as r/MensRights, r/MGTOW, r/incels, and
Qualitative research on male supremacist ex- r/TheRedPill, along with many other sources of
tremism frequently examines Men’s Rights Ac- online hate speech (Qian et al., 2019; Sap et al.,
2
2020; Mollas et al., 2020). Guest et al. (2021) pro- 4.1 Measuring the use of identity terms
pose a Reddit dataset annotated for misogyny by We first find identity terms using a generic lexi-
trained annotators, who would be more likely to con combined from multiple sources: the extensive
understand community-specific jargon than crowd- list of English identity terms from the NetMapper
workers with limited training. Though they include software (Joseph et al., 2016; Carley et al., 2018),
a variety of manosphere-related subreddits, absent as well as identity terms frequently found in hate
from this dataset are banned black-pilled incel sub- speech (Yoder et al., 2022) and terms for LGBTQ+
reddits such as r/braincels, r/shortcels, and r/incels, and neurodiverse identities found online (Yoder
the precursor of the more extreme incels.is. et al., 2020; Yoder, 2021). This combined lexicon
totals 19,050 unique identity terms. Ignoring case,
3 Data 7,244 were present in the incels.is dataset.

Our dataset contains 6,248,234 English-language Grouping identity terms We aggregate identity
public comments posted between the forum’s cre- terms referring to similar groups (such as LGBTQ+
ation in November 2017 and scraping in April people) and then further group those identities
2021.1 It includes forum and thread names, as into broader demographic categories (such as gen-
well as the date of posting, user names and the der/sexuality). To form these groupings, we adapt
comment’s full text. However, it does not contain identity terms group labels used in hate speech
images, which is a limitation. research from Uyheng and Carley (2020) and Yo-
der et al. (2022).2 Intersectional identity terms are
White supremacist dataset We compare identity counted for all groups indicated by the term, e.g.,
mentions on incels.is to another common source of “white women” was counted for both “white” and
unlabeled hate speech: white supremacist texts. “women.”
From a large, multi-domain, English-language Identity lexicon expansion To capture the neol-
white supremacist dataset (Yoder et al., 2023), we ogisms that incel communities are known for (Jaki
select posts from online forums in a similar time et al., 2019; Gothard, 2021), we expand our generic
frame as the incels data, 2015-2019 (the latest year identity lexicon to nearest neighbors in word em-
available in the white supremacist dataset). This bedding space, a common approach (Demszky
subset includes 3,410,623 posts from Stormfront, et al., 2019; Simons and Skillicorn, 2020; Lai et al.,
Iron March, and 4chan /pol/ in threads with fascist 2021). We trained a 300-dimension word2vec
and white supremacist topics or posted by users model (Mikolov et al., 2013) over our data and
choosing white supremacist, Nazi, Confederate or manually examined terms appearing at least 1,000
fascist flags. times among the top 30 nearest neighbors by cosine
distances to a) the 30 most frequent generic identity
4 Methods terms or b) the mean of identity term embeddings
in an identity group. This resulted in 84 new terms,
We take a quantitative approach to studying discur- the most frequent of which are in Table 1.
sive identity construction (Bucholtz and Hall, 2005;
Gee, 2011), borrowing a focus on in-group and Varieties of “incels” It is common in incel dis-
out-group identity presentation from social identity course to refer to different types of incels with
theory (Tajfel, 1974; Seering et al., 2018). Specif- terms including a “cel” suffix (Gothard, 2021). For
ically, we examine the use of identity terms and example, “tallcels” refers to tall incels and the racist
the immediate contexts in which they appear. A terms “currycels” and “ricecels” refer to South
few mentions of an identity group may not repre- Asian and East Asian incels, respectively. Exclud-
sent attitudes of participants, but associations re- ing usernames, over 1500 unique words used in our
peatedly made over the course of a 6 million-post incels.is dataset contained the string “cel,” many
corpus are more likely to capture widely shared of which referred to varieties of incels. We exam-
beliefs (Stubbs, 2001). ined the 100 most frequent words containing “cel”
2
Non-proprietary portions of identity term lexicons (in-
1
This dataset, without any private or identifying informa- cluding groupings and categorizations) and code for anal-
tion, will be made available to vetted researchers upon publi- yses in this paper are available at https://github.com/
cation of the main paper associated with it. michaelmilleryoder/incels_identities.

3
0.4 Dataset/identity lexicon
Incels data
Mentions/post

0.3 Expanded lexicon


Incels data
0.2 Generic lexicon
White supremacist data
Generic lexicon
0.1

0
Wo Me Yo N LG As Bla Wh Je
uth eur i ws
men n od BTQ an ck ite
ive +
rse

Figure 1: Identity group mention frequencies.

Community-specific identity terms number of posts as the rest of the users combined.
foids, chads, manlets, stacies, boyo, femoids,
ethnics, chadlites, roasties, holes, betabux, 4.2 Associations with identity terms
landwhales, waifus, jbs, chicks, noodlewhores, Beyond the occurrence of identity term mentions,
soyboy, br0, aspie, betas, thots, traps, beckies, we analyze associations made with identities in
m8, boi their immediate contexts. Specifically, we extract
“Cel” variants actions taken by or to these groups, as well as at-
truecels, fakecels, volcels, greycels, tributes associated with them, a simple approach
escortcelling, gymcelling, ricecels, mentalcels, to analyzing the presentation of entities in dis-
currycels, fatcels, femcels, whitecels, framecels, course (Bamman et al., 2013, 2014; Yoder, 2021).
youngcels, oldcels, blackcels, ethnicels, brocels, For actions, we extract verbs where an identity
itcels, incelistan, nearcels, tallcels, shortcels, term is the subject or object from a dependency
locationcels, bluecels parse. Attributes are adjectives and appositives
whose head word is an identity term.
Table 1: Most frequent 25 novel identity and “cel” terms We surface the actions and attributes most dis-
found in the incels.is dataset. Plural and singular men- tinctively associated with each identity group with
tions are combined, as are “-ing” terms with their roots. PMI3 (Daille, 1994; Role and Nadif, 2011), a vari-
ant of pointwise mutual information that lowers the
ranking of low-frequency terms.
and grouped words that referred to incel variants,
except those referring to “fake” incels, within the 5 Results
incels identity group for further analysis.
5.1 Distribution of identity mentions (RQ1)
Central forum users We also analyze how fo- Prevalence of the most popular identity group men-
rum leaders (prototypical incels) use identity terms. tions in our incels.is dataset is seen in Figure 1.
To find such leaders based on network structure, we Expanding the generic identity list with context-
construct a undirected graph where nodes are users specific identity terms dramatically increases the
and edges are weighted by the number of shared detection of mentions of all identity groups, es-
threads (out of 154,049 threads) between them. pecially women, men, and neurodiverse people3 .
This graph contains 6819 users and 3,889,054 Adding these context-specific identity terms in-
links. We operationalize central users as the top 5% creases the total number of mentions identified
ranked by eigenvector centrality, which measures from 6.46 million to 8.91 million–a jump of 37.8%–
if users share threads with other highly-connected
users. These central users had a roughly similar 3
Many incels self-identify on the autism spectrum.

4
Identity group/category
−2
Black
Jews
LGBTQ+
Log word probability

−2.5 Men
Women
Politics
−3

−3.5

Jan 2018 Jul 2018 Jan 2019 Jul 2019 Jan 2020 Jul 2020 Jan 2021

Figure 2: Selected identity group and category mentions over time in the incels.is dataset. Mentions of other identity
groups remain steady.

demonstrating how common identity term innova-


Women
tions are in this community. Men
Also visible in Figure 1 is a comparison of iden- Asian
Black
tity group mention frequency with another com- White
mon source of hate speech, white supremacist data. Jews
Mentions of women and men are much more fre- LGBTQ+
Neurodiverse
quent in the incel data than the white supremacist Incels
data, surpassing 0.4 mentions/post for women. Truecels
Mentions of racial identities and Jewish people are Fakecels
−0.03 −0.02 −0.01 0
more commonly found in the white supremacist
Difference in proportion of mentions for central users
data. This confirms that discourse from incel com-
munities can be a useful source of misogynist text,
especially after recognizing the lexical innovations Figure 3: Absolute difference between the proportion
referring to women and others. of identity mentions used by top 5% central users in the
shared thread network for each identity group and the
5.2 Identity mentions over time (RQ2) proportion of mentions used by the rest of the users.

Figure 2 displays the prevalence of identity group


mentions in this forum over time, binned every
month during the dataset range and identified with an increased rate through 2020 and 2021. Mentions
the expanded lexicon. To control for any system- of political identities also rise.
atic changes in post word count over time, we
present log word probability (the logarithm of iden-
tity group mention counts normalized by total word 5.3 Central users’ use of identity terms (RQ3)
count).
Though mentions of women and men are most Figure 3 shows the absolute difference in propor-
frequent across the data range, they stay steady tion of identity term mentions for the top 5% of
or slightly decrease over time. There is a steady users ranked by eigenvector centrality, compared
rise, however, in mentions of LGBTQ+ identities. to the rest. Proportionally, central users are less
Except for a decrease in the latter half of 2019, likely to mention identity terms for women, but are
mentions of Jewish people also steadily rise. There more likely to mention Black, LGBTQ+, and Jew-
is a significant rise in mentions of Black people in ish people. A concern for incel authenticity is also
2020, reaching a peak in June (during the anti-racist reflected in this central group’s increased use of
uprising against police brutality) and remaining at “truecels” and “fakecels” compared to other users.
5
Identity Top PMI3 terms
Attr white, old, single, fat, young, hot, fucking, ugly, cute, other, ethnic, average
Women ActS want, get, love, care, go, hate, think, like, fuck, say, give, look, find, wants
ActO get, fuck, hate, fucking, find, getting, having, attracted, see, want, fucked
Attr ugly, white, other, good, looking, average, tall, black, nice, short, chad, young
Men ActS get, looks, go, need, look, think, want, got, mogs, fuck, going, become, gets
ActO over, see, know, want, fuck, hate, fucking, love, seen, get, against, laid, date
Attr south, central, >, east, average, half, ugly, other, skinned, northern, full
Asian ActS look, hate, get, cope, worship, go, need, mog, tend, eat, seem, make, want, take
ActO learning, learn, speak, hate, seen, see, mog, killed, against, over, above, know
Attr north, real, west, other, fucking, man, stupid, dumb, ugly, dark, average
Black ActS get, got, commit, slay, aspire, look, developed, gon, need, go, tend, fuck, run
ActO free, hate, fuck, see, against, say, around, date, prefer, fucking, kill, sand
Attr southern, northern, non, eastern, western, pure, white, nordic, other, average
White ActS go, get, want, mog, look, hate, invented, need, going, tend, voted, did, age
ActO worship, hate, prefer, against, want, date, after, over, see, sought, towards
Attr orthodox, fucking, rich, religious, secular, international, anglo, elite, greedy
Jews ActS control, did, created, want, pushing, won, own, pushed, made, win, took, push
ActO hate, blame, ashkenazi, against, because, blaming, gas, kill, hated, hating
Attr fucking, it, bluepilled, moral, low, normal, others, larping, stupid, ill, little
LGBTQ+ ActS get, exist, go, say, need, think, fuck, try, deserve, trying, look, make, want
ActO hate, fucking, coping, fuck, shut, ban, turning, kill, banned, larping, go
Attr social, severe, functioning, fucking, extreme, crippling, sentence, complete
Neurodiverse ActS exist, makes, worse, causes, affect, get, comes, means, make, sucks, goes, go
ActO because, due, diagnosed, cure, fucking, having, caused, cause, causes
Attr fellow, other, blackpilled, true, real, actual, white, ugly, bluepilled, blackpill
Incels ActS get, exist, means, need, ascend, go, cope, know, want, say, become, going, look
ActO help, against, hate, coping, create, see, creating, hates, bullying, die, ok, bullied

Table 2: Actions and attributes associated with identity group terms in incels.is dataset. Attr refers to attributes,
while ActS are actions for which the identity is a subject and ActO are actions for which the identity is an object.

5.4 Associations with identities (RQ4) yny, casting women as things to be “attracted” or
Terms commonly associated with identity groups controlled.
are presented in Table 2. Across groups, we find Men are also discussed with an emphasis on
that the most frequent attributes relate to physical physical appearance, as well as domination. “Ugly,”
features (“ugly,” “short,” etc.). The use of these de- “looking,” “average,” and “tall” are all top attribu-
scriptors suggest hierarchies based on appearance, tions for men. Top actions include “get,” “need,”
race and gender. This focus on physical appearance and “mogs,” an incel neologism meaning “domi-
is apparent in top terms used to describe women, nate”. An example post that reinforces gender hi-
including “young,” “fat,” and “hot.” Example uses erarchies reads, “men literally mog femoids across
show the hierarchies of appearance that incels apply the board, yet the foids whine about it.”
to women: “some can’t tell a beta female apart from
Race is relevant in discussions of gendered iden-
a hot whore and so lump all types of the female
tities. “White,” for example, is a top descriptor for
sub species together.”4 Actions for which women
both women and men, as is “black” for men. Top
are subjects suggest incels’ speculation about what
terms suggest a negotiated association of superior-
women “want,” “love,” or “hate.” Common actions
ity with whiteness. White people are the grammati-
for which women are grammatical objects include
cal objects of “worship” but also of “hate.” “Pure”
“fuck” and “hate”– evidence of the forum’s misog-
and “nordic,” common white supremacist descrip-
4
Quotes are paraphrased for privacy (Williams et al., 2017) tors, are distinctive attributes used for white people.
6
In contrast, Asian people are cast as subjects of of victimhood at the hands of corporations, Jewish
actions like “worship” and “cope.” people, and the media are also present: “Jews and
We find a range of common stereotypes for mi- the media hate incels, and the gaming industry is
noritized identities, particularly conspiratorial an- full of SJWs [social justice warriors].”
tisemitic tropes, as evidenced by terms suggest- Race is also important–and controversial–in as-
ing a global Jewish conspiracy (e.g., “elite” and sociations made with incels. “White” is a top in-
“control”) and derogatory associations with the cels attribute on the forum, and both “ethnic” and
Holocaust (“gas”). One example post reads, “the “white” are associated with truecels (see Table 3 in
endgame is an global Jewish Communist dictator- Appendix A for top terms related to truecels and
ship,” while another mixes antisemitic conspiracy fakecels on the forum). There is controversy over
theories with anti-feminism: “feminism is a sub- which races occupy what positions in an assumed
versive Jewish movement designed to ruin us.” hierarchy, often centering around the “just be white”
LGBTQ+ characterizations are negative and as- (JBW) theory that white men have access to sexual
sociated with inauthenticity (e.g., “larping,” or live relationships with women of all races. Some posts
action role playing). For example, one posts reads, support this theory, e.g., “being white is a +3 when
“Lesbians don’t exist. They’re just bisexual foids it comes to noodles [Asian women], so a 4/10 white
who like women but still can’t resist Chad.” is better than a 6/10 ethnic.” Others challenge this
Violence is associated with Black people (“com- notion: “a brown man with a chiseled face will
mit”), for example in one post that reads, “I don’t mog a white incel everywhere.” Still others echo
hate blacks because they’re ugly, I hate them be- the white supremacist Great Replacement Theory,
cause no matter where they are they commit crime.” blaming JBW as a way for incels of color to “get
whitecels out so sh**skins can take over.”
In-group identity associations Victimhood, race “Real” and “true” are top attributes associated
and authenticity are common themes associated with talk about incels, echoing a focus on authen-
with identity mentions of incels themselves and ticity in the top “cel” variants. This boundary-
“incel” variants on the platform. keeping is also visible in the words associated
The most frequent lexical variations containing with fakecels (e.g., “detected” and “ban”). Au-
the “cel” suffix are in Table 1. Top terms relate thentic incels are victims of women’s hatred (“if
to authenticity, including “fakecel,” “truecel” and women aren’t trying to kill you, you’re not a true
“volcel,” (“voluntary celibate”), a focus that has incel”), post a lot (“graycels are a joke with their
also been observed in incel subreddits (Gothard, tiny post counts”) are unattractive (“I’m an incel, of
2021). Platform affordances highlight distinctions course she said no to my hideous face”), have no fe-
between frequent and non-frequent posters: “gr- male friends (“what true incel has a female friend?
eycels” who have posted less than 500 times have a stupid newf*g”) and do not date (“normie spotted.
gray-colored username and are often deemed inau- real incels are doing this all weekend and have no
thentic. Variants related to race (“whitecels,” “eth- dates”). They also are unable to “ascend” (i.e.,
nicels”) are also frequent, suggesting the impor- have sex and leave inceldom), and are “mogged”
tance of race in incel self-classification (Jaki et al., by others. Jaki et al. (2019) found similar themes
2019; Farrell et al., 2020). Also visible in these in an earlier incel dataset.
“cel” variations are a set of categories based on the
familar theme of physical appearance (e.g., “fatcels” 6 Discussion
and “youngcels”). “Femcels,” or female incels, are
frequently mentioned, usually derided as outside Across our quantitative analysis of the distribution
the inherent masculinity of inceldom. and associations made with identity terms, we see
Incels are cast as merely “existing” or “coping,” evidence of an ideology where physical appearance
(Table 2) while others “hate” them or are “against” determines human value, as has been found with
them. This victimhood includes common mascu- prior work on incels (Maxwell et al., 2020; Baele
line tropes, such as a supposed inability to con- et al., 2021; Pruden, 2021). This ideology essential-
trol themselves. From one post: “we can’t control izes social constructs, such as race and gender, as
what we want, devaluation of women is a coping biological physical features impacting desirability,
mechanism for not being able to elicit a biological with controversy over the role of race.
response in them.” Common far-right narratives We find strong evidence for gender as a cen-
7
tral focus of incel discussion; mentions of men ated (Bucholtz and Hall, 2010); we find contention
and women far surpass the number of mentions of around race in inceldom. “White” is associated
any other identity. We find that this community with both true and fake incels on the platform, of-
commonly uses novel identity terms that may not ten in connection with the folk JBW theory that
appear in generic lists, including many derogatory white men appeal to women of all races.
terms for women (“foids,” “landwhales”).
Implications for automated hate speech detec-
Increases in mentions of LGBTQ+, political,
tion Central to many hate speech definitions is
Jewish, and Black identities, often with stereotypes
whether a text denigrates groups based on iden-
and conspiracy theories, could suggest this com-
tity characteristics (Sellars, 2016; Sanguinetti et al.,
munity has incorporated broader far-right trends.
2018; Poletto et al., 2021). Identity terms are, thus,
An increasing politicization is reflected in this ex-
a major indicator and concern for hate speech de-
ample post: “we don’t need society to completely
tection. In our analysis of identity construction
accept the incel ideology, we just need to masquer-
on incels.is, we confirm that mentions of men and
ade as normies and keep bashing women, jews and
women identity terms are much more frequent than
gays.” Our evidence from text analysis supports the
in a similar source of unlabeled hate speech: white
common user movement that Mamié et al. (2021)
supremacist data. Incel texts, then, may be a good
found from manosphere content to alt-right con-
source of unlabeled or annotated data for misog-
tent on YouTube and Reddit. We find that many
yny detection. The dangerous black-pilled ideol-
associations on incels.is reinforce stereotypes such
ogy in particular is missing from current misogyny
as LGBTQ+ identities being fake, Black people
datasets (Guest et al., 2021). Such data should be
being criminals, and antisemitic conspiracy theo-
considered, but the broader issue is a need for sub-
ries. Users who are central in the forum’s shared
ject matter expertise in building such datasets for
network devote more identity mentions, propor-
automated hate speech detection. Experts should
tionally, to Black, LGBTQ+, and Jewish people
be consulted to know where to look for training
compared to average users, suggesting that leaders
data so that specific types of hateful movements,
on the platform play a role in broadening the dis-
with lexical or other linguistic innovations, are not
cussion to include mentions of marginalized iden-
overlooked.
tity groups other than women. We also find more
We find that almost 30% of identity mentions
mentions of neurodiversity and mental health in
in our dataset involve community-specific neol-
this online community than in a dataset of white
ogisms, often derogatory terms against women.
supremacist online content, which may be part of a
Training hate speech classifiers on data that does
victimhood narrative.
not include these terms hinders the ability to de-
The overarching black-pilled ideology of physi- tect this substantial source of contemporary online
cal appearance determining human worth also ex- misogyny.
tends to talk about incels themselves on incels.is.
Our analysis also draws attention to the ideolog-
Jaki et al. (2019) also find this “negative self-image”
ical associations being made with identities in this
on a precursor forum, which is theorized by Nagle
discourse space. We find problematic stereotypes
(2015) and Ging (2019). Though incels are pre-
against not only women, but also LGBTQ+, Black,
sented as occupying the lowest status among men,
and Jewish people. Thus, incel text data is not
we find fierce gate-keeping around who can claim
only a source of misogyny, but also reflects broader
the identity and its perceived victimhood. This
trends related to the mainstreaming of far-right
echoes theoretical work by Kleinke and Bös (2015)
beliefs. Particularly pernicious is a black-pilled
finding that online communities often disparage
ideology that physical appearance determines hu-
less typical members along with out-groups.
man value, a reinforcement and extension of es-
Central users are active in discussions of au- sentialized gender and racial hierarchies. Hence,
thenticity. Such victimhood could lead “authentic” fatphobia, homophobia, ableism, and racism are
misogynist incels to pursue symbolic–or material– all wrapped up in misogynist incel content. Auto-
action against “fake” incels in the community but matically detecting this broader ideology may be
also against the perceived unjust system and the unattainable or extremely difficult with machine
women they believe benefit from it. learning techniques, but we emphasize practition-
Identities constructed in interaction are negoti- ers and researchers should be aware of this ideol-
8
ogy. A narrow focus on hate against women from does not capture attitudes held toward high-profile
these communities will miss these important–and members of those groups, which play a role in
increasing–trends toward politicization and hate circulating associations with identities (such as per-
against other groups. sonal attacks on women in gaming or the use of
“George Soros” as shorthand for antisemitic con-
7 Conclusion and Future Work spiracy theories). Future work may try to capture
and measure these attitudes.
The incel movement and the collective identity
around it is a relatively new expression of male Incels.is is a large, popular forum for black-
supremacism. In this paper, we use quantitative pilled incel discourse, which has a unique and ex-
text and network analysis techniques to investigate treme ideology that we argue is under-represented
how identities are constructed in discourse on one in current hate speech datasets. However, our anal-
of the largest incel forums. We study the identity ysis is limited to this forum, and the trends we
group mention frequency over time, as well as ac- identify may not apply to more moderate incel dis-
tions and attributes associated with them. course (e.g., r/IncelsWithoutHate) or related online
We find that talk about women and men dom- male supremacist movements, such as MGTOW
inates identity mentions on this forum, though and PUAs. Though these communities are known
mentions of marginalized identities commonly to have related, but distinct jargon (Farrell et al.,
targeted by far-right groups increase from 2017- 2020), we emphasize that researchers should recog-
2021, appearing in textual contexts that propagate nize these lexical innovations in their annotations
stereotypes. Many of these mentions use novel, for hate speech and include a variety of these com-
community-specific identity terms that would be munities in training datasets for misogyny.
missed with generic lists of identities or hate speech
training data from other contexts. Future work Ethics Statement
could systematically evaluate the ability of existing The Association of Internet Researchers (AoIR)
hate speech classifiers to handle this jargon, as well acknowledges internet research is complex, dy-
as the particularly dangerous black-pilled ideology. namic and often involves many gray areas– specif-
This ideology is apparent in discussions of iden- ically related to what constitutes human subjects,
tities, including in-group ones, that reinforce rigid private versus public spaces and data versus per-
physical hierarchies based on attractiveness, gen- sons (Markham and Buchanan, 2012). For this rea-
der, and race. We find race is a site of contention in son, the AoIR guidance recommends an inductive,
discussions of who are “true” incels. Gatekeeping ongoing and context-specific approach to ethics
around incel authenticity is common. throughout the research process. At all stages, this
Negotiation around race and inceldom, as well as involves being mindful of the vulnerability of the
intersectional racism and misogyny in incel forums community under study and taking efforts to pro-
would be a fruitful avenue for future work. This tect them where appropriate, while balancing their
dataset could also be compared with other incel rights with social benefits and the researcher’s right
discussions, such as incels.me and earlier banned to conduct research.
subreddits r/incels and r/braincels. The role of plat- Following this guidance, we subscribe to a util-
form affordances and informal mentorship on the itarian philosophy where we focus on doing the
platform could be further investigated, as Perry and greatest good for the greatest number of people.
DeDeo (2021) mapped different user pathways in In the case of black-pilled incels, we believe the
and out of r/TheRedPill. Further network analysis necessity to better understand this potentially dan-
could reveal how the behaviors we identify, includ- gerous group outweighs the possible damage to
ing a rise in mentions of marginalized and political forum members. For this reason, in addition to the
identities, were spread in this community and why. AoIR guidance outlined above, we have followed
some commonly accepted standards to protect par-
Limitations
ticipants and refrain from amplifying misogynist
Our approaches largely focus on explicit mentions voices.
of identity terms. This does not capture whether Data was collected only from publicly available
the identity term is the target of hate speech, which online message boards and no private or identifiable
would require further analysis. This approach also information has been included in this manuscript.
9
We are not publishing user names, though we did J. M. Berger. 2018. Extremism. MIT Press.
observe them in our analysis of central users. We
Shiladitya Bhattacharya, Siddharth Singh, Ritesh Ku-
also did not subscribe to any channels or recirculate
mar, Akanksha Bansal, Akash Bhagat, and Yogesh
any content to ensure our work does not contribute Dawer. 2020. Developing a Multilingual Annotated
to the monetization of the forum or associated ac- Corpus of Misogyny and Aggression. In Proceedings
counts. Following the WOAH recommendation, of the Second Workshop on Trolling, Aggression and
we paraphrase posts to retain key aspects while Cyberbullying, pages 158–168, Marseille, France.
European Language Resources Association (ELRA).
protecting users’ privacy.
Mary Bucholtz and Kira Hall. 2005. Identity and in-
Acknowledgements teraction: A sociocultural linguistic approach. Dis-
course Studies, 7(4-5):585–614.
This work was supported in part by the Collabo-
ratory Against Hate: Research and Action Center Mary Bucholtz and Kira Hall. 2010. Locating Iden-
at Carnegie Mellon University and the University tity in Language. In Carmen Llamas and Dominic
of Pittsburgh. The Center for Informed Democ- Watt, editors, Language and Identities, pages 18–28.
Edinburgh University Press, Edinburgh.
racy and Social Cybersecurity at Carnegie Mellon
University also provided support. Viv Burr and Penny Dick. 2017. Social Constructionism.
In Brendan Gough, editor, The Palgrave Handbook
of Critical Social Psychology, pages 59–80. Palgrave
References Macmillan UK, London.
Hind S. Alatawi, Areej M. Alhothali, and Kawthar M. L. Richard Carley, Jeff Reminga, and Kathleen M. Car-
Moria. 2021. Detecting White Supremacist Hate ley. 2018. ORA & NetMapper. In International Con-
Speech Using Domain Specific Word Embedding ference on Social Computing, Behavioral-Cultural
with Deep Learning and BERT. IEEE Access, Modeling and Prediction and Behavior Representa-
9:106363–106374. tion in Modeling and Simulation, volume 3. Springer.
Maria Anzovino, Elisabetta Fersini, and Paolo Rosso.
2018. Automatic Identification and Classification of Béatrice Daille. 1994. Approche mixte pour l’extraction
Misogynistic Language on Twitter. In Natural Lan- automatique de terminologie: statistiques lexicales
guage Processing and Information Systems, Lecture et filtres linguistiques. Ph.D. Thesis, Paris Diderot
Notes in Computer Science, pages 57–64. Springer University.
International Publishing.
Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou,
Stephane J. Baele, Lewys Brace, and Travis G. Coan. Matthew Gentzkow, Jesse Shapiro, and Dan Juraf-
2021. From “Incel” to “Saint”: Analyzing the violent sky. 2019. Analyzing Polarization in Social Media:
worldview behind the 2018 Toronto attack. Terrorism Method and Application to Tweets on 21 Mass Shoot-
and Political Violence, 33(8):1667–1691. ings. In Proceedings of the 2019 Conference of the
North American Chapter of the Association for Com-
David Bamman, Brendan O’Connor, and Noah A Smith. putational Linguistics: Human Language Technolo-
2013. Learning Latent Personas of Film Characters. gies, pages 2970–3005.
Proceedings of the 51st Annual Meeting of the Asso-
ciation for Computational Linguistics (ACL 2013), Tracie Farrell, Oscar Araque, Miriam Fernandez, and
pages 352–361. Harith Alani. 2020. On the use of Jargon and Word
Embeddings to Explore Subculture within the Red-
David Bamman, Ted Underwood, and Noah A. Smith. dit’s Manosphere. In 12th ACM Conference on Web
2014. A Bayesian Mixed Effects Model of Literary Science, pages 221–230, New York, NY, USA. Asso-
Character. Proceedings of the 52nd Annual Meet- ciation for Computing Machinery.
ing of the Association for Computational Linguistics
(ACL 2014), pages 370–379. Tracie Farrell, Miriam Fernandez, Jakub Novotny, and
Valerio Basile, Cristina Bosco, Elisabetta Fersini, Deb- Harith Alani. 2019. Exploring Misogyny across the
ora Nozza, Viviana Patti, Francisco Rangel, Paolo Manosphere in Reddit. In Proceedings of the 10th
Rosso, and Manuela Sanguinetti. 2019. SemEval- ACM Conference on Web Science, pages 87–96, New
2019 Task 5: Multilingual Detection of Hate Speech York, NY, USA. Association for Computing Machin-
Against Immigrants and Women in Twitter. In Pro- ery.
ceedings of the 13th International Workshop on Se-
mantic Evaluation (SemEval-2019), pages 54–63. Elisabetta Fersini, Debora Nozza, and Paolo Rosso.
2018a. Overview of the Evalita 2018 Task on Auto-
Robert D. Benford and David A. Snow. 2000. Framing matic Misogyny Identification (AMI). In Proceed-
Processes and Social Movements: An Overview and ings of the Sixth Evaluation Campaign of Natural
Assessment. Annual Review of Sociology, 26(1):611– Language Processing and Speech Tools for Italian
639. (EVALITA 2018), Turin, Italy.

10
Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. Robin Mamié, Manoel Horta Ribeiro, and Robert West.
2018b. Overview of the Task on Automatic 2021. Are Anti-Feminist Communities Gateways to
Misogyny Identification at IberEval 2018. In the Far Right? Evidence from Reddit and YouTube.
IberEval@SEPLN 2018, pages 214–228, Seville, In Proceedings of the 13th ACM Web Science Con-
Spain. ference 2021, pages 139–147, New York, NY, USA.
Association for Computing Machinery.
James Paul Gee. 2011. An Introduction to Discourse
Analysis: Theory and Method. Routledge, New York. Annette Markham and Elizabeth Buchanan. 2012. Ethi-
cal Decision-Making and Internet Research: Recom-
Debbie Ging. 2019. Alphas, Betas, and Incels: Theoriz- mendations from the AoIR Ethics Working Commit-
ing the Masculinities of the Manosphere. Men and tee (Version 2.0). Technical report.
Masculinities, 22(4):638–657.
December Maxwell, Sarah R. Robinson, Jessica R.
Kelly Caroline Gothard. 2021. The Incel Lexicon: Deci- Williams, and Craig Keaton. 2020. “A Short Story
phering the Emergent Cryptolect of a Global Misog- of a Lonely Guy”: A Qualitative Thematic Analysis
ynistic Community. Master’s thesis, The University of Involuntary Celibacy Using Reddit. Sexuality &
of Vermont and State Agricultural College, Vermont, Culture, 24(6):1852–1874.
United States.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
Ella Guest, Bertie Vidgen, Alexandros Mittos, Nishanth Dean. 2013. Distributed Representations of Words
Sastry, Gareth Tyson, and Helen Margetts. 2021. An and Phrases and their Compositionality. In Advances
Expert Annotated Dataset for the Detection of Online in Neural Information Processing Systems, pages
Misogyny. In Proceedings of the 16th Conference of 3111–3119.
the European Chapter of the Association for Compu- Cynthia Miller-Idriss. 2022. Hate in the Homeland: The
tational Linguistics: Main Volume, pages 1336–1350, New Global Far Right. Princeton University Press.
Online. Association for Computational Linguistics.
Ioannis Mollas, Zoe Chrysopoulou, Stamatis Karlos,
Frazer Heritage, Veronika Koller, Alexandra Krendel, and Grigorios Tsoumakas. 2020. ETHOS: an Online
and Abi Hawtin. 2019. MANTRaP:A Corpus Ap- Hate Speech Detection Dataset. ArXiv: 2006.08328.
proach to Researching Gender in Online Misogynist
Communities. In 12th BAAL LGaS SIG. Angela Nagle. 2015. An investigation into contempo-
rary online anti-feminist movements. Ph.D. Thesis,
Sarah Hewitt, T. Tiropanis, and C. Bokhove. 2016. The Dublin City University, Dublin, Ireland.
problem of identifying misogynist language on Twit-
ter (and other online social spaces). In Proceedings Chloe Perry and Simon DeDeo. 2021. The Cog-
of the 8th ACM Conference on Web Science, pages nitive Science of Extremist Ideologies Online.
333–335, New York, NY, USA. Association for Com- ArXiv:2110.00626.
puting Machinery. Fabio Poletto, Valerio Basile, Manuela Sanguinetti,
Cristina Bosco, and Viviana Patti. 2021. Resources
Sylvia Jaki, Tom De Smedt, Maja Gwóźdź, Rudresh and benchmark corpora for hate speech detection: a
Panchal, Alexander Rossa, and Guy De Pauw. 2019. systematic review. In Language Resources and Eval-
Online hatred of women in the Incels.me forum: Lin- uation, volume 55, pages 477–523. Springer Science
guistic analysis and automatic detection. Journal of and Business Media.
Language Aggression and Conflict, 7(2):240–268.
Meredith L. Pruden. 2021. “Maintaining Frame” in
Kenneth Joseph, Wei Wei, Matthew Benigni, and Kath- the Incelosphere: Mapping the Discourses, Repre-
leen M. Carley. 2016. A social-event based ap- sentations and Geographies of Involuntary Celibates
proach to sentiment analysis of identities and behav- Online. Ph.D. Thesis, Georgia State University, At-
iors in text. The Journal of Mathematical Sociology, lanta, Georgia, USA.
40(3):137–166.
Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Beld-
Sonja Kleinke and Birte Bös. 2015. Intergroup rudeness ing, and William Yang Wang. 2019. A Benchmark
and the metapragmatics of its negotiation in online Dataset for Learning to Intervene in Online Hate
discussion fora. Pragmatics, 25(1):47–71. Speech. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing
Mirko Lai, Marco Antonio Stranisci, Cristina Bosco, and the 9th International Joint Conference on Natu-
Rossana Damiano, and Viviana Patti. 2021. HaMor ral Language Processing, pages 4754–4763.
at the Profiling Hate Speech Spreaders on Twitter
Notebook for PAN at CLEF 2021. Technical report. Manoel Horta Ribeiro, Jeremy Blackburn, Barry Brad-
lyn, Emiliano De Cristofaro, Gianluca Stringhini,
Jack LaViolette and Bernie Hogan. 2019. Using plat- Summer Long, Stephanie Greenberg, and Savvas
form signals for distinguishing discourses: The case Zannettou. 2021. The Evolution of the Manosphere
of men’s rights and men’s liberation on Reddit. In across the Web. In Proceedings of the International
Proceedings of the 13th International Conference on AAAI Conference on Web and Social Media, vol-
Web and Social Media, ICWSM 2019, pages 323–334. ume 15, pages 196–207.

11
François Role and Mohamed Nadif. 2011. Handling the Zeerak Waseem and Dirk Hovy. 2016. Hateful Sym-
Impact of Low Frequency Events on Co-occurrence bols or Hateful People? Predictive Features for Hate
based Measures of Word Similarity - A Case Study of Speech Detection on Twitter. In Proceedings of the
Pointwise Mutual Information. In Proceedings of the NAACL-HLT 2016, pages 88–93.
International Conference on Knowledge Discovery
and Information Retrieval (KDIR-2011). Matthew L. Williams, Pete Burnap, and Luke Sloan.
2017. Towards an Ethical Framework for Publishing
Niloofar Safi Samghabadi, Parth Patwa, Srinivas Pykl, Twitter Data in Social Research: Taking into Account
Prerana Mukherjee, Amitava Das, and Thamar Users’ Views, Online Context and Algorithmic Esti-
Solorio. 2020. Aggression and Misogyny Detection mation. Sociology, 51(6):1149–1168.
using BERT: A Multi-Task Approach. In Proceed-
ings of the Second Workshop on Trolling, Aggression Michael Miller Yoder. 2021. Computational Models
and Cyberbullying, pages 11–16. of Identity Presentation in Language. Ph.D. thesis,
Carnegie Mellon University, Pittsburgh, Pennsylva-
nia, USA.
Manuela Sanguinetti, Fabio Poletto, Cristina Bosco,
Viviana Patti, and Marco Stranisci. 2018. An Ital- Michael Miller Yoder, Ahmad Diab, David West Brown,
ian Twitter Corpus of Hate Speech against Immi- and Kathleen M. Carley. 2023. A Weakly Super-
grants. In Proceedings of the Eleventh International vised Classifier and Dataset of White Supremacist
Conference on Language Resources and Evaluation Language. In Proceedings of the 61st Annual Meet-
(LREC’18), pages 2798–2895. ing of the Association for Computational Linguistics
(Volume 2: Short Papers).
Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Juraf-
sky, Noah A. Smith, and Yejin Choi. 2020. Social Michael Miller Yoder, Lynnette Ng, David West Brown,
Bias Frames: Reasoning about Social and Power Im- and Kathleen Carley. 2022. How Hate Speech Varies
plications of Language. In Proceedings of the 58th by Target Identity: A Computational Analysis. In
Annual Meeting of the Association for Computational Proceedings of the 26th Conference on Computa-
Linguistics, pages 5477–5490. tional Natural Language Learning (CoNLL), pages
27–39, Abu Dhabi, United Arab Emirates (Hybrid).
Joseph Seering, Felicia Ng, Zheng Yao, and Geoff Kauf- Association for Computational Linguistics.
man. 2018. Applications of Social Identity The-
ory to Research and Design in Social Computing. Michael Miller Yoder, Qinlan Shen, Yansen Wang, Alex
In Proceedings of the ACM Conference on Human- Coda, Yunseok Jang, Yale Song, Kapil Thadani, and
Computer Interaction, volume 2 of CSCW, pages Carolyn P. Rosé. 2020. Phans, Stans and Cishets:
1–33. Issue: January. Self-Presentation Effects on Content Propagation in
Tumblr. In 12th ACM Conference on Web Science
Andrew Sellars. 2016. Defining Hate Speech. Technical (WebSci ’20), pages 39–48.
report, Berkman Klein Center.
A Additional Tables
B. Simons and D. B. Skillicorn. 2020. A Boot-
strapped Model to Detect Abuse and Intent in White Table 3 shows actions and attributes associated with
Supremacist Corpora. In Proceedings - 2020 IEEE “trucels” and “fakecels,” common incel variants
International Conference on Intelligence and Secu- mentioned in incels.is.
rity Informatics, ISI 2020. Institute of Electrical and
Electronics Engineers Inc.

Michael Stubbs. 2001. Words and phrases: Corpus


studies of lexical semantics. Blackwell Publishers
Oxford.

Henri Tajfel. 1974. Social identity and intergroup be-


haviour. Social Science Information, 13(2):65–93.

Joshua Uyheng and Kathleen M. Carley. 2020. Bots


and online hate during the COVID-19 pandemic: case
studies in the United States and the Philippines. Jour-
nal of Computational Social Science, 3(2):445–468.

Joshua Uyheng and Kathleen M. Carley. 2021. An


Identity-Based Framework for Generalizable Hate
Speech Detection. In International Conference on So-
cial Computing, Behavioral-Cultural Modeling and
Prediction and Behavior Representation in Modeling
and Simulation, pages 121–130.

12
Identity Top PMI3 terms
Attr biggest, truest, real, actual, giga, ultimate, legit, confirmed, certified, blackpilled,
Truecels ugly, absolute, genuine, hope, ethnic, white, old, automatic, fellow, bigger, other
ActS ascend, get, post, know, rise, knows, remain, go, looks, confirmed, relate, rot,
understand, cope, use, tried, need, browse, make, suicide, spend, ldar, roped, take
ActO pleasure, confirmed, banning, mog, rejected, bluepilled, help, laid, banned, born,
save, doomed, over, seen, calling, see, die, excluded, bullying, dude, mock, mocking
Attr fucking, larping, biggest, volcel, inb4, obvious, banned, known, defending, other,
Fakecels massive, fuck, tbh, confirmed, gtfo, potential, normie, likely, one, looking, users
ActS detected, gtfo, confirmed, spotted, get, ascend, post, say, need, banned, try, posting,
larping, fuck, smh, come, leave, coming, bragging, go, worry, ruining, invade, piss
ActO ban, calling, banned, gtfo, weed, defending, expose, fucking, larping, spot, call,
exposed, defends, exposing, smell, purged, banning, defend, confirmed, found

Table 3: Representative actions and attributes associated with truecels and fakecels identity group terms in the
incels.is dataset. Attr refers to attributes, while ActS are actions for which the identity group is a subject and ActO
are actions for which the identity group is an object.

13

You might also like