Identity Construction in A Misogynist Incels Forum
Identity Construction in A Misogynist Incels Forum
Identity Construction in A Misogynist Incels Forum
Our dataset contains 6,248,234 English-language Grouping identity terms We aggregate identity
public comments posted between the forum’s cre- terms referring to similar groups (such as LGBTQ+
ation in November 2017 and scraping in April people) and then further group those identities
2021.1 It includes forum and thread names, as into broader demographic categories (such as gen-
well as the date of posting, user names and the der/sexuality). To form these groupings, we adapt
comment’s full text. However, it does not contain identity terms group labels used in hate speech
images, which is a limitation. research from Uyheng and Carley (2020) and Yo-
der et al. (2022).2 Intersectional identity terms are
White supremacist dataset We compare identity counted for all groups indicated by the term, e.g.,
mentions on incels.is to another common source of “white women” was counted for both “white” and
unlabeled hate speech: white supremacist texts. “women.”
From a large, multi-domain, English-language Identity lexicon expansion To capture the neol-
white supremacist dataset (Yoder et al., 2023), we ogisms that incel communities are known for (Jaki
select posts from online forums in a similar time et al., 2019; Gothard, 2021), we expand our generic
frame as the incels data, 2015-2019 (the latest year identity lexicon to nearest neighbors in word em-
available in the white supremacist dataset). This bedding space, a common approach (Demszky
subset includes 3,410,623 posts from Stormfront, et al., 2019; Simons and Skillicorn, 2020; Lai et al.,
Iron March, and 4chan /pol/ in threads with fascist 2021). We trained a 300-dimension word2vec
and white supremacist topics or posted by users model (Mikolov et al., 2013) over our data and
choosing white supremacist, Nazi, Confederate or manually examined terms appearing at least 1,000
fascist flags. times among the top 30 nearest neighbors by cosine
distances to a) the 30 most frequent generic identity
4 Methods terms or b) the mean of identity term embeddings
in an identity group. This resulted in 84 new terms,
We take a quantitative approach to studying discur- the most frequent of which are in Table 1.
sive identity construction (Bucholtz and Hall, 2005;
Gee, 2011), borrowing a focus on in-group and Varieties of “incels” It is common in incel dis-
out-group identity presentation from social identity course to refer to different types of incels with
theory (Tajfel, 1974; Seering et al., 2018). Specif- terms including a “cel” suffix (Gothard, 2021). For
ically, we examine the use of identity terms and example, “tallcels” refers to tall incels and the racist
the immediate contexts in which they appear. A terms “currycels” and “ricecels” refer to South
few mentions of an identity group may not repre- Asian and East Asian incels, respectively. Exclud-
sent attitudes of participants, but associations re- ing usernames, over 1500 unique words used in our
peatedly made over the course of a 6 million-post incels.is dataset contained the string “cel,” many
corpus are more likely to capture widely shared of which referred to varieties of incels. We exam-
beliefs (Stubbs, 2001). ined the 100 most frequent words containing “cel”
2
Non-proprietary portions of identity term lexicons (in-
1
This dataset, without any private or identifying informa- cluding groupings and categorizations) and code for anal-
tion, will be made available to vetted researchers upon publi- yses in this paper are available at https://github.com/
cation of the main paper associated with it. michaelmilleryoder/incels_identities.
3
0.4 Dataset/identity lexicon
Incels data
Mentions/post
0
Wo Me Yo N LG As Bla Wh Je
uth eur i ws
men n od BTQ an ck ite
ive +
rse
Community-specific identity terms number of posts as the rest of the users combined.
foids, chads, manlets, stacies, boyo, femoids,
ethnics, chadlites, roasties, holes, betabux, 4.2 Associations with identity terms
landwhales, waifus, jbs, chicks, noodlewhores, Beyond the occurrence of identity term mentions,
soyboy, br0, aspie, betas, thots, traps, beckies, we analyze associations made with identities in
m8, boi their immediate contexts. Specifically, we extract
“Cel” variants actions taken by or to these groups, as well as at-
truecels, fakecels, volcels, greycels, tributes associated with them, a simple approach
escortcelling, gymcelling, ricecels, mentalcels, to analyzing the presentation of entities in dis-
currycels, fatcels, femcels, whitecels, framecels, course (Bamman et al., 2013, 2014; Yoder, 2021).
youngcels, oldcels, blackcels, ethnicels, brocels, For actions, we extract verbs where an identity
itcels, incelistan, nearcels, tallcels, shortcels, term is the subject or object from a dependency
locationcels, bluecels parse. Attributes are adjectives and appositives
whose head word is an identity term.
Table 1: Most frequent 25 novel identity and “cel” terms We surface the actions and attributes most dis-
found in the incels.is dataset. Plural and singular men- tinctively associated with each identity group with
tions are combined, as are “-ing” terms with their roots. PMI3 (Daille, 1994; Role and Nadif, 2011), a vari-
ant of pointwise mutual information that lowers the
ranking of low-frequency terms.
and grouped words that referred to incel variants,
except those referring to “fake” incels, within the 5 Results
incels identity group for further analysis.
5.1 Distribution of identity mentions (RQ1)
Central forum users We also analyze how fo- Prevalence of the most popular identity group men-
rum leaders (prototypical incels) use identity terms. tions in our incels.is dataset is seen in Figure 1.
To find such leaders based on network structure, we Expanding the generic identity list with context-
construct a undirected graph where nodes are users specific identity terms dramatically increases the
and edges are weighted by the number of shared detection of mentions of all identity groups, es-
threads (out of 154,049 threads) between them. pecially women, men, and neurodiverse people3 .
This graph contains 6819 users and 3,889,054 Adding these context-specific identity terms in-
links. We operationalize central users as the top 5% creases the total number of mentions identified
ranked by eigenvector centrality, which measures from 6.46 million to 8.91 million–a jump of 37.8%–
if users share threads with other highly-connected
users. These central users had a roughly similar 3
Many incels self-identify on the autism spectrum.
4
Identity group/category
−2
Black
Jews
LGBTQ+
Log word probability
−2.5 Men
Women
Politics
−3
−3.5
Jan 2018 Jul 2018 Jan 2019 Jul 2019 Jan 2020 Jul 2020 Jan 2021
Figure 2: Selected identity group and category mentions over time in the incels.is dataset. Mentions of other identity
groups remain steady.
Table 2: Actions and attributes associated with identity group terms in incels.is dataset. Attr refers to attributes,
while ActS are actions for which the identity is a subject and ActO are actions for which the identity is an object.
5.4 Associations with identities (RQ4) yny, casting women as things to be “attracted” or
Terms commonly associated with identity groups controlled.
are presented in Table 2. Across groups, we find Men are also discussed with an emphasis on
that the most frequent attributes relate to physical physical appearance, as well as domination. “Ugly,”
features (“ugly,” “short,” etc.). The use of these de- “looking,” “average,” and “tall” are all top attribu-
scriptors suggest hierarchies based on appearance, tions for men. Top actions include “get,” “need,”
race and gender. This focus on physical appearance and “mogs,” an incel neologism meaning “domi-
is apparent in top terms used to describe women, nate”. An example post that reinforces gender hi-
including “young,” “fat,” and “hot.” Example uses erarchies reads, “men literally mog femoids across
show the hierarchies of appearance that incels apply the board, yet the foids whine about it.”
to women: “some can’t tell a beta female apart from
Race is relevant in discussions of gendered iden-
a hot whore and so lump all types of the female
tities. “White,” for example, is a top descriptor for
sub species together.”4 Actions for which women
both women and men, as is “black” for men. Top
are subjects suggest incels’ speculation about what
terms suggest a negotiated association of superior-
women “want,” “love,” or “hate.” Common actions
ity with whiteness. White people are the grammati-
for which women are grammatical objects include
cal objects of “worship” but also of “hate.” “Pure”
“fuck” and “hate”– evidence of the forum’s misog-
and “nordic,” common white supremacist descrip-
4
Quotes are paraphrased for privacy (Williams et al., 2017) tors, are distinctive attributes used for white people.
6
In contrast, Asian people are cast as subjects of of victimhood at the hands of corporations, Jewish
actions like “worship” and “cope.” people, and the media are also present: “Jews and
We find a range of common stereotypes for mi- the media hate incels, and the gaming industry is
noritized identities, particularly conspiratorial an- full of SJWs [social justice warriors].”
tisemitic tropes, as evidenced by terms suggest- Race is also important–and controversial–in as-
ing a global Jewish conspiracy (e.g., “elite” and sociations made with incels. “White” is a top in-
“control”) and derogatory associations with the cels attribute on the forum, and both “ethnic” and
Holocaust (“gas”). One example post reads, “the “white” are associated with truecels (see Table 3 in
endgame is an global Jewish Communist dictator- Appendix A for top terms related to truecels and
ship,” while another mixes antisemitic conspiracy fakecels on the forum). There is controversy over
theories with anti-feminism: “feminism is a sub- which races occupy what positions in an assumed
versive Jewish movement designed to ruin us.” hierarchy, often centering around the “just be white”
LGBTQ+ characterizations are negative and as- (JBW) theory that white men have access to sexual
sociated with inauthenticity (e.g., “larping,” or live relationships with women of all races. Some posts
action role playing). For example, one posts reads, support this theory, e.g., “being white is a +3 when
“Lesbians don’t exist. They’re just bisexual foids it comes to noodles [Asian women], so a 4/10 white
who like women but still can’t resist Chad.” is better than a 6/10 ethnic.” Others challenge this
Violence is associated with Black people (“com- notion: “a brown man with a chiseled face will
mit”), for example in one post that reads, “I don’t mog a white incel everywhere.” Still others echo
hate blacks because they’re ugly, I hate them be- the white supremacist Great Replacement Theory,
cause no matter where they are they commit crime.” blaming JBW as a way for incels of color to “get
whitecels out so sh**skins can take over.”
In-group identity associations Victimhood, race “Real” and “true” are top attributes associated
and authenticity are common themes associated with talk about incels, echoing a focus on authen-
with identity mentions of incels themselves and ticity in the top “cel” variants. This boundary-
“incel” variants on the platform. keeping is also visible in the words associated
The most frequent lexical variations containing with fakecels (e.g., “detected” and “ban”). Au-
the “cel” suffix are in Table 1. Top terms relate thentic incels are victims of women’s hatred (“if
to authenticity, including “fakecel,” “truecel” and women aren’t trying to kill you, you’re not a true
“volcel,” (“voluntary celibate”), a focus that has incel”), post a lot (“graycels are a joke with their
also been observed in incel subreddits (Gothard, tiny post counts”) are unattractive (“I’m an incel, of
2021). Platform affordances highlight distinctions course she said no to my hideous face”), have no fe-
between frequent and non-frequent posters: “gr- male friends (“what true incel has a female friend?
eycels” who have posted less than 500 times have a stupid newf*g”) and do not date (“normie spotted.
gray-colored username and are often deemed inau- real incels are doing this all weekend and have no
thentic. Variants related to race (“whitecels,” “eth- dates”). They also are unable to “ascend” (i.e.,
nicels”) are also frequent, suggesting the impor- have sex and leave inceldom), and are “mogged”
tance of race in incel self-classification (Jaki et al., by others. Jaki et al. (2019) found similar themes
2019; Farrell et al., 2020). Also visible in these in an earlier incel dataset.
“cel” variations are a set of categories based on the
familar theme of physical appearance (e.g., “fatcels” 6 Discussion
and “youngcels”). “Femcels,” or female incels, are
frequently mentioned, usually derided as outside Across our quantitative analysis of the distribution
the inherent masculinity of inceldom. and associations made with identity terms, we see
Incels are cast as merely “existing” or “coping,” evidence of an ideology where physical appearance
(Table 2) while others “hate” them or are “against” determines human value, as has been found with
them. This victimhood includes common mascu- prior work on incels (Maxwell et al., 2020; Baele
line tropes, such as a supposed inability to con- et al., 2021; Pruden, 2021). This ideology essential-
trol themselves. From one post: “we can’t control izes social constructs, such as race and gender, as
what we want, devaluation of women is a coping biological physical features impacting desirability,
mechanism for not being able to elicit a biological with controversy over the role of race.
response in them.” Common far-right narratives We find strong evidence for gender as a cen-
7
tral focus of incel discussion; mentions of men ated (Bucholtz and Hall, 2010); we find contention
and women far surpass the number of mentions of around race in inceldom. “White” is associated
any other identity. We find that this community with both true and fake incels on the platform, of-
commonly uses novel identity terms that may not ten in connection with the folk JBW theory that
appear in generic lists, including many derogatory white men appeal to women of all races.
terms for women (“foids,” “landwhales”).
Implications for automated hate speech detec-
Increases in mentions of LGBTQ+, political,
tion Central to many hate speech definitions is
Jewish, and Black identities, often with stereotypes
whether a text denigrates groups based on iden-
and conspiracy theories, could suggest this com-
tity characteristics (Sellars, 2016; Sanguinetti et al.,
munity has incorporated broader far-right trends.
2018; Poletto et al., 2021). Identity terms are, thus,
An increasing politicization is reflected in this ex-
a major indicator and concern for hate speech de-
ample post: “we don’t need society to completely
tection. In our analysis of identity construction
accept the incel ideology, we just need to masquer-
on incels.is, we confirm that mentions of men and
ade as normies and keep bashing women, jews and
women identity terms are much more frequent than
gays.” Our evidence from text analysis supports the
in a similar source of unlabeled hate speech: white
common user movement that Mamié et al. (2021)
supremacist data. Incel texts, then, may be a good
found from manosphere content to alt-right con-
source of unlabeled or annotated data for misog-
tent on YouTube and Reddit. We find that many
yny detection. The dangerous black-pilled ideol-
associations on incels.is reinforce stereotypes such
ogy in particular is missing from current misogyny
as LGBTQ+ identities being fake, Black people
datasets (Guest et al., 2021). Such data should be
being criminals, and antisemitic conspiracy theo-
considered, but the broader issue is a need for sub-
ries. Users who are central in the forum’s shared
ject matter expertise in building such datasets for
network devote more identity mentions, propor-
automated hate speech detection. Experts should
tionally, to Black, LGBTQ+, and Jewish people
be consulted to know where to look for training
compared to average users, suggesting that leaders
data so that specific types of hateful movements,
on the platform play a role in broadening the dis-
with lexical or other linguistic innovations, are not
cussion to include mentions of marginalized iden-
overlooked.
tity groups other than women. We also find more
We find that almost 30% of identity mentions
mentions of neurodiversity and mental health in
in our dataset involve community-specific neol-
this online community than in a dataset of white
ogisms, often derogatory terms against women.
supremacist online content, which may be part of a
Training hate speech classifiers on data that does
victimhood narrative.
not include these terms hinders the ability to de-
The overarching black-pilled ideology of physi- tect this substantial source of contemporary online
cal appearance determining human worth also ex- misogyny.
tends to talk about incels themselves on incels.is.
Our analysis also draws attention to the ideolog-
Jaki et al. (2019) also find this “negative self-image”
ical associations being made with identities in this
on a precursor forum, which is theorized by Nagle
discourse space. We find problematic stereotypes
(2015) and Ging (2019). Though incels are pre-
against not only women, but also LGBTQ+, Black,
sented as occupying the lowest status among men,
and Jewish people. Thus, incel text data is not
we find fierce gate-keeping around who can claim
only a source of misogyny, but also reflects broader
the identity and its perceived victimhood. This
trends related to the mainstreaming of far-right
echoes theoretical work by Kleinke and Bös (2015)
beliefs. Particularly pernicious is a black-pilled
finding that online communities often disparage
ideology that physical appearance determines hu-
less typical members along with out-groups.
man value, a reinforcement and extension of es-
Central users are active in discussions of au- sentialized gender and racial hierarchies. Hence,
thenticity. Such victimhood could lead “authentic” fatphobia, homophobia, ableism, and racism are
misogynist incels to pursue symbolic–or material– all wrapped up in misogynist incel content. Auto-
action against “fake” incels in the community but matically detecting this broader ideology may be
also against the perceived unjust system and the unattainable or extremely difficult with machine
women they believe benefit from it. learning techniques, but we emphasize practition-
Identities constructed in interaction are negoti- ers and researchers should be aware of this ideol-
8
ogy. A narrow focus on hate against women from does not capture attitudes held toward high-profile
these communities will miss these important–and members of those groups, which play a role in
increasing–trends toward politicization and hate circulating associations with identities (such as per-
against other groups. sonal attacks on women in gaming or the use of
“George Soros” as shorthand for antisemitic con-
7 Conclusion and Future Work spiracy theories). Future work may try to capture
and measure these attitudes.
The incel movement and the collective identity
around it is a relatively new expression of male Incels.is is a large, popular forum for black-
supremacism. In this paper, we use quantitative pilled incel discourse, which has a unique and ex-
text and network analysis techniques to investigate treme ideology that we argue is under-represented
how identities are constructed in discourse on one in current hate speech datasets. However, our anal-
of the largest incel forums. We study the identity ysis is limited to this forum, and the trends we
group mention frequency over time, as well as ac- identify may not apply to more moderate incel dis-
tions and attributes associated with them. course (e.g., r/IncelsWithoutHate) or related online
We find that talk about women and men dom- male supremacist movements, such as MGTOW
inates identity mentions on this forum, though and PUAs. Though these communities are known
mentions of marginalized identities commonly to have related, but distinct jargon (Farrell et al.,
targeted by far-right groups increase from 2017- 2020), we emphasize that researchers should recog-
2021, appearing in textual contexts that propagate nize these lexical innovations in their annotations
stereotypes. Many of these mentions use novel, for hate speech and include a variety of these com-
community-specific identity terms that would be munities in training datasets for misogyny.
missed with generic lists of identities or hate speech
training data from other contexts. Future work Ethics Statement
could systematically evaluate the ability of existing The Association of Internet Researchers (AoIR)
hate speech classifiers to handle this jargon, as well acknowledges internet research is complex, dy-
as the particularly dangerous black-pilled ideology. namic and often involves many gray areas– specif-
This ideology is apparent in discussions of iden- ically related to what constitutes human subjects,
tities, including in-group ones, that reinforce rigid private versus public spaces and data versus per-
physical hierarchies based on attractiveness, gen- sons (Markham and Buchanan, 2012). For this rea-
der, and race. We find race is a site of contention in son, the AoIR guidance recommends an inductive,
discussions of who are “true” incels. Gatekeeping ongoing and context-specific approach to ethics
around incel authenticity is common. throughout the research process. At all stages, this
Negotiation around race and inceldom, as well as involves being mindful of the vulnerability of the
intersectional racism and misogyny in incel forums community under study and taking efforts to pro-
would be a fruitful avenue for future work. This tect them where appropriate, while balancing their
dataset could also be compared with other incel rights with social benefits and the researcher’s right
discussions, such as incels.me and earlier banned to conduct research.
subreddits r/incels and r/braincels. The role of plat- Following this guidance, we subscribe to a util-
form affordances and informal mentorship on the itarian philosophy where we focus on doing the
platform could be further investigated, as Perry and greatest good for the greatest number of people.
DeDeo (2021) mapped different user pathways in In the case of black-pilled incels, we believe the
and out of r/TheRedPill. Further network analysis necessity to better understand this potentially dan-
could reveal how the behaviors we identify, includ- gerous group outweighs the possible damage to
ing a rise in mentions of marginalized and political forum members. For this reason, in addition to the
identities, were spread in this community and why. AoIR guidance outlined above, we have followed
some commonly accepted standards to protect par-
Limitations
ticipants and refrain from amplifying misogynist
Our approaches largely focus on explicit mentions voices.
of identity terms. This does not capture whether Data was collected only from publicly available
the identity term is the target of hate speech, which online message boards and no private or identifiable
would require further analysis. This approach also information has been included in this manuscript.
9
We are not publishing user names, though we did J. M. Berger. 2018. Extremism. MIT Press.
observe them in our analysis of central users. We
Shiladitya Bhattacharya, Siddharth Singh, Ritesh Ku-
also did not subscribe to any channels or recirculate
mar, Akanksha Bansal, Akash Bhagat, and Yogesh
any content to ensure our work does not contribute Dawer. 2020. Developing a Multilingual Annotated
to the monetization of the forum or associated ac- Corpus of Misogyny and Aggression. In Proceedings
counts. Following the WOAH recommendation, of the Second Workshop on Trolling, Aggression and
we paraphrase posts to retain key aspects while Cyberbullying, pages 158–168, Marseille, France.
European Language Resources Association (ELRA).
protecting users’ privacy.
Mary Bucholtz and Kira Hall. 2005. Identity and in-
Acknowledgements teraction: A sociocultural linguistic approach. Dis-
course Studies, 7(4-5):585–614.
This work was supported in part by the Collabo-
ratory Against Hate: Research and Action Center Mary Bucholtz and Kira Hall. 2010. Locating Iden-
at Carnegie Mellon University and the University tity in Language. In Carmen Llamas and Dominic
of Pittsburgh. The Center for Informed Democ- Watt, editors, Language and Identities, pages 18–28.
Edinburgh University Press, Edinburgh.
racy and Social Cybersecurity at Carnegie Mellon
University also provided support. Viv Burr and Penny Dick. 2017. Social Constructionism.
In Brendan Gough, editor, The Palgrave Handbook
of Critical Social Psychology, pages 59–80. Palgrave
References Macmillan UK, London.
Hind S. Alatawi, Areej M. Alhothali, and Kawthar M. L. Richard Carley, Jeff Reminga, and Kathleen M. Car-
Moria. 2021. Detecting White Supremacist Hate ley. 2018. ORA & NetMapper. In International Con-
Speech Using Domain Specific Word Embedding ference on Social Computing, Behavioral-Cultural
with Deep Learning and BERT. IEEE Access, Modeling and Prediction and Behavior Representa-
9:106363–106374. tion in Modeling and Simulation, volume 3. Springer.
Maria Anzovino, Elisabetta Fersini, and Paolo Rosso.
2018. Automatic Identification and Classification of Béatrice Daille. 1994. Approche mixte pour l’extraction
Misogynistic Language on Twitter. In Natural Lan- automatique de terminologie: statistiques lexicales
guage Processing and Information Systems, Lecture et filtres linguistiques. Ph.D. Thesis, Paris Diderot
Notes in Computer Science, pages 57–64. Springer University.
International Publishing.
Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou,
Stephane J. Baele, Lewys Brace, and Travis G. Coan. Matthew Gentzkow, Jesse Shapiro, and Dan Juraf-
2021. From “Incel” to “Saint”: Analyzing the violent sky. 2019. Analyzing Polarization in Social Media:
worldview behind the 2018 Toronto attack. Terrorism Method and Application to Tweets on 21 Mass Shoot-
and Political Violence, 33(8):1667–1691. ings. In Proceedings of the 2019 Conference of the
North American Chapter of the Association for Com-
David Bamman, Brendan O’Connor, and Noah A Smith. putational Linguistics: Human Language Technolo-
2013. Learning Latent Personas of Film Characters. gies, pages 2970–3005.
Proceedings of the 51st Annual Meeting of the Asso-
ciation for Computational Linguistics (ACL 2013), Tracie Farrell, Oscar Araque, Miriam Fernandez, and
pages 352–361. Harith Alani. 2020. On the use of Jargon and Word
Embeddings to Explore Subculture within the Red-
David Bamman, Ted Underwood, and Noah A. Smith. dit’s Manosphere. In 12th ACM Conference on Web
2014. A Bayesian Mixed Effects Model of Literary Science, pages 221–230, New York, NY, USA. Asso-
Character. Proceedings of the 52nd Annual Meet- ciation for Computing Machinery.
ing of the Association for Computational Linguistics
(ACL 2014), pages 370–379. Tracie Farrell, Miriam Fernandez, Jakub Novotny, and
Valerio Basile, Cristina Bosco, Elisabetta Fersini, Deb- Harith Alani. 2019. Exploring Misogyny across the
ora Nozza, Viviana Patti, Francisco Rangel, Paolo Manosphere in Reddit. In Proceedings of the 10th
Rosso, and Manuela Sanguinetti. 2019. SemEval- ACM Conference on Web Science, pages 87–96, New
2019 Task 5: Multilingual Detection of Hate Speech York, NY, USA. Association for Computing Machin-
Against Immigrants and Women in Twitter. In Pro- ery.
ceedings of the 13th International Workshop on Se-
mantic Evaluation (SemEval-2019), pages 54–63. Elisabetta Fersini, Debora Nozza, and Paolo Rosso.
2018a. Overview of the Evalita 2018 Task on Auto-
Robert D. Benford and David A. Snow. 2000. Framing matic Misogyny Identification (AMI). In Proceed-
Processes and Social Movements: An Overview and ings of the Sixth Evaluation Campaign of Natural
Assessment. Annual Review of Sociology, 26(1):611– Language Processing and Speech Tools for Italian
639. (EVALITA 2018), Turin, Italy.
10
Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. Robin Mamié, Manoel Horta Ribeiro, and Robert West.
2018b. Overview of the Task on Automatic 2021. Are Anti-Feminist Communities Gateways to
Misogyny Identification at IberEval 2018. In the Far Right? Evidence from Reddit and YouTube.
IberEval@SEPLN 2018, pages 214–228, Seville, In Proceedings of the 13th ACM Web Science Con-
Spain. ference 2021, pages 139–147, New York, NY, USA.
Association for Computing Machinery.
James Paul Gee. 2011. An Introduction to Discourse
Analysis: Theory and Method. Routledge, New York. Annette Markham and Elizabeth Buchanan. 2012. Ethi-
cal Decision-Making and Internet Research: Recom-
Debbie Ging. 2019. Alphas, Betas, and Incels: Theoriz- mendations from the AoIR Ethics Working Commit-
ing the Masculinities of the Manosphere. Men and tee (Version 2.0). Technical report.
Masculinities, 22(4):638–657.
December Maxwell, Sarah R. Robinson, Jessica R.
Kelly Caroline Gothard. 2021. The Incel Lexicon: Deci- Williams, and Craig Keaton. 2020. “A Short Story
phering the Emergent Cryptolect of a Global Misog- of a Lonely Guy”: A Qualitative Thematic Analysis
ynistic Community. Master’s thesis, The University of Involuntary Celibacy Using Reddit. Sexuality &
of Vermont and State Agricultural College, Vermont, Culture, 24(6):1852–1874.
United States.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
Ella Guest, Bertie Vidgen, Alexandros Mittos, Nishanth Dean. 2013. Distributed Representations of Words
Sastry, Gareth Tyson, and Helen Margetts. 2021. An and Phrases and their Compositionality. In Advances
Expert Annotated Dataset for the Detection of Online in Neural Information Processing Systems, pages
Misogyny. In Proceedings of the 16th Conference of 3111–3119.
the European Chapter of the Association for Compu- Cynthia Miller-Idriss. 2022. Hate in the Homeland: The
tational Linguistics: Main Volume, pages 1336–1350, New Global Far Right. Princeton University Press.
Online. Association for Computational Linguistics.
Ioannis Mollas, Zoe Chrysopoulou, Stamatis Karlos,
Frazer Heritage, Veronika Koller, Alexandra Krendel, and Grigorios Tsoumakas. 2020. ETHOS: an Online
and Abi Hawtin. 2019. MANTRaP:A Corpus Ap- Hate Speech Detection Dataset. ArXiv: 2006.08328.
proach to Researching Gender in Online Misogynist
Communities. In 12th BAAL LGaS SIG. Angela Nagle. 2015. An investigation into contempo-
rary online anti-feminist movements. Ph.D. Thesis,
Sarah Hewitt, T. Tiropanis, and C. Bokhove. 2016. The Dublin City University, Dublin, Ireland.
problem of identifying misogynist language on Twit-
ter (and other online social spaces). In Proceedings Chloe Perry and Simon DeDeo. 2021. The Cog-
of the 8th ACM Conference on Web Science, pages nitive Science of Extremist Ideologies Online.
333–335, New York, NY, USA. Association for Com- ArXiv:2110.00626.
puting Machinery. Fabio Poletto, Valerio Basile, Manuela Sanguinetti,
Cristina Bosco, and Viviana Patti. 2021. Resources
Sylvia Jaki, Tom De Smedt, Maja Gwóźdź, Rudresh and benchmark corpora for hate speech detection: a
Panchal, Alexander Rossa, and Guy De Pauw. 2019. systematic review. In Language Resources and Eval-
Online hatred of women in the Incels.me forum: Lin- uation, volume 55, pages 477–523. Springer Science
guistic analysis and automatic detection. Journal of and Business Media.
Language Aggression and Conflict, 7(2):240–268.
Meredith L. Pruden. 2021. “Maintaining Frame” in
Kenneth Joseph, Wei Wei, Matthew Benigni, and Kath- the Incelosphere: Mapping the Discourses, Repre-
leen M. Carley. 2016. A social-event based ap- sentations and Geographies of Involuntary Celibates
proach to sentiment analysis of identities and behav- Online. Ph.D. Thesis, Georgia State University, At-
iors in text. The Journal of Mathematical Sociology, lanta, Georgia, USA.
40(3):137–166.
Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Beld-
Sonja Kleinke and Birte Bös. 2015. Intergroup rudeness ing, and William Yang Wang. 2019. A Benchmark
and the metapragmatics of its negotiation in online Dataset for Learning to Intervene in Online Hate
discussion fora. Pragmatics, 25(1):47–71. Speech. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing
Mirko Lai, Marco Antonio Stranisci, Cristina Bosco, and the 9th International Joint Conference on Natu-
Rossana Damiano, and Viviana Patti. 2021. HaMor ral Language Processing, pages 4754–4763.
at the Profiling Hate Speech Spreaders on Twitter
Notebook for PAN at CLEF 2021. Technical report. Manoel Horta Ribeiro, Jeremy Blackburn, Barry Brad-
lyn, Emiliano De Cristofaro, Gianluca Stringhini,
Jack LaViolette and Bernie Hogan. 2019. Using plat- Summer Long, Stephanie Greenberg, and Savvas
form signals for distinguishing discourses: The case Zannettou. 2021. The Evolution of the Manosphere
of men’s rights and men’s liberation on Reddit. In across the Web. In Proceedings of the International
Proceedings of the 13th International Conference on AAAI Conference on Web and Social Media, vol-
Web and Social Media, ICWSM 2019, pages 323–334. ume 15, pages 196–207.
11
François Role and Mohamed Nadif. 2011. Handling the Zeerak Waseem and Dirk Hovy. 2016. Hateful Sym-
Impact of Low Frequency Events on Co-occurrence bols or Hateful People? Predictive Features for Hate
based Measures of Word Similarity - A Case Study of Speech Detection on Twitter. In Proceedings of the
Pointwise Mutual Information. In Proceedings of the NAACL-HLT 2016, pages 88–93.
International Conference on Knowledge Discovery
and Information Retrieval (KDIR-2011). Matthew L. Williams, Pete Burnap, and Luke Sloan.
2017. Towards an Ethical Framework for Publishing
Niloofar Safi Samghabadi, Parth Patwa, Srinivas Pykl, Twitter Data in Social Research: Taking into Account
Prerana Mukherjee, Amitava Das, and Thamar Users’ Views, Online Context and Algorithmic Esti-
Solorio. 2020. Aggression and Misogyny Detection mation. Sociology, 51(6):1149–1168.
using BERT: A Multi-Task Approach. In Proceed-
ings of the Second Workshop on Trolling, Aggression Michael Miller Yoder. 2021. Computational Models
and Cyberbullying, pages 11–16. of Identity Presentation in Language. Ph.D. thesis,
Carnegie Mellon University, Pittsburgh, Pennsylva-
nia, USA.
Manuela Sanguinetti, Fabio Poletto, Cristina Bosco,
Viviana Patti, and Marco Stranisci. 2018. An Ital- Michael Miller Yoder, Ahmad Diab, David West Brown,
ian Twitter Corpus of Hate Speech against Immi- and Kathleen M. Carley. 2023. A Weakly Super-
grants. In Proceedings of the Eleventh International vised Classifier and Dataset of White Supremacist
Conference on Language Resources and Evaluation Language. In Proceedings of the 61st Annual Meet-
(LREC’18), pages 2798–2895. ing of the Association for Computational Linguistics
(Volume 2: Short Papers).
Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Juraf-
sky, Noah A. Smith, and Yejin Choi. 2020. Social Michael Miller Yoder, Lynnette Ng, David West Brown,
Bias Frames: Reasoning about Social and Power Im- and Kathleen Carley. 2022. How Hate Speech Varies
plications of Language. In Proceedings of the 58th by Target Identity: A Computational Analysis. In
Annual Meeting of the Association for Computational Proceedings of the 26th Conference on Computa-
Linguistics, pages 5477–5490. tional Natural Language Learning (CoNLL), pages
27–39, Abu Dhabi, United Arab Emirates (Hybrid).
Joseph Seering, Felicia Ng, Zheng Yao, and Geoff Kauf- Association for Computational Linguistics.
man. 2018. Applications of Social Identity The-
ory to Research and Design in Social Computing. Michael Miller Yoder, Qinlan Shen, Yansen Wang, Alex
In Proceedings of the ACM Conference on Human- Coda, Yunseok Jang, Yale Song, Kapil Thadani, and
Computer Interaction, volume 2 of CSCW, pages Carolyn P. Rosé. 2020. Phans, Stans and Cishets:
1–33. Issue: January. Self-Presentation Effects on Content Propagation in
Tumblr. In 12th ACM Conference on Web Science
Andrew Sellars. 2016. Defining Hate Speech. Technical (WebSci ’20), pages 39–48.
report, Berkman Klein Center.
A Additional Tables
B. Simons and D. B. Skillicorn. 2020. A Boot-
strapped Model to Detect Abuse and Intent in White Table 3 shows actions and attributes associated with
Supremacist Corpora. In Proceedings - 2020 IEEE “trucels” and “fakecels,” common incel variants
International Conference on Intelligence and Secu- mentioned in incels.is.
rity Informatics, ISI 2020. Institute of Electrical and
Electronics Engineers Inc.
12
Identity Top PMI3 terms
Attr biggest, truest, real, actual, giga, ultimate, legit, confirmed, certified, blackpilled,
Truecels ugly, absolute, genuine, hope, ethnic, white, old, automatic, fellow, bigger, other
ActS ascend, get, post, know, rise, knows, remain, go, looks, confirmed, relate, rot,
understand, cope, use, tried, need, browse, make, suicide, spend, ldar, roped, take
ActO pleasure, confirmed, banning, mog, rejected, bluepilled, help, laid, banned, born,
save, doomed, over, seen, calling, see, die, excluded, bullying, dude, mock, mocking
Attr fucking, larping, biggest, volcel, inb4, obvious, banned, known, defending, other,
Fakecels massive, fuck, tbh, confirmed, gtfo, potential, normie, likely, one, looking, users
ActS detected, gtfo, confirmed, spotted, get, ascend, post, say, need, banned, try, posting,
larping, fuck, smh, come, leave, coming, bragging, go, worry, ruining, invade, piss
ActO ban, calling, banned, gtfo, weed, defending, expose, fucking, larping, spot, call,
exposed, defends, exposing, smell, purged, banning, defend, confirmed, found
Table 3: Representative actions and attributes associated with truecels and fakecels identity group terms in the
incels.is dataset. Attr refers to attributes, while ActS are actions for which the identity group is a subject and ActO
are actions for which the identity group is an object.
13