On the Challenges of Building Datasets for Hate Speech Detection

Bhandari, Vitthal

Computer Science > Computation and Language

arXiv:2309.02912 (cs)

[Submitted on 6 Sep 2023]

Title:On the Challenges of Building Datasets for Hate Speech Detection

Authors:Vitthal Bhandari

View PDF

Abstract:Detection of hate speech has been formulated as a standalone application of NLP and different approaches have been adopted for identifying the target groups, obtaining raw data, defining the labeling process, choosing the detection algorithm, and evaluating the performance in the desired setting. However, unlike other downstream tasks, hate speech suffers from the lack of large-sized, carefully curated, generalizable datasets owing to the highly subjective nature of the task. In this paper, we first analyze the issues surrounding hate speech detection through a data-centric lens. We then outline a holistic framework to encapsulate the data creation pipeline across seven broad dimensions by taking the specific example of hate speech towards sexual minorities. We posit that practitioners would benefit from following this framework as a form of best practice when creating hate speech datasets in the future.

Comments:	12 pages, 1 figure
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2309.02912 [cs.CL]
	(or arXiv:2309.02912v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.02912

Submission history

From: Vitthal Bhandari [view email]
[v1] Wed, 6 Sep 2023 11:15:47 UTC (7,563 KB)

Computer Science > Computation and Language

Title:On the Challenges of Building Datasets for Hate Speech Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On the Challenges of Building Datasets for Hate Speech Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators