Power of Explanations: Towards automatic debiasing in hate speech detection

Cai, Yi; Zimek, Arthur; Wunder, Gerhard; Ntoutsi, Eirini

Computer Science > Computation and Language

arXiv:2209.09975 (cs)

[Submitted on 7 Sep 2022]

Title:Power of Explanations: Towards automatic debiasing in hate speech detection

Authors:Yi Cai, Arthur Zimek, Gerhard Wunder, Eirini Ntoutsi

View PDF

Abstract:Hate speech detection is a common downstream application of natural language processing (NLP) in the real world. In spite of the increasing accuracy, current data-driven approaches could easily learn biases from the imbalanced data distributions originating from humans. The deployment of biased models could further enhance the existing social biases. But unlike handling tabular data, defining and mitigating biases in text classifiers, which deal with unstructured data, are more challenging. A popular solution for improving machine learning fairness in NLP is to conduct the debiasing process with a list of potentially discriminated words given by human annotators. In addition to suffering from the risks of overlooking the biased terms, exhaustively identifying bias with human annotators are unsustainable since discrimination is variable among different datasets and may evolve over time. To this end, we propose an automatic misuse detector (MiD) relying on an explanation method for detecting potential bias. And built upon that, an end-to-end debiasing framework with the proposed staged correction is designed for text classifiers without any external resources required.

Comments:	IEEE DSAA'22
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2209.09975 [cs.CL]
	(or arXiv:2209.09975v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2209.09975

Submission history

From: Yi Cai [view email]
[v1] Wed, 7 Sep 2022 14:14:03 UTC (1,117 KB)

Computer Science > Computation and Language

Title:Power of Explanations: Towards automatic debiasing in hate speech detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Power of Explanations: Towards automatic debiasing in hate speech detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators