Evading AI-Generated Content Detectors using Homoglyphs

Creo, Aldan; Pudasaini, Shushanta

Computer Science > Computation and Language

arXiv:2406.11239v1 (cs)

[Submitted on 17 Jun 2024 (this version), latest version 28 Aug 2024 (v2)]

Title:Evading AI-Generated Content Detectors using Homoglyphs

Authors:Aldan Creo, Shushanta Pudasaini

View PDF HTML (experimental)

Abstract:The generation of text that is increasingly human-like has been enabled by the advent of large language models (LLMs). As the detection of AI-generated content holds significant importance in the fight against issues such as misinformation and academic cheating, numerous studies have been conducted to develop reliable LLM detectors. While promising results have been demonstrated by such detectors on test data, recent research has revealed that they can be circumvented by employing different techniques. In this article, homoglyph-based ($a \rightarrow {\alpha}$) attacks that can be used to circumvent existing LLM detectors are presented. The efficacy of the attacks is illustrated by analizing how homoglyphs shift the tokenization of the text, and thus its token loglikelihoods. A comprehensive evaluation is conducted to assess the effectiveness of homoglyphs on state-of-the-art LLM detectors, including Binoculars, DetectGPT, OpenAI's detector, and watermarking techniques, on five different datasets. A significant reduction in the efficiency of all the studied configurations of detectors and datasets, down to an accuracy of 0.5 (random guessing), is demonstrated by the proposed approach. The results show that homoglyph-based attacks can effectively evade existing LLM detectors, and the implications of these findings are discussed along with possible defenses against such attacks.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.11239 [cs.CL]
	(or arXiv:2406.11239v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.11239

Submission history

From: Aldan Creo [view email]
[v1] Mon, 17 Jun 2024 06:07:32 UTC (296 KB)
[v2] Wed, 28 Aug 2024 11:10:59 UTC (367 KB)

Computer Science > Computation and Language

Title:Evading AI-Generated Content Detectors using Homoglyphs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Evading AI-Generated Content Detectors using Homoglyphs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators