CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning

Liu, Xiaoming; Zhang, Zhaohan; Wang, Yichen; Pu, Hang; Lan, Yu; Shen, Chao

Computer Science > Computation and Language

arXiv:2212.10341 (cs)

[Submitted on 20 Dec 2022 (v1), last revised 20 Oct 2023 (this version, v2)]

Title:CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning

Authors:Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Hang Pu, Yu Lan, Chao Shen

View PDF

Abstract:Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently. Latest proposed detectors usually take coarse text sequences as input and fine-tune pretrained models with standard cross-entropy loss. However, these methods fail to consider the linguistic structure of texts. Moreover, they lack the ability to handle the low-resource problem which could often happen in practice considering the enormous amount of textual data online. In this paper, we present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario. To exploit the linguistic feature, we encode coherence information in form of graph into text representation. To tackle the challenges of low data resource, we employ a contrastive learning framework and propose an improved contrastive loss for preventing performance degradation brought by simple samples. The experiment results on two public datasets and two self-constructed datasets prove our approach outperforms the state-of-art methods significantly. Also, we surprisingly find that MGTs originated from up-to-date language models could be easier to detect than these from previous models, in our experiments. And we propose some preliminary explanations for this counter-intuitive phenomena. All the codes and datasets are open-sourced.

Comments:	Accepted by EMNLP 2023 main cofference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2212.10341 [cs.CL]
	(or arXiv:2212.10341v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.10341

Submission history

From: Xiaoming Liu [view email]
[v1] Tue, 20 Dec 2022 15:26:19 UTC (9,973 KB)
[v2] Fri, 20 Oct 2023 13:21:51 UTC (7,940 KB)

Computer Science > Computation and Language

Title:CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators