Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Li, Shiyang; Yavuz, Semih; Chen, Wenhu; Yan, Xifeng

Computer Science > Computation and Language

arXiv:2109.06466 (cs)

[Submitted on 14 Sep 2021 (v1), last revised 19 Feb 2023 (this version, v2)]

Title:Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Authors:Shiyang Li, Semih Yavuz, Wenhu Chen, Xifeng Yan

View PDF

Abstract:Task-adaptive pre-training (TAPT) and Self-training (ST) have emerged as the major semi-supervised approaches to improve natural language understanding (NLU) tasks with massive amount of unlabeled data. However, it's unclear whether they learn similar representations or they can be effectively combined. In this paper, we show that TAPT and ST can be complementary with simple TFS protocol by following TAPT -> Finetuning -> Self-training (TFS) process. Experimental results show that TFS protocol can effectively utilize unlabeled data to achieve strong combined gains consistently across six datasets covering sentiment classification, paraphrase identification, natural language inference, named entity recognition and dialogue slot classification. We investigate various semi-supervised settings and consistently show that gains from TAPT and ST can be strongly additive by following TFS procedure. We hope that TFS could serve as an important semi-supervised baseline for future NLP studies.

Comments:	Findings of EMNLP 2021
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2109.06466 [cs.CL]
	(or arXiv:2109.06466v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.06466

Submission history

From: Shiyang Li [view email]
[v1] Tue, 14 Sep 2021 06:24:28 UTC (5,542 KB)
[v2] Sun, 19 Feb 2023 08:29:28 UTC (5,542 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Shiyang Li
Semih Yavuz
Wenhu Chen
Xifeng Yan

export BibTeX citation

Computer Science > Computation and Language

Title:Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators