Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

Zhao, Mengjie; Lin, Tao; Mi, Fei; Jaggi, Martin; Schütze, Hinrich

Computer Science > Computation and Language

arXiv:2004.12406 (cs)

[Submitted on 26 Apr 2020 (v1), last revised 11 Oct 2020 (this version, v2)]

Title:Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

Authors:Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Schütze

View PDF

Abstract:We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

Comments:	EMNLP 2020; MZ and TL contribute equally
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2004.12406 [cs.CL]
	(or arXiv:2004.12406v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2004.12406

Submission history

From: Mengjie Zhao [view email]
[v1] Sun, 26 Apr 2020 15:03:47 UTC (334 KB)
[v2] Sun, 11 Oct 2020 11:52:08 UTC (565 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-04

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tao Lin
Martin Jaggi
Hinrich Schütze

export BibTeX citation

Computer Science > Computation and Language

Title:Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators