schuBERT: Optimizing Elements of BERT

Khetan, Ashish; Karnin, Zohar

Computer Science > Computation and Language

arXiv:2005.06628 (cs)

[Submitted on 9 May 2020]

Title:schuBERT: Optimizing Elements of BERT

Authors:Ashish Khetan, Zohar Karnin

View PDF

Abstract:Transformers \citep{vaswani2017attention} have gradually become a key component for many state-of-the-art natural language representation models. A recent Transformer based model- BERT \citep{devlin2018bert} achieved state-of-the-art results on various natural language processing tasks, including GLUE, SQuAD v1.1, and SQuAD v2.0. This model however is computationally prohibitive and has a huge number of parameters. In this work we revisit the architecture choices of BERT in efforts to obtain a lighter model. We focus on reducing the number of parameters yet our methods can be applied towards other objectives such FLOPs or latency. We show that much efficient light BERT models can be obtained by reducing algorithmically chosen correct architecture design dimensions rather than reducing the number of Transformer encoder layers. In particular, our schuBERT gives $6.6\%$ higher average accuracy on GLUE and SQuAD datasets as compared to BERT with three encoder layers while having the same number of parameters.

Comments:	11 pages, 6 figures, Accepted for publication in ACL 2020 as a long paper
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2005.06628 [cs.CL]
	(or arXiv:2005.06628v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.06628

Submission history

From: Amazon Khetan [view email]
[v1] Sat, 9 May 2020 21:56:04 UTC (442 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-05

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ashish Khetan
Zohar S. Karnin

export BibTeX citation

Computer Science > Computation and Language

Title:schuBERT: Optimizing Elements of BERT

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:schuBERT: Optimizing Elements of BERT

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators