MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Zhang, Zhengyan; Lin, Yankai; Liu, Zhiyuan; Li, Peng; Sun, Maosong; Zhou, Jie

Computer Science > Computation and Language

arXiv:2110.01786v2 (cs)

[Submitted on 5 Oct 2021 (v1), revised 15 Oct 2021 (this version, v2), latest version 5 Apr 2022 (v3)]

Title:MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Authors:Zhengyan Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou

View PDF

Abstract:Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. Fortunately, we observe that most inputs only activate a tiny ratio of neurons of large Transformer-based models during inference. Hence, we propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication, which could accelerate large-model inference by conditional computation based on the sparse activation phenomenon. MoEfication consists of two steps: (1) splitting the parameters of feed-forward neural networks (FFNs) into multiple parts as experts, and (2) building expert routers to decide which experts will be used for each input. Experimental results show that the MoEfied models can significantly reduce computation cost, e.g., only activating 20% FFN parameters of a 700-million-parameter model without performance degradation on several downstream tasks including text classification and machine reading comprehension.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2110.01786 [cs.CL]
	(or arXiv:2110.01786v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.01786

Submission history

From: Zhengyan Zhang [view email]
[v1] Tue, 5 Oct 2021 02:14:38 UTC (182 KB)
[v2] Fri, 15 Oct 2021 13:47:51 UTC (321 KB)
[v3] Tue, 5 Apr 2022 07:35:52 UTC (1,527 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun

…

export BibTeX citation

Computer Science > Computation and Language

Title:MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators