DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Tan, Zhen; Dong, Daize; Zhao, Xinyu; Peng, Jie; Cheng, Yu; Chen, Tianlong

Computer Science > Machine Learning

arXiv:2407.11030 (cs)

[Submitted on 3 Jul 2024]

Title:DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Authors:Zhen Tan, Daize Dong, Xinyu Zhao, Jie Peng, Yu Cheng, Tianlong Chen

View PDF HTML (experimental)

Abstract:In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples. Our framework is integrated with the Supervised Fine-Tuning (SFT) stage, eliminating the need for resource-intensive Continual Pre-Training (CPT). Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency. Our work offers a promising direction for building efficient yet powerful LLMs. We will release our implementation and model weights upon acceptance.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2407.11030 [cs.LG]
	(or arXiv:2407.11030v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.11030

Submission history

From: Zhen Tan [view email]
[v1] Wed, 3 Jul 2024 18:34:08 UTC (2,049 KB)

Computer Science > Machine Learning

Title:DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators