WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem

Liu, Ziming; Wang, Shaoyu; Cheng, Shenggan; Zhao, Zhongkai; Zhao, Xuanlei; Demmel, James; You, Yang

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2407.00611v2 (cs)

[Submitted on 30 Jun 2024 (v1), revised 2 Jul 2024 (this version, v2), latest version 19 Sep 2024 (v3)]

Title:WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem

Authors:Ziming Liu, Shaoyu Wang, Shenggan Cheng, Zhongkai Zhao, Xuanlei Zhao, James Demmel, Yang You

View PDF HTML (experimental)

Abstract:In recent years, Transformer-based Large Language Models (LLMs) have garnered significant attention due to their exceptional performance across a variety of tasks. However, training these models on long sequences presents a substantial challenge in terms of efficiency and scalability. Current methods are constrained either by the number of attention heads, limiting scalability, or by excessive communication overheads. In this paper, we propose an insight that Attention Computation can be considered as a special case of n-body problem with direct interactions. Based on this concept, this paper introduces WallFacer, an efficient long-sequence training system with a novel multi-dimensional ring sequence parallelism, fostering an efficient communication paradigm and extra tuning space for communication arrangement. Through comprehensive experiments under diverse environments and model settings, we demonstrate that WallFacer significantly surpasses state-of-the-art method that supports near-infinite sequence length, achieving performance improvements of up to 77.12%.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2407.00611 [cs.DC]
	(or arXiv:2407.00611v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2407.00611

Submission history

From: Ziming Liu [view email]
[v1] Sun, 30 Jun 2024 07:00:07 UTC (5,697 KB)
[v2] Tue, 2 Jul 2024 02:47:20 UTC (5,697 KB)
[v3] Thu, 19 Sep 2024 08:28:04 UTC (3,931 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators