Unlocking the Potential of Model Merging for Low-Resource Languages

Tao, Mingxu; Zhang, Chen; Huang, Quzhe; Ma, Tianyao; Huang, Songfang; Zhao, Dongyan; Feng, Yansong

Computer Science > Computation and Language

arXiv:2407.03994 (cs)

[Submitted on 4 Jul 2024 (v1), last revised 6 Oct 2024 (this version, v3)]

Title:Unlocking the Potential of Model Merging for Low-Resource Languages

Authors:Mingxu Tao, Chen Zhang, Quzhe Huang, Tianyao Ma, Songfang Huang, Dongyan Zhao, Yansong Feng

View PDF HTML (experimental)

Abstract:Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. We use model merging to develop task-solving LLMs for low-resource languages without SFT data in the target languages. Our experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data. Observing performance saturation in model merging with more training tokens, we further analyze the merging process and introduce a slack variable to the model merging algorithm to mitigate the loss of important parameters, thereby enhancing performance. We hope that model merging can benefit more human languages suffering from data scarcity with its higher data efficiency.

Comments:	To appear in EMNLP2024 Findings
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.03994 [cs.CL]
	(or arXiv:2407.03994v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.03994

Submission history

From: Mingxu Tao [view email]
[v1] Thu, 4 Jul 2024 15:14:17 UTC (9,246 KB)
[v2] Tue, 9 Jul 2024 11:09:19 UTC (9,247 KB)
[v3] Sun, 6 Oct 2024 10:54:02 UTC (9,340 KB)

Computer Science > Computation and Language

Title:Unlocking the Potential of Model Merging for Low-Resource Languages

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unlocking the Potential of Model Merging for Low-Resource Languages

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators