Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Zhang, Hengyuan; Wu, Yanru; Li, Dawei; Yang, Sak; Zhao, Rui; Jiang, Yong; Tan, Fei

Computer Science > Computation and Language

arXiv:2404.10306 (cs)

[Submitted on 16 Apr 2024 (v1), last revised 12 Aug 2024 (this version, v5)]

Title:Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Authors:Hengyuan Zhang, Yanru Wu, Dawei Li, Sak Yang, Rui Zhao, Yong Jiang, Fei Tan

View PDF HTML (experimental)

Abstract:Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice to gain speciality, often leads to catastrophic forgetting (CF) of previously acquired versatility, hindering the model's performance across diverse tasks. In response to this challenge, we propose CoFiTune, a coarse to fine framework in an attempt to strike the balance between speciality and versatility. At the coarse-grained level, an empirical tree-search algorithm is utilized to pinpoint and update specific modules that are crucial for speciality, while keeping other parameters frozen; at the fine-grained level, a soft-masking mechanism regulates the update to the LLMs, mitigating the CF issue without harming speciality. In an overall evaluation of both speciality and versatility, CoFiTune consistently outperforms baseline methods across diverse tasks and model scales. Compared to the full-parameter SFT, CoFiTune leads to about 14% versatility improvement and marginal speciality loss on a 13B model. Lastly, based on further analysis, we provide a speculative insight into the information forwarding process in LLMs, which helps explain the effectiveness of the proposed method. The code is available at this https URL.

Comments:	43 pages, 10 figures, accepted by ACL 2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2404.10306 [cs.CL]
	(or arXiv:2404.10306v5 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.10306

Submission history

From: Hengyuan Zhang [view email]
[v1] Tue, 16 Apr 2024 06:27:39 UTC (598 KB)
[v2] Sun, 28 Apr 2024 12:22:41 UTC (579 KB)
[v3] Thu, 16 May 2024 10:53:50 UTC (579 KB)
[v4] Mon, 3 Jun 2024 10:42:36 UTC (599 KB)
[v5] Mon, 12 Aug 2024 19:37:42 UTC (599 KB)

Computer Science > Computation and Language

Title:Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators