Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Yixin Ji, Yang Xiang, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Kehai Chen, Min Zhang


Abstract
In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in LLMs has not been extensively studied. The key to low-rank compression lies in low-rank factorization and low-rank dimensions allocation. To address the challenges of low-rank compression in LLMs, we conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for LLMs. This approach involves precise estimation of feature distributions through pooled covariance matrices and a Bayesian optimization strategy for allocating low-rank dimensions. Experiments on the LLaMA-2 models demonstrate that our method outperforms existing strong structured pruning and low-rank compression techniques in maintaining model performance at the same compression ratio.
Anthology ID:
2024.findings-emnlp.240
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4152–4168
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.240/
DOI:
10.18653/v1/2024.findings-emnlp.240
Bibkey:
Cite (ACL):
Yixin Ji, Yang Xiang, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Kehai Chen, and Min Zhang. 2024. Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4152–4168, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization (Ji et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.240.pdf
Software:
 2024.findings-emnlp.240.software.zip