Asymmetry in low-rank adapters of foundation models
arXiv preprint arXiv:2402.16842, 2024•arxiv.org
Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a
subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective.
Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this
paper characterizes and leverages unexpected asymmetry in the importance of low-rank
adapter matrices. Specifically, when updating the parameter matrices of a neural network by
adding a product $ BA $, we observe that the $ B $ and $ A $ matrices have distinct …
subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective.
Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this
paper characterizes and leverages unexpected asymmetry in the importance of low-rank
adapter matrices. Specifically, when updating the parameter matrices of a neural network by
adding a product $ BA $, we observe that the $ B $ and $ A $ matrices have distinct …
Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically, when updating the parameter matrices of a neural network by adding a product , we observe that the and matrices have distinct functions: extracts features from the input, while uses these features to create the desired output. Based on this observation, we demonstrate that fine-tuning is inherently more effective than fine-tuning , and that a random untrained should perform nearly as well as a fine-tuned one. Using an information-theoretic lens, we also bound the generalization of low-rank adapters, showing that the parameter savings of exclusively training improves the bound. We support our conclusions with experiments on RoBERTa, BART-Large, LLaMA-2, and ViTs.
arxiv.org
Showing the best result for this search. See all results