Slides
Slides
Umar Jamil
License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0):
https://creativecommons.org/licenses/by-nc/4.0/legalcode
Fine tuning means training a pre-trained network on new data to improve its performance on
a specific task. For example, we may fine-tune a LLM that was trained on many programming languages
and fine-tune it for a new dialect of SQL.
Umar Jamil - https://github.com/hkproj/pytorch-lora
Problems with fine-tuning
1. We must train the full network, which it computationally expensive for the average user
when dealing with Large Language Models like GPT
2. Storage requirements for the checkpoints are expensive, as we need to save the entire
model on the disk for each checkpoint. If we also save the optimizer state (which we
usually do, then the situation gets worse!)
3. If we have multiple fine-tuned models, we need to reload all the weights of the model
every time we want to switch between them, which can be expensive and slow. For
example, we may have a model fine-tuned for helping users write SQL queries and one
model for helping users write Javascript code.
🥶 W (pre-trained)
FROZEN
ℝ ×
🦾
B A
ℝ × ℝ ×
𝑟 ≪ min(𝑑, 𝑘)
It basically means that the W matrix of a pretrained model contains many parameters that convey
the same information as others (so they can be obtained by a combination of the other weights); This
means we can get rid of them without decreasing the performance of the model. This kind of matrices
are called rank-deficient (they do not have full-rank).