LLM Research Report
LLM Research Report
Models
& Exploring Methods for Fine
Tuning
January 8, 2025
Contents
Abstract 2
Adapters 5
P-Tuning 6
1
Abstract
This report aims to provide clear guidelines for the selection of publicly available open-
source models. It also explores various methods for fine-tuning these models depending
on specific problem cases, resource availability, and performance requirements.
2
• openai-community/gpt2: A general-purpose model but requires fine-tuning with
specific instructions for optimal results.
https://huggingface.co/openai-community/gpt2
Parameter Efficient Fine-Tuning (PEFT) refers to methods that focus on efficiently adapt-
ing large pre-trained models for specific downstream tasks without requiring full model
retraining. PEFT techniques aim to save computational resources by reducing the num-
ber of trainable parameters.
LoRA works by converting high-rank matrices into low-rank matrices, which are then
3
combined and added back into the original matrices. This enables the model to retain its
original capabilities while being fine-tuned for specific tasks. As a result, LoRA speeds
up training while maintaining the overall model performance [1].
Figure 2: QLoRA
Once the model is quantized, it is then further optimized through Low-Rank Adap-
tation (LoRA). By reducing the precision and applying LoRA, the model’s performance
can be fine-tuned for specific tasks, leading to significant resource savings in both time
and cost, while retaining much of the original model’s performance capabilities.
For more details, refer to the work on QLoRA, as described by Dettmers et al. [[2]].
4
Figure 3: Prefix Fine Tuning
Adapters
Instead of performing full fine-tuning, adapter layers are added between the model’s
forward layers. Full fine-tuning adjusts all parameters, requiring significant resources.
However, in the adapter approach, task-specific layers are added to the model, making it
more flexible for many use cases. While this method saves resources, adding additional
layers can make inference slightly slower due to sequential processing.
For further information on adapters, refer to Hu et al. [[4]].
Figure 4: Adapters
5
- Setup: The input becomes: [soft prompt embeddings] + ”This movie review is:” +
[review text] - During training, the model adjusts [soft prompt embeddings] to better
contextualize the task. - Inference Input: [soft prompt embeddings] + ”This movie
review is:” + ”An exciting roller-coaster ride full of suspense” - Expected Output:
”Positive”
Alternatively, a hard prompt is manually appended to the input:
- Input: ”Sentiment classification: Positive or Negative. Review: An exciting roller-
coaster ride full of suspense.” - Output: ”Positive”
Further details on prompt-tuning can be found in Shah [[5]].
P-Tuning
P-Tuning involves adding computer-generated embeddings along with the task-specific
prompt for prediction. For example, input text may be structured as: [P-tuning embed-
dings] + [original task input text].
However, it is worth noting that this approach works particularly well on large models
but does not produce optimal results on small models.
For more information on P-Tuning, see Liu et al. [[6]].
Figure 5: P-Tuning
References
[1] E. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen,
”LoRA: Low-Rank Adaptation of Large Language Models,” Microsoft Corporation,
[Online]. Available: https://arxiv.org/abs/2106.09685v2. [Accessed: 7-Jan-2025].
6
[2] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, ”QLoRA: Effi-
cient Finetuning of Quantized LLMs,” arXiv, May 2023. [Online]. Available:
https://arxiv.org/pdf/2305.14314.
[4] Z. Hu, L. Wang, Y. Lan, W. Xu, E.-P. Lim, L. Bing, X. Xu, S. Poria, and
R. K.-W. Lee, “LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-
Tuning of Large Language Models,” in Proc. of the 2023 Conf. on Empirical Meth-
ods in Natural Language Processing (EMNLP), Dec. 2023. [Online]. Available:
https://aclanthology.org/2023.emnlp-main.319.pdf.
[5] S. Shah, ”Prompt-Tuning: A Powerful Technique for Adapting LLMs to New Tasks,”
Medium, [Online]. Available: https://medium.com/@shahshreyansh20/prompt-
tuning-a-powerful-technique-for-adapting-llms-to-new-tasks-6d6fd9b83557.
[6] X. Liu, K. Ji, Y. Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, ”P-
Tuning v2: Prompt Tuning Can Be Comparable to Fine-Tuning Universally
Across Scales and Tasks,” arXiv preprint arXiv:2110.07602. [Online]. Available:
https://arxiv.org/pdf/2110.07602.