Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications

Halfon, Alon; Gretz, Shai; Arviv, Ofir; Spector, Artem; Toledo-Ronen, Orith; Katz, Yoav; Ein-Dor, Liat; Shmueli-Scheuer, Michal; Slonim, Noam

Computer Science > Machine Learning

arXiv:2407.18990 (cs)

[Submitted on 25 Jul 2024 (v1), last revised 7 Aug 2024 (this version, v2)]

Title:Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications

Authors:Alon Halfon, Shai Gretz, Ofir Arviv, Artem Spector, Orith Toledo-Ronen, Yoav Katz, Liat Ein-Dor, Michal Shmueli-Scheuer, Noam Slonim

View PDF HTML (experimental)

Abstract:Fine-tuning Large Language Models (LLMs) is an effective method to enhance their performance on downstream tasks. However, choosing the appropriate setting of tuning hyperparameters (HPs) is a labor-intensive and computationally expensive process. Here, we provide recommended HP configurations for practical use-cases that represent a better starting point for practitioners, when considering two SOTA LLMs and two commonly used tuning methods. We describe Coverage-based Search (CBS), a process for ranking HP configurations based on an offline extensive grid search, such that the top ranked configurations collectively provide a practical robust recommendation for a wide range of datasets and domains. We focus our experiments on Llama-3-8B and Mistral-7B, as well as full fine-tuning and LoRa, conducting a total of > 10,000 tuning experiments. Our results suggest that, in general, Llama-3-8B and LoRA should be preferred, when possible. Moreover, we show that for both models and tuning methods, exploring only a few HP configurations, as recommended by our analysis, can provide excellent results in practice, making this work a valuable resource for practitioners.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2407.18990 [cs.LG]
	(or arXiv:2407.18990v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.18990

Submission history

From: Alon Halfon [view email]
[v1] Thu, 25 Jul 2024 12:07:55 UTC (285 KB)
[v2] Wed, 7 Aug 2024 07:46:39 UTC (285 KB)

Computer Science > Machine Learning

Title:Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators