Fine
Fine
Introduction
This guide is intended for users of the new OpenAI fine-tuning API. If you are a
legacy fine-tuning user, please refer to our legacy fine-tuning guide.
Fine-tuning lets you get more out of the models available through the API by
providing:
GPT models have been pre-trained on a vast amount of text. To use the models
effectively, we include instructions and sometimes several examples in a
prompt. Using demonstrations to show how to perform a task is often called
"few-shot learning."
Visit our pricing page to learn more about how fine-tuned model training and
usage are billed.
gpt-3.5-turbo-0613 (recommended)
babbage-002
davinci-002
There are many tasks for which our models may initially appear to not
perform well at, but with better prompting we can achieve much better
results and potentially not need to be fine-tune
Iterating over prompts and other tactics has a much faster feedback loop
than iterating with fine-tuning, which requires creating datasets and
running training jobs
In cases where fine-tuning is still necessary, initial prompt engineering
work is not wasted - we typically see best results when using a good
prompt in the fine-tuning data (or combining prompt chaining / tool use
with fine-tuning)
One high-level way to think about these cases is when it’s easier to "show, not
tell". In the sections to come, we will explore how to set up data for fine-tuning
and various examples where fine-tuning improves the performance over the
baseline model.
Example format
In this example, our goal is to create a chatbot that occasionally gives sarcastic
responses, these are three training examples (conversations) we could create
for a dataset:
1
2
3
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role":
"user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone
doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role":
"user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy
named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role":
"user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400
kilometers. Give or take a few, like that really matters."}]}
Crafting prompts
We generally recommend taking the set of instructions and prompts that you
found worked best for the model prior to fine-tuning, and including them in every
training example. This should let you reach the best and most general results,
especially if you have relatively few (e.g. under a hundred) training examples.
If you would like to shorten the instructions or prompts that are repeated in
every example to save costs, keep in mind that the model will likely behave as if
those instructions were included, and it may be hard to get the model to ignore
those "baked-in" instructions at inference time.
It may take more training examples to arrive at good results, as the model has
to learn entirely through demonstration and without guided instructions.
You can compute token counts using our counting tokens notebook from the
OpenAI cookbook.
Estimate costs
In order to estimate the cost of a fine-tuning job, please refer to the pricing
page for details on the cost per 1k tokens. To estimate the costs for a specific
fine-tuning job, use the following formula:
1
2
3
4
openai.File.create(
file=open("mydata.jsonl", "rb"),
purpose='fine-tune'
)
1
2
3
4
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.FineTuningJob.create(training_file="file-abc123", model="gpt-3.5-turbo")
In addition to creating a fine-tuning job, you can also list existing jobs, retrieve
the status of a job, or cancel a job.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# List 10 fine-tuning jobs
openai.FineTuningJob.list(limit=10)
# Retrieve the state of a fine-tune
openai.FineTuningJob.retrieve("ft-abc123")
# Cancel a job
openai.FineTuningJob.cancel("ft-abc123")
# Delete a fine-tuned model (must be an owner of the org the model was created in)
import openai
openai.Model.delete("ft-abc123")
1
2
3
4
5
6
7
8
9
10
11
12
13
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
completion = openai.ChatCompletion.create(
model="ft:gpt-3.5-turbo:my-org:custom_suffix:id",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(completion.choices[0].message)
You can start making requests by passing the model name as shown above
and in our GPT guide.
However we think that evaluating samples from the fine-tuned model provides
the most relevant sense of model quality. We recommend generating samples
from both the base model and the fine-tuned model on a test set, and
comparing the samples side by side. The test set should ideally include the full
distribution of inputs that you might send to the model for inference. If manual
evaluation is too time-consuming, consider using our Evals library for how to
use GPT-4 to perform evaluations.
Iterating on hyperparameters
We allow you to specify the number of epochs to fine-tune a model for. We
recommend initially training without specifying the number of epochs, allowing
us to pick a default for you based on dataset size, then adjusting if you observe
the following:
If the model does not follow the training data as much as expected
increase the number by 1 or 2 epochs
o This is more common for tasks for which there is a single ideal
completion (or a small set of ideal completions which are similar).
Some examples include classification, entity extraction, or
structured parsing. These are often tasks for which you can
compute a final accuracy metric against a reference answer.
If the model becomes less diverse than expected decrease the number
by 1 or 2 epochs
o This is more common for tasks for which there are a wide range of
possible good completions
Fine-tuning examples
Now that we have explored the basics of the fine-tuning API, let’s look at going
through the fine-tuning lifecycle for a few different use cases.
To begin, we create a sample set of messages showing what the model should
which in this case is misspelled words.
1
2
3
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role":
"user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone
doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role":
"user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy
named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role":
"user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400
kilometers. Give or take a few, like that really matters."}]}
If you want to follow along and create a fine-tuned model yourself, you will need
at least 10 examples.
After getting the data that will potentially improve the model, the next step is to
check if the data meets all the formatting requirements.
Now that we have the data formatted and validated, the final training step is to
kick off a job to create the fine-tuned model. You can do this via the OpenAI CLI
or one of our SDKs as shown below:
1
2
3
openai.File.create(file=open("marv.jsonl", "rb"), purpose='fine-tune')
openai.FineTuningJob.create(training_file="file-abc123", model="gpt-3.5-turbo")
Once the training job is done, you will be able to use your fine-tuned model.
Collapse
Structured output
Another type of use case which works really well with fine-tuning is getting the
model to provide structured information, in this case about sports headlines:
1
2
3
4
{"messages": [{"role": "system", "content": "Given a sports headline, provide the following fields in a
JSON dict, where applicable: "player" (full name)", "team", "sport", and "gender".},{"role": "user",
"content": "Sources: Colts grant RB Taylor OK to seek trade"},
{"role": "assistant", "content": "{"player": "Jonathan Taylor", "team": "Colts", "sport": "football",
"gender": "male" }"},]}
{"messages": [{"role": "system", "content": "Given a sports headline, provide the following fields in a
JSON dict, where applicable: "player" (full name)", "team", "sport", and "gender".},{"role": "user",
"content": "OSU 'split down middle' on starting QB battle"},
{"role": "assistant", "content": "{"player": null, "team": "OSU", "sport": "football", "gender": null }"},]}
If you want to follow along and create a fine-tuned model yourself, you will need
at least 10 examples.
After getting the data that will potentially improve the model, the next step is to
check if the data meets all the formatting requirements.
Now that we have the data formatted and validated, the final training step is to
kick off a job to create the fine-tuned model. You can do this via the OpenAI CLI
or one of our SDKs as shown below:
1
2
3
openai.File.create(file=open("sports-context.jsonl", "rb"), purpose='fine-tune')
openai.FineTuningJob.create(training_file="file-abc123", model="gpt-3.5-turbo")
Once the training job is done, you will be able to use your fine-tuned model and
make a request that looks like the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
completion = openai.ChatCompletion.create(
model="ft:gpt-3.5-turbo:my-org:custom_suffix:id",
messages=[
{"role": "system", "content": "Given a sports headline, provide the following fields in a JSON dict,
where applicable: player (full name), team, sport, and gender"},
{"role": "user", "content": "Richardson wins 100m at worlds to cap comeback"}
]
)
print(completion.choices[0].message)
Based on the formatted training data, the response should look like the
following:
{"player": "Sha'Carri Richardson", "team": null", "sport": "track and field", "gender": "female"}
Collapse
FAQ