0% found this document useful (0 votes)
27 views26 pages

OpenAI Developer Platform Guide

This document provides strategies and tactics for effective prompt engineering to enhance results from large language models like GPT-4o. Key strategies include writing clear instructions, providing reference text, splitting complex tasks into simpler subtasks, allowing the model time to think, using external tools, and systematically testing changes. The guide encourages experimentation with various tactics to optimize performance and improve the quality of outputs.

Uploaded by

officialvasquez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views26 pages

OpenAI Developer Platform Guide

This document provides strategies and tactics for effective prompt engineering to enhance results from large language models like GPT-4o. Key strategies include writing clear instructions, providing reference text, splitting complex tasks into simpler subtasks, allowing the model time to think, using external tools, and systematically testing changes. The guide encourages experimentation with various tactics to optimize performance and improve the quality of outputs.

Uploaded by

officialvasquez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Prompt engineering

==================

Enhance results with prompt engineering strategies.

This guide shares strategies and tactics for getting better results from large
language models (sometimes referred to as GPT models) like GPT-4o. The methods
described here can sometimes be deployed in combination for greater effect. We
encourage experimentation to find the methods that work best for you.

You can also explore example prompts which showcase what our models are capable of:

Prompt examples

Explore prompt examples to learn what GPT models can do

](/examples)

Six strategies for getting better results


-----------------------------------------

### Write clear instructions

These models can’t read your mind. If outputs are too long, ask for brief replies.
If outputs are too simple, ask for expert-level writing. If you dislike the format,
demonstrate the format you’d like to see. The less the model has to guess at what
you want, the more likely you’ll get it.

Tactics:

* [Include details in your query to get more relevant answers](#tactic-include-


details-in-your-query-to-get-more-relevant-answers)
* [Ask the model to adopt a persona](#tactic-ask-the-model-to-adopt-a-persona)
* [Use delimiters to clearly indicate distinct parts of the input](#tactic-use-
delimiters-to-clearly-indicate-distinct-parts-of-the-input)
* [Specify the steps required to complete a task](#tactic-specify-the-steps-
required-to-complete-a-task)
* [Provide examples](#tactic-provide-examples)
* [Specify the desired length of the output](#tactic-specify-the-desired-length-
of-the-output)

### Provide reference text

Language models can confidently invent fake answers, especially when asked about
esoteric topics or for citations and URLs. In the same way that a sheet of notes
can help a student do better on a test, providing reference text to these models
can help in answering with fewer fabrications.

Tactics:

* [Instruct the model to answer using a reference text](#tactic-instruct-the-


model-to-answer-using-a-reference-text)
* [Instruct the model to answer with citations from a reference text](#tactic-
instruct-the-model-to-answer-with-citations-from-a-reference-text)

### Split complex tasks into simpler subtasks


Just as it is good practice in software engineering to decompose a complex system
into a set of modular components, the same is true of tasks submitted to a language
model. Complex tasks tend to have higher error rates than simpler tasks.
Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks
in which the outputs of earlier tasks are used to construct the inputs to later
tasks.

Tactics:

* [Use intent classification to identify the most relevant instructions for a


user query](#tactic-use-intent-classification-to-identify-the-most-relevant-
instructions-for-a-user-query)
* [For dialogue applications that require very long conversations, summarize or
filter previous dialogue](#tactic-for-dialogue-applications-that-require-very-long-
conversations-summarize-or-filter-previous-dialogue)
* [Summarize long documents piecewise and construct a full summary recursively]
(#tactic-summarize-long-documents-piecewise-and-construct-a-full-summary-
recursively)

### Give the model time to "think"

If asked to multiply 17 by 28, you might not know it instantly, but can still work
it out with time. Similarly, models make more reasoning errors when trying to
answer right away, rather than taking time to work out an answer. Asking for a
"chain of thought" before an answer can help the model reason its way toward
correct answers more reliably.

Tactics:

* [Instruct the model to work out its own solution before rushing to a
conclusion](#tactic-instruct-the-model-to-work-out-its-own-solution-before-rushing-
to-a-conclusion)
* [Use inner monologue or a sequence of queries to hide the model's reasoning
process](#tactic-use-inner-monologue-or-a-sequence-of-queries-to-hide-the-model-s-
reasoning-process)
* [Ask the model if it missed anything on previous passes](#tactic-ask-the-model-
if-it-missed-anything-on-previous-passes)

### Use external tools

Compensate for the weaknesses of the model by feeding it the outputs of other
tools. For example, a text retrieval system (sometimes called RAG or retrieval
augmented generation) can tell the model about relevant documents. A code execution
engine like OpenAI's Code Interpreter can help the model do math and run code. If a
task can be done more reliably or efficiently by a tool rather than by a language
model, offload it to get the best of both.

Tactics:

* [Use embeddings-based search to implement efficient knowledge retrieval]


(#tactic-use-embeddings-based-search-to-implement-efficient-knowledge-retrieval)
* [Use code execution to perform more accurate calculations or call external
APIs](#tactic-use-code-execution-to-perform-more-accurate-calculations-or-call-
external-apis)
* [Give the model access to specific functions](#tactic-give-the-model-access-to-
specific-functions)

### Test changes systematically


Improving performance is easier if you can measure it. In some cases a modification
to a prompt will achieve better performance on a few isolated examples but lead to
worse overall performance on a more representative set of examples. Therefore to be
sure that a change is net positive to performance it may be necessary to define a
comprehensive test suite (also known an as an "eval").

Tactic:

* [Evaluate model outputs with reference to gold-standard answers](#tactic-


evaluate-model-outputs-with-reference-to-gold-standard-answers)

Tactics
-------

Each of the strategies listed above can be instantiated with specific tactics.
These tactics are meant to provide ideas for things to try. They are by no means
fully comprehensive, and you should feel free to try creative ideas not represented
here.

### Strategy: Write clear instructions

#### Tactic: Include details in your query to get more relevant answers

In order to get a highly relevant response, make sure that requests provide any
important details or context. Otherwise you are leaving it up to the model to guess
what you mean.

|||
|---|---|
|Worse|Better|
|How do I add numbers in Excel?|How do I add up a row of dollar amounts in Excel? I
want to do this automatically for a whole sheet of rows with all the totals ending
up on the right in a column called "Total".|
|Who’s president?|Who was the president of Mexico in 2021, and how frequently are
elections held?|
|Write code to calculate the Fibonacci sequence.|Write a TypeScript function to
efficiently calculate the Fibonacci sequence. Comment the code liberally to explain
what each piece does and why it's written that way.|
|Summarize the meeting notes.|Summarize the meeting notes in a single paragraph.
Then write a markdown list of the speakers and each of their key points. Finally,
list the next steps or action items suggested by the speakers, if any.|

#### Tactic: Ask the model to adopt a persona

The system message can be used to specify the persona used by the model in its
replies.

SYSTEM

When I ask for help to write something, you will reply with a document that
contains at least one joke or playful comment in every paragraph.

USER

Write a thank you note to my steel bolt vendor for getting the delivery in on time
and in short notice. This made it possible for us to deliver an important order.

#### Tactic: Use delimiters to clearly indicate distinct parts of the input
Delimiters like triple quotation marks, XML tags, section titles, etc. can help
demarcate sections of text to be treated differently.

USER

Summarize the text delimited by triple quotes with a haiku. """insert text here"""

SYSTEM

You will be provided with a pair of articles (delimited with XML tags) about the
same topic. First summarize the arguments of each article. Then indicate which of
them makes a better argument and explain why.

USER

<article> insert first article here </article> <article> insert second article here
</article>

SYSTEM

You will be provided with a thesis abstract and a suggested title for it. The
thesis title should give the reader a good idea of the topic of the thesis but
should also be eye-catching. If the title does not meet these criteria, suggest 5
alternatives.

USER

Abstract: insert abstract here Title: insert title here

For straightforward tasks such as these, using delimiters might not make a
difference in the output quality. However, the more complex a task is the more
important it is to disambiguate task details. Don’t make the model work to
understand exactly what you are asking of them.

#### Tactic: Specify the steps required to complete a task

Some tasks are best specified as a sequence of steps. Writing the steps out
explicitly can make it easier for the model to follow them.

SYSTEM

Use the following step-by-step instructions to respond to user inputs. Step 1 - The
user will provide you with text in triple quotes. Summarize this text in one
sentence with a prefix that says "Summary: ". Step 2 - Translate the summary from
Step 1 into Spanish, with a prefix that says "Translation: ".

USER

"""insert text here"""

#### Tactic: Provide examples

Providing general instructions that apply to all examples is generally more


efficient than demonstrating all permutations of a task by example, but in some
cases providing examples may be easier. For example, if you intend for the model to
copy a particular style of responding to user queries which is difficult to
describe explicitly. This is known as "few-shot" prompting.

SYSTEM
Answer in a consistent style.

USER

Teach me about patience.

ASSISTANT

The river that carves the deepest valley flows from a modest spring; the grandest
symphony originates from a single note; the most intricate tapestry begins with a
solitary thread.

USER

Teach me about the ocean.

#### Tactic: Specify the desired length of the output

You can ask the model to produce outputs that are of a given target length. The
targeted output length can be specified in terms of the count of words, sentences,
paragraphs, bullet points, etc. Note however that instructing the model to generate
a specific number of words does not work with high precision. The model can more
reliably generate outputs with a specific number of paragraphs or bullet points.

USER

Summarize the text delimited by triple quotes in about 50 words. """insert text
here"""

USER

Summarize the text delimited by triple quotes in 2 paragraphs. """insert text


here"""

USER

Summarize the text delimited by triple quotes in 3 bullet points. """insert text
here"""

### Strategy: Provide reference text

#### Tactic: Instruct the model to answer using a reference text

If we can provide a model with trusted information that is relevant to the current
query, then we can instruct the model to use the provided information to compose
its answer.

SYSTEM

Use the provided articles delimited by triple quotes to answer questions. If the
answer cannot be found in the articles, write "I could not find an answer."

USER

<insert articles, each delimited by triple quotes> Question: <insert question here>

Given that all models have limited context windows, we need some way to dynamically
lookup information that is relevant to the question being asked. [Embeddings]
(https://platform.openai.com/docs/guides/embeddings#what-are-embeddings) can be
used to implement efficient knowledge retrieval. See the tactic ["Use embeddings-
based search to implement efficient knowledge retrieval"](#tactic-use-embeddings-
based-search-to-implement-efficient-knowledge-retrieval) for more details on how to
implement this.

#### Tactic: Instruct the model to answer with citations from a reference text

If the input has been supplemented with relevant knowledge, it's straightforward to
request that the model add citations to its answers by referencing passages from
provided documents. Note that citations in the output can then be verified
programmatically by string matching within the provided documents.

SYSTEM

You will be provided with a document delimited by triple quotes and a question.
Your task is to answer the question using only the provided document and to cite
the passage(s) of the document used to answer the question. If the document does
not contain the information needed to answer this question then simply write:
"Insufficient information." If an answer to the question is provided, it must be
annotated with a citation. Use the following format for to cite relevant passages
({"citation": …}).

USER

"""<insert document here>""" Question: <insert question here>

### Strategy: Split complex tasks into simpler subtasks

#### Tactic: Use intent classification to identify the most relevant instructions
for a user query

For tasks in which lots of independent sets of instructions are needed to handle
different cases, it can be beneficial to first classify the type of query and to
use that classification to determine which instructions are needed. This can be
achieved by defining fixed categories and hardcoding instructions that are relevant
for handling tasks in a given category. This process can also be applied
recursively to decompose a task into a sequence of stages. The advantage of this
approach is that each query will contain only those instructions that are required
to perform the next stage of a task which can result in lower error rates compared
to using a single query to perform the whole task. This can also result in lower
costs since larger prompts cost more to run ([see pricing
information](https://openai.com/api/pricing)).

Suppose for example that for a customer service application, queries could be
usefully classified as follows:

SYSTEM

You will be provided with customer service queries. Classify each query into a
primary category and a secondary category. Provide your output in json format with
the keys: primary and secondary. Primary categories: Billing, Technical Support,
Account Management, or General Inquiry. Billing secondary categories: - Unsubscribe
or upgrade - Add a payment method - Explanation for charge - Dispute a charge
Technical Support secondary categories: - Troubleshooting - Device compatibility -
Software updates Account Management secondary categories: - Password reset - Update
personal information - Close account - Account security General Inquiry secondary
categories: - Product information - Pricing - Feedback - Speak to a human
USER

I need to get my internet working again.

Based on the classification of the customer query, a set of more specific


instructions can be provided to a model for it to handle next steps. For example,
suppose the customer requires help with "troubleshooting".

SYSTEM

You will be provided with customer service inquiries that require troubleshooting
in a technical support context. Help the user by: - Ask them to check that all
cables to/from the router are connected. Note that it is common for cables to come
loose over time. - If all cables are connected and the issue persists, ask them
which router model they are using - Now you will advise them how to restart their
device: -- If the model number is MTD-327J, advise them to push the red button and
hold it for 5 seconds, then wait 5 minutes before testing the connection. -- If the
model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes
before testing the connection. - If the customer's issue persists after restarting
the device and waiting 5 minutes, connect them to IT support by outputting {"IT
support requested"}. - If the user starts asking questions that are unrelated to
this topic then confirm if they would like to end the current chat about
troubleshooting and classify their request according to the following scheme:
<insert primary/secondary classification scheme from above here>

USER

I need to get my internet working again.

Notice that the model has been instructed to emit special strings to indicate when
the state of the conversation changes. This enables us to turn our system into a
state machine where the state determines which instructions are injected. By
keeping track of state, what instructions are relevant at that state, and also
optionally what state transitions are allowed from that state, we can put
guardrails around the user experience that would be hard to achieve with a less
structured approach.

#### Tactic: For dialogue applications that require very long conversations,
summarize or filter previous dialogue

Since models have a fixed context length, dialogue between a user and an assistant
in which the entire conversation is included in the context window cannot continue
indefinitely.

There are various workarounds to this problem, one of which is to summarize


previous turns in the conversation. Once the size of the input reaches a
predetermined threshold length, this could trigger a query that summarizes part of
the conversation and the summary of the prior conversation could be included as
part of the system message. Alternatively, prior conversation could be summarized
asynchronously in the background throughout the entire conversation.

An alternative solution is to dynamically select previous parts of the conversation


that are most relevant to the current query. See the tactic ["Use embeddings-based
search to implement efficient knowledge retrieval"](#tactic-use-embeddings-based-
search-to-implement-efficient-knowledge-retrieval).

#### Tactic: Summarize long documents piecewise and construct a full summary
recursively
Since models have a fixed context length, they cannot be used to summarize a text
longer than the context length minus the length of the generated summary in a
single query.

To summarize a very long document such as a book we can use a sequence of queries
to summarize each section of the document. Section summaries can be concatenated
and summarized producing summaries of summaries. This process can proceed
recursively until an entire document is summarized. If it’s necessary to use
information about earlier sections in order to make sense of later sections, then a
further trick that can be useful is to include a running summary of the text that
precedes any given point in the book while summarizing content at that point. The
effectiveness of this procedure for summarizing books has been studied in previous
[research](https://openai.com/research/summarizing-books) by OpenAI using variants
of GPT-3.

### Strategy: Give models time to "think"

#### Tactic: Instruct the model to work out its own solution before rushing to a
conclusion

Sometimes we get better results when we explicitly instruct the model to reason
from first principles before coming to a conclusion. Suppose for example we want a
model to evaluate a student’s solution to a math problem. The most obvious way to
approach this is to simply ask the model if the student's solution is correct or
not.

SYSTEM

Determine if the student's solution is correct or not.

USER

Problem Statement: I'm building a solar power installation and I need help working
out the financials. - Land costs $100 / square foot - I can buy solar panels for
$250 / square foot - I negotiated a contract for maintenance that will cost me a
flat $100k per year, and an additional $10 / square foot What is the total cost for
the first year of operations as a function of the number of square feet. Student's
Solution: Let x be the size of the installation in square feet. 1. Land cost: 100x
2. Solar panel cost: 250x 3. Maintenance cost: 100,000 + 100x Total cost: 100x +
250x + 100,000 + 100x = 450x + 100,000

ASSISTANT

The student's solution is correct.

But the student's solution is actually not correct! We can get the model to
successfully notice this by prompting the model to generate its own solution first.

SYSTEM

First work out your own solution to the problem. Then compare your solution to the
student's solution and evaluate if the student's solution is correct or not. Don't
decide if the student's solution is correct until you have done the problem
yourself.

USER

Problem Statement: I'm building a solar power installation and I need help working
out the financials. - Land costs $100 / square foot - I can buy solar panels for
$250 / square foot - I negotiated a contract for maintenance that will cost me a
flat $100k per year, and an additional $10 / square foot What is the total cost for
the first year of operations as a function of the number of square feet. Student's
Solution: Let x be the size of the installation in square feet. 1. Land cost: 100x
2. Solar panel cost: 250x 3. Maintenance cost: 100,000 + 100x Total cost: 100x +
250x + 100,000 + 100x = 450x + 100,000

ASSISTANT

Let x be the size of the installation in square feet. 1. Land cost: 100x 2. Solar
panel cost: 250x 3. Maintenance cost: 100,000 + 10x Total cost: 100x + 250x +
100,000 + 10x = 360x + 100,000 The student's solution is incorrect. They made an
error in the maintenance cost calculation, using 100x instead of 10x. The correct
total cost for the first year of operations is 360x + 100,000.

#### Tactic: Use inner monologue or a sequence of queries to hide the model's
reasoning process

The previous tactic demonstrates that it is sometimes important for the model to
reason in detail about a problem before answering a specific question. For some
applications, the reasoning process that a model uses to arrive at a final answer
would be inappropriate to share with the user. For example, in tutoring
applications we may want to encourage students to work out their own answers, but a
model’s reasoning process about the student’s solution could reveal the answer to
the student.

Inner monologue is a tactic that can be used to mitigate this. The idea of inner
monologue is to instruct the model to put parts of the output that are meant to be
hidden from the user into a structured format that makes parsing them easy. Then
before presenting the output to the user, the output is parsed and only part of the
output is made visible.

SYSTEM

Follow these steps to answer the user queries. Step 1 - First work out your own
solution to the problem. Don't rely on the student's solution since it may be
incorrect. Enclose all your work for this step within triple quotes ("""). Step 2 -
Compare your solution to the student's solution and evaluate if the student's
solution is correct or not. Enclose all your work for this step within triple
quotes ("""). Step 3 - If the student made a mistake, determine what hint you could
give the student without giving away the answer. Enclose all your work for this
step within triple quotes ("""). Step 4 - If the student made a mistake, provide
the hint from the previous step to the student (outside of triple quotes). Instead
of writing "Step 4 - ..." write "Hint:".

USER

Problem Statement: <insert problem statement> Student Solution: <insert student


solution>

Alternatively, this can be achieved with a sequence of queries in which all except
the last have their output hidden from the end user.

First, we can ask the model to solve the problem on its own. Since this initial
query doesn't require the student’s solution, it can be omitted. This provides the
additional advantage that there is no chance that the model’s solution will be
biased by the student’s attempted solution.

USER
<insert problem statement>

Next, we can have the model use all available information to assess the correctness
of the student’s solution.

SYSTEM

Compare your solution to the student's solution and evaluate if the student's
solution is correct or not.

USER

Problem statement: """<insert problem statement>""" Your solution: """<insert model


generated solution>""" Student’s solution: """<insert student's solution>"""

Finally, we can let the model use its own analysis to construct a reply in the
persona of a helpful tutor.

SYSTEM

You are a math tutor. If the student made an error, offer a hint to the student in
a way that does not reveal the answer. If the student did not make an error, simply
offer them an encouraging comment.

USER

Problem statement: """<insert problem statement>""" Your solution: """<insert model


generated solution>""" Student’s solution: """<insert student's solution>"""
Analysis: """<insert model generated analysis from previous step>"""

#### Tactic: Ask the model if it missed anything on previous passes

Suppose that we are using a model to list excerpts from a source which are relevant
to a particular question. After listing each excerpt the model needs to determine
if it should start writing another or if it should stop. If the source document is
large, it is common for a model to stop too early and fail to list all relevant
excerpts. In that case, better performance can often be obtained by prompting the
model with followup queries to find any excerpts it missed on previous passes.

SYSTEM

You will be provided with a document delimited by triple quotes. Your task is to
select excerpts which pertain to the following question: "What significant paradigm
shifts have occurred in the history of artificial intelligence." Ensure that
excerpts contain all relevant context needed to interpret them - in other words
don't extract small snippets that are missing important context. Provide output in
JSON format as follows: \[{"excerpt": "..."}, ... {"excerpt": "..."}\]

USER

"""<insert document here>"""

ASSISTANT

\[{"excerpt": "the model writes an excerpt here"}, ... {"excerpt": "the model
writes another excerpt here"}\]

USER
Are there more relevant excerpts? Take care not to repeat excerpts. Also ensure
that excerpts contain all relevant context needed to interpret them - in other
words don't extract small snippets that are missing important context.

### Strategy: Use external tools

#### Tactic: Use embeddings-based search to implement efficient knowledge retrieval

A model can leverage external sources of information if provided as part of its


input. This can help the model to generate more informed and up-to-date responses.
For example, if a user asks a question about a specific movie, it may be useful to
add high quality information about the movie (e.g. actors, director, etc…) to the
model’s input. Embeddings can be used to implement efficient knowledge retrieval,
so that relevant information can be added to the model input dynamically at run-
time.

A text embedding is a vector that can measure the relatedness between text strings.
Similar or relevant strings will be closer together than unrelated strings. This
fact, along with the existence of fast vector search algorithms means that
embeddings can be used to implement efficient knowledge retrieval. In particular, a
text corpus can be split up into chunks, and each chunk can be embedded and stored.
Then a given query can be embedded and vector search can be performed to find the
embedded chunks of text from the corpus that are most related to the query (i.e.
closest together in the embedding space).

Example implementations can be found in the [OpenAI


Cookbook](https://cookbook.openai.com/examples/vector_databases/readme). See the
tactic [“Instruct the model to use retrieved knowledge to answer queries”](#tactic-
instruct-the-model-to-answer-using-a-reference-text) for an example of how to use
knowledge retrieval to minimize the likelihood that a model will make up incorrect
facts.

#### Tactic: Use code execution to perform more accurate calculations or call
external APIs

Language models cannot be relied upon to perform arithmetic or long calculations


accurately on their own. In cases where this is needed, a model can be instructed
to write and run code instead of making its own calculations. In particular, a
model can be instructed to put code that is meant to be run into a designated
format such as triple backtick. After an output is produced, the code can be
extracted and run. Finally, if necessary, the output from the code execution engine
(i.e. Python interpreter) can be provided as an input to the model for the next
query.

SYSTEM

You can write and execute Python code by enclosing it in triple backticks, e.g. \`\
`\`code goes here\`\`\`. Use this to perform calculations.

USER

Find all real-valued roots of the following polynomial: 3\*x\*\*5 - 5\*x\*\*4 - 3\


*x\*\*3 - 7\*x - 10.

Another good use case for code execution is calling external APIs. If a model is
instructed in the proper use of an API, it can write code that makes use of it. A
model can be instructed in how to use an API by providing it with documentation
and/or code samples showing how to use the API.
SYSTEM

You can write and execute Python code by enclosing it in triple backticks. Also
note that you have access to the following module to help users send messages to
their friends: \`\`\`python import message message.write(to="John", message="Hey,
want to meetup after work?")\`\`\`

**WARNING: Executing code produced by a model is not inherently safe and


precautions should be taken in any application that seeks to do this. In
particular, a sandboxed code execution environment is needed to limit the harm that
untrusted code could cause.**

#### Tactic: Give the model access to specific functions

The Chat Completions API allows passing a list of function descriptions in


requests. This enables models to generate function arguments according to the
provided schemas. Generated function arguments are returned by the API in JSON
format and can be used to execute function calls. Output provided by function calls
can then be fed back into a model in the following request to close the loop. This
is the recommended way of using OpenAI models to call external functions. To learn
more see the [function calling section](/docs/guides/function-calling) in our
introductory text generation guide and more [function calling
examples](https://cookbook.openai.com/examples/how_to_call_functions_with_chat_mode
ls) in the OpenAI Cookbook.

### Strategy: Test changes systematically

Sometimes it can be hard to tell whether a change — e.g., a new instruction or a


new design — makes your system better or worse. Looking at a few examples may hint
at which is better, but with small sample sizes it can be hard to distinguish
between a true improvement or random luck. Maybe the change helps performance on
some inputs, but hurts performance on others.

Evaluation procedures (or "evals") are useful for optimizing system designs. Good
evals are:

* Representative of real-world usage (or at least diverse)


* Contain many test cases for greater statistical power (see table below for
guidelines)
* Easy to automate or repeat

|Difference to detect|Sample size needed for 95% confidence|


|---|---|
|30%|~10|
|10%|~100|
|3%|~1,000|
|1%|~10,000|

Evaluation of outputs can be done by computers, humans, or a mix. Computers can


automate evals with objective criteria (e.g., questions with single correct
answers) as well as some subjective or fuzzy criteria, in which model outputs are
evaluated by other model queries. [OpenAI Evals](https://github.com/openai/evals)
is an open-source software framework that provides tools for creating automated
evals.

Model-based evals can be useful when there exists a range of possible outputs that
would be considered equally high in quality (e.g. for questions with long answers).
The boundary between what can be realistically evaluated with a model-based eval
and what requires a human to evaluate is fuzzy and is constantly shifting as models
become more capable. We encourage experimentation to figure out how well model-
based evals can work for your use case.

#### Tactic: Evaluate model outputs with reference to gold-standard answers

Suppose it is known that the correct answer to a question should make reference to
a specific set of known facts. Then we can use a model query to count how many of
the required facts are included in the answer.

For example, using the following system message:

SYSTEM

You will be provided with text delimited by triple quotes that is supposed to be
the answer to a question. Check if the following pieces of information are directly
contained in the answer: - Neil Armstrong was the first person to walk on the moon.
- The date Neil Armstrong first walked on the moon was July 21, 1969. For each of
these points perform the following steps: 1 - Restate the point. 2 - Provide a
citation from the answer which is closest to this point. 3 - Consider if someone
reading the citation who doesn't know the topic could directly infer the point.
Explain why or why not before making up your mind. 4 - Write "yes" if the answer to
3 was yes, otherwise write "no". Finally, provide a count of how many "yes" answers
there are. Provide this count as {"count": <insert count here>}.

Here's an example input where both points are satisfied:

SYSTEM

<insert system message above>

USER

"""Neil Armstrong is famous for being the first human to set foot on the Moon. This
historic event took place on July 21, 1969, during the Apollo 11 mission."""

Here's an example input where only one point is satisfied:

SYSTEM

<insert system message above>

USER

"""Neil Armstrong made history when he stepped off the lunar module, becoming the
first person to walk on the moon."""

Here's an example input where none are satisfied:

SYSTEM

<insert system message above>

USER

"""In the summer of '69, a voyage grand, Apollo 11, bold as legend's hand.
Armstrong took a step, history unfurled, "One small step," he said, for a new
world."""
There are many possible variants on this type of model-based eval. Consider the
following variation which tracks the kind of overlap between the candidate answer
and the gold-standard answer, and also tracks whether the candidate answer
contradicts any part of the gold-standard answer.

SYSTEM

Use the following steps to respond to user inputs. Fully restate each step before
proceeding. i.e. "Step 1: Reason...". Step 1: Reason step-by-step about whether the
information in the submitted answer compared to the expert answer is either:
disjoint, equal, a subset, a superset, or overlapping (i.e. some intersection but
not subset/superset). Step 2: Reason step-by-step about whether the submitted
answer contradicts any aspect of the expert answer. Step 3: Output a JSON object
structured like: {"type\_of\_overlap": "disjoint" or "equal" or "subset" or
"superset" or "overlapping", "contradiction": true or false}

Here's an example input with a substandard answer which nonetheless does not
contradict the expert answer:

SYSTEM

<insert system message above>

USER

Question: """What event is Neil Armstrong most famous for and on what date did it
occur? Assume UTC time.""" Submitted Answer: """Didn't he walk on the moon or
something?""" Expert Answer: """Neil Armstrong is most famous for being the first
person to walk on the moon. This historic event occurred on July 21, 1969."""

Here's an example input with answer that directly contradicts the expert answer:

SYSTEM

<insert system message above>

USER

Question: """What event is Neil Armstrong most famous for and on what date did it
occur? Assume UTC time.""" Submitted Answer: """On the 21st of July 1969, Neil
Armstrong became the second person to walk on the moon, following after Buzz
Aldrin.""" Expert Answer: """Neil Armstrong is most famous for being the first
person to walk on the moon. This historic event occurred on July 21, 1969."""

Here's an example input with a correct answer that also provides a bit more detail
than is necessary:

SYSTEM

<insert system message above>

USER

Question: """What event is Neil Armstrong most famous for and on what date did it
occur? Assume UTC time.""" Submitted Answer: """At approximately 02:56 UTC on July
21st 1969, Neil Armstrong became the first human to set foot on the lunar surface,
marking a monumental achievement in human history.""" Expert Answer: """Neil
Armstrong is most famous for being the first person to walk on the moon. This
historic event occurred on July 21, 1969."""
Other resources
---------------

For more inspiration, visit the [OpenAI Cookbook](https://cookbook.openai.com),


which contains example code and also links to third-party resources such as:

* [Prompting libraries &


tools](https://cookbook.openai.com/related_resources#prompting-libraries--tools)
* [Prompting guides](https://cookbook.openai.com/related_resources#prompting-
guides)
* [Video courses](https://cookbook.openai.com/related_resources#video-courses)
* [Papers on advanced prompting to improve
reasoning](https://cookbook.openai.com/related_resources#papers-on-advanced-
prompting-to-improve-reasoning)
```
Developer quickstart
====================

Learn how to make your first API request.

The OpenAI API provides a simple interface to state-of-the-art AI


[models](/docs/models) for natural language processing, image generation, semantic
search, and speech recognition. Follow this guide to learn how to generate human-
like responses to [natural language prompts](/docs/guides/text-generation), [create
vector embeddings](/docs/guides/embeddings) for semantic search, and [generate
images](/docs/guides/images) from textual descriptions.

Create and export an API key


----------------------------

[Create an API key in the dashboard here](/api-keys), which you’ll use to securely
[access the API](/docs/api-reference/authentication). Store the key in a safe
location, like a [`.zshrc` file](https://www.freecodecamp.org/news/how-do-zsh-
configuration-files-work/) or another text file on your computer. Once you’ve
generated an API key, export it as an [environment
variable](https://en.wikipedia.org/wiki/Environment_variable) in your terminal.

macOS / Linux

Export an environment variable on macOS or Linux systems

```bash
export OPENAI_API_KEY="your_api_key_here"
```

Windows

Export an environment variable in PowerShell

```bash
setx OPENAI_API_KEY "your_api_key_here"
```

Make your first API request


---------------------------

With your OpenAI API key exported as an environment variable, you're ready to make
your first API request. You can either use the [REST API](/docs/api-reference)
directly with the HTTP client of your choice, or use one of our [official
SDKs](/docs/libraries) as shown below.

JavaScript

To use the OpenAI API in server-side JavaScript environments like Node.js, Deno, or
Bun, you can use the official [OpenAI SDK for TypeScript and
JavaScript](https://github.com/openai/openai-node). Get started by installing the
SDK using [npm](https://www.npmjs.com/) or your preferred package manager:

Install the OpenAI SDK with npm

```bash
npm install openai
```

With the OpenAI SDK installed, create a file called `example.mjs` and copy one of
the following examples into it:

Generate text

Create a human-like response to a prompt

```javascript
import OpenAI from "openai";
const openai = new OpenAI();

const completion = await openai.chat.completions.create({


model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{
role: "user",
content: "Write a haiku about recursion in programming.",
},
],
store: true,
});

console.log(completion.choices[0].message);
```

Generate an image

Generate an image based on a textual prompt

```javascript
import OpenAI from "openai";
const openai = new OpenAI();

const image = await openai.images.generate({ prompt: "A cute baby sea otter" });

console.log(image.data[0].url);
```

Create vector embeddings

Create vector embeddings for a string of text


```javascript
import OpenAI from "openai";
const openai = new OpenAI();

const embedding = await openai.embeddings.create({


model: "text-embedding-3-large",
input: "The quick brown fox jumped over the lazy dog",
});

console.log(embedding);
```

Execute the code with `node example.mjs` (or the equivalent command for Deno or
Bun). In a few moments, you should see the output of your API request!

Python

To use the OpenAI API in Python, you can use the official [OpenAI SDK for Python]
(https://github.com/openai/openai-python). Get started by installing the SDK using
[pip](https://pypi.org/project/pip/):

Install the OpenAI SDK with pip

```bash
pip install openai
```

With the OpenAI SDK installed, create a file called `example.py` and copy one of
the following examples into it:

Generate text

Create a human-like response to a prompt

```python
from openai import OpenAI
client = OpenAI()

completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": "Write a haiku about recursion in programming."
}
]
)

print(completion.choices[0].message)
```

Generate an image

Generate an image based on a textual prompt

```python
from openai import OpenAI
client = OpenAI()
response = client.images.generate(
prompt="A cute baby sea otter",
n=2,
size="1024x1024"
)

print(response.data[0].url)
```

Create vector embeddings

Create vector embeddings for a string of text

```python
from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
model="text-embedding-3-large",
input="The food was delicious and the waiter..."
)

print(response)
```

Execute the code with `python example.py`. In a few moments, you should see the
output of your API request!

curl

On Unix-based systems, you can test out the [OpenAI REST API](/docs/api-reference)
using [curl](https://curl.se/). The following commands assume that you have
exported the `OPENAI_API_KEY` system environment variable as shown above.

Generate text

Create a human-like response to a prompt

```bash
curl "https://api.openai.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Write a haiku that explains the concept of recursion."
}
]
}'
```

Generate an image
Generate an image based on a textual prompt

```bash
curl "https://api.openai.com/v1/images/generations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"prompt": "A cute baby sea otter",
"n": 2,
"size": "1024x1024"
}'
```

Create vector embeddings

Create vector embeddings for a string of text

```bash
curl "https://api.openai.com/v1/embeddings" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"input": "The food was delicious and the waiter...",
"model": "text-embedding-3-large"
}'
```

Execute the curl commands above in your terminal. In a few moments, you should see
the output of your API request!

Next steps
----------

Now that you've made your first OpenAI API request, you can explore the following
resources:

Chat Completions

Learn more about generating text responses to natural language prompts

](/docs/guides/text-generation)[

Image Generation

Generate images using our DALL·E model

](/docs/guides/images)[

Embeddings

Create vector representations of text, used for similarity search

](/docs/guides/embeddings)[

Text-to-speech
Generate human-like voice recordings with our text-to-speech model

](/docs/guides/text-to-speech)[

Speech-to-text

Create transcriptions of voice recordings with our Whisper model

](/docs/guides/speech-to-text)[

Moderation

Analyze and filter user-created content with our moderation model

](/docs/guides/moderation)[

Fine-tuning

Fine-tune our models with your own data

](/docs/guides/fine-tuning)[

Batch

Batch requests for async jobs

](/docs/guides/batch)[

Full API Reference

View the full REST API reference for OpenAI

](/docs/api-reference)
```
Production best practices
=========================

Transition AI projects to production with best practices.

This guide provides a comprehensive set of best practices to help you transition
from prototype to production. Whether you are a seasoned machine learning engineer
or a recent enthusiast, this guide should provide you with the tools you need to
successfully put the platform to work in a production setting: from securing access
to our API to designing a robust architecture that can handle high traffic volumes.
Use this guide to help develop a plan for deploying your application as smoothly
and effectively as possible.

If you want to explore best practices for going into production further, please
check out our Developer Day talk:

Setting up your organization


----------------------------

Once you [log in](/login) to your OpenAI account, you can find your organization
name and ID in your [organization settings](/settings/organization/general). The
organization name is the label for your organization, shown in user interfaces. The
organization ID is the unique identifier for your organization which can be used in
API requests.
Users who belong to multiple organizations can [pass a header](/docs/api-
reference/requesting-organization) to specify which organization is used for an API
request. Usage from these API requests will count against the specified
organization's quota. If no header is provided, the [default
organization](/settings/organization/api-keys) will be billed. You can change your
default organization in your [user settings](/settings/organization/api-keys).

You can invite new members to your organization from the [Team
page](/settings/organization/team). Members can be **readers** or **owners**.

Readers:

* Can make API requests.


* Can view basic organization information.
* Can create, update, and delete resources (like Assistants) in the organization,
unless otherwise noted.

Owners:

* Have all the permissions of readers.


* Can modify billing information.
* Can manage members within the organization.

### Managing billing limits

To begin using the OpenAI API, enter your [billing


information](/settings/organization/billing/overview). If no billing information is
entered, you will still have login access but will be unable to make API requests.

Once you’ve entered your billing information, you will have an approved usage limit
of $100 per month, which is set by OpenAI. Your quota limit will automatically
increase as your usage on your platform increases and you move from one [usage
tier](/docs/guides/rate-limits#usage-tiers) to another. You can review your current
usage limit in the [limits](/settings/organization/limits) page in your account
settings.

If you’d like to be notified when your usage exceeds a certain dollar amount, you
can set a notification threshold through the [usage
limits](/settings/organization/limits) page. When the notification threshold is
reached, the owners of the organization will receive an email notification. You can
also set a monthly budget so that, once the monthly budget is reached, any
subsequent API requests will be rejected. Note that these limits are best effort,
and there may be 5 to 10 minutes of delay between the usage and the limits being
enforced.

### API keys

The OpenAI API uses API keys for authentication. Visit your [API
keys](/settings/organization/api-keys) page to retrieve the API key you'll use in
your requests.

This is a relatively straightforward way to control access, but you must be


vigilant about securing these keys. Avoid exposing the API keys in your code or in
public repositories; instead, store them in a secure location. You should expose
your keys to your application using environment variables or secret management
service, so that you don't need to hard-code them in your codebase. Read more in
our [Best practices for API key
safety](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-
safety).
API key usage can be monitored on the [Usage page](/usage) once tracking is
enabled. If you are using an API key generated prior to Dec 20, 2023 tracking will
not be enabled by default. You can enable tracking going forward on the [API key
management dashboard](/api-keys). All API keys generated past Dec 20, 2023 have
tracking enabled. Any previous untracked usage will be displayed as `Untracked` in
the dashboard.

### Staging projects

As you scale, you may want to create separate projects for your staging and
production environments. You can create these projects in the dashboard, allowing
you to isolate your development and testing work, so you don't accidentally disrupt
your live application. You can also limit user access to your production project,
and set custom rate and spend limits per project.

Scaling your solution architecture


----------------------------------

When designing your application or service for production that uses our API, it's
important to consider how you will scale to meet traffic demands. There are a few
key areas you will need to consider regardless of the cloud service provider of
your choice:

* **Horizontal scaling**: You may want to scale your application out horizontally
to accommodate requests to your application that come from multiple sources. This
could involve deploying additional servers or containers to distribute the load. If
you opt for this type of scaling, make sure that your architecture is designed to
handle multiple nodes and that you have mechanisms in place to balance the load
between them.
* **Vertical scaling**: Another option is to scale your application up
vertically, meaning you can beef up the resources available to a single node. This
would involve upgrading your server's capabilities to handle the additional load.
If you opt for this type of scaling, make sure your application is designed to take
advantage of these additional resources.
* **Caching**: By storing frequently accessed data, you can improve response
times without needing to make repeated calls to our API. Your application will need
to be designed to use cached data whenever possible and invalidate the cache when
new information is added. There are a few different ways you could do this. For
example, you could store data in a database, filesystem, or in-memory cache,
depending on what makes the most sense for your application.
* **Load balancing**: Finally, consider load-balancing techniques to ensure
requests are distributed evenly across your available servers. This could involve
using a load balancer in front of your servers or using DNS round-robin. Balancing
the load will help improve performance and reduce bottlenecks.

### Managing rate limits

When using our API, it's important to understand and plan for [rate
limits](/docs/guides/rate-limits).

Improving latencies
-------------------

Check out our most up-to-date guide on [latency optimization](/docs/guides/latency-


optimization).

Latency is the time it takes for a request to be processed and a response to be


returned. In this section, we will discuss some factors that influence the latency
of our text generation models and provide suggestions on how to reduce it.

The latency of a completion request is mostly influenced by two factors: the model
and the number of tokens generated. The life cycle of a completion request looks
like this:

Network

End user to API latency

Server

Time to process prompt tokens

Server

Time to sample/generate tokens

Network

API to end user latency

The bulk of the latency typically arises from the token generation step.

> **Intuition**: Prompt tokens add very little latency to completion calls. Time to
generate completion tokens is much longer, as tokens are generated one at a time.
Longer generation lengths will accumulate latency due to generation required for
each token.

### Common factors affecting latency and possible mitigation techniques

Now that we have looked at the basics of latency, let’s take a look at various
factors that can affect latency, broadly ordered from most impactful to least
impactful.

#### Model

Our API offers different models with varying levels of complexity and generality.
The most capable models, such as `gpt-4`, can generate more complex and diverse
completions, but they also take longer to process your query. Models such as `gpt-
4o-mini`, can generate faster and cheaper chat completions, but they may generate
results that are less accurate or relevant for your query. You can choose the model
that best suits your use case and the trade-off between speed, cost, and quality.

#### Number of completion tokens

Requesting a large amount of generated tokens completions can lead to increased


latencies:

* **Lower max tokens**: for requests with a similar token generation count, those
that have a lower `max_tokens` parameter incur less latency.
* **Include stop sequences**: to prevent generating unneeded tokens, add a stop
sequence. For example, you can use stop sequences to generate a list with a
specific number of items. In this case, by using `11.` as a stop sequence, you can
generate a list with only 10 items, since the completion will stop when `11.` is
reached. [Read our help article on stop
sequences](https://help.openai.com/en/articles/5072263-how-do-i-use-stop-sequences)
for more context on how you can do this.
* **Generate fewer completions**: lower the values of `n` and `best_of` when
possible where `n` refers to how many completions to generate for each prompt and
`best_of` is used to represent the result with the highest log probability per
token.

If `n` and `best_of` both equal 1 (which is the default), the number of generated
tokens will be at most, equal to `max_tokens`.

If `n` (the number of completions returned) or `best_of` (the number of completions


generated for consideration) are set to `> 1`, each request will create multiple
outputs. Here, you can consider the number of generated tokens as `[ max_tokens *
max (n, best_of) ]`

#### Streaming

Setting `stream: true` in a request makes the model start returning tokens as soon
as they are available, instead of waiting for the full sequence of tokens to be
generated. It does not change the time to get all the tokens, but it reduces the
time for first token for an application where we want to show partial progress or
are going to stop generations. This can be a better user experience and a UX
improvement so it’s worth experimenting with streaming.

#### Infrastructure

Our servers are currently located in the US. While we hope to have global
redundancy in the future, in the meantime you could consider locating the relevant
parts of your infrastructure in the US to minimize the roundtrip time between your
servers and the OpenAI servers.

#### Batching

Depending on your use case, batching _may help_. If you are sending multiple
requests to the same endpoint, you can [batch the prompts](/docs/guides/rate-
limits#batching-requests) to be sent in the same request. This will reduce the
number of requests you need to make. The prompt parameter can hold up to 20 unique
prompts. We advise you to test out this method and see if it helps. In some cases,
you may end up increasing the number of generated tokens which will slow the
response time.

Managing costs
--------------

To monitor your costs, you can set a [notification


threshold](/settings/organization/limits) in your account to receive an email alert
once you pass a certain usage threshold. You can also set a [monthly
budget](/settings/organization/limits). Please be mindful of the potential for a
monthly budget to cause disruptions to your application/users. Use the [usage
tracking dashboard](/settings/organization/usage) to monitor your token usage
during the current and past billing cycles.

### Text generation

One of the challenges of moving your prototype into production is budgeting for the
costs associated with running your application. OpenAI offers a [pay-as-you-go
pricing model](https://openai.com/api/pricing/), with prices per 1,000 tokens
(roughly equal to 750 words). To estimate your costs, you will need to project the
token utilization. Consider factors such as traffic levels, the frequency with
which users will interact with your application, and the amount of data you will be
processing.

**One useful framework for thinking about reducing costs is to consider costs as a
function of the number of tokens and the cost per token.** There are two potential
avenues for reducing costs using this framework. First, you could work to reduce
the cost per token by switching to smaller models for some tasks in order to reduce
costs. Alternatively, you could try to reduce the number of tokens required. There
are a few ways you could do this, such as by using shorter prompts, [fine-tuning]
(/docs/guides/fine-tuning) models, or caching common user queries so that they
don't need to be processed repeatedly.

You can experiment with our interactive [tokenizer tool](/tokenizer) to help you
estimate costs. The API and playground also returns token counts as part of the
response. Once you’ve got things working with our most capable model, you can see
if the other models can produce the same results with lower latency and costs.
Learn more in our [token usage help
article](https://help.openai.com/en/articles/6614209-how-do-i-check-my-token-
usage).

MLOps strategy
--------------

As you move your prototype into production, you may want to consider developing an
MLOps strategy. MLOps (machine learning operations) refers to the process of
managing the end-to-end life cycle of your machine learning models, including any
models you may be fine-tuning using our API. There are a number of areas to
consider when designing your MLOps strategy. These include

* Data and model management: managing the data used to train or fine-tune your
model and tracking versions and changes.
* Model monitoring: tracking your model's performance over time and detecting any
potential issues or degradation.
* Model retraining: ensuring your model stays up to date with changes in data or
evolving requirements and retraining or fine-tuning it as needed.
* Model deployment: automating the process of deploying your model and related
artifacts into production.

Thinking through these aspects of your application will help ensure your model
stays relevant and performs well over time.

Security and compliance


-----------------------

As you move your prototype into production, you will need to assess and address any
security and compliance requirements that may apply to your application. This will
involve examining the data you are handling, understanding how our API processes
data, and determining what regulations you must adhere to. Our [security practices]
(https://www.openai.com/security) and [trust and compliance
portal](https://trust.openai.com/) provide our most comprehensive and up-to-date
documentation. For reference, here is our [Privacy
Policy](https://openai.com/privacy/) and [Terms of
Use](https://openai.com/api/policies/terms/).

Some common areas you'll need to consider include data storage, data transmission,
and data retention. You might also need to implement data privacy protections, such
as encryption or anonymization where possible. In addition, you should follow best
practices for secure coding, such as input sanitization and proper error handling.

### Safety best practices


When creating your application with our API, consider our [safety best practices]
(/docs/guides/safety-best-practices) to ensure your application is safe and
successful. These recommendations highlight the importance of testing the product
extensively, being proactive about addressing potential issues, and limiting
opportunities for misuse.

Business considerations
-----------------------

As projects using AI move from prototype to production, it is important to consider


how to build a great product with AI and how that ties back to your core business.
We certainly don't have all the answers but a great starting place is a talk from
our Developer Day where we dive into this with some of our customers:
```

You might also like