Rate Limits - OpenAI API 1
Rate Limits - OpenAI API 1
Rate limits
Overview
A rate limit is a restriction that an API imposes on the number of times a user or client can
access the server within a specified period of time.
Rate limits are a common practice for APIs, and they're put in place for a few different
reasons:
They help protect against abuse or misuse of the API. For example, a malicious actor
could flood the API with requests in an attempt to overload it or cause disruptions in
service. By setting rate limits, OpenAI can prevent this kind of activity.
Rate limits help ensure that everyone has fair access to the API. If one person or
organization makes an excessive number of requests, it could bog down the API for
everyone else. By throttling the number of requests that a single user can make, OpenAI
ensures that the most number of people have an opportunity to use the API without
experiencing slowdowns.
Rate limits can help OpenAI manage the aggregate load on its infrastructure. If
requests to the API increase dramatically, it could tax the servers and cause
performance issues. By setting rate limits, OpenAI can help maintain a smooth and
consistent experience for all users.
Please work through this document in its entirety to better understand how
OpenAI’s rate limit system works. We include code examples and possible solutions
to handle common issues. It is recommended to follow this guidance before filling
out the Rate Limit Increase Request form with details regarding how to fill it out in
the last section.
https://platform.openai.com/docs/guides/rate-limits/overview 1/6
3/1/23, 3:35 PM Rate Limits - OpenAI API
We enforce rate limits at the organization level, not user level, based on the specific endpoint
used as well as the type of account you have. Rate limits are measured in two ways: RPM
(requests per minute) and TPM (tokens per minute). The table below highlights the default rate
limits for our API but these limits can be increased depending on your use case after filling
out the Rate Limit increase request form.
The TPM (tokens per minute) unit is different depending on the model:
In practical terms, this means you can send approximately 200x more tokens per minute to
an ada model versus a davinci model.
TE X T &
EMBEDDING CODE X EDIT IMAGE
Free trial users •20 RPM •20 RPM •20 RPM 50 images /
•150,000 TPM •40,000 •150,000 min
TPM TPM
Pay-as-you-go users (first 48 •60 RPM •20 RPM •20 RPM 50 images /
hours) •250,000 TPM* •40,000 •150,000 min
TPM TPM
Pay-as-you-go users (after •3,500 RPM •20 RPM •20 RPM 50 images /
48 hours) •350,000 TPM* •40,000 •150,000 min
TPM TPM
It is important to note that the rate limit can be hit by either option depending on what occurs
first. For example, you might send 20 requests with only 100 tokens to the Codex endpoint and
that would fill your limit, even if you did not send 40k tokens within those 20 requests.
If your rate limit is 60 requests per minute and 150k davinci tokens per minute, you’ll be
limited either by reaching the requests/min cap or running out of tokens—whichever happens
https://platform.openai.com/docs/guides/rate-limits/overview 2/6
3/1/23, 3:35 PM Rate Limits - OpenAI API
first. For example, if your max requests/min is 60, you should be able to send 1 request per
second. If you send 1 request every 800ms, once you hit your rate limit, you’d only need to
make your program sleep 200ms in order to send one more request otherwise subsequent
requests would fail. With the default of 3,000 requests/min, customers can effectively send 1
request every 20ms, or every .02 seconds.
If you hit a rate limit, it means you've made too many requests in a short period of time, and
the API is refusing to fulfill further requests until a specified amount of time has passed.
Error Mitigation
You should also exercise caution when providing programmatic access, bulk processing
features, and automated social media posting - consider only enabling these for trusted
customers.
To protect against automated and high-volume misuse, set a usage limit for individual users
within a specified time frame (daily, weekly, or monthly). Consider implementing a hard cap or
a manual review process for users who exceed the limit.
https://platform.openai.com/docs/guides/rate-limits/overview 3/6
3/1/23, 3:35 PM Rate Limits - OpenAI API
One easy way to avoid rate limit errors is to automatically retry requests with a random
exponential backoff. Retrying with exponential backoff means performing a short sleep when
a rate limit error is hit, then retrying the unsuccessful request. If the request is still
unsuccessful, the sleep length is increased and the process is repeated. This continues until
the request is successful or until a maximum number of retries is reached. This approach has
many benefits:
Automatic retries means you can recover from rate limit errors without crashes or missing
data
Exponential backoff means that your first retries can be tried quickly, while still benefiting
from longer delays if your first few retries fail
Adding random jitter to the delay helps retries from all hitting at the same time.
Below are a few example solutions for Python that use exponential backoff.
Batching requests
The OpenAI API has separate limits for requests per minute and tokens per minute.
If you're hitting the limit on requests per minute, but have available capacity on tokens per
minute, you can increase your throughput by batching multiple tasks into each request. This
will allow you to process more tokens per minute, especially with our smaller models.
Sending in a batch of prompts works exactly the same as a normal API call, except you pass
in a list of strings to the prompt parameter instead of a single string.
https://platform.openai.com/docs/guides/rate-limits/overview 4/6
3/1/23, 3:35 PM Rate Limits - OpenAI API
Warning: the response object may not return completions in the order of the
prompts, so always remember to match responses back to prompts using the index
field.
Request Increase
Keep in mind that rate limit increases can sometimes take 7-10 days so it makes sense to try
and plan ahead and submit early if there is data to support you will reach your rate limit given
your current growth numbers.
I’ve implemented exponential backoff for my text/code APIs, but I’m still
hitting this error. How do I increase my rate limit?
https://platform.openai.com/docs/guides/rate-limits/overview 5/6
3/1/23, 3:35 PM Rate Limits - OpenAI API
Currently, we don’t support increasing our free beta endpoints, such as the edit endpoint. We
also don’t increase ChatGPT rate limits but you can join the waitlist for ChatGPT Professional
access.
We understand the frustration that limited rate limits can cause, and we would love to raise
the defaults for everyone. However, due to shared capacity constraints, we can only approve
rate limit increases for paid customers who have demonstrated a need through our Rate Limit
Increase Request form. To help us evaluate your needs properly, we ask that you please
provide statistics on your current usage or projections based on historic user activity in the
'Share evidence of need' section of the form. If this information is not available, we
recommend a phased release approach. Start by releasing the service to a subset of users at
your current rate limits, gather usage data for 10 business days, and then submit a formal rate
limit increase request based on that data for our review and approval.
We will review your request and if it is approved, we will notify you of the approval within a
period of 7-10 business days.
Here are some examples of how you might fill out this form:
https://platform.openai.com/docs/guides/rate-limits/overview 6/6