0% found this document useful (0 votes)

150 views

Rate Limits - OpenAI API 1

The document discusses OpenAI's API rate limits, which restrict the number of requests a user can make within a time period. Rate limits help protect against abuse, ensure fair access, and manage infrastructure load. The default limits are provided for different account types and models. If users exceed the limit, their requests will fail. The document recommends techniques like exponential backoff retries and batching requests to handle rate limit errors. It also provides guidance on when and how to apply for a rate limit increase.

Uploaded by

Jonathan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views

Rate Limits - OpenAI API 1

Uploaded by

Jonathan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

3/1/23, 3:35 PM Rate Limits - OpenAI API

Rate limits

Overview

What are rate limits?

A rate limit is a restriction that an API imposes on the number of times a user or client can
access the server within a specified period of time.

Why do we have rate limits?

Rate limits are a common practice for APIs, and they're put in place for a few different
reasons:

They help protect against abuse or misuse of the API. For example, a malicious actor
could flood the API with requests in an attempt to overload it or cause disruptions in
service. By setting rate limits, OpenAI can prevent this kind of activity.
Rate limits help ensure that everyone has fair access to the API. If one person or
organization makes an excessive number of requests, it could bog down the API for
everyone else. By throttling the number of requests that a single user can make, OpenAI
ensures that the most number of people have an opportunity to use the API without
experiencing slowdowns.
Rate limits can help OpenAI manage the aggregate load on its infrastructure. If
requests to the API increase dramatically, it could tax the servers and cause
performance issues. By setting rate limits, OpenAI can help maintain a smooth and
consistent experience for all users.

Please work through this document in its entirety to better understand how
OpenAI’s rate limit system works. We include code examples and possible solutions
to handle common issues. It is recommended to follow this guidance before filling
out the Rate Limit Increase Request form with details regarding how to fill it out in
the last section.

What are the rate limits for our API?

https://platform.openai.com/docs/guides/rate-limits/overview 1/6
3/1/23, 3:35 PM Rate Limits - OpenAI API

We enforce rate limits at the organization level, not user level, based on the specific endpoint
used as well as the type of account you have. Rate limits are measured in two ways: RPM
(requests per minute) and TPM (tokens per minute). The table below highlights the default rate
limits for our API but these limits can be increased depending on your use case after filling
out the Rate Limit increase request form.

The TPM (tokens per minute) unit is different depending on the model:

T YPE 1 TPM EQUALS

davinci 1 token per minute

curie 25 tokens per minute

babbage 100 tokens per minute

ada 200 tokens per minute

In practical terms, this means you can send approximately 200x more tokens per minute to
an ada model versus a davinci model.

TE X T &
EMBEDDING CODE X EDIT IMAGE

Free trial users •20 RPM •20 RPM •20 RPM 50 images /
•150,000 TPM •40,000 •150,000 min
TPM TPM

Pay-as-you-go users (first 48 •60 RPM •20 RPM •20 RPM 50 images /
hours) •250,000 TPM* •40,000 •150,000 min
TPM TPM

Pay-as-you-go users (after •3,500 RPM •20 RPM •20 RPM 50 images /
48 hours) •350,000 TPM* •40,000 •150,000 min
TPM TPM

It is important to note that the rate limit can be hit by either option depending on what occurs
first. For example, you might send 20 requests with only 100 tokens to the Codex endpoint and
that would fill your limit, even if you did not send 40k tokens within those 20 requests.

How do rate limits work?

If your rate limit is 60 requests per minute and 150k davinci tokens per minute, you’ll be
limited either by reaching the requests/min cap or running out of tokens—whichever happens

https://platform.openai.com/docs/guides/rate-limits/overview 2/6
3/1/23, 3:35 PM Rate Limits - OpenAI API

first. For example, if your max requests/min is 60, you should be able to send 1 request per
second. If you send 1 request every 800ms, once you hit your rate limit, you’d only need to
make your program sleep 200ms in order to send one more request otherwise subsequent
requests would fail. With the default of 3,000 requests/min, customers can effectively send 1
request every 20ms, or every .02 seconds.

What happens if I hit a rate limit error?

Rate limit errors look like this:

Rate limit reached for default-text-davinci-002 in organization org-{id} on requests per

min. Limit: 20.000000 / min. Current: 24.000000 / min.

If you hit a rate limit, it means you've made too many requests in a short period of time, and
the API is refusing to fulfill further requests until a specified amount of time has passed.

Rate limits vs max_tokens

Each model we offer has a limited number of tokens that can be passed in as input when
making a request. You cannot increase the maximum number of tokens a model takes in. For
example, if you are using text-ada-001 , the maximum number of tokens you can send to
this model is 2,048 tokens per request.

Error Mitigation

What are some steps I can take to mitigate this?

The OpenAI Cookbook has a python notebook that explains details on how to avoid rate limit
errors.

You should also exercise caution when providing programmatic access, bulk processing
features, and automated social media posting - consider only enabling these for trusted
customers.

To protect against automated and high-volume misuse, set a usage limit for individual users
within a specified time frame (daily, weekly, or monthly). Consider implementing a hard cap or
a manual review process for users who exceed the limit.

Retrying with exponential backoff

https://platform.openai.com/docs/guides/rate-limits/overview 3/6
3/1/23, 3:35 PM Rate Limits - OpenAI API

One easy way to avoid rate limit errors is to automatically retry requests with a random
exponential backoff. Retrying with exponential backoff means performing a short sleep when
a rate limit error is hit, then retrying the unsuccessful request. If the request is still
unsuccessful, the sleep length is increased and the process is repeated. This continues until
the request is successful or until a maximum number of retries is reached. This approach has
many benefits:

Automatic retries means you can recover from rate limit errors without crashes or missing
data
Exponential backoff means that your first retries can be tried quickly, while still benefiting
from longer delays if your first few retries fail
Adding random jitter to the delay helps retries from all hitting at the same time.

Note that unsuccessful requests contribute to your per-minute limit, so continuously

resending a request won’t work.

Below are a few example solutions for Python that use exponential backoff.

Example #1: Using the Tenacity library

Example #2: Using the backoff library

Example 3: Manual backoff implementation

Batching requests

The OpenAI API has separate limits for requests per minute and tokens per minute.

If you're hitting the limit on requests per minute, but have available capacity on tokens per
minute, you can increase your throughput by batching multiple tasks into each request. This
will allow you to process more tokens per minute, especially with our smaller models.

Sending in a batch of prompts works exactly the same as a normal API call, except you pass
in a list of strings to the prompt parameter instead of a single string.

https://platform.openai.com/docs/guides/rate-limits/overview 4/6
3/1/23, 3:35 PM Rate Limits - OpenAI API

Example without batching

Example with batching

Warning: the response object may not return completions in the order of the
prompts, so always remember to match responses back to prompts using the index
field.

Request Increase

When should I consider applying for a rate limit increase?

Our default rate limits help us maximize stability and prevent abuse of our API. We increase
limits to enable high-traffic applications, so the best time to apply for a rate limit increase is
when you feel that you have the necessary traffic data to support a strong case for increasing
the rate limit. Large rate limit increase requests without supporting data are not likely to be
approved. If you're gearing up for a product launch, please obtain the relevant data through a
phased release over 10 days.

Keep in mind that rate limit increases can sometimes take 7-10 days so it makes sense to try
and plan ahead and submit early if there is data to support you will reach your rate limit given
your current growth numbers.

Will my rate limit increase request be rejected?

A rate limit increase request is most often rejected because it lacks the data needed to justify
the increase. We have provided numerical examples below that show how to best support a
rate limit increase request and try our best to approve all requests that align with our safety
policy and show supporting data. We are committed to enabling developers to scale and be
successful with our API.

I’ve implemented exponential backoff for my text/code APIs, but I’m still
hitting this error. How do I increase my rate limit?

https://platform.openai.com/docs/guides/rate-limits/overview 5/6
3/1/23, 3:35 PM Rate Limits - OpenAI API

Currently, we don’t support increasing our free beta endpoints, such as the edit endpoint. We
also don’t increase ChatGPT rate limits but you can join the waitlist for ChatGPT Professional
access.

We understand the frustration that limited rate limits can cause, and we would love to raise
the defaults for everyone. However, due to shared capacity constraints, we can only approve
rate limit increases for paid customers who have demonstrated a need through our Rate Limit
Increase Request form. To help us evaluate your needs properly, we ask that you please
provide statistics on your current usage or projections based on historic user activity in the
'Share evidence of need' section of the form. If this information is not available, we
recommend a phased release approach. Start by releasing the service to a subset of users at
your current rate limits, gather usage data for 10 business days, and then submit a formal rate
limit increase request based on that data for our review and approval.

We will review your request and if it is approved, we will notify you of the approval within a
period of 7-10 business days.

Here are some examples of how you might fill out this form:

DALL-E API examples

Language model examples

https://platform.openai.com/docs/guides/rate-limits/overview 6/6

API Reference - OpenAI API
No ratings yet
API Reference - OpenAI API
46 pages
Software Testing Principles and Practices
100% (1)
Software Testing Principles and Practices
674 pages
Entity-Relationship Modeling
No ratings yet
Entity-Relationship Modeling
40 pages
Assistants Tools - OpenAI API
No ratings yet
Assistants Tools - OpenAI API
12 pages
Rate Limits - OpenAI API 3
No ratings yet
Rate Limits - OpenAI API 3
6 pages
Rate Limits - OpenAI API 2
No ratings yet
Rate Limits - OpenAI API 2
6 pages
Open Assistant-Open-Source Chat Assistant
No ratings yet
Open Assistant-Open-Source Chat Assistant
2 pages
OpenAI GPT-3 Prominent Features
No ratings yet
OpenAI GPT-3 Prominent Features
1 page
AI Agent Index
No ratings yet
AI Agent Index
15 pages
Dolly2.0 Ready For Commercial Use
No ratings yet
Dolly2.0 Ready For Commercial Use
3 pages
Text2Video-Zero: High-Quality and Consistent Video Generation With Low Overhead
No ratings yet
Text2Video-Zero: High-Quality and Consistent Video Generation With Low Overhead
3 pages
EMAIL Investigaion
100% (1)
EMAIL Investigaion
5 pages
Meta AI's Chameleon: A Revolutionary Leap in Mixed-Modal AI
No ratings yet
Meta AI's Chameleon: A Revolutionary Leap in Mixed-Modal AI
8 pages
Fake News Detection PPT
No ratings yet
Fake News Detection PPT
13 pages
Micro-Framework: Presented By-Khirod Kumar Behera
No ratings yet
Micro-Framework: Presented By-Khirod Kumar Behera
10 pages
Chatgpt Theory Final
No ratings yet
Chatgpt Theory Final
18 pages
Hope To Skills: Lecture# 04 Irfan Malik, Dr. Sheraz Naseer
No ratings yet
Hope To Skills: Lecture# 04 Irfan Malik, Dr. Sheraz Naseer
10 pages
Chat GPT Integration
No ratings yet
Chat GPT Integration
2 pages
Vehicle Registration Statistics of Private Vehicle Goods Vehicle and Others by Year
No ratings yet
Vehicle Registration Statistics of Private Vehicle Goods Vehicle and Others by Year
2 pages
New Microsoft Word Document
100% (1)
New Microsoft Word Document
25 pages
Prompting - Unleashing the Potential of Prompt Engineering in Large Language Models
No ratings yet
Prompting - Unleashing the Potential of Prompt Engineering in Large Language Models
58 pages
AI Made Easy For All
No ratings yet
AI Made Easy For All
54 pages
Jailbreaking for Education Inquiry (4)
No ratings yet
Jailbreaking for Education Inquiry (4)
66 pages
Ai Tools Links
No ratings yet
Ai Tools Links
1 page
Chapter 2: Chatgpt in Academic Writing and Publishing: A Comprehensive Guide
No ratings yet
Chapter 2: Chatgpt in Academic Writing and Publishing: A Comprehensive Guide
8 pages
Introduction To ChatGPT
No ratings yet
Introduction To ChatGPT
8 pages
[Ebooks PDF] download AI-Assisted Data Science (MEAP V02): Large Language Models for multimodal data analysis Immanuel Trummer full chapters
100% (5)
[Ebooks PDF] download AI-Assisted Data Science (MEAP V02): Large Language Models for multimodal data analysis Immanuel Trummer full chapters
22 pages
Generating Synthetic Data For Context-Aware Recommender Systems
No ratings yet
Generating Synthetic Data For Context-Aware Recommender Systems
5 pages
Gum Road
No ratings yet
Gum Road
14 pages
Generative AI
No ratings yet
Generative AI
2 pages
Mastering AI Prompting A Complete Guide
No ratings yet
Mastering AI Prompting A Complete Guide
7 pages
HuggingChat: The New Open-Source Chatbot Challenging ChatGPT
No ratings yet
HuggingChat: The New Open-Source Chatbot Challenging ChatGPT
4 pages
AI Powered Voice Assistant
No ratings yet
AI Powered Voice Assistant
9 pages
Manual ChatGPT To Flashcards in Excel - Adjusted For English Dictionary
No ratings yet
Manual ChatGPT To Flashcards in Excel - Adjusted For English Dictionary
6 pages
Chat GPT
No ratings yet
Chat GPT
12 pages
State of GPT
No ratings yet
State of GPT
50 pages
Gemini For Google Workspace Prompting Guide
No ratings yet
Gemini For Google Workspace Prompting Guide
2 pages
Exploring The Role of AI in Public Sector Accounting Education and Research
No ratings yet
Exploring The Role of AI in Public Sector Accounting Education and Research
14 pages
AutoGPT - AutoBusDevTool
No ratings yet
AutoGPT - AutoBusDevTool
3 pages
Generative AI - 48 Hours TOC
No ratings yet
Generative AI - 48 Hours TOC
4 pages
FoundationsGPT
No ratings yet
FoundationsGPT
8 pages
Building of Personalised AI Assistant
100% (1)
Building of Personalised AI Assistant
12 pages
Building Python App Using ChatGPT
No ratings yet
Building Python App Using ChatGPT
10 pages
ChatGPT PPT First 8 Slides
No ratings yet
ChatGPT PPT First 8 Slides
8 pages
ChatGPT For Translators With Gaby-T Slides
No ratings yet
ChatGPT For Translators With Gaby-T Slides
48 pages
A Survey of AI Text-to-Image and AI Text-to-Video Generators
No ratings yet
A Survey of AI Text-to-Image and AI Text-to-Video Generators
5 pages
An Intelligent Chatbot Using Deep Learning With Bidir - 2021 - Materials Today PDF
No ratings yet
An Intelligent Chatbot Using Deep Learning With Bidir - 2021 - Materials Today PDF
8 pages
Chat GPT Presentation
No ratings yet
Chat GPT Presentation
10 pages
Digital Conversational Assets and Accelerators
No ratings yet
Digital Conversational Assets and Accelerators
23 pages
What Is Google Bard?
0% (1)
What Is Google Bard?
3 pages
GPT-4o API Deep Dive Text Generation Vision and Function Calling
No ratings yet
GPT-4o API Deep Dive Text Generation Vision and Function Calling
21 pages
Chat GPT 101
No ratings yet
Chat GPT 101
10 pages
Analysing Chatgpt's Potential Through The Lens of Creating Research Papers
No ratings yet
Analysing Chatgpt's Potential Through The Lens of Creating Research Papers
17 pages
Prompts Bluewillow
No ratings yet
Prompts Bluewillow
1 page
Full Download The Book of Chatbots: From ELIZA to ChatGPT Robert Ciesla PDF DOCX
100% (1)
Full Download The Book of Chatbots: From ELIZA to ChatGPT Robert Ciesla PDF DOCX
50 pages
ChatGPT: A Revolutionary Human-Machine Communication Technology
No ratings yet
ChatGPT: A Revolutionary Human-Machine Communication Technology
3 pages
Generative AI For Everyone - Coursera
No ratings yet
Generative AI For Everyone - Coursera
5 pages
CHATGPT Training
No ratings yet
CHATGPT Training
1 page
Report For Chatbot Using NLTK Library Using Python Programming Python For Machine Learning (Int 522)
No ratings yet
Report For Chatbot Using NLTK Library Using Python Programming Python For Machine Learning (Int 522)
9 pages
CHATBOT: Architecture, Design, & Development
No ratings yet
CHATBOT: Architecture, Design, & Development
46 pages
Whatischatgpt 221208190752 7a70dcc8
No ratings yet
Whatischatgpt 221208190752 7a70dcc8
16 pages
ChatGPT Prompt Guide by the Rundown
No ratings yet
ChatGPT Prompt Guide by the Rundown
16 pages
Finding IP Address of A Domain
No ratings yet
Finding IP Address of A Domain
2 pages
Testing
No ratings yet
Testing
123 pages
Ibm Qradar
No ratings yet
Ibm Qradar
39 pages
Chapter-01
No ratings yet
Chapter-01
31 pages
Cybersecurity Report For SMBs
100% (1)
Cybersecurity Report For SMBs
20 pages
IT Equipment Proposal Template
No ratings yet
IT Equipment Proposal Template
2 pages
Academia Basis Old
No ratings yet
Academia Basis Old
168 pages
Chapta 3
No ratings yet
Chapta 3
14 pages
Project Charter DMAIC
No ratings yet
Project Charter DMAIC
3 pages
Veritas - Testking.vcs 278.v2019!11!24.by .Philip.75q
No ratings yet
Veritas - Testking.vcs 278.v2019!11!24.by .Philip.75q
38 pages
PAssword Remove With Hiren Boot
0% (1)
PAssword Remove With Hiren Boot
4 pages
Fresco
No ratings yet
Fresco
29 pages
RS RAMS Flowchart
No ratings yet
RS RAMS Flowchart
2 pages
Prathmesh Dudhnikar: UI Developer
No ratings yet
Prathmesh Dudhnikar: UI Developer
2 pages
Readme
No ratings yet
Readme
3 pages
XML Parsing PeopleCode
No ratings yet
XML Parsing PeopleCode
3 pages
PCO 15.2 Security Guide
No ratings yet
PCO 15.2 Security Guide
26 pages
V7000 Storage System Performance Assessment Report Example ABC Company
No ratings yet
V7000 Storage System Performance Assessment Report Example ABC Company
26 pages
Salsa 8 Backup And-Restore
No ratings yet
Salsa 8 Backup And-Restore
41 pages
Forma 1:: Using Using Using Using Using Using Using Using Namespace Public Partial Class Public
No ratings yet
Forma 1:: Using Using Using Using Using Using Using Using Namespace Public Partial Class Public
5 pages
SQL Performance Explained
No ratings yet
SQL Performance Explained
122 pages
Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
Testing Effort Estimation Guidelines
No ratings yet
Testing Effort Estimation Guidelines
6 pages
Table of Content of Srs Document
No ratings yet
Table of Content of Srs Document
3 pages
The Cyber Security Professional
No ratings yet
The Cyber Security Professional
9 pages
Linux Installation CheckList
No ratings yet
Linux Installation CheckList
11 pages
1day Plc2013 06 TIA STEP 7 Basic
No ratings yet
1day Plc2013 06 TIA STEP 7 Basic
16 pages
Tomcat Documentation V1.0
No ratings yet
Tomcat Documentation V1.0
7 pages

Rate Limits - OpenAI API 1

Uploaded by

Rate Limits - OpenAI API 1

Uploaded by

3/1/23, 3:35 PM Rate Limits - OpenAI API

What are rate limits?

Why do we have rate limits?

What are the rate limits for our API?

T YPE 1 TPM EQUALS

davinci 1 token per minute

curie 25 tokens per minute

babbage 100 tokens per minute

ada 200 tokens per minute

How do rate limits work?

What happens if I hit a rate limit error?

Rate limit reached for default-text-davinci-002 in organization org-{id} on requests per

Rate limits vs max_tokens

What are some steps I can take to mitigate this?

Retrying with exponential backoff

Note that unsuccessful requests contribute to your per-minute limit, so continuously

Example #1: Using the Tenacity library

Example #2: Using the backoff library

Example 3: Manual backoff implementation

Example without batching

Example with batching

When should I consider applying for a rate limit increase?

Will my rate limit increase request be rejected?

DALL-E API examples

Language model examples

You might also like