0% found this document useful (0 votes)

12 views

How to Design a Rate Limiter API _ Learn System Design - GeeksforGeeks

This document outlines the design and implementation of a Rate Limiter API, which controls the number of requests made to a server within a specified time frame to prevent overload and ensure fair usage. It covers functional and non-functional requirements, high-level design considerations, and various algorithms for rate limiting, such as Token Bucket and Leaky Bucket. Additionally, it provides examples of widely used rate limiting APIs and emphasizes the importance of using server-side implementations for better security and resource management.

Uploaded by

Pallab Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

How to Design a Rate Limiter API _ Learn System Design - GeeksforGeeks

Uploaded by

Pallab Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Sale Ends In 01 : 14 : 16 Trending Now Data Structures Algorithms Foundational Courses Data Science

How to Design a Rate Limiter API | Learn System

Design
Read Discuss Courses

A Rate Limiter API is a tool that developers can use to define rules
that specify how many requests can be made in a given time period
and what actions should be taken when these limits are exceeded.

Rate limiting is an essential technique used in software systems to control

the rate of incoming requests. It helps to prevent the overloading of servers
by limiting the number of requests that can be made in a given time frame.

It helps to prevent a high volume of requests from overwhelming a server or

API. Here is a basic design for a rate limiter API In this article, we will
discuss the design of a rate limiter API, including its requirements, high-level
design, and algorithms used for rate limiting.

Why is rate limiting used?

Avoid resource starvation due to a Denial of Service (DoS) attack.
Ensure that servers are not overburdened. Using rate restriction per user
ensures fair and reasonable use without harming other users.
Control the flow of information, for example, prevent a single worker from
accumulating a backlog of unprocessed items while other workers are
idle.

Requirements to Design a Rate Limiter API

The requirements of a rate limiter API can be classified into two categories:
functional and non-functional.
AD

English, French, Chinese, Korean, Spanish, OPE

German, Italian, Vietnamese, etc.

Functional requirements to Design a Rate Limiter API:

The API should allow the definition of multiple rate-limiting rules.

The API should provide the ability to customize the response to clients
when rate limits are exceeded.
The API should allow for the storage and retrieval of rate-limit data.
The API should be implemented with proper error handling as in when
the threshold limit of requests are crossed for a single server or across
different combinations, the client should get a proper error message.

Non-functional requirements to Design a Rate Limiter API:

The API should be highly available and scalable. Availability is the main
pillar in the case of request fetching APIs.
The API should be secure and protected against malicious attacks.
The API should be easy to integrate with existing systems.
There should be low latency provided by the rate limiter to the system, as
performance is one of the key factors in the case of any system.

High Level Design (HLD) to Design a Rate Limiter API

Where to place the Rate Limiter – Client Side or Server Side?

A rate limiter should generally be implemented on the server side

rather than on the client side. This is because of the following points:
Positional Advantage: The server is in a better position to enforce
rate limits across all clients, whereas client-side rate limiting would
require every client to implement their own rate limiter, which
would be difficult to coordinate and enforce consistently.
Security: Implementing rate limiting on the server side also
provides better security, as it allows the server to prevent malicious
clients from overwhelming the system with a large number of
requests. If rate limiting were implemented on the client side, it
would be easier for attackers to bypass the rate limit by just
modifying or disabling the client-side code.
Flexible: Server-side rate limiting allows more flexibility in
adjusting the rate limits and managing resources. The server can
dynamically adjust the rate limits based on traffic patterns and
resource availability, and can also prioritize certain types of requests
or clients over others. Thus, lends to better utilization of available
resources, and also keeps performance good.

HLD of Rate Limiter API – rate limiter placed at server side

The overall basic structure of a rate limiter seems relatively simpler. We just
need a counter associated with each user to track how many requests are
being same submitted in a particular timeframe. The request is rejected if
the counter value hits the limit.

Memory Structure/Approximation
Thus, now let’s think of the data structure which might help us. Since
we need fast retrieval of the counter values associated with each user,
we can use a hash-table. Considering we have a key-value pair. The
key would contain hash value of each User Id, and the corresponding
value would be the pair or structure of counter and the startTime, e.g.,
UserId -> {counter, startTime}

Now, each UserId let’s say takes 8 bytes(long long) and the counter
takes 2 bytes(int), which for now can count to 50k(limit). Now for the
time if we store only the minute and seconds, it will also take 2 bytes.
So in total, we would need 12 bytes to store each user’s data.

Now considering the overhead of 10 bytes for each record in our hash-
table, we would be needing to track at least 5 million users at any
time(traffic), so the total memory in need would be:
(12+10)bytes*5 million = 110 MB

Key Components in the Rate Limiter

Define the rate limiting policy: The first step is to determine the policy
for rate limiting. This policy should include the maximum number of
requests allowed per unit of time, the time window for measuring
requests, and the actions to be taken when a limit is exceeded (e.g., return
an error code or delay the request).
Store request counts: The rate limiter API should keep track of the
number of requests made by each client. One way to do this is to use a
database, such as Redis or Cassandra, to store the request counts.
Identify the client: The API must identify each client that makes a
request. This can be done using a unique identifier such as an IP address
or an API key.
Handle incoming requests: When a client makes a request, the API
should first check if the client has exceeded their request limit within the
specified time window. If the limit has been reached, the API can take the
action specified in the rate-limiting policy (e.g., return an error code). If the
limit has not been reached, the API should update the request count for
the client and allow the request to proceed.
Set headers: When a request is allowed, the API should set appropriate
headers in the response to indicate the remaining number of requests that
the client can make within the time window, as well as the time at which
the limit will be reset.
Expose an endpoint: Finally, the rate limiter API should expose an
endpoint for clients to check their current rate limit status. This endpoint
can return the number of requests remaining within the time window, as
well as the time at which the limit will be reset.

Where should we keep the counters?

Due to the slowness of Database operations, it is not a smart option

for us. This problem can be handled by an in-memory cache such as
Redis. It is quick and supports the already implemented time-based
expiration technique.

We can rely on two commands being used with in-memory storage,

INCR: This is used for increasing the stored counter by 1.

EXPIRE: This is used for setting the timeout on the stored counter.
This counter is automatically deleted from the storage when the
timeout expires.

In this design, client requests pass through a rate limiter middleware,

which checks against the configured rate limits. The rate limiter
module stores and retrieves rate limit data from a backend storage
system. If a client exceeds a rate limit, the rate limiter module returns
an appropriate response to the client.

Algorithms to Design a Rate Limiter API

Several algorithms are used for rate limiting, including

The Token bucket,

Leaky bucket,
Sliding window logs, and
Sliding window counters.
Let’s discuss each algorithm in detail:

Token Bucket

The token bucket algorithm is a simple algorithm that uses a fixed-size

token bucket to limit the rate of incoming requests. The token bucket is filled
with tokens at a fixed rate, and each request requires a token to be
processed. If the bucket is empty, the request is rejected.

The token bucket algorithm can be implemented using the following steps:

Initialize the token bucket with a fixed number of tokens.

For each request, remove a token from the bucket.
If there are no tokens left in the bucket, reject the request.
Add tokens to the bucket at a fixed rate.

Thus, by allocating a bucket with a predetermined number of tokens for each

user, we are successfully limiting the number of requests per user per time
unit. When the counter of tokens comes down to 0 for a certain user, we
know that he or she has reached the maximum amount of requests in a
particular timeframe. The bucket will be auto-refilled whenever the new
timeframe starts.

Token bucket example with initial bucket token count of 3 for each user in one minute

Leaky Bucket
It is based on the idea that if the average rate at which water is poured
exceeds the rate at which the bucket leaks, the bucket will overflow.

The leaky bucket algorithm is similar to the token bucket algorithm, but
instead of using a fixed-size token bucket, it uses a leaky bucket that
empties at a fixed rate. Each incoming request adds to the bucket’s depth,
and if the bucket overflows, the request is rejected.

One way to implement this is using a queue, which corresponds to the

bucket that will contain the incoming requests. Whenever a new request is
made, it is added to the queue’s end. If the queue is full at any time, then the
additional requests are discarded.

The leaky bucket algorithm can be separated into the following concepts:

Initialize the leaky bucket with a fixed depth and a rate at which it leaks.
For each request, add to the bucket’s depth.
If the bucket’s depth exceeds its capacity, reject the request.
Leak the bucket at a fixed rate.

Leaky bucket example with token count per user per minute is 3, which is the queue size.

Sliding Window Logs

Another approach to rate limiting is to use sliding window logs. This data
structure involves a “window” of fixed size that slides along a timeline of
events, storing information about the events that fall within the window at
any given time.
The window can be thought of as a buffer of limited size that holds the most
recent events or changes that have occurred. As new events or changes
occur, they are added to the buffer, and old events that fall outside of the
window are removed. This ensures that the buffer stays within its fixed size,
and only contains the most recent events.

This rate limitation keeps track of each client’s request in a time-stamped

log. These logs are normally stored in a time-sorted hash set or table.

The sliding window logs algorithm can be implemented using the following
steps:

A time-sorted queue or hash table of timestamps within the time range of

the most recent window is maintained for each client making the
requests.
When a certain length of the queue is reached or after a certain number
of minutes, whenever a new request comes, a check is done for any
timestamps older than the current window time.
The queue is updated with new timestamp of incoming request and if
number of elements in queue does not exceed the authorised count, it is
proceeded otherwise an exception is triggered.

Sliding window logs in a timeframe of 1 minute

Sliding Window Counters

The sliding window counter algorithm is an optimization over sliding

window logs. As we can see in the previous approach, memory usage is
high. For example, to manage numerous users or huge window timeframes,
all the request timestamps must be kept for a window time, which
eventually uses a huge amount of memory. Also, removing numerous
timestamps older than a particular timeframe means high complexity of time
as well.

To reduce surges of traffic, this algorithm accounts for a weighted value of

the previous window’s request based on timeframe. If we have a one-minute
rate limit, we can record the counter for each second and calculate the sum
of all counters in the previous minute whenever we get a new request to
determine the throttling limit.

The sliding window counters can be separated into the following concepts:

Remove all counters which are more than 1 minute old.

If a request comes which falls in the current bucket, the counter is
increased.
If a request comes when the current bucket has reached it’s throat limit,
the request is blocked.

sliding window counters with a timeframe of 20 seconds

Examples of Rate Limiting APIs used worldwide

Google Cloud Endpoints: It is a platform for building APIs that includes a
built-in rate limiter to help prevent excessive API usage.
AWS API Gateway: Amazon Web Services (AWS) API Gateway includes
a feature called Usage Plans that allows for rate limiting and throttling of
API requests.
Akamai API Gateway: Akamai API Gateway is a cloud-based platform
that includes a rate limiter feature for controlling API requests.
Cloudflare Rate Limiting: Cloudflare’s Rate Limiting feature helps
prevent DDoS attacks and other types of abusive traffic by limiting the
number of requests that can be made to an API.
Redis: It is an in-memory data structure store that can be used as a
database, cache, and message broker. It includes several features that
make it useful for implementing a rate limiter, such as its ability to store
data in memory for fast access and its support for atomic operations.

Whether you're preparing for your first job interview or aiming to upskill in
this ever-evolving tech landscape, GeeksforGeeks Courses are your key to
success. We provide top-quality content at affordable prices, all geared
towards accelerating your growth in a time-bound manner. Join the millions
we've already empowered, and we're here to do the same for you. Don't
miss out - check it out now!

Last Updated : 16 Mar, 2023 1

Similar Reads
What is System Design - What is High Level Design –
Learn System Design Learn System Design

What is Low Level Design or REST API vs GraphQL API vs

LLD - Learn System Design gRPC API

What is Scalability and How Analysis of Monolithic and

to achieve it - Learn System Distributed Systems - Learn
Design System Design

Important Key Concepts Why is it important to learn

and Terminologies – Learn System Design?
System Design

Complete Reference to Eventual Consistency in

Databases in Designing Distributive Systems | Learn
Systems - Learn System… System Design

How to Design a Rate Limiter API _ Learn System Design - GeeksforGeeks

Uploaded by

How to Design a Rate Limiter API _ Learn System Design - GeeksforGeeks

Uploaded by

Sale Ends In 01 : 14 : 16 Trending Now Data Structures Algorithms Foundational Courses Data Science

How to Design a Rate Limiter API | Learn System

Rate limiting is an essential technique used in software systems to control

It helps to prevent a high volume of requests from overwhelming a server or

Why is rate limiting used?

Requirements to Design a Rate Limiter API

English, French, Chinese, Korean, Spanish, OPE

Functional requirements to Design a Rate Limiter API:

The API should allow the definition of multiple rate-limiting rules.

Non-functional requirements to Design a Rate Limiter API:

High Level Design (HLD) to Design a Rate Limiter API

Where to place the Rate Limiter – Client Side or Server Side?

A rate limiter should generally be implemented on the server side

HLD of Rate Limiter API – rate limiter placed at server side

Key Components in the Rate Limiter

Where should we keep the counters?

Due to the slowness of Database operations, it is not a smart option

We can rely on two commands being used with in-memory storage,

INCR: This is used for increasing the stored counter by 1.

In this design, client requests pass through a rate limiter middleware,

Algorithms to Design a Rate Limiter API

The Token bucket,

The token bucket algorithm is a simple algorithm that uses a fixed-size

Initialize the token bucket with a fixed number of tokens.

Thus, by allocating a bucket with a predetermined number of tokens for each

One way to implement this is using a queue, which corresponds to the

Sliding Window Logs

This rate limitation keeps track of each client’s request in a time-stamped

A time-sorted queue or hash table of timestamps within the time range of

Sliding window logs in a timeframe of 1 minute

Sliding Window Counters

The sliding window counter algorithm is an optimization over sliding

To reduce surges of traffic, this algorithm accounts for a weighted value of

Remove all counters which are more than 1 minute old.

sliding window counters with a timeframe of 20 seconds

Examples of Rate Limiting APIs used worldwide

Last Updated : 16 Mar, 2023 1

What is Low Level Design or REST API vs GraphQL API vs

What is Scalability and How Analysis of Monolithic and

Important Key Concepts Why is it important to learn

Complete Reference to Eventual Consistency in

You might also like