How to Design a Rate Limiter API _ Learn System Design - GeeksforGeeks
How to Design a Rate Limiter API _ Learn System Design - GeeksforGeeks
A Rate Limiter API is a tool that developers can use to define rules
that specify how many requests can be made in a given time period
and what actions should be taken when these limits are exceeded.
The API should be highly available and scalable. Availability is the main
pillar in the case of request fetching APIs.
The API should be secure and protected against malicious attacks.
The API should be easy to integrate with existing systems.
There should be low latency provided by the rate limiter to the system, as
performance is one of the key factors in the case of any system.
The overall basic structure of a rate limiter seems relatively simpler. We just
need a counter associated with each user to track how many requests are
being same submitted in a particular timeframe. The request is rejected if
the counter value hits the limit.
Memory Structure/Approximation
Thus, now let’s think of the data structure which might help us. Since
we need fast retrieval of the counter values associated with each user,
we can use a hash-table. Considering we have a key-value pair. The
key would contain hash value of each User Id, and the corresponding
value would be the pair or structure of counter and the startTime, e.g.,
UserId -> {counter, startTime}
Now, each UserId let’s say takes 8 bytes(long long) and the counter
takes 2 bytes(int), which for now can count to 50k(limit). Now for the
time if we store only the minute and seconds, it will also take 2 bytes.
So in total, we would need 12 bytes to store each user’s data.
Now considering the overhead of 10 bytes for each record in our hash-
table, we would be needing to track at least 5 million users at any
time(traffic), so the total memory in need would be:
(12+10)bytes*5 million = 110 MB
Define the rate limiting policy: The first step is to determine the policy
for rate limiting. This policy should include the maximum number of
requests allowed per unit of time, the time window for measuring
requests, and the actions to be taken when a limit is exceeded (e.g., return
an error code or delay the request).
Store request counts: The rate limiter API should keep track of the
number of requests made by each client. One way to do this is to use a
database, such as Redis or Cassandra, to store the request counts.
Identify the client: The API must identify each client that makes a
request. This can be done using a unique identifier such as an IP address
or an API key.
Handle incoming requests: When a client makes a request, the API
should first check if the client has exceeded their request limit within the
specified time window. If the limit has been reached, the API can take the
action specified in the rate-limiting policy (e.g., return an error code). If the
limit has not been reached, the API should update the request count for
the client and allow the request to proceed.
Set headers: When a request is allowed, the API should set appropriate
headers in the response to indicate the remaining number of requests that
the client can make within the time window, as well as the time at which
the limit will be reset.
Expose an endpoint: Finally, the rate limiter API should expose an
endpoint for clients to check their current rate limit status. This endpoint
can return the number of requests remaining within the time window, as
well as the time at which the limit will be reset.
Token Bucket
The token bucket algorithm can be implemented using the following steps:
Token bucket example with initial bucket token count of 3 for each user in one minute
Leaky Bucket
It is based on the idea that if the average rate at which water is poured
exceeds the rate at which the bucket leaks, the bucket will overflow.
The leaky bucket algorithm is similar to the token bucket algorithm, but
instead of using a fixed-size token bucket, it uses a leaky bucket that
empties at a fixed rate. Each incoming request adds to the bucket’s depth,
and if the bucket overflows, the request is rejected.
The leaky bucket algorithm can be separated into the following concepts:
Initialize the leaky bucket with a fixed depth and a rate at which it leaks.
For each request, add to the bucket’s depth.
If the bucket’s depth exceeds its capacity, reject the request.
Leak the bucket at a fixed rate.
Leaky bucket example with token count per user per minute is 3, which is the queue size.
Another approach to rate limiting is to use sliding window logs. This data
structure involves a “window” of fixed size that slides along a timeline of
events, storing information about the events that fall within the window at
any given time.
The window can be thought of as a buffer of limited size that holds the most
recent events or changes that have occurred. As new events or changes
occur, they are added to the buffer, and old events that fall outside of the
window are removed. This ensures that the buffer stays within its fixed size,
and only contains the most recent events.
The sliding window logs algorithm can be implemented using the following
steps:
The sliding window counters can be separated into the following concepts:
Whether you're preparing for your first job interview or aiming to upskill in
this ever-evolving tech landscape, GeeksforGeeks Courses are your key to
success. We provide top-quality content at affordable prices, all geared
towards accelerating your growth in a time-bound manner. Join the millions
we've already empowered, and we're here to do the same for you. Don't
miss out - check it out now!
Similar Reads
What is System Design - What is High Level Design –
Learn System Design Learn System Design
Related Tutorials