SD Roadmap PDF
SD Roadmap PDF
Road Map 🏁
Eventual Consistency
Scalability
Caching
Consistency Types
Weak Consistency 📦 Load balancing vs Reverse Proxy
Health Checks
Bulkhead
Circuit Breakers
Merkle Trees
Quorum
Load Balancer vs Reverse Proxy
API Gateway
🧮 Push CDNs
Pull CDNs
Event-Driven Architecture App Layer
Event Sourcing
Saga
CQRS
Pull vs Push
Cache Layers
💨
Client Caching
CDN Caching
Caching Web Server Caching
Normalization vs Denormalization
Database Caching
NoSQL
Application Caching
Key-Value Store
Cache Strategies
Document Store
Refresh-Ahead
Wide Column store
Graph Databases Write-behind
💾 Database Caching
Storage
Block storage
File storage
Object storage
Redundant Disk Arrays (RAID)
📭 Idempotent Operations
Message Queues
Asynchronism
Reactive Programming
Back Pressure
API Security
🏁
Table of Contents
introduction
8
What is System Design? 8
How To Approach: System Design? 9
For whom is this guide? 10
fundamentals
12
Latency vs Throughput 13
Availability vs Consistency 14
CAP Theorem 16
Reliability vs Resilience 17
scalability
19
Horizontal Scaling vs Vertical Scaling 19
Stateless Systems vs Stateful Systems 20
Eventual consistency 23
Caching 24
Load Balancing vs Reverse Proxy 25
Database Scaling 27
Consistency
29
Consistency Types 29
Strong Consistency: 29
Weak Consistency: 30
Eventual Consistency: 30
Consistency Patterns 31
Full Mesh: 31
Coordinator Service: 32
Distributed Cache: 32
Gossip Protocol: 33
Random Leader Election: 34
availability
35
Availability Patterns 35
Replication: 35
Failover: 36
Throttling / Rate Limiting 36
Queue-based Load Leveling 38
Health Checks 39
RESILIENCE
41
Bulkhead 41
Circuit Breaker 42
The Exponential Backoff and Retry 44
Fallback Mechanism 46
The Dead Letter Queue (DLQ) 47
design patterns
49
Bloom Filters 49
Consistent Hashing 50
Checksum 52
Merkle Trees 53
Quorum 55
LOAD-BALANCER
57
Load Balancer vs Reverse Proxy 57
Load Balancing Algorithms 59
Layer 4 Load Balancing vs Layer 7 Load Balancing 61
Domain Name System (DNS) 63
Proxy (Forward proxy) 64
Content Delivery Network (CDN) 65
Application Layer
68
N-tier applications 68
Microservices 71
Service Discovery 73
API Gateway 74
Event Driven Architecture (EDA) 77
Event Sourcing 80
Command Query Responsibility Segregation (CQRS) 83
Saga Pattern 86
Push vs Pull 88
caching
92
Caching Layers 92
Client Caching: 92
CDN (Content Delivery Network) Caching: 92
Web Server Caching: 93
Database Caching: 93
Application Caching: 93
Caching Strategies 94
Refresh Ahead (Read-Ahead) Caching: 94
Write Behind Caching: 94
Write-Through Caching: 95
Cache-Aside (Lazy Loading) Caching: 95
Cache Cleanup 97
Cache Invalidation: 97
Cache Eviction: 97
DATABASES
99
Normalization vs Denormalization 99
NoSQL Databases 101
Key-Value Store: 102
Document Store: 103
Wide-Column Store (Column-Family Store): 103
Graph Database: 104
ACID vs BASE 105
Replication 108
Sharding and partitioning 109
Data Modeling Factors 112
Storage 114
Block Storage: 114
File Storage: 114
Object Storage: 116
Asynchronism
118
Idempotent operations 118
Message Queues 119
Reactive programming 122
Back Pressure Handling 124
HTTP & REST 126
WebSocket vs gRPC 129
GraphQL 131
SECURITY
134
Data Encryption 134
Data Encryption in Transit: 134
Data Encryption at Rest: 134
Checksums and Hashing 137
Token-based authentication and sessions 140
Token-Based Authentication: 140
Session-Based Authentication: 141
API Security 143
introduction
What is System Design?
System design is basically the process of figuring out all the parts of a system and how
they work together to meet certain requirements. It's like solving a big puzzle, where
you break down a problem into smaller parts and design each part to work seamlessly
with the others.
Oh, and heads up: if you're interviewing for a software engineering job, you'll probably
have to do a system design interview round. That means you'll have to design a whole
system from scratch and talk about your choices along the way. It's all about finding
the right balance and weighing the pros and cons of each option.
System design is an iterative process, meaning you'll be testing and refining your
design multiple times until it meets all the requirements. You'll also take a look at any
current systems in place and see what's working and what's not.
+---------------------+
| Gather Requirements |
+---------------------+
|
V
+-----------------+
| Handle the Data |
+-----------------+
|
V
+-------------------+
| Define Components |
+-------------------+
|
V
+---------------------+
| Identify Trade-offs |
+---------------------+
How To Approach: System Design?
When you're diving into a system design puzzle, here's a chill list of steps to follow:
Functional Requirements
Nonfunctional Requirements
Read-heavy or Write-heavy
Consistency levels
Components interactions
Constraints considered
Cost Optimization
Performance Optimization
First, wrap your head around what you're building. No rocket science here, just
get what it's all about.
Gather Requirements:
Chat with folks involved and collect all the "wants" and "needs." Think of it as
making a wish list for your project.
Scalability:
Ensure your system can grow without breaking a sweat. Imagine it like hosting a
party and making sure you have enough snacks for everyone!
Break down your system into smaller parts. It's like assembling a jigsaw puzzle -
each piece has a role.
Visualize how data will move within your system. Imagine it as a flowchart for
your digital river.
Database Design:
System Architecture:
Plan how everything will fit together. It's like sketching a rough blueprint for
your dream house.
Security:
Guard your system like a fort. Keep out unwanted visitors and protect your
precious data.
Tinker, test, and tune. Make sure everything runs like a well-oiled machine.
Check your work and keep improving. It's like refining a recipe to make it
perfect.
Remember, system design is like building your very own digital playground - it's an art
and a science mixed with a dash of creativity and lots of problem-solving fun! 🎨🛠🚀
Goal: The goal of improving performance is to ensure that the system's operations are
completed quickly and efficiently, providing a smooth and responsive user experience.
Scalability:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Server │ │ Server │ │ Server │
└─────────┘ └─────────┘ └─────────┘
In summary, while both performance and scalability aim to improve the efficiency and
effectiveness of a system, they address different aspects.
• If your system is slow for a single user, you have a performance issue.
• If your system is fast for a single user but slows down when under heavy load,
you have a scalability issue.
Latency vs Throughput
Latency:
▲
|
| Latency
|
▼
Latency, often referred to as response time, is the amount of time it takes for a single
request to travel from the sender to the receiver and back, including any processing
time along the way. It is essentially the delay experienced by a single operation or
request. Low latency is generally desirable, as it means that requests are being handled
quickly and users are getting timely responses.
• Example: Consider a web server receiving requests from clients. The latency for
a particular request is the time it takes for the server to process the request and
send a response back to the client.
Throughput:
--> --> --> -->
Throughput, on the other hand, refers to the number of operations or requests that a
system can handle in a given amount of time. It is a measure of the system's processing
capacity. High throughput indicates that the system can handle a large volume of
operations effectively.
• Example: A database system might be able to handle 1000 queries per second.
This means its throughput is 1000 queries per second.
Availability vs Consistency
Availability:
╔════════════════════╗
║ ║
║ System ║
║ Operational ║
║ ║
╚════════════════════╝
Consistency:
╔════════════════════╗
║ ╱╲ ╱╲ ║
║ ╱ ╲ ╱ ╲ ║
║ ╱ ╲ ╱ ╲ ║
╚════════════════════╝
Consistency refers to the requirement that all nodes in a distributed system see the
same data at any given point in time, regardless of where the data is accessed. In other
words, updates made to the data in one part of the system should eventually be
reflected in all other parts of the system. Maintaining consistency can be challenging in
distributed systems, especially in the face of network partitions and failures.
It's important to note that there's a trade-off between availability and consistency,
often referred to as the CAP theorem (Consistency, Availability, Partition Tolerance).
Systems that prioritize high availability may sacrifice consistency, while systems that
prioritize consistency may sacrifice availability.
CAP Theorem
According to CAP theorem, in a distributed system, you can only support two of the
following guarantees:
C
/ \
/ \
/ \
A-------P
Consistency:
This means everyone sees the same data at the same time. Like in a game, if you shoot a
monster, everyone playing should see the monster getting hit right away.
Availability:
This means your system keeps working and doesn't crash, even if something goes
wrong. Like in the game, even if one part of the game slows down, you can still play in
other parts.
Partition Tolerance:
This means your system can handle things like the internet having problems and parts
of your system not being able to talk to each other for a bit. It's like your game still
working even if some players can't talk to each other for a moment.
CA:
You choose Consistency and Availability. This means everyone sees the same data, and
the system keeps working well. But if there's a problem with the network, your system
might slow down or stop working.
CP:
You choose Consistency and Partition Tolerance. This means everyone sees the same
data, even if some parts of the system can't talk to each other for a bit. But during that
time, your system might slow down.
AP:
You choose Availability and Partition Tolerance. This means your system keeps
working even if the network has issues. But because different parts of the system
might not have the exact same data right away, things might seem a little inconsistent
for a short time.
So, the CAP theorem helps you decide what's most important for your system when
things get tough: having everyone see the same data, keeping the system working, or
handling problems in the network. You can't have all three perfect at once, so you need
to choose what fits best for what you're building.
Reliability vs Resilience
Reliability:
┌─────────┐ ┌─────────┐
│ │ │ │
│ Up │ │ Up │
│ Server │ │ Server │
│ │ │ │
└─────────┘ └─────────┘
(🕑) (🕠)
Resilience:
┌─────────┐ ┌─────────┐
│ │ │ │
│ Failed │ │ Up │
│ Server │ -> │ Server │
│ │ │ │
└─────────┘ └─────────┘
(🕑) (🕠)
• Example: If a website is experiencing high traffic, you might add more web
servers to the system. Each server handles a portion of the incoming requests,
allowing the system to collectively handle a higher number of users.
Vertical Scaling (Scaling Up):
┌─────────┐
│ │
│ │
│ Server │
│ │
│ │
└─────────┘
▲ ▲
│ │
┌─────────┐
│ │
│ Server │
│ │
└─────────┘
What to chose?
The choice between these two approaches depends on factors like the nature of the
application, available resources, budget, and scalability goals. Horizontal scaling is
often favored in modern cloud-based and distributed systems, as it provides greater
flexibility and fault tolerance by distributing the workload across multiple machines.
A stateless system is designed in a way that each request from a client contains all the
information necessary for the server to fulfill that request. The server doesn't rely on
any previous interactions or stored data to process the current request. This means
that each request is independent, and the server doesn't maintain any memory of past
requests.
• Example: A web server that serves static content like images or CSS files is
typically stateless. It simply responds to incoming requests without requiring
any information about previous requests.
Stateful Systems
┌─────────┐ ┌─────────┐
│ │ <---- │ │
│ │ │ Request │
│ ▄▄▄▄▄▄▄ │ │ Request │
│ ▄▄▄▄▄▄▄ │ │ Request │
└─────────┘ └─────────┘
(Stateful)
A stateful system, on the other hand, maintains and relies on state information about
past interactions. It keeps track of user sessions, historical data, or other contextual
information that's relevant to fulfilling requests. Stateful systems often store this
information in databases, caches, or other persistent storage.
What to chose?
Choosing between a stateless and a stateful approach depends on the specific
requirements of the application.
Stateless systems are often preferred in modern microservices architectures for their
scalability and simplicity, while stateful systems are chosen when applications need to
maintain context or state between interactions, even though they can introduce more
complexity in terms of data management and synchronization.
Eventual consistency
┌─────────┐ ┌─────────┐ ┌─────────┐
│ │ ---> │ │ ---> │ │ │ │
│───┼─┼───│ │───┼─┼───│ │───┼─┼───│
│ │ │ │ │ │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘
This concept is particularly relevant when dealing with large-scale distributed systems
where maintaining strong consistency across all nodes in real-time can be challenging
due to factors like network delays, node failures, and high traffic. Eventual consistency
offers a trade-off between immediate consistency and system performance and
availability.
Eventual consistency can help with scalability by allowing the system to:
Reduce Latency:
Achieving strong consistency in a distributed system often requires coordination and
communication between nodes, which can introduce latency. Eventual consistency
reduces the need for constant synchronization, helping to keep response times lower.
Improve Availability:
By relaxing the requirement for immediate consistency, systems can remain
operational even when a subset of nodes experiences issues. This improves overall
system availability and fault tolerance.
It's important to note that eventual consistency might not be suitable for all types of
applications. Systems that require strong consistency guarantees for critical
operations (like financial transactions) might not benefit from the eventual consistency
model. However, many applications, such as social media feeds, recommendation
engines, and analytics platforms, can leverage eventual consistency to achieve high
levels of scalability without compromising the user experience.
Caching
┌─────────┐ ┌─────────┐ ┌─────────┐
│ │ │ Cache │ │ │
│ Client │ ---> │ │ ---> │ Server │
│ │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘
Enhanced Scalability:
Caching can be especially effective in distributed systems and microservices
architectures. By reducing the load on shared resources, caching allows each individual
service to scale more easily without putting too much strain on the backend systems.
It's important to note that caching requires careful management to ensure that cached
data remains accurate and up to date. Strategies like cache expiration, cache eviction,
and cache invalidation are used to maintain the integrity of cached data.
Load balancers can operate at different layers of the network stack, including the
transport (Layer 4) and application (Layer 7) layers. Load balancers monitor the health
and capacity of the backend servers and direct incoming requests to the least busy
server or the one with the best performance metrics.
• Example: A website receiving a large number of user requests might use a load
balancer to distribute those requests among multiple web servers. This ensures
that each server handles a manageable amount of traffic, improving response
times and preventing server overloads.
Reverse Proxy
┌─────────┐
│ Client │
└─────────┘
▲ ▲
│ │
┌─────────┐
│ Reverse │
│ Proxy │
└─────────┘
▼ ▼
┌─────────┐
│ Server │
└─────────┘
A reverse proxy is a server that sits between client devices and backend servers. It
handles incoming requests from clients, forwards those requests to appropriate
backend servers, and then returns the response to the clients. Reverse proxies often
provide additional features like caching, security, SSL termination, and content
compression.
Reverse proxies are commonly used to improve security, performance, and
manageability. They can offload tasks from backend servers, centralize access control,
and optimize network traffic by serving cached content directly to clients.
Database Scaling
There are several main techniques to scale a database, each with its own benefits,
challenges, and considerations.
Database Sharding:
┌─────────────────────┐
│ │ │
│ Shard 1 | Shard 2 │
│ │ │
└─────────────────────┘
Sharding involves partitioning the data into smaller subsets, called shards, and distributing
these shards across multiple servers. Each server holds a portion of the data, reducing the
load on individual servers and improving performance. Sharding requires careful planning to
ensure that data is evenly distributed, and query routing mechanisms are in place to direct
queries to the appropriate shard.
┌─────────┐ ┌─────────┐
│ │ │ │
│ Master ─┼──────▶│ Replica │
└─────────┘ └─────────┘
Replication involves creating copies (replicas) of the database on multiple servers.
Each replica can handle read requests, distributing the read workload and improving
performance. Load balancers distribute incoming requests across the replicas,
ensuring even distribution of traffic.
Data Partitioning:
Data partitioning involves breaking a single table into smaller partitions based on a
certain criterion (e.g., range, hash, list). Each partition resides on a separate server,
allowing for efficient data storage and retrieval. This technique can improve query
performance and maintenance.
Strong Consistency:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ │ │ │ ---> │ | | │ ---> │ │ │ │
│───┼─┼───│ │───┼─┼───│ │───┼─┼───│
│ │ │ │ │ │ │ │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘
Strong consistency ensures that all nodes in a distributed system see the same data at
the same time. This means that once a write operation is confirmed, all subsequent
read operations will return the updated data. Strong consistency provides a guarantee
that there is no time lag or discrepancy in the data seen by different nodes.
Weak consistency allows for a certain level of inconsistency in data views across
different nodes. Nodes might see different versions of data for some time after an
update, but eventually, all nodes will converge to a consistent state. Weak consistency
provides higher availability and performance compared to strong consistency but
allows for temporary inconsistencies.
• Applications where high availability and low latency are top priorities.
• Systems that can tolerate minor inconsistencies for a short duration, and where
eventual consistency is acceptable.
• Use cases like social media feeds, where immediate consistency is not critical,
and users are okay with seeing slightly different data for a brief period.
Eventual Consistency:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ │ ---> │ │ ---> │ │ │ │
│───┼─┼───│ │───┼─┼───│ │───┼─┼───│
│ │ │ │ │ │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘
Consistency Patterns
The consistency patterns you mentioned are approaches or techniques used in
distributed systems to manage data consistency, communication, and coordination
among nodes.
Each pattern addresses specific challenges and scenarios. Let's explore each one:
Full Mesh:
In a full mesh topology, every node directly communicates with every other node in the
system. This approach facilitates direct communication and data sharing between
nodes. While it provides strong connectivity, it can lead to increased network traffic
and complexity as the number of nodes grows.
A coordinator service acts as a central point of control and coordination for distributed
operations. Nodes communicate with the coordinator to perform tasks, acquire locks,
or obtain information. The coordinator orchestrates actions and ensures consistency
across the system.
Distributed Cache:
A distributed cache stores frequently accessed data in memory across multiple nodes.
This pattern improves data retrieval performance by reducing the need to fetch data
from slower storage systems like databases. Nodes can retrieve data directly from the
cache, improving response times.
When to use the distributed cache pattern:
Gossip Protocol:
The gossip protocol is a method for spreading information across nodes in a network in
a decentralized way. Nodes periodically exchange information with a few randomly
chosen peers. Over time, information spreads through the network, achieving eventual
consistency.
Replication:
Client
Master Replica
Client
Failed Replica
The backup system is kept up to date through replication. Failover ensures minimal
downtime and seamless transitions when failures occur.
Both replication and failover patterns are used to improve system availability, but they
tackle availability challenges from different angles.
Throttling can be implemented in various ways, such as limiting the number of requests
a user can make within a certain time period, delaying requests that exceed a certain
rate, or temporarily blocking access for users who violate predefined limits.
Prevent Overload:
Throttling prevents a single user or client from sending an excessive number of
requests in a short period. Without throttling, a sudden influx of requests, whether
intentional or due to a malfunction, could overwhelm the system's resources, causing
slowdowns or outages.
Stabilize Performance:
By controlling the rate of incoming requests, throttling helps maintain stable and
consistent performance. It prevents spikes in traffic that could strain resources and
cause performance degradation.
Prevent Overloads:
During periods of high traffic or sudden bursts of requests, the system may become
overloaded if it tries to process all requests simultaneously. Queue-based Load
Leveling staggers the processing of requests, preventing spikes in usage that could lead
to resource exhaustion or downtime.
Enhance Scalability:
Queue-based Load Leveling enables the system to scale more effectively. Instead of
provisioning resources to handle the highest possible load, the system can dynamically
adjust to the current traffic while maintaining availability.
Implementing Queue-based Load Leveling involves creating a buffer (queue) that holds
incoming requests and processes them in a controlled manner. Techniques like rate
limiting, throttling, and dynamic scaling can be employed to adjust the rate at which
requests are dequeued and processed based on the system's capacity and available
resources.
Health Checks
♥ ♥ ♥
Proactive Maintenance:
Health checks enable proactive maintenance. If a component is detected as unhealthy,
appropriate actions can be taken to address the issue before it leads to service
disruptions.
Fault Isolation:
When a component fails or becomes unhealthy, health checks help pinpoint the exact
component or service that is causing the problem. This makes it easier to isolate and
address the issue, reducing downtime and impact on other parts of the system.
Graceful Degradation:
Health checks can be used to implement graceful degradation strategies. If a certain
service or component is found to be unhealthy, the system can automatically adjust its
behavior to minimize the impact on users while the issue is being addressed.
Automatic Recovery:
When an unhealthy component recovers, health checks can trigger automatic
processes to bring it back into the rotation, ensuring that the system fully recovers
without manual intervention.
🛍 📦 📞
The Bulkhead Pattern is a fancy way to say that we can make our system extra tough
by separating out different parts of it. It's like how ships have different compartments
to keep water from flooding the whole thing if there's a leak.
Basically, we divide our system into different "bulkheads," each with its own job or
group of users. That way, if one part fails, it won't take down the whole system. It's like
having different backup plans to keep things running smoothly.
Isolation of Failures:
If one part of the system experiences a failure, it won't cascade to other parts of the
system, reducing the potential for widespread downtime or disruptions.
Improved Availability:
Even if one section experiences issues, other sections can continue to serve users,
enhancing the overall availability of the system.
Easier Maintenance:
Isolating sections can simplify maintenance and updates, as changes in one section are
less likely to affect others.
An example of applying the Bulkhead Pattern could be in a microservices architecture,
where each microservice is encapsulated within its own bulkhead. If one microservice
experiences a surge in traffic or a failure, it won't overload or bring down other
microservices. Instead, they remain operational, providing a degree of fault tolerance.
The e-commerce application has three separate backend services. These separate
services serve as bulkheads, isolating different functionalities.
Circuit Breaker
⚡ ⚠
⚠ 🟢
The Circuit Breaker Pattern is a design pattern used in system architecture to enhance
the resilience of a system by preventing repeated requests to a failing component or
service. It is inspired by the concept of an electrical circuit breaker that stops the flow
of electricity when there is an overload or fault to prevent damage.
In software, the Circuit Breaker Pattern serves as a mechanism to detect and manage
failures, allowing a component to temporarily "open" the circuit and stop requests to a
failing service. This prevents the system from overloading or becoming unresponsive
due to continuous requests to a component that's already struggling.
Key characteristics of the Circuit Breaker Pattern and how it enhances resilience
include:
Failure Detection:
The Circuit Breaker continuously monitors the responses from a component or
service. If a predefined threshold of failures or errors is reached, the Circuit Breaker
transitions to a "broken" state.
Fast Failures:
When the Circuit Breaker detects a broken state, it prevents further requests to the
failing component. This helps the system to quickly fail-fast, reducing the load on the
failing service.
Fallback Mechanism:
In case of a broken circuit, the Circuit Breaker can provide a fallback mechanism. This
can include using cached data, alternative services, or default responses to continue
serving user requests without waiting for the failing service to recover.
System Resilience:
By preventing continuous requests to a failing component, the Circuit Breaker helps to
protect other parts of the system from being affected by the failure.
The Exponential Backoff and Retry
⚠
⚠
Initial Retry
(Delay 1s)
⚠
⚠
Retry with
Exponential Backoff
⚠
🟢
Successful retry
after recovery
The Exponential Backoff and Retry Pattern is a design pattern used in system
architecture to enhance the resilience of a system by handling transient failures that
may occur during communication with external services or components.
This pattern is especially useful in distributed systems and when interacting with
unreliable or intermittently available resources.
Retry Mechanism:
When a request to an external service or component fails, the system automatically
retries the operation after a brief delay.
Exponential Backoff:
If the initial retry attempt fails, the system increases the delay before the next retry
exponentially. This means that subsequent retries are spaced farther apart in time.
Jitter:
To avoid synchronization and reduce the likelihood of multiple systems retrying
simultaneously, a random "jitter" is often introduced to the backoff period.
⚠
⚠ 🔄
The key idea behind the Fallback Pattern is to have a predefined fallback mechanism or
compensating transaction that can be executed if the primary operation cannot be
completed successfully. This secondary operation aims to undo or compensate for the
effects of the primary operation, ensuring that the system remains in a consistent state
despite the failure.
Graceful Degradation:
When a primary operation fails, the system can immediately trigger the fallback
mechanism to handle the situation, preventing cascading failures and maintaining
system functionality.
Data Consistency:
Compensating transactions are designed to bring the system back to a consistent
state, ensuring that any partial changes made by the primary operation are
appropriately rolled back or compensated.
Error Recovery:
The pattern offers a structured approach to handling errors, reducing the chances of
leaving the system in an undefined state after a failure.
Availability:
By providing an alternative way to achieve the same outcome, the system can remain
available and operational even if the primary operation encounters issues.
Predictable Behaviour:
Users and applications can rely on the fact that, regardless of whether the primary
operation succeeds or fails, the system will always maintain its integrity through the
use of compensating transactions.
Example use cases for the Fallback Pattern include e-commerce checkout processes,
financial transactions, and any scenario where multiple steps are involved and failure
of any step could potentially leave the system in an undesirable state.
⚠
📨 📬 ⚠
🔄 🗑
The Dead Letter Queue (DLQ) is a design pattern commonly used in messaging
systems to handle messages that cannot be successfully processed or delivered to
their intended recipients. It's a mechanism that captures messages that encounter
issues during processing, such as errors, malformed content, or unavailable recipients.
Instead of discarding these problematic messages, they are redirected to the DLQ for
further analysis and potential resolution.
How the Dead Letter Queue pattern enhances resilience:
Error Isolation:
Messages that encounter errors or failures don't disrupt the main processing flow.
They are isolated in the DLQ, preventing the entire system from being affected by a
single message's issues.
Preservation of Data:
Instead of losing valuable data due to errors, the DLQ stores problematic messages for
further investigation. This is especially useful for compliance, auditing, and debugging
purposes.
Guaranteed Delivery:
DLQs can be used to ensure that messages are not lost even in cases of temporary
communication failures.
DLQs are especially valuable in scenarios with high message volumes, asynchronous
communication, and complex workflows. They provide a safety net for the system by
allowing it to continue processing without being blocked by occasional failures, and
they provide a mechanism for addressing errors and improving system performance
over time.
design patterns
Bloom Filters
+---+---+---+---+---+---+---+---+---+---+
| 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |
+---+---+---+---+---+---+---+---+---+---+
Bloom filters are a probabilistic data structure used in computer science and system
design to quickly check whether an element is a member of a set. They provide an
efficient way to perform membership queries with low memory usage.
Distributed Systems: Bloom filters can help reduce the amount of network
traffic in distributed systems. They can be used to determine which nodes might have
certain data, avoiding unnecessary requests to nodes that definitely don't have it.
Spam Filtering: In email systems, Bloom filters can be employed to quickly
identify whether an incoming email matches known spam patterns. This helps reduce
the processing load for legitimate emails.
Big Data: Bloom filters can help optimize data processing pipelines by allowing
quick filtering of data that is likely not relevant for a particular analysis or operation.
Web Crawling: In web crawling, Bloom filters can be used to prevent revisiting
URLs that have already been crawled, saving bandwidth and time.
It's important to note that while Bloom filters provide fast membership queries, they
can sometimes produce false positives (indicating an element is present when it's not).
The probability of false positives can be controlled by adjusting the size of the filter
and the number of hash functions used. The trade-off lies between memory usage and
the acceptable level of false positives.
Consistent Hashing
n-1 0
1
2
s3 s1
s2
Node Placement: Nodes are placed on the ring using a hash function. The hash
values of nodes determine their positions on the ring.
Data Placement: Each data item is also hashed using the same hash function to
find its position on the ring. The data item is placed on the node that comes after its
position.
Finding Nodes: When a request for data comes in, the same hash function is
applied to the data's key. The data is then placed on the node that comes after the
hashed value on the ring.
Load Balancing: The distribution of data across nodes is more even, which
prevents hotspots where some nodes are overloaded while others are underutilized.
Scalability: When nodes are added or removed, only a small portion of data needs
to be remapped, minimizing data migration and disruption.
Fault Tolerance: If a node fails, its data can be quickly reassigned to the next
available node without redistributing all data.
Caching: In caching systems, consistent hashing helps determine which cache node
to use for storing or retrieving data.
Distributed Hash Tables: Consistent hashing forms the basis for distributed
hash tables (DHTs), which allow efficient decentralized data storage and retrieval.
+---+---+---+---+
Data| 1 | 2 | 3 | 4 |
+---+---+---+---+
Checksum: 10
Concept of Checksum:
A checksum is like a unique fingerprint for a piece of data. When data is sent or stored,
a checksum value is calculated using a specific algorithm, often involving mathematical
operations. This checksum value is then sent along with the actual data.
When the data reaches its destination or is retrieved from storage, the recipient
calculates the checksum of the received data using the same algorithm. If the
calculated checksum matches the received checksum, it indicates that the data is likely
intact and hasn't been corrupted during transmission or storage.
Usefulness of Checksums:
Checksums are extremely useful in distributed systems for several reasons:
Data Recovery: In case of data corruption, checksums can help identify which
portions of the data are affected, aiding in recovery efforts.
Overall, checksums play a vital role in maintaining data reliability and integrity within
distributed systems. They're a fundamental tool to ensure that the data being
processed, transmitted, or stored remains accurate and unaltered, even in the face of
potential errors or disruptions.
Merkle Trees
Top hash
hash(ABCD)
Leaf Nodes: The original data is divided into fixed-size chunks or blocks. Each block
is hashed individually to create leaf nodes of the tree.
Intermediate Nodes: The leaf nodes are then paired and hashed together to create
intermediate nodes. This process continues, pairing and hashing nodes at each level,
until only a single root hash remains.
Root Hash: The root hash is a condensed representation of the entire dataset. Any
change in the data, no matter how small, will result in a completely different root hash.
Data Integrity Verification: By comparing root hashes, parties can quickly verify
whether a large dataset has been tampered with or modified.
Digital Signatures: Merkle Trees are used to create digital signatures that prove
the authenticity of a document without revealing its content.
Efficient Proofs: Merkle Trees allow the creation of succinct proofs that a specific
piece of data is included in a larger dataset without needing to share the entire dataset.
Overall, Merkle Trees are a powerful tool for maintaining data integrity, facilitating
efficient data verification, and enabling secure communication in various distributed
systems. They provide a way to efficiently prove the authenticity of data while
minimizing the need to transmit or store large amounts of information.
Quorum
⚠
👍
Agree
👍 👎
Agree Disagree
For example, in a distributed database or storage system, a read quorum might require
that data is read from a minimum number of nodes before returning a result to ensure
consistency. Similarly, a write quorum might require that data is written to a certain
number of nodes before considering a write operation successful.
Usefulness of Quorums:
Quorums provide several benefits in distributed systems:
Consistency: Quorums help maintain data consistency by ensuring that a sufficient
number of nodes agree on the state of the system. This prevents situations where
nodes have conflicting views of the data.
Load Balancer:
Health Monitoring: Load balancers regularly check the health of backend servers,
removing or redirecting traffic from unhealthy servers.
Reverse Proxy:
┌─────────┐
│ Client │
└─────────┘
▲ ▲
│ │
┌─────────┐
│ Reverse │
│ Proxy │
└─────────┘
▼ ▼
┌─────────┐
│ Server │
└─────────┘
Security: Reverse proxies shield backend servers from direct client access, providing
an extra layer of security by filtering out malicious traffic and protecting sensitive data.
Caching: Reverse proxies can cache static content, such as images and files, reducing
server load and improving response times for frequently accessed resources.
SSL Termination: Reverse proxies can handle SSL encryption and decryption,
offloading this resource-intensive task from backend servers.
URL Routing and Rewriting: Reverse proxies can route requests based on URL
patterns and rewrite URLs for cleaner, user-friendly paths.
Comparison:
In summary, while both Load Balancers and Reverse Proxies contribute to better
system performance and security, they focus on different aspects of optimization:
In many real-world scenarios, Load Balancers and Reverse Proxies are used together,
with the Load Balancer distributing traffic among multiple backend servers, and the
Reverse Proxy providing additional security, caching, and optimization features.
Least Connections: Servers with the fewest active connections receive new
requests. This algorithm helps to distribute traffic more evenly based on the current
load of each server.
Weighted Round Robin: Similar to Round Robin, but servers are assigned
different weights based on their capacity. Servers with higher weights receive more
traffic, which is useful for balancing servers with varying capabilities.
IP Hash: The IP address of the client is used to determine which server to send the
request to. This is useful for ensuring that requests from the same client consistently
go to the same server.
Least Response Time: Servers with the lowest response time to the client receive
new requests. This algorithm aims to direct traffic to the server that can respond most
quickly.
Different load balancing algorithms have their strengths and weaknesses. The choice
of algorithm depends on factors such as the architecture of the application, the
characteristics of the backend servers, and the desired behavior in terms of traffic
distribution and server load.
Layer 4 Load Balancing vs Layer 7 Load
Balancing
+---------------------------+
| Application |
+---------------------------+
| Presentation |
+---------------------------+
| Session |
+---------------------------+
| Transport |
+---------------------------+
| Network |
+---------------------------+
| Data Link |
+---------------------------+
| Physical |
+---------------------------+
Layer 4 Load Balancer and Layer 7 Load Balancer are two types of load balancers used
in system design to distribute incoming network traffic to backend servers. They
operate at different layers of the network stack and offer distinct functionalities:
Network-Level Load Balancing: They are well-suited for scenarios where the
backend servers host similar applications or services that can be identified by IP and
port.
Fast Routing Decisions: Layer 4 load balancers make routing decisions quickly
because they don't analyze application-level content.
Content Inspection: They can inspect HTTP headers, cookies, and other
application-level data to determine the appropriate backend server.
Load Balancing Strategies: Layer 7 load balancers can employ more complex
load balancing algorithms that take into account server health, response times, and
user sessions.
In summary, the primary difference between Layer 4 and Layer 7 Load Balancers lies in
the layer of the network stack at which they operate and the type of information they
use to make routing decisions. Layer 4 Load Balancers focus on network-level
information, while Layer 7 Load Balancers analyze application-level content for more
sophisticated and context-aware routing decisions. The choice between the two
depends on the specific requirements of the application and the desired level of traffic
distribution granularity.
The Domain Name System (DNS) is a critical component of the internet infrastructure
that translates human-readable domain names (like www.example.com) into IP
addresses (such as 192.0.2.1) that computers use to identify and communicate with
each other. DNS serves as a distributed, hierarchical, and highly available naming
system, providing a way to locate resources on the internet.
Name Resolution: DNS allows users to access websites and services using easily
memorable domain names instead of numeric IP addresses. It simplifies the process of
accessing online resources.
Load Distribution: DNS can be used for load distribution and balancing by
distributing incoming traffic among multiple IP addresses associated with a single
domain name.
Reverse DNS Lookup: DNS also supports reverse lookup, which allows you to
find the domain name associated with a given IP address. This is useful for network
diagnostics and security purposes.
Privacy and Anonymity: Clients can use a forward proxy to access resources on
the internet without revealing their own IP addresses. This can help protect user
privacy and anonymity.
Content Filtering and Access Control: Forward proxies can enforce content
filtering policies by blocking or allowing access to specific websites or content
categories. They can also restrict access to certain domains or URLs, improving
security and compliance.
Bandwidth Savings: By caching and serving resources locally, forward proxies can
reduce the amount of data transferred between clients and servers, leading to reduced
bandwidth usage and faster loading times.
Access to Restricted Content: Forward proxies can be used to access resources
that might be geographically restricted, allowing clients to bypass content restrictions
imposed by websites.
Security and Anomaly Detection: Forward proxies can analyze incoming and
outgoing traffic for security threats, helping to detect and prevent malicious activities.
They can also log and analyze network traffic patterns for anomaly detection.
🇮🇳 🇺🇸 🇧🇷
The primary purpose of a CDN is to improve the delivery of web content, such as
images, videos, stylesheets, and scripts, to users by reducing latency, increasing
availability, and optimizing content delivery. CDNs work by caching and delivering
content from the server location that is closest to the user's geographic location.
Latency Reduction: CDNs reduce the distance between the user and the server,
which minimizes the latency and improves the loading speed of web content.
Content Caching: CDNs cache copies of content in various server locations. When
a user requests a resource, the CDN serves the cached copy, reducing the load on the
origin server and improving response times.
Distributed Network: CDNs have servers in multiple regions and data centers,
providing redundancy and increased availability. This is particularly useful in mitigating
the impact of server failures or network outages.
Pull CDN:
• In a Pull CDN, the CDN servers retrieve content from the origin server on-
demand when a user requests that content.
• When a user requests a resource, the CDN server fetches the content from the
origin server, caches it, and serves it to the user.
• Benefits include reduced load on the origin server and efficient utilization of
storage on the CDN servers.
Push CDN:
• In a Push CDN, content is manually uploaded or pushed to the CDN servers in
advance of user requests.
• The content is uploaded to the CDN servers' cache proactively, regardless of
whether users have requested it yet.
• Push CDNs are suitable for static content that doesn't change frequently.
• Benefits include immediate content availability and control over when content
is distributed to the CDN.
In summary, CDNs play a crucial role in system design by optimizing content delivery,
reducing latency, improving availability, and enhancing the overall user experience.
They can significantly improve the performance and scalability of websites and
applications by leveraging a distributed network of servers strategically located across
the globe. The choice between Pull and Push CDNs depends on the nature of the
content, update frequency, and the desired level of control over content distribution.
Application Layer
N-tier applications
+-------------------------+
| |
| Presentation Layer |
| |
+-------------------------+
| |
| Application Layer |
| |
+-------------------------+
| |
| Business Logic Layer |
| |
+-------------------------+
| |
| Data Access Layer |
| |
+-------------------------+
| |
| Integration Layer |
| |
+-------------------------+
Modularity: The separation of concerns into distinct tiers makes the application
more modular and easier to develop, test, and maintain. Changes in one tier have
minimal impact on others.
Maintenance: Changes, updates, and enhancements can be made more easily due to
the clear separation of functionality into distinct tiers.
Example:
Let's say a user visits an online shopping website to purchase items. Here's how the n-
tier architecture would be applied:
Presentation Tier (Client): The user opens a web browser and interacts with the
website's user interface. They browse through products, view product details, and add
items to their shopping cart.
Application Tier (Business Logic): The application tier receives requests from the
presentation tier. It processes user actions, such as adding items to the cart, calculating
the total price, and validating user inputs. It communicates with the services tier for
tasks like authenticating users and processing payments.
Services Tier: The services tier provides authentication services to verify user
credentials. It also handles payment processing by interfacing with external payment
gateways.
Data Tier (Database): The data tier stores information about products, user accounts,
and orders. It retrieves product details, updates cart contents, and records order
history.
Microservices
+------------------------+
| Microservice A |
| +--------------+ |
| | | |
| | Func A | |
| | | |
| +--------------+ |
| |
+------------------------+
+------------------------+
| Microservice B |
| +--------------+ |
| | | |
| | Func B | |
| | | |
| +--------------+ |
| |
+------------------------+
+------------------------+
| Microservice C |
| +--------------+ |
| | | |
| | Func C | |
| | | |
| +--------------+ |
| |
+------------------------+
Each microservice is responsible for a specific business capability or function, and they
can be developed, deployed, and scaled independently. This approach contrasts with
monolithic architectures where the entire application is tightly integrated into a single
codebase.
Polyglot Persistence: Each microservice can use its own data storage technology,
allowing for the best fit for different data storage requirements.
Challenges:
+------------------------+
| Service Registry |
| |
| +------+ +------+ |
| | Svc1 | | Svc2 | |
| | | | | |
| | | | | |
| +------+ +------+ |
| |
+------------------------+
|
| Discover
|
+------------------------+
| Service |
| Consumer |
| |
+------------------------+
It helps applications and services discover the network addresses and locations of
other services they need to interact with.
Some of the most popular systems for service discovery are: etcd, Consul, ZooKeeper,
Kubernetes Native Service Discovery: etc.
Decoupling: Services can interact without needing to know the specific network
addresses or locations of each other, promoting loose coupling.
Resilience: In the face of service failures or outages, Service Discovery can help
reroute traffic to healthy instances of the service.
API Gateway
+------------------------+
| API Gateway |
| |
| +-------------+ |
| | Service | |
| | Routing | |
| | and Proxy | |
| | Logic | |
| +-------------+ |
| |
+------------------------+
| | |
| | |
+---|-------|-------|---+
| | | | |
| | | | |
| +---+ +---+ +---+ |
| | S1| | S2| | S3| |
| | | | | | | |
| +---+ +---+ +---+ |
| |
+-----------------------+
In system design, an API Gateway is a server or service that acts as an entry point for
client requests to a microservices architecture or distributed system. It provides a
centralized point of control and management for APIs (Application Programming
Interfaces) offered by various services.
The API Gateway handles tasks such as routing, authentication, security, load
balancing, and protocol translation, streamlining the communication between clients
and the underlying services.
Centralized Entry Point: The API Gateway serves as a single entry point for client
applications to access various APIs provided by different microservices.
Routing and Load Balancing: The API Gateway routes incoming requests to the
appropriate microservices based on the requested endpoint. It can also distribute
incoming traffic across multiple instances of the same microservice to ensure load
balancing.
Authentication and Authorization: The API Gateway handles user
authentication and authorization, enforcing security policies and ensuring only
authorized users access the APIs.
Protocol Translation: The API Gateway can translate requests and responses
between different protocols and formats, allowing clients and services to communicate
in their preferred formats.
Aggregation: The API Gateway can aggregate data from multiple services to fulfill a
single client request, reducing the number of calls the client needs to make.
Mobile and Web Applications: API Gateways simplify the interaction between
mobile apps, web clients, and backend services.
Example:
Consider an e-commerce platform with various microservices for product catalog, user
management, and order processing.
The API Gateway receives incoming requests from clients, such as mobile apps and
web browsers. It authenticates users, routes requests to the appropriate microservices
(e.g., retrieving product information or placing orders), and aggregates data if needed.
The API Gateway also handles rate limiting and caching to improve performance.
+----------------------------------+
| |
| Event Producer |
| |
+----------------------------------+
|
| Publishes Events
|
v
+----------------------------------+
| |
| Event Broker |
| |
+----------------------------------+
|
| Routes Events
|
v
+----------------------------------+
| |
| Event Consumers |
| |
+----------------------------------+
Modularity: Components are isolated, making it easier to develop, test, and maintain
individual parts of the system.
Event Sourcing: EDA can be used for event sourcing, where the system's state is
derived from a sequence of events.
Real-Time Analytics: Events can trigger real-time data analysis and reporting.
Example:
Consider an e-commerce platform. When a user places an order, an "Order Placed"
event is generated.
This event can trigger various actions, such as updating inventory, sending a
confirmation email, and initiating payment processing. Different services in the system
can subscribe to this event and react accordingly, without needing to be tightly
coupled.
+----------------------------------+
| |
| Command Handlers |
| |
+----------------------------------+
|
| Generates Events
|
v
+----------------------------------+
| |
| Event Store |
| |
+----------------------------------+
|
| Stores Events
|
v
+----------------------------------+
| |
| Event Handlers |
| |
+----------------------------------+
|
| Processes Events
|
v
+----------------------------------+
| |
| Aggregate State |
| |
+----------------------------------+
In system design, Event Sourcing is a pattern that involves capturing changes in the
state of an application as a sequence of events.
Instead of storing the current state of an object or system, Event Sourcing maintains a
log of events that have occurred over time.
These events are used to reconstruct the current state at any point in time. This
pattern offers several benefits and is particularly useful in scenarios where data
history, audit trails, and complex state transitions are important.
Key Concepts of Event Sourcing:
Events: Events represent significant occurrences or state changes in the system. Each
event is a record of what happened, including the type of event and the relevant data.
Event Log: Events are stored in an event log or event store. The event log serves as
the source of truth and can be used to reconstruct the current state of the system.
Immutable: Events are immutable once recorded. They cannot be modified, which
ensures data integrity and traceability.
Historical Audit: Event Sourcing provides a complete audit trail of all changes and
actions taken in the system, making it suitable for compliance and debugging.
Flexible Queries: Data can be restructured and optimized for various queries,
enabling better performance for specific use cases.
Gaming: Storing game events, player interactions, and progress for multiplayer and
single-player games.
Example:
Consider an e-commerce application using Event Sourcing. Instead of directly updating
the stock quantity when an order is placed, the system records an "Order Placed"
event with relevant data.
This event is stored in the event log. Subsequent events, such as "Order Shipped" and
"Order Cancelled," are also recorded. The current stock quantity can be derived by
replaying these events and calculating the resulting stock changes.
+----------------------------------+
| |
| Command Side |
| |
+----------------------------------+
|
| Sends Commands
|
v
+----------------------------------+
| |
| Command Handlers |
| |
+----------------------------------+
|
| Processes Commands
|
v
+----------------------------------+
| |
| Event Store |
| |
+----------------------------------+
Command Model: The command model handles write operations, such as creating,
updating, or deleting data. It focuses on enforcing business rules and maintaining data
consistency.
Query Model: The query model is optimized for read operations, providing a
denormalized view of the data that is tailored to specific query requirements. It aims to
deliver optimal performance for querying data.
Data Duplication: CQRS often involves duplicating data between the command
and query models. This allows each model to be designed and optimized
independently.
Scalability: The command and query models can be scaled independently based on
their respective workloads.
Flexibility: The query model can be adapted and denormalized to meet the specific
needs of various read scenarios, improving responsiveness.
Support for Event Sourcing: CQRS aligns well with Event Sourcing, as events
generated by write operations can be captured and processed by the command model.
Real-Time Analytics: CQRS can optimize the query model to support real-time
analytics and reporting.
E-Commerce: Separating read and write operations can improve the performance
of product catalog browsing while maintaining data consistency in the order
processing.
Collaborative Systems: In systems with heavy read and write workloads, CQRS
can balance the demands on different parts of the application.
Example:
Consider a banking application using CQRS. When a customer makes a fund transfer (a
write operation), the command model validates the transaction, debits the source
account, and credits the destination account. The event generated by this transaction
is captured and processed asynchronously. The query model, responsible for
displaying account balances, is updated with the new balance.
Saga Pattern
+----------------------------------+
| |
| Saga Coordinator |
| |
+----------------------------------+
|
| Orchestrates Sagas
|
v
+----------------------------------+
| |
| Saga Participants |
| |
+----------------------------------+
/ | | \
/ | | \
/ | | \
| | | |
| +---------+ +---------+ +---------+
| | Step 1 | | Step 2 | | Step 3 |
| | | | | | |
| +---------+ +---------+ +---------+
| | | |
\ | | /
\ | | /
\ | | /
+----------------------------------+
| |
| External Systems or Events |
| |
+----------------------------------+
In system design, the Saga pattern is a technique used to manage long-running and
distributed transactions in a way that ensures data consistency across multiple
services or components. Traditional ACID transactions might not be suitable for
distributed systems due to their inherent limitations.
The Saga pattern addresses this challenge by breaking down a complex transaction
into a series of smaller, localized transactions known as "saga steps."
Each step represents a single operation and is associated with its own compensating
action in case of failures.
Saga Steps: A saga is composed of multiple saga steps, each representing a single
transactional operation. Each step has a corresponding compensating action that can
undo the effects of the operation.
Atomicity and Consistency: The saga ensures that each individual step is atomic
(it either fully completes or fully compensates) and that the overall saga maintains data
consistency.
Data Consistency: Saga ensures that the overall system remains consistent despite
failures during the transaction process.
Decoupling: Saga reduces tight coupling between services by orchestrating
interactions through events.
Travel Booking: Coordinating flights, hotels, and car rentals when a user makes a
travel booking.
Example:
Consider an e-commerce application where a user places an order.
The saga involves several steps, including deducting payment, updating inventory, and
sending order confirmation. If any step fails, the saga orchestrator triggers
compensating actions, such as refunding payment or restocking inventory. This
ensures that even if one step fails, the system remains consistent.
Push vs Pull
In system design, "Push" and "Pull" are two contrasting approaches to data or event
distribution between components, services, or systems. These approaches determine
how information is transmitted and when it's initiated. Each approach has its own
advantages and use cases:
Push Approach: In the "Push" approach, the sender actively pushes data or events to
one or more recipients without waiting for the recipients to request or poll for the
information. When new data or an event becomes available, the sender initiates the
communication and delivers it to the recipients.
Advantages of Push:
+----------------------+
| Data Source |
| |
| +-------+ |
| | Data | |
| | Event | |
| +-------+ |
| | |
| | |
| v |
| +-------+ |
| | Push | |
| | Data | |
| +-------+ |
+----------------------+
Lower Latency: Push can reduce communication latency since the information is
delivered as soon as it's available.
Advantages of Pull:
+----------------------+
| Data Source |
| |
| +-------+ |
| | Data | |
| | Store| |
| +-------+ |
| | |
| | |
| v |
| +-------+ |
| | Pull | |
| | Data | |
| +-------+ |
+----------------------+
Controlled Consumption: Recipients can control when and how often they
retrieve data, preventing information overload.
Easier Load Management: Since recipients initiate requests, load spikes on the
sender's side can be managed better.
Choosing Between Push and Pull: The choice between push and pull depends on the
specific requirements of your system and use case:
• Push is suitable for real-time or event-driven scenarios where immediate
updates are essential.
• Pull is useful when recipients need to manage their own consumption rate, or
when data updates are less frequent.
In many cases, a combination of push and pull approaches can be used to achieve the
desired behavior. For example, a system might use push for real-time updates and pull
for less time-sensitive data retrieval.
caching
Caching Layers
In system design, various caching mechanisms are employed at different layers of a
software architecture to improve performance, reduce latency, and optimize resource
utilization.
Let's explore the differences between client caching, CDN caching, web server caching,
database caching, and application caching, along with their respective benefits:
Client Caching:
Client caching occurs on the user's device (browser, mobile app) and involves storing
copies of frequently accessed resources locally. These resources include HTML, CSS,
JavaScript, images, and other media files.
Useful For: Reducing page load times for returning users, minimizing server load, and
improving user experience.
Benefits: Faster load times for returning users, reduced bandwidth usage, and
decreased server load.
Useful For: Accelerating content delivery to users across different regions, improving
global performance, and handling traffic spikes.
Database Caching:
Database caching involves storing frequently accessed database query results in
memory to reduce the need for repeated database queries. It's particularly beneficial
for read-heavy workloads.
Useful For: Optimizing database performance and reducing query response times for
read operations.
Benefits: Faster query execution, reduced database load, improved scalability, and
enhanced application responsiveness.
Application Caching:
Benefits: Faster response times, reduced resource consumption, and improved overall
application efficiency.
Key Takeaways:
• Client Caching improves user experience by storing resources on the user's
device.
• CDN Caching enhances content delivery by replicating content across a
network of servers.
• Web Server Caching optimizes response times by serving cached content
directly from the web server.
• Database Caching speeds up query execution by storing frequently accessed
data in memory.
• Application Caching improves application performance by caching computed
results.
Caching Strategies
Refresh Ahead (Read-Ahead) Caching:
Refresh ahead caching involves proactively fetching data from the backend or storage
before it's requested by users. This strategy aims to reduce the latency associated with
retrieving data on demand.
Useful For: Improving read performance by anticipating user needs and minimizing
the delay caused by fetching data.
Benefits: Reduced read latency, improved user experience, and efficient utilization of
cache.
Useful For: Enhancing write performance by reducing the immediate impact of write
operations on the backend.
Useful For: Maintaining data consistency and integrity between the cache and the
backend storage.
Useful For: Providing fine-grained control over cached data and minimizing the risk of
stale data.
Benefits: Control over cache management, reduced risk of stale data, and efficient
utilization of cache.
Key Takeaways:
• Refresh Ahead Caching proactively fetches data before it's requested to
reduce read latency.
• Write Behind Caching defers write operations to the cache and
asynchronously updates the backend.
• Write-Through Caching maintains data consistency by immediately updating
both the cache and the backend.
• Cache-Aside Caching allows the application to manage cache interactions
directly.
In summary, these caching strategies offer different ways to optimize data access and
improve performance. The choice of strategy depends on the specific requirements of
the application, the trade-offs between data consistency and performance, and the
data access patterns.
Cache Cleanup
In system design, both cache invalidation and cache eviction are strategies used to
manage and optimize the behavior of caches. They serve different purposes and are
used in different situations:
Cache Invalidation:
Purpose: Cache invalidation is a strategy used to remove specific items or entries from
the cache when the data they represent becomes outdated or no longer valid.
Usage: Cache invalidation is typically used when you have a way to detect changes in
the underlying data source. When the data in the source changes, you invalidate the
corresponding cached item(s) to ensure that future requests will fetch fresh, up-to-
date data.
Use Cases: It's especially useful in scenarios where data changes frequently and you
want to maintain cache consistency. For example, in a web application, you might
invalidate the cache for a user's profile when they update their information.
Advantages:
Cache Eviction:
Purpose: Cache eviction is a strategy used to remove items or entries from the cache
to make room for new data when the cache becomes full.
Usage: Cache eviction is employed when you have a limited amount of memory or
storage allocated for caching. When the cache reaches its capacity, you need to decide
which items to remove to accommodate new data.
Use Cases: It's particularly useful in scenarios where you need to balance between
maximizing cache hits (serving data from the cache) and minimizing cache misses
(removing the least valuable items when the cache is full).
Advantages:
In practice, both strategies are often used in combination to create an efficient and
well-behaved caching system. Cache invalidation ensures data consistency, while
cache eviction manages the cache's resource consumption. The choice of which
strategy or combination to use depends on the specific requirements and
characteristics of your system.
DATABASES
Normalization vs Denormalization
Normalization and denormalization are two opposing database design techniques used
to optimize database schemas for different purposes. They have distinct advantages
and trade-offs in terms of data integrity, storage efficiency, and query performance:
Normalization:
Table A Table B
Table C
Purpose:
Use Cases:
Advantages:
Easier Maintenance: Changes to data are easier to manage since data is stored
in a structured and modular way.
Drawbacks:
Denormalization:
Denormalized Table
Purpose:
Use Cases:
Denormalization is often used in data warehousing, reporting, and analytics
databases where read operations are frequent, and query performance is a top
priority.
Advantages:
Drawbacks:
Storage Overhead: Redundant data occupies more storage space, which can be
a concern for large datasets.
Use Cases:
NoSQL Databases
In the context of database systems, different types of databases are designed to handle
various data models and use cases. Here's a comparison of key-value stores, document
stores, wide-column stores, and graph databases, along with their respective use cases
and advantages:
Key-Value Store:
Data Model:
Stores data as a collection of key-value pairs, where each key is unique and
corresponds to a value.
Use Cases:
Caching: Ideal for caching frequently accessed data due to fast read and write
operations.
Advantages:
High Performance: Extremely fast read and write operations, especially when
data access is predictable.
Data Model:
Use Cases:
Advantages:
Organizes data into columns rather than rows, with a focus on optimizing read-
heavy workloads.
Use Cases:
Time-Series Data: Suitable for storing time-series data like logs and sensor
data.
Advantages:
Scalability: Scales well for analytical use cases with large datasets.
Graph Database:
Data Model:
Represents data as nodes, edges, and properties, allowing for efficient traversal
of relationships.
Use Cases:
Advantages:
Use Cases:
• Choose a Key-Value Store for simple data retrieval and caching scenarios
where read and write performance are critical, and the data structure is
relatively straightforward.
• Opt for a Document Store when dealing with semi-structured data, dynamic
schemas, and applications where flexibility in data representation is required.
• Consider a Wide-Column Store when handling large volumes of data, especially
for analytics or time-series data, where read performance and horizontal
scalability are essential.
• Select a Graph Database when your data involves complex relationships, and
you need efficient traversal of these relationships to answer queries or make
recommendations.
ACID vs BASE
ACID and BASE are two contrasting sets of properties that define different
approaches to data consistency and system behavior in distributed database systems:
ACID properties are designed to ensure strong data consistency and reliability
in database transactions.
Atomicity:
Consistency:
Guarantees that a transaction brings the database from one consistent state to
another, preserving integrity constraints.
Isolation:
Ensures that concurrent transactions do not interfere with each other and that
they appear to execute serially, even when running concurrently.
Durability:
Ensures that once a transaction is committed, its changes are permanent and
will survive system failures.
Basically Available:
The system remains available for read and write operations, even in the
presence of network partitions or failures.
Soft state:
Eventually Consistent:
Over time, the system will converge to a consistent state as all changes
propagate and conflicts are resolved.
Use Cases:
• ACID is suitable for applications where data integrity and reliability are of
utmost importance, such as financial systems, healthcare databases, and
scenarios where correctness and consistency are non-negotiable.
• BASE is useful in distributed systems where high availability and fault tolerance
are critical, and some degree of temporary inconsistency can be tolerated. It is
commonly used in NoSQL databases, content delivery networks, and systems
that prioritize scalability and responsiveness over strict consistency.
Key Differences:
• ACID prioritizes strong consistency and data integrity, while BASE prioritizes
availability and fault tolerance.
• ACID guarantees immediate consistency and durability, whereas BASE allows
for temporary inconsistencies that will eventually be resolved.
• ACID is often used in traditional relational databases, while BASE is commonly
associated with NoSQL and distributed databases.
The choice between ACID and BASE depends on the specific requirements of the
system and the trade-offs that can be made. In practice, some systems may use a
combination of both approaches, applying ACID principles where strong consistency is
necessary and BASE principles where high availability and scalability are required.
Replication
Master
Replica Replica
High Availability:
Load Balancing:
Disaster Recovery:
Geographic Distribution:
Reduces latency for users in different regions by placing data closer to them.
Read Scaling:
Offline Processing:
Supports data analysis, reporting, and other offline tasks without impacting the
primary database.
Sharding:
+----------------------------------+
| Sharded Database Instance |
| |
| +---------+ +---------+ |
| | Shard 1 | | Shard 2 | |
| | | | | |
| +---------+ +---------+ |
| |
+----------------------------------+
+----------------------------------+
| Sharded Database Instance |
| |
| +---------+ +---------+ |
| | Shard 3 | | Shard 4 | |
| | | | | |
| +---------+ +---------+ |
| |
+----------------------------------+
Data Isolation: Sharding can isolate different types of data, customer data, or
data from different geographical regions.
Advantages:
Partitioning:
+-------------------------------------+
| Partitioned Table |
| |
| +-----------+ +-----------+ |
| | Partition 1 | Partition 2 | |
| | | | |
| +-----------+ +-----------+ |
| |
+-------------------------------------+
Use Cases:
Data Distribution: It can also be used to distribute data across storage devices
or file systems for efficiency.
Advantages:
Key Differences:
Scope: Sharding typically involves distributing data across multiple independent
databases or database clusters, while partitioning operates within a single database.
Use Cases: Sharding is primarily used for achieving horizontal scalability and data
isolation in distributed systems, while partitioning is used for organizing data within a
single database for improved data management and query optimization.
Use Cases:
Use sharding when you need to horizontally scale your database across multiple
database instances to handle large data volumes and high traffic loads in a distributed
environment.
Use partitioning when you want to organize and optimize data within a single
database, making it easier to manage, improve query performance, and facilitate
maintenance tasks.
Both sharding and partitioning are valuable techniques for managing data in
databases, and the choice between them depends on your specific scalability and data
management requirements. In some cases, you may even use both techniques in
combination to achieve the desired results.
Data Modeling Factors
When starting to create a data model for an application or when designing one in an
interview, consider the following important factors:
Normalization:
Data Integrity:
Enforce data integrity constraints, such as primary keys, foreign keys, unique
constraints, and check constraints, to maintain data accuracy and consistency.
Indexes:
Partitioning:
Denormalization:
Concurrency Control:
Consider how the database will handle concurrent access and implement
appropriate locking or isolation mechanisms.
Scalability:
Plan for future scalability by designing a schema that can accommodate
increased data volume and user load.
Joins:
Consider how the data model will handle write-intensive operations and
implement strategies for minimizing contention.
Remember that designing an effective data model requires a balance between various
factors, including data integrity, query performance, and scalability, and it often
involves trade-offs. Clear communication and justifications for your design choices are
crucial in both real-world projects and interview scenarios.
Storage
Block storage, file storage, object storage, and redundant disk arrays are different
storage technologies, each with its own characteristics and use cases:
Block Storage:
+------------------+
| Block 1 |
| (Data) |
+------------------+
| Block 2 |
| (Data) |
+------------------+
| Block 3 |
| (Data) |
+------------------+
| Block 4 |
| (Data) |
+------------------+
Data Structure:
Use Cases:
Used in storage area networks (SANs) and cloud computing to provide high-
performance, low-latency storage for applications and databases.
Ideal for scenarios where precise control over data placement and direct access
to individual blocks is required.
Advantages:
Low-Level Access: Allows direct access to storage blocks, making it suitable for
databases and virtualization.
File Storage:
+------------------+
| Directory 1 |
| +-------------+
| | File 1 |
| | File 2 |
| +-------------+
+------------------+
| Directory 2 |
| +-------------+
| | File 3 |
| | Directory 3|
| | +---------+
| | | File 4 |
| | +---------+
| +-------------+
+------------------+
Data Structure:
Use Cases:
Suitable for shared file access and data organization where files need to be
accessed concurrently by multiple users or applications.
Advantages:
File-Level Access: Provides a familiar file system interface for organizing and
accessing data.
+--+
| |
| |
+--+
+-----+
| |
| |
+-----+
/\
/ \
/____\
Data Structure:
Stores data as objects, each with its metadata and a unique identifier.
Use Cases:
Frequently used in cloud storage, content delivery, and web applications for
storing and serving unstructured data like images, videos, and documents.
Advantages:
Commonly used in servers and storage systems to enhance data availability and
reliability.
Advantages:
Data Redundancy: Protects against data loss by duplicating data across multiple
drives.
Performance: Some RAID levels can improve read and write performance.
File Storage is beneficial when multiple users or applications need to share and
organize data using a hierarchical file system structure, as seen in traditional file
servers and NAS devices.
Object Storage is valuable for storing and serving large volumes of unstructured data
on a global scale, making it suitable for cloud storage and content delivery.
Redundant Disk Arrays (RAID) are essential for enhancing data availability and
reliability in storage systems. Different RAID levels provide various trade-offs
between redundancy and performance.
Asynchronism
Idempotent operations
+-------------------------+
| |
| Operation |
| |
+-------------------------+
|
| Performs
|
+-------------------------+
| |
| System State |
| |
+-------------------------+
Idempotent APIs:
Idempotence is a desirable property for public APIs and web services. It allows
clients to safely repeat requests without fear of unintended consequences. For
example, HTTP methods like GET and PUT are idempotent.
Retry Mechanisms:
Idempotence simplifies the implementation of retry mechanisms in systems.
When an operation fails or times out, it can be retried without worrying about
the state or result of previous attempts.
Memoization:
Idempotent operations are suitable for caching and memoization. When a result
is calculated once, it can be cached and reused for subsequent requests with the
same inputs.
Financial Transactions:
In summary, the concept of idempotent operations is essential for building reliable and
fault-tolerant systems. It simplifies error handling, enables retries, and ensures that
the system remains in a consistent and predictable state, even in the face of network
failures or other unforeseen issues.
Idempotence is a valuable property for designing systems that can handle unexpected
events gracefully.
Message Queues
Loose Coupling:
Scalability:
Resilience:
Messages are stored in the queue until they are processed. This enables
systems to handle temporary spikes in traffic or service failures without losing
data.
Event-Driven Architecture:
Cross-System Integration:
Message Transformation:
Event Sourcing:
Batch Processing:
Message Queues are suitable for handling batch processing jobs, where data
processing can be deferred and optimized.
Task Queues:
Message Queues can manage task queues, distributing tasks to workers and
ensuring efficient utilization of resources.
Example:
Consider an e-commerce application. When a customer places an order, the order
service can publish a message to a Message Queue. The payment service subscribes to
the queue and processes payment for the order. This decouples the order and payment
processes, allowing them to scale independently and ensuring that orders are not lost
even if the payment service experiences downtime.
In summary, Message Queues provide an efficient and reliable way for distributed
components to communicate asynchronously, promoting loose coupling, scalability,
and resilience in various system design scenarios.
Reactive programming
+------------------------+
| |
| Reactive Programming |
| |
+------------------------+
| | |
| | |
+--------v---v---v--------+
| Data Streams |
| (Observable, Stream) |
+--------------------------+
| | |
| | |
+--------v---v---v--------+
| Operators |
| (Transformations) |
+--------------------------+
It's particularly useful for building systems that need to react to and process real-time
events, user interactions, or data from various sources. Here's a closer look at the
concept and its utility:
Observer Pattern:
Backpressure Handling:
In scenarios where data streams can produce data faster than they can be
consumed, reactive programming frameworks often provide mechanisms for
handling backpressure, ensuring that data is processed without overwhelming
the system.
In web and mobile app development, reactive frameworks like React, Angular,
and Vue.js allow for the creation of responsive and interactive user interfaces
that update in real-time based on user actions and data changes.
Scalability:
Reactive systems can be designed to scale horizontally to handle high loads and
increased data volumes. They are often used in distributed and microservices
architectures.
+-------------------------+
| |
| Data Producer |
| |
+-------------------------+
| | |
| | |
+---------v---v---v-------+
| Data Channel |
| (e.g., Queue, Stream) |
+-------------------------+
| | |
| | |
+---------v---v---v-------+
| Data Processor |
| (e.g., Consumer) |
+-------------------------+
Purpose:
Back pressure ensures that the system remains within its capacity limits,
preventing data loss, resource exhaustion, and system instability. It allows the
system to adapt to fluctuations in workloads.
Rate Limiting:
Dynamic Scaling:
Client Server
| |
| Request |
|-------------->|
| |
| Response |
|<--------------|
| |
HTTP (Hypertext Transfer Protocol) and REST (Representational State Transfer) are
fundamental concepts in web-based system design. Let's explore what they are, how
they are implemented, and their utility:
HTTP is an application layer protocol used for transmitting and receiving data
over the World Wide Web. It serves as the foundation of communication
between clients (such as web browsers) and web servers.
Implementation:
Request-Response:
Methods:
HTTP defines various request methods, including GET (retrieve data), POST
(create data), PUT (update data), DELETE (remove data), and more. These
methods determine the action to be performed on a resource.
Status Codes:
HTTP responses include status codes, such as 200 (OK), 404 (Not Found), and
500 (Internal Server Error), to indicate the outcome of a request.
Stateless:
Use Cases:
Web Browsing:
HTTP is the protocol used for web browsing, allowing users to access and
interact with websites.
APIs:
Many web services and APIs use HTTP as the underlying protocol to expose
their functionalities, making it possible for clients to access and manipulate data
remotely.
Resource Retrieval:
HTTP is used for retrieving resources like web pages, images, videos, and
documents.
Implementation:
Resources:
HTTP Methods:
RESTful services use HTTP methods to perform CRUD (Create, Read, Update,
Delete) operations on resources. For example, GET retrieves data, POST
creates data, PUT updates data, and DELETE removes data.
Statelessness:
REST services are stateless, meaning each request from a client to a server must
contain all the information needed to understand and process the request.
Servers do not store client state.
Representations:
Hypermedia:
Use Cases:
Web APIs:
REST is commonly used to build web APIs for mobile and web applications. It
provides a scalable and easy-to-use approach for exposing data and services.
Microservices:
Resource Management:
Interoperability:
RESTful services can be used across different platforms and technologies due to
their reliance on standard HTTP methods and formats.
In summary, HTTP is the protocol that powers the World Wide Web, and REST is an
architectural style for designing scalable and stateless web services. REST leverages
HTTP methods and resources to create APIs that are widely used for web and mobile
applications, microservices, and distributed systems. It provides a standardized and
flexible approach to building web-based systems.
WebSocket vs gRPC
WebSocket and gRPC are two distinct communication technologies used in system
design, each with its own characteristics, implementation, and use cases:
WebSocket:
Definition:
Implementation:
Protocol:
WebSocket defines a protocol for handshake and data framing, allowing both
the client and server to send messages to each other without the need for a
request-response pattern.
Persistent Connection:
Server Libraries:
Use Cases:
Live Dashboards: WebSocket can be used to update live dashboards with real-
time data from various sources, providing users with up-to-the-minute
information.
Streaming Services: It's suitable for streaming data services where clients need
to receive continuous updates, such as stock market updates, weather feeds, or
social media feeds.
gRPC:
Definition:
Implementation:
IDL:
Developers define service methods and message structures in a .proto file using
Protocol Buffers, which serves as a contract between clients and servers.
Code Generation:
gRPC generates client and server code in various programming languages from
the .proto file, making it easy to develop and maintain consistent APIs across
different platforms.
HTTP/2:
gRPC communicates over HTTP/2, a modern and efficient protocol that offers
multiplexing, header compression, and other features to improve performance.
Use Cases:
Microservices:
Cross-Language Communication:
Efficiency:
It can be used to build APIs for web applications and backend services, where
efficient and type-safe communication is desired.
Streaming:
gRPC supports both unary and bidirectional streaming, allowing for efficient
streaming of data between clients and servers.
GraphQL
API
API API
API API
GraphQL is a query language for APIs and a runtime environment for executing those
queries by utilizing a type system you define for your data. GraphQL is designed to be a
more efficient, powerful, and flexible alternative to traditional RESTful APIs. Let's
explore what GraphQL is, how it is typically implemented, and why it is useful in system
design:
Key Functions:
Query Language:
GraphQL allows clients to specify the shape and structure of the data they need
by sending queries to the server. Clients can request only the data they require,
avoiding over-fetching or under-fetching of data.
Strongly Typed:
GraphQL uses a type system to define the structure of your data. You specify
the types, their relationships, and the available queries and mutations in a
schema.
Single Endpoint:
Unlike REST, which typically exposes multiple endpoints for different resources,
GraphQL uses a single endpoint for all data queries and mutations.
Real-Time Data:
Implementation:
Schema Definition:
In GraphQL, you start by defining a schema. The schema specifies the types
available, their relationships, and the queries and mutations that can be
performed. This schema serves as a contract between clients and servers.
Resolvers:
For each field in your schema, you provide resolver functions. Resolvers are
responsible for fetching the data for their corresponding fields. They determine
how to retrieve data from databases, APIs, or other sources.
Query Execution:
When a client sends a query, the GraphQL server parses the query, validates it
against the schema, and executes the appropriate resolvers to fetch the
requested data.
Data Encryption
Data encryption is a crucial aspect of system design, providing security for data both in
transit (while it's being transmitted between systems) and at rest (when it's stored on
storage devices).
Achieving data encryption in transit and at rest involves different mechanisms and
considerations:
TLS/SSL protocols are used to encrypt data during transit over networks, such
as the internet.
FDE encrypts the entire storage device (e.g., hard drive, SSD) at the block level.
When data is written to the disk, it's automatically encrypted, and when it's
read, it's decrypted on-the-fly.
FDE ensures that if the physical storage device is stolen or compromised, the
data remains inaccessible without the encryption key.
File-Level Encryption:
It provides more granular control over which files are encrypted and can be
useful for encrypting specific sensitive files while leaving others unencrypted.
Database Encryption:
This can include encrypting the entire database, specific tables, or even
individual columns containing sensitive data.
In the event of a data breach, encrypted data is much less valuable to attackers
because they cannot easily access the information without the decryption key.
Checksums and Hashing
Checksums and hashing are fundamental concepts in computer science and system
design, playing crucial roles in ensuring data integrity, security, and error detection.
Here's how they work and why they are important:
Process:
Checksums and hashing are essential for verifying the integrity of data. They help
detect errors or unauthorized changes during data transmission or storage.
Error Detection:
Data Verification:
Hashing allows for data verification. When transmitting or storing data, a hash
value is computed and transmitted or stored alongside the data. The recipient can
verify the data's authenticity by rehashing and comparing it with the received hash.
Security:
Hashing is used in data structures like hash tables for efficient data retrieval. It
enables quick lookup and indexing of data based on a hash value.
Digital Signatures:
Data Deduplication:
Cryptography:
Token-Based Authentication:
User Server
| |
| Authentication |
|--------------------->|
| |
| |
| Token Issued |
|<---------------------|
| |
| |
| Token Sent |
|--------------------->|
| |
| |
| Access Granted |
|<---------------------|
| |
Principle:
Tokens are usually generated upon successful login and are used to prove the
user's identity in subsequent requests.
Stateless:
Each request from the client must include the token for authentication.
Scalability:
Implementation:
The server generates a token (e.g., JSON Web Token or JWT) upon successful
login and sends it to the client.
The client stores the token (often in local storage or cookies) and sends it with
each subsequent request.
The server validates the token by checking its signature and claims. If valid, the
user is authenticated.
Session-Based Authentication:
User Server
| |
| Authentication |
|--------------------->|
| |
| |
| Session Created |
|<---------------------|
| |
| |
| Session ID Sent |
|--------------------->|
| |
| |
| Session ID Checked |
|<---------------------|
| |
| |
| Access Granted |
|<---------------------|
| |
Principle:
Session-based authentication relies on server-side storage of user session data.
Upon successful login, the server creates a session for the user and associates it
with a session identifier (usually a cookie or URL parameter).
Stateful:
Sessions are stateful because the server stores session data on its side.
Implementation:
After a user logs in, the server creates a session and sends a session identifier to
the client, often in the form of a session cookie.
The client automatically includes the session identifier with each request.
The server looks up the session data associated with the identifier to
authenticate the user.
Key Differences:
Storage:
Token-based authentication stores user identity and claims in the token itself,
while session-based authentication stores session data on the server.
State:
Scalability:
Storage Location:
Token-based authentication stores the token on the client (e.g., in cookies or local
storage), while session-based authentication stores session data on the server.
Logout Handling:
In session-based authentication, server-side sessions can be invalidated easily for
user logout. Token-based authentication requires additional handling to handle
token invalidation and logout.
Expiration:
API Security
Securing a REST API is crucial to protect sensitive data and ensure that your
application is not vulnerable to various security threats. Here are some best practices
for securing a REST API:
Ensure that all communication between clients and the API is encrypted using
HTTPS. This prevents eavesdropping and man-in-the-middle attacks.
Authentication:
Use strong and up-to-date authentication protocols (e.g., OAuth 2.0, OpenID
Connect) for user authentication and authorization.
Authorization:
Token-Based Authentication:
When using token-based authentication, ensure that tokens have a limited
lifespan (expiration time) and are securely stored and transmitted.
Input Validation:
Validate all input data to prevent injection attacks, such as SQL injection, cross-
site scripting (XSS), and cross-site request forgery (CSRF).
Error Handling:
Security Headers:
Data Encryption:
API Gateway:
Follow security standards and frameworks like OWASP API Security Top Ten
and OWASP Web Security Testing Guide to guide your security efforts.
Remember that security is an ongoing process, and it's essential to stay vigilant and
adapt to evolving threats. Regularly review and update your security practices to
mitigate new risks effectively.