DS Unit - 4
DS Unit - 4
Fault Tolerance: Introduction, Process Resilience, Reliable Client-Server Communication, Reliable Group
Communication, Distributed Commit, Recovery.
Fault Tolerance
Fault Tolerance in Distributed System
Fault tolerance in distributed systems is the capability to continue operating smoothly despite failures or errors in
one or more of its components. This resilience is crucial for maintaining system reliability, availability, and
consistency. By implementing strategies like redundancy, replication, and error detection, distributed systems can
handle various types of failures, ensuring uninterrupted service and data integrity.
In distributed systems, three types of problems occur. All these three types of problems are related.
Fault: Fault is defined as a weakness or shortcoming in the system or any hardware and software
component. The presence of fault can lead to error and failure.
Failure: Failure is the outcome where the assigned goal is not achieved.
Fault Tolerance is defined as the ability of the system to function properly even in the presence of any
failure. Distributed systems consist of multiple components due to which there is a high risk of faults occurring. Due
to the presence of faults, the overall performance may degrade.
Types of Faults
Transient Faults: Transient Faults are the type of faults that occur once and then disappear. These types of
faults do not harm the system to a great extent but are very difficult to find or locate. Processor fault is an
example of transient fault.
Intermittent Faults: Intermittent Faults are the type of faults that come again and again. Such as once the
fault occurs it vanishes upon itself and then reappears again. An example of intermittent fault is when the
working computer hangs up.
Permanent Faults: Permanent Faults are the type of faults that remain in the system until the component is
replaced by another. These types of faults can cause very severe damage to the system but are easy to
identify. A burnt-out chip is an example of a permanent Fault.
1. Availability: Availability is defined as the property where the system is readily available for its use at any
time.
2. Reliability: Reliability is defined as the property where the system can work continuously without any
failure.
3. Safety: Safety is defined as the property where the system can remain safe from unauthorized access even if
any failure occurs.
4. Maintainability: Maintainability is defined as the property states that how easily and fastly the failed node
or system can be repaired.
Fault Tolerance in Distributed Systems
In order to implement the techniques for fault tolerance in distributed systems, the design, configuration and
relevant applications need to be considered. Below are the phases carried out for fault tolerance in any distributed
systems.
1. Fault Detection
Fault Detection is the first phase where the system is monitored continuously. The outcomes are being compared
with the expected output. During monitoring if any faults are identified they are being notified. These faults can
occur due to various reasons such as hardware failure, network failure, and software issues. The main aim of the first
phase is to detect these faults as soon as they occur so that the work being assigned will not be delayed.
2. Fault Diagnosis
Fault diagnosis is the process where the fault that is identified in the first phase will be diagnosed properly in order
to get the root cause and possible nature of the faults. Fault diagnosis can be done manually by the administrator or
by using automated Techniques in order to solve the fault and perform the given task.
3. Evidence Generation
Evidence generation is defined as the process where the report of the fault is prepared based on the diagnosis done
in an earlier phase. This report involves the details of the causes of the fault, the nature of faults, the solutions that
can be used for fixing, and other alternatives and preventions that need to be considered.
4. Assessment
Assessment is the process where the damages caused by the faults are analyzed. It can be determined with the help
of messages that are being passed from the component that has encountered the fault. Based on the assessment
further decisions are made.
5. Recovery
Recovery is the process where the aim is to make the system fault free. It is the step to make the system fault free
and restore it to state forward recovery and backup recovery. Some of the common recovery techniques such as
reconfiguration and resynchronization can be used.
Types of Fault Tolerance in Distributed Systems
1. Hardware Fault Tolerance: Hardware Fault Tolerance involves keeping a backup plan for hardware devices
such as memory, hard disk, CPU, and other hardware peripheral devices. Hardware Fault Tolerance is a type
of fault tolerance that does not examine faults and runtime errors but can only provide hardware backup.
The two different approaches that are used in Hardware Fault Tolerance are fault-masking and dynamic
recovery.
2. Software Fault Tolerance: Software Fault Tolerance is a type of fault tolerance where dedicated software is
used in order to detect invalid output, runtime, and programming errors. Software Fault Tolerance makes
use of static and dynamic methods for detecting and providing the solution. Software Fault Tolerance also
consists of additional data points such as recovery rollback and checkpoints.
3. System Fault Tolerance: System Fault Tolerance is a type of fault tolerance that consists of a whole system.
It has the advantage that it not only stores the checkpoints but also the memory block, and program
checkpoints and detects the errors in applications automatically. If the system encounters any type of fault
or error it does provide the required mechanism for the solution. Thus system fault tolerance is reliable and
efficient.
Process resilience
Process Resilience is a critical aspect of fault tolerance in distributed systems. It refers to the system's ability to
handle and recover from process failures while maintaining availability, reliability, and consistent operation. The goal
is to ensure that individual process failures do not compromise the overall functionality of the system.
o Failure Detection:
Identifies failed processes using mechanisms like heartbeats, timeout checks, or monitoring
tools.
o Recovery:
o Redundancy:
Ensures critical tasks are handled by multiple processes to avoid single points of failure.
2. Process Failures:
o Crash Failures:
A process halts and does not resume (no further communication is received).
o Byzantine Failures:
o Omission Failures:
o Timing Failures:
o Idempotence:
Ensure that re-executing a process produces the same result (e.g., retrying a failed
transaction).
o Isolation:
o Graceful Degradation:
1. Replication:
o Types:
2. Process Monitoring:
o Types:
4. Process Migration:
o Move a process to another node if the current node fails or becomes overloaded.
5. Leader Election:
o Protocols:
Bully Algorithm: The process with the highest priority becomes the leader.
6. Failover Mechanisms:
7. Consensus Protocols:
3. Availability:
1. Network Partitioning:
o Partitioning can isolate processes, making it difficult to detect or recover failures.
2. Byzantine Failures:
o Handling malicious or arbitrary failures requires complex protocols like Byzantine Fault Tolerance
(BFT).
3. State Synchronization:
4. Resource Overheads:
Applications
Financial Systems: Require high resilience to handle transaction processing without downtime.
E-commerce: Essential for handling user requests and maintaining consistent inventory.
Process resilience is vital for building robust distributed systems, ensuring continuous operation despite failures, and
improving user trust in system reliability.
Reliable Client-Server Communication in a Distributed System
Reliable client-server communication in a distributed system refers to the dependable exchange of data between
clients and servers across a network. Ensuring this reliability is critical for maintaining system integrity, consistency,
and performance.
Challenges like network latency, packet loss, and data corruption can hinder effective communication.
Addressing these issues involves using robust protocols and error-handling techniques.
In this article, we will explore the importance of reliable communication, common challenges, and the best
practices for achieving it in distributed systems.
Reliable communication is vital for ensuring the smooth operation of distributed systems. It guarantees that data
transmitted between clients and servers remains accurate and consistent. Here are several key reasons why reliable
communication is essential:
Data Integrity: Ensuring data integrity means that the information sent is received without errors. This is
crucial for applications like financial transactions where accuracy is paramount.
Consistency: Consistent communication prevents data mismatches across different parts of the system. This
helps maintain a unified state across distributed nodes.
Security: Reliable protocols often include security features that protect data from interception and
tampering. This ensures that sensitive information remains confidential and intact.
Scalability: As systems grow, maintaining reliable communication becomes more challenging. Reliable
communication strategies support scalable solutions that can handle increased load without compromising
performance.
Maintaining reliable client-server communication in distributed systems can be complex due to various inherent
challenges. These challenges can impact the system's performance, data integrity, and overall user experience. Here
are some common issues faced in client-server communication:
Network Latency: Delays in data transmission can slow down system responses. High latency can degrade
user experience and hinder real-time processing.
Packet Loss: Data packets may get lost during transmission due to network issues. Packet loss can lead to
incomplete or corrupted messages, affecting data integrity.
Data Corruption: Errors during transmission can corrupt data, rendering it unusable. Ensuring data integrity
requires robust error detection and correction mechanisms.
Concurrency Issues: Simultaneous data requests can cause conflicts and inconsistencies. Managing
concurrent requests effectively is crucial for maintaining data consistency.
Scalability: As the system grows, ensuring reliable communication becomes more challenging. Increased
traffic can strain network resources and lead to performance bottlenecks.
Security Threats: Data transmitted over the network can be intercepted or tampered with. Implementing
strong encryption and security measures is essential to protect sensitive information.
Protocols and Techniques for Reliable Communication
Ensuring reliable communication in a distributed system requires a combination of robust protocols and effective
techniques. Here are several key methods and protocols that help achieve dependable client-server communication:
Transmission Control Protocol (TCP): TCP ensures reliable, ordered, and error-checked delivery of data
between applications. It manages packet loss by retransmitting lost packets and ensures data integrity
through checksums.
HTTP/2 and HTTP/3: These protocols improve performance and reliability with features like multiplexing,
which allows multiple requests and responses simultaneously over a single connection. They also include
header compression to reduce overhead.
Message Queues: Systems like RabbitMQ and Apache Kafka help manage message delivery. They queue
messages and retry sending them if they fail, ensuring no message is lost even if the server is temporarily
unavailable.
Automatic Repeat reQuest (ARQ): ARQ is a protocol for error control that automatically retransmits lost or
corrupted packets. This technique ensures that all data reaches its destination intact.
Forward Error Correction (FEC): FEC adds redundant data to the original message. This allows the receiver to
detect and correct errors without needing a retransmission.
Error detection and correction mechanisms are essential for maintaining data integrity in client-server
communication. They ensure that any data corrupted during transmission is identified and corrected.
Checksums: Checksums generate a small value from a block of data. The sender includes this value with the
data, and the receiver recalculates it to verify integrity.
Cyclic Redundancy Check (CRC): CRC is a more advanced form of checksum. It uses polynomial division to
detect errors in transmitted messages.
Parity Bits: Parity bits add an extra bit to data to make the number of set bits either even or odd. This helps
detect single-bit errors.
Hamming Code: Hamming code adds redundant bits to data. It detects and corrects single-bit errors and
detects two-bit errors.
Automatic Repeat reQuest (ARQ): ARQ protocols, like Stop-and-Wait and Go-Back-N, request
retransmission of corrupted or lost packets. This ensures reliable delivery.
Forward Error Correction (FEC): FEC adds redundant data to enable the receiver to detect and correct errors
without needing retransmission.
Examples of Reliable Client-Server Communication
Reliable client-server communication is crucial for various real-world applications where data integrity and
performance are paramount. Below are some examples demonstrating its importance:
Financial Systems: In banking and stock trading platforms, reliable communication ensures transaction
accuracy and data consistency. A single error can lead to significant financial loss and undermine trust.
E-commerce Platforms: Online shopping sites rely on dependable communication for inventory
management and payment processing. This ensures users have a smooth and secure shopping experience.
Healthcare Systems: Electronic health records and telemedicine services require accurate and timely data
exchange. Reliable communication ensures patient information is correct and up-to-date.
Cloud Services: Cloud platforms like AWS and Google Cloud maintain data consistency and availability across
distributed servers. This enables seamless access and high availability for users.
Gaming Applications: Multiplayer online games need real-time data synchronization to ensure a fair and
enjoyable experience. Reliable communication minimizes lag and prevents data discrepancies.
IoT Devices: Smart home systems and industrial IoT applications rely on consistent data transmission. This
ensures devices function correctly and respond promptly to commands.
Reliable Group Communication
Reliable group communication is used when you want to send messages to a group of processes or nodes, ensuring
that the messages are delivered correctly, even if some of the processes fail or the network experiences issues. It
involves mechanisms that guarantee reliable message delivery, message ordering, and fault tolerance for all
members of the group.
Key Challenges in Reliable Group Communication:
1. Message Loss: Messages sent to the group may be lost due to network issues or process crashes.
2. Message Duplication: A message might be delivered more than once, causing inconsistencies.
3. Out-of-Order Delivery: Messages may arrive out of order, violating the intended sequence.
4. Network Partitions: The system might become divided into separate groups, with some members unable to
communicate with others.
5. Fault Tolerance: Some processes in the group may fail, and the system should still ensure the message
delivery to the surviving members.
Types of Reliable Group Communication
1. Atomic Broadcast
Definition: An atomic broadcast ensures that all members of the group either receive a message or none at
all. This means that the message is either delivered to all members of the group or none, even in the
presence of network failures or process crashes.
Key Properties:
o Uniformity: All processes in the group either deliver the message or do not.
o Order: Messages are delivered in the same order to all processes (i.e., no reordering).
o Fault Tolerance: Even if some processes crash or the network partitions, the system ensures that a
message is either delivered to all surviving processes or no one.
Example: A leader election protocol in a distributed system might use atomic broadcast to ensure that when the
leader is elected, all nodes receive the same information about the new leader at the same time.
2. Causal Ordering
Definition: Causal ordering ensures that messages are delivered in a way that respects their causal
relationships. If one message causes another (for example, process A sends a message to process B, and
process B sends a response back to process A), the order in which these messages are delivered must
respect the causal chain.
Key Properties:
o Causal Consistency: The system respects the causal relationships between events (i.e., a cause
precedes its effect).
o Concurrency: If two messages are not causally related, they may be delivered in any order.
Example: If a client sends a request to a server, and the server sends a response back, the client must receive the
response after receiving the original request. If two requests are independent of each other, they may be delivered
in any order.
3. Total Ordering
Definition: Total ordering ensures that all members of the group receive messages in the same order. This is
particularly important when the order of message processing matters (e.g., in transactions).
Key Properties:
o Global Agreement: Every process receives messages in the same order.
o Consistency: It ensures that there is no divergence in the order in which messages are processed
across the group.
Example: If a distributed ledger system needs to process transactions in a specific order (e.g., in a blockchain), total
ordering ensures that all participants process the transactions in the same sequence.
4. Message Delivery Guarantees
Reliable Delivery: Messages sent to a group are reliably delivered to all group members, even in the face of
network failures or crashes. The system ensures that if a message is sent, all group members either receive it
or none do.
No Duplicates: A message should not be delivered more than once, even if retries or retransmissions occur
due to network failure.
No Loss: Messages must not be lost. If a message is sent to the group, all processes in the group should
eventually receive it.
Timeouts and Retries: In case of message loss or delay, the system should have mechanisms for retrying
delivery until all processes receive the message.
5. Handling Network Partitions (Partition Tolerance)
Problem: In a distributed system, network failures might cause a partition, where some nodes become
isolated from the rest of the group. During a partition, some processes may not receive messages from
others, which can lead to inconsistencies.
Solution: To ensure reliable communication in the presence of partitions, systems often use Quorum-based
protocols or Leader-based protocols to make decisions about message delivery and partition resolution.
Quorum-based Protocol: A majority (quorum) of the group must agree to deliver a message. This ensures
consistency while handling partitions.
Leader-based Protocol: A leader makes decisions for the group, and if the leader fails, a new leader is
elected to avoid split-brain scenarios.
1. Atomicity:
o A transaction must be executed completely or not at all across all participating nodes.
2. Consistency:
o All participants must agree on the outcome (commit or abort) to ensure data integrity.
3. Coordination:
4. Fault Tolerance:
o The protocol must handle failures gracefully, ensuring the system remains consistent.
Overview:
A widely used protocol for achieving consensus among distributed nodes.
Phases:
1. Prepare Phase:
2. Commit Phase:
Advantages:
Disadvantages:
Blocking: If the coordinator fails after sending PREPARE, participants cannot proceed.
Overview:
An extension of 2PC designed to avoid blocking during coordinator failures.
Phases:
1. Prepare Phase:
o Same as in 2PC.
2. Pre-Commit Phase:
3. Commit Phase:
o The coordinator sends a COMMIT message, and participants finalize the transaction.
Advantages:
Disadvantages:
3. Paxos-Based Commit
Overview:
Uses the Paxos consensus algorithm to achieve a distributed commit decision.
Steps:
Participants propose values (commit/abort), and Paxos ensures agreement on the outcome.
Advantages:
Guarantees consistency.
Disadvantages:
Complex implementation.
Overview:
Relies on quorum-based voting to decide commit or abort.
Steps:
Disadvantages:
1. Coordinator Failures
2. Network Partitions
3. Performance Overhead
4. Concurrency Control
Applications
1. Distributed Databases
3. Financial Systems
Recovery
Recovery in distributed systems focuses on maintaining functionality and data integrity despite failures. It involves
strategies for detecting faults, restoring state, and ensuring continuity across interconnected nodes. This article
delves into techniques for handling various types of failures—such as network issues and node crashes—by
implementing robust recovery mechanisms. Understanding these principles helps in designing resilient systems that
can quickly recover from disruptions and maintain consistent operations.
Effective recovery in distributed systems is crucial for ensuring system reliability, availability, and fault tolerance.
When a component fails or an error occurs, the system must recover quickly and correctly to minimize downtime
and data loss. Effective recovery mechanisms, such as checkpointing, rollback, and forward recovery, help maintain
system consistency, prevent cascading failures, and ensure that the system can continue to function even in the
presence of faults.
Recovery techniques in distributed systems are essential for ensuring that the system can return to a stable state
after encountering errors or failures. These techniques can be broadly categorized into the following:
Checkpointing: Periodically saving the system’s state to a stable storage, so that in the event of a failure, the
system can be restored to the last known good state. Checkpointing is a key aspect of backward recovery.
Rollback Recovery: Involves reverting the system to a previous checkpointed state upon detecting an error.
This technique is useful for undoing the effects of errors and is often combined with checkpointing.
Forward Recovery: Instead of reverting to a previous state, forward recovery attempts to move the system
from an erroneous state to a new, correct state. This requires anticipating possible errors and having
strategies in place to correct them on the fly.
Logging and Replay: Keeping logs of system operations and replaying them from a certain point to recover
the system’s state. This is useful in scenarios where a complete rollback might not be feasible.
Replication: Maintaining multiple copies of data or system components across different nodes. If one
component fails, another can take over, ensuring continuity of service.
Error Detection and Correction: Incorporating mechanisms that detect errors and automatically correct
them before they lead to system failure. This is a proactive approach that enhances system resilience.
Recovery from an error is essential to fault tolerance, and error is a component of a system that could result in
failure. The whole idea of error recovery is to replace an erroneous state with an error-free state. Error recovery can
be broadly divided into two categories.
Backward Recovery: This involves rolling the system back to a previously known good state, using
checkpoints to periodically save the system’s state. When an error occurs, the system can revert to one of
these saved states to recover from the error.
Forward Recovery: This approach focuses on moving the system from an erroneous state to a new, correct
state without reverting to a previous checkpoint. It requires anticipation of potential errors and the ability to
correct them, allowing the system to continue functioning.
These two categories are fundamental to understanding how distributed systems handle errors and maintain fault
tolerance. The distinction between backward and forward recovery highlights different strategies for ensuring
system resilience in the face of failures
Introduction to Security in Distributed Systems
Security in distributed systems refers to the protection of system resources and data from unauthorized access,
modification, or destruction. Distributed systems consist of multiple components, often running on different
machines or even different geographical locations, which increases the attack surface for potential security
breaches. Effective security in these systems typically involves:
Integrity: Ensuring that data is not tampered with, altered, or corrupted during transmission or storage.
Availability: Ensuring that the system remains operational and accessible even under attack (e.g., denial-of-
service attacks).
Authorization: Ensuring that an authenticated entity has the necessary permissions to access resources.
Non-repudiation: Ensuring that actions cannot be denied after they have been performed (e.g., signing a
document).
Securing distributed systems poses several significant challenges due to their complexity, scale, and dynamic nature.
Here are the key challenges in distributed system security:
Network Complexity: Increases the attack surface and complexity of managing security configurations and
updates across diverse network environments.
Data Protection and Encryption: Vulnerabilities in encryption implementations or weak key management
practices can lead to data breaches and unauthorized access.
Diverse Technologies and Platforms: Compatibility issues, differing security postures, and varying levels of
support for security standards can introduce vulnerabilities and complexities in maintaining a consistent
security posture.
Scalability and Performance: Security measures such as encryption and authentication may introduce
latency and overhead, affecting system performance and responsiveness, especially under high load
conditions.
Communication over insecure networks: Messages can be intercepted or altered during transmission.
Data consistency and integrity: Protecting data from being modified or corrupted by malicious actors.
Secure Channels
A secure channel is a communication link between two or more entities in a distributed system that is designed to
protect the data being transmitted. The goal is to prevent eavesdropping, tampering, and unauthorized access.
1. Encryption: Encrypting data before transmission ensures that even if the data is intercepted, it cannot be
read or altered without the appropriate decryption key.
o Symmetric Encryption: The same key is used for both encryption and decryption (e.g., AES). It is fast
but requires secure key distribution.
o Asymmetric Encryption: Uses a pair of keys: a public key to encrypt and a private key to decrypt
(e.g., RSA). It is more secure but computationally expensive.
2. TLS/SSL (Transport Layer Security / Secure Sockets Layer): These are protocols that provide secure
communication over a computer network by using encryption, integrity checks, and authentication. TLS/SSL
is widely used for securing HTTP connections (HTTPS).
o TLS ensures that data is encrypted between the client and server, preventing man-in-the-middle
(MITM) attacks.
o Handshake: During the TLS handshake, the client and server exchange cryptographic keys and
authenticate each other.
3. Digital Signatures: A digital signature is a cryptographic technique that verifies the authenticity and integrity
of a message. It ensures that the message has not been tampered with and that it was sent by the legitimate
sender.
o Example: A distributed blockchain system uses digital signatures to verify the authenticity of
transactions.
4. Perfect Forward Secrecy (PFS): This ensures that even if the private key of a server is compromised in the
future, past communications will remain secure because the session keys are not derived from the server’s
private key. Each session uses unique session keys.
5. VPNs (Virtual Private Networks): A VPN provides an encrypted communication channel between remote
clients and servers over a public network, ensuring privacy and security.
In a distributed cloud-based storage system, when a user uploads or downloads sensitive data, the communication
between the user’s device and the cloud server should occur over a secure channel, such as HTTPS, to prevent
eavesdropping or man-in-the-middle attacks.
Access Control
Access control refers to the mechanisms that restrict access to system resources based on the identity of the user or
process and their permissions. It ensures that only authorized users or entities can perform specific operations on
system resources.
o Definition: In DAC, the resource owner has control over who can access their resources and what
actions they can perform.
o Example: A user may decide to share a file with specific users or groups, granting them read or write
access.
o Downside: DAC is considered less secure because the resource owner can grant permissions to
anyone.
o Definition: In MAC, access decisions are made by a central authority, not the resource owner. The
system enforces strict policies about who can access resources.
o Example: A classified government document system where access is based on security labels (e.g.,
"Top Secret," "Confidential").
o Key Advantage: MAC provides a more secure and enforceable access control policy, reducing the
risk of unauthorized access.
o Definition: RBAC is based on roles assigned to users. Each role has specific permissions, and users
are assigned to roles based on their job responsibilities.
o Example: In a corporate setting, the "Admin" role might have full access to the system, while the
"Employee" role might have limited access.
o Key Advantage: RBAC is scalable and easy to manage as permissions are granted based on roles
rather than individual users.
o Definition: ABAC uses attributes (characteristics) of the user, resource, and environment to
determine access. This model allows fine-grained access control.
o Example: A system might allow access to resources based on the user's location, time of day, or
department.
In an online banking application, access to sensitive data (e.g., account balance, transaction history) is controlled
through RBAC. A regular user can view their own account data, but an admin might have broader access to all users’
data.
Security Management
Security management involves the processes and tools used to ensure that security policies are effectively
implemented, monitored, and enforced in distributed systems. It covers everything from identifying threats to
responding to security incidents.
1. Security Policies:
o Definition: A security policy is a document that outlines the rules and guidelines for maintaining the
security of a system. It includes policies for data protection, user access, and incident response.
o Example: A security policy might dictate that all sensitive data must be encrypted at rest and during
transit.
o Definition: Authentication is the process of verifying a user’s identity, typically through credentials
like usernames and passwords, biometrics, or security tokens. Identity management systems handle
the creation, storage, and management of user identities and their associated roles and permissions.
o Example: Single Sign-On (SSO) systems allow users to authenticate once and access multiple services
without having to log in separately to each one.
o Definition: Auditing involves tracking and logging all access to system resources, and monitoring
involves continuously checking the system for suspicious activity or potential threats.
o Example: In a distributed system, auditing logs could track all user access to sensitive data, while
monitoring could involve detecting unusual traffic patterns that might indicate an ongoing
Distributed Denial of Service (DDoS) attack.
4. Incident Response:
o Definition: Incident response refers to the steps taken to detect, respond to, and recover from
security incidents or breaches.
o Example: If a breach is detected in a system, the response might include isolating the affected
system, identifying the cause of the breach, notifying affected users, and restoring systems from
backups.
o Definition: Regular patching and updating of software components is essential for security.
Vulnerabilities in software are often discovered, and patches must be applied to mitigate potential
exploits.
o Example: An organization might have a policy to apply critical security patches to all servers within
24 hours of release.
6. Key Management:
o Example: In a system using asymmetric encryption, key management ensures that private keys are
protected and only authorized users have access to public keys.