0% found this document useful (0 votes)
9 views

Unit 4 Distributed Systems

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit 4 Distributed Systems

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Distributed File Systems: Introduction

A Distributed File System (DFS) is a file system that allows access to files and data over a network,
enabling multiple users and applications to read, write, and share data as if it were located on a local
disk. Unlike traditional file systems that are tied to a single machine, a DFS distributes files across
multiple servers, providing benefits such as increased availability, scalability, and fault tolerance.

Key Characteristics of Distributed File Systems

1.Location Transparency:

Users access files without needing to know their physical location. The system abstracts the details
of where data is stored, making it easier for users and applications to interact with files seamlessly.

2.Scalability:

DFS can handle increasing amounts of data and user requests by adding more servers and storage
devices. This scalability allows organizations to grow their storage capabilities without significant
redesign.

3.Fault Tolerance:

Distributed file systems often include redundancy and replication mechanisms. If one server fails,
data can still be accessed from another server, ensuring continuous availability and reliability.

4.Concurrency Control:

A DFS manages concurrent access to files by multiple users or applications. It implements protocols
to ensure data consistency and integrity, preventing conflicts when multiple clients attempt to read
or write the same file simultaneously.

5.Data Replication:

To enhance reliability and performance, DFS often replicates data across multiple nodes. This not
only provides fault tolerance but also allows for load balancing by distributing read requests across
replicas.

6.Network Transparency:

Users interact with the file system without being aware of the underlying network. The system
manages network communication and data transfer, allowing users to focus on file management
rather than connectivity issues.

Common Use Cases

• Cloud Storage Solutions: Services like Google Drive, Dropbox, and Amazon S3 use
distributed file systems to provide scalable and accessible storage for users.
• Big Data Applications: Distributed file systems are essential in big data frameworks (e.g.,
Hadoop Distributed File System) that manage large datasets across many servers.
• Collaborative Environments: Environments that require multiple users to access and edit
files concurrently, such as shared project directories in organizations.
Examples of Distributed File Systems

1.Google File System (GFS):

Designed to support Google's data-intensive applications, GFS provides fault tolerance, high
throughput, and supports large files.

2.Hadoop Distributed File System (HDFS):

Part of the Apache Hadoop project, HDFS is designed for high-throughput access to large datasets,
supporting big data processing applications.

3.Ceph:

A unified distributed storage system that provides object, block, and file storage in a scalable and
fault-tolerant manner.

4.NFS (Network File System):

Originally developed by Sun Microsystems, NFS allows users to access files over a network as if they
were on local storage, though it is primarily designed for smaller networks compared to more
modern distributed file systems.

File service architecture


Distributed File Systems play a crucial role in modern computing by providing efficient, scalable, and
fault-tolerant access to data across a network. Their ability to abstract physical storage locations
while ensuring data consistency and availability makes them essential for cloud computing, big data
applications, and collaborative environments. As the demand for data storage and access continues
to grow, distributed file systems will remain integral to the architecture of distributed computing
solutions.
The file service architecture in the image is designed to provide a structured, distributed file service
with a clear separation of responsibilities among components. This architecture consists of three
primary components: the Client Module, the Directory Service, and the Flat File Service. Here’s an in-
depth explanation of each component and their interactions:

1. Client Module

Role: The client module runs on the client computer and acts as an intermediary between the client
applications and the server services (Flat File Service and Directory Service).

Functionality:

Provides a unified interface that combines the capabilities of the Flat File Service and Directory
Service, making it easier for applications to interact with files in a way similar to conventional file
systems (e.g., UNIX).

Manages file operations requested by the application programs, such as reading, writing, and
creating files.

Interprets file names and translates them into unique file identifiers (UFIDs) through iterative
communication with the Directory Service.

Caches recently accessed file data locally to enhance performance and reduce the number of
requests sent to the server.

Stores information about the network locations of the Flat File Service and Directory Service,
enabling direct communication with them.

2. Directory Service

Role: The Directory Service is responsible for managing file names and their mapping to unique file
identifiers (UFIDs).

Functionality:

Acts as a mapping system, translating human-readable file names into UFIDs. This allows the system
to use UFIDs to uniquely identify files across the network.

Supports directory-related operations, such as creating directories, adding new file names, and
retrieving UFIDs for files.

Uses a hierarchical naming system, similar to that in UNIX, where directories can reference other
directories, enabling complex file structures.

Stores directory data within the Flat File Service, meaning directory information is kept as files in the
flat file system, providing a uniform approach to data storage.

Interaction with Flat File Service:

The Directory Service functions as a client of the Flat File Service, where it manages and stores
directory-related data as regular files.
3. Flat File Service

Role: The Flat File Service provides low-level access to the actual file contents and handles operations
on the data within each file.

Functionality:

Manages files directly by UFIDs, which ensure each file is uniquely identified and accessed.

Provides basic file operations, such as:

• Read: Fetches data from a specified position in the file.


• Write: Modifies or extends data within the file.
• Create: Generates a new file and assigns it a unique identifier.
• Delete: Removes a file from storage.
• GetAttributes and SetAttributes: Access and modify file metadata.

Implements these functions through a Remote Procedure Call (RPC) interface, allowing the Client
Module to request operations without direct file manipulation.

Access Control:

Operations in the Flat File Service check for appropriate access rights before allowing access to or
modification of file data. Invalid file identifiers or permissions trigger exceptions (e.g., BadPosition).

How the Components Work Together

• File Naming and Access: When an application program requests access to a file, it specifies
the file name. The Client Module contacts the Directory Service to resolve the file name to a
UFID.

• File Data Operations: Once the Client Module has the UFID, it uses the Flat File Service to
perform file operations like reading, writing, or deleting file contents.

• Caching: The Client Module may cache recently accessed files or data blocks locally,
improving performance by reducing repeated requests to the server.

• Separation of Concerns: The Flat File Service focuses only on file content management, while
the Directory Service manages the mapping of file names to UFIDs. This separation makes the
system modular and easier to scale and maintain.

Advantages of This Architecture

1. Modularity: The separation between file data operations (Flat File Service) and file
naming/directory management (Directory Service) allows each component to be maintained
and scaled independently.

2. Flexibility: Different client modules can be implemented for various client systems, adapting
to specific operating system conventions and performance requirements.

3. Scalability: By separating directory and file data management, the system can handle more
clients and files more efficiently, making it well-suited for large distributed environments.
4. Caching for Performance: The client-side caching helps reduce latency by storing frequently
accessed data locally, which reduces the load on the server and improves user experience.

5. Unified Access: The Client Module provides a single interface for application programs,
abstracting the complexities of managing directories, UFIDs, and network communication.

In summary, this file service architecture effectively divides responsibilities to provide a


comprehensive and efficient distributed file management system, supporting both flexibility and
scalability for a variety of client and server setups.

Peer-to-peer (P2P) systems


Peer-to-peer (P2P) systems are decentralized networks where participants (peers) share resources
directly with each other, rather than relying on a centralized server. Unlike traditional client-server
architectures, where servers provide resources and clients consume them, P2P networks distribute
both the resource-sharing and computing responsibilities among all participants. This approach
enables P2P networks to scale efficiently and makes them resilient to single points of failure, which
are typical in centralized systems.

Key Characteristics of Peer-to-Peer Systems

1.Decentralization:

In P2P systems, there is no central authority or server that manages resources or connections. Each
peer is both a client and a server, capable of initiating requests and fulfilling others' requests.

This decentralized structure improves scalability and fault tolerance, as there’s no single point of
failure.

2.Resource Sharing:

Peers share resources, such as processing power, storage, or bandwidth, directly with one another.
For instance, in file-sharing P2P systems, peers store and share files with others.

Sharing resources at the peer level allows P2P networks to handle high volumes of traffic and
accommodate large numbers of users.

3.Scalability:

P2P systems are inherently scalable because as more peers join, they bring additional resources,
helping the network grow organically.

The network’s ability to handle traffic increases with the number of active peers.

4.Self-Organization:

Peers autonomously join and leave the network without requiring permission or coordination from
a central server. This characteristic makes P2P networks robust and adaptable.

5.Fault Tolerance:

P2P networks can remain operational even if several peers fail, as there is no central point of
dependency. The network dynamically adapts by rerouting tasks or data through active peers.
Redundant copies of data or resources are often distributed across multiple peers, ensuring
availability even when some nodes go offline.

Types of Peer-to-Peer Systems

1.Pure P2P Systems:

In pure P2P systems, all peers have equal roles, and there is no central authority or hierarchy.

Examples include early file-sharing networks like Napster and Gnutella.

2.Hybrid P2P Systems:

Hybrid P2P systems use a combination of P2P and client-server architectures. Some peers or servers
may have more control, often to handle specific tasks like indexing or authentication.

Examples include BitTorrent, where "tracker" servers help locate peers but do not participate in file
sharing.

3.Structured vs. Unstructured P2P Systems:

Structured: Peers follow specific protocols to organize themselves and manage resources. For
example, Distributed Hash Tables (DHTs) assign data locations based on hash functions, allowing
efficient searches.

Unstructured: Peers connect randomly or semi-randomly, without any fixed organization. While this
makes them easier to set up, they can be inefficient in finding specific resources.

Applications of Peer-to-Peer Systems

P2P systems have been applied across various domains, including:

1.File Sharing:

P2P technology is widely used in file-sharing systems, such as BitTorrent, where users share files by
downloading and uploading file chunks to each other.

2.Distributed Computing:

In systems like SETI@home, peers contribute their processing power to perform computational
tasks, aggregating resources from millions of users.

3.Content Distribution:

Peer-assisted delivery, used by streaming services or large-scale content distribution networks


(CDNs), allows users to download content directly from other viewers, reducing server load.

4.Blockchain and Cryptocurrencies:

Blockchain-based cryptocurrencies like Bitcoin and Ethereum use P2P networks to verify transactions
and maintain a distributed ledger. Each peer in the network validates transactions and helps maintain
the security and integrity of the ledger.
5.Communication and Social Media:

P2P technology can be used for secure messaging, voice calls, and video calls, where messages travel
directly between users instead of through central servers. Applications like Skype initially used a P2P-
based architecture.

Advantages of Peer-to-Peer Systems

1.Scalability:

Because resources increase as more peers join, P2P networks are naturally scalable. This contrasts
with client-server systems, where more users increase the load on centralized servers.

2.Cost Efficiency:

P2P networks reduce infrastructure costs as there’s no need for powerful central servers. Each peer
provides some storage, processing, or bandwidth.

3.Resilience and Availability:

P2P systems are resilient to failures. If some peers go offline, the network can still function, as
resources are spread across multiple nodes.

4.Anonymity and Privacy:

Many P2P networks allow users to connect and share resources directly, without revealing their
identity to a central authority. This can provide a level of anonymity.

Challenges of Peer-to-Peer Systems

1.Security:

P2P systems are vulnerable to attacks like data corruption, poisoning, and Sybil attacks, where
malicious users create multiple fake identities to manipulate the network.

2.Data Integrity:

Without central oversight, ensuring data integrity and authenticity can be challenging. Malicious
peers may introduce corrupted or fake data.

3.Legal Issues:

Due to the decentralized nature of file-sharing, P2P networks are often associated with piracy and
copyright infringement.

4.Management and Coordination:

Since peers can join and leave at will, coordinating and managing resources in a P2P network is
complex, especially for unstructured systems.

Peer-to-peer systems offer a powerful, decentralized approach to resource sharing, capable of


supporting large and resilient networks. Their ability to scale organically and adapt dynamically
makes them suitable for applications ranging from file sharing to distributed computing and
blockchain. However, they also pose challenges in terms of security, data integrity, and legal
compliance, requiring careful consideration in their design and implementation.
Napster
Napster was one of the first major peer-to-peer (P2P) file-sharing systems, introduced in 1999 by
Shawn Fanning. It allowed users to share music files over the Internet, igniting a massive cultural
shift in how music was distributed and consumed. At its peak, Napster attracted millions of users
and transformed the music industry by making digital music widely accessible. Despite its impact,
Napster’s centralized indexing approach and legal challenges led to its shutdown, but it paved the
way for future P2P systems and innovations in content distribution.

How Napster Worked

Napster’s architecture blended a centralized and decentralized approach. Users’ computers, or


"peers," stored and shared files, while Napster maintained a centralized server to index the location
of these files.

The operational flow included:

1. File Location Request: A user searching for a song would send a request to Napster's
centralized server.

2. List of Peers: The server returned a list of peers who had the requested file, identified by their
IP addresses.

3. File Request: The client then directly contacted one of these peers to request the file.

4. File Delivered: The peer with the file sent it directly to the requesting client.

5. Index Update: Each user updated Napster’s index with the available files on their computer.

This combination of central indexing and decentralized file hosting allowed Napster to scale quickly,
enabling massive simultaneous file-sharing activity.

Legacy and Impact of Napster

1.Introduction of Large-Scale P2P Networks:

Napster demonstrated that a decentralized network could leverage the resources of ordinary
Internet users to create a large-scale information-sharing service.
This approach showed the power of distributing storage and computing across multiple nodes,
laying the foundation for future P2P systems like BitTorrent.

2.Legal and Ethical Implications:

Napster’s reliance on copyrighted music files led to extensive legal battles with the music industry,
culminating in a court-ordered shutdown in 2001.

The case highlighted intellectual property issues in the digital age, influencing copyright laws and
leading to the development of legal digital music services like iTunes and Spotify.

3.Architectural Lessons:

Napster’s use of a centralized index showed both the strengths and limitations of this approach.
While it made file location efficient, it also created a single point of failure.

Later P2P systems adopted fully decentralized indexing (e.g., Distributed Hash Tables in BitTorrent)
to improve resilience and remove reliance on a single server.

4.Anonymity and Decentralization:

Napster’s structure offered minimal anonymity since the centralized index tracked file locations. This
spurred interest in anonymity for P2P systems, leading to projects like Freenet, which emphasize
privacy and resilience against censorship.

5.Application-Specific Design:

Napster benefited from application-specific properties: music files were static (no updates), and
temporary unavailability was acceptable since users could download files later. This simplicity
facilitated scalability but was not suitable for applications requiring strict data consistency.

Napster’s Influence on Future Systems

Napster’s popularity and scalability influenced future P2P architectures, such as:

• Freenet and FreeHaven: P2P systems focusing on anonymous and censorship-resistant file
storage.

• BitTorrent: Decentralized file-sharing with a focus on efficient resource distribution and load
management.

• Blockchain: Decentralized ledgers that rely on distributed networks, where each peer plays a
role in maintaining data integrity.

Napster’s blend of centralized indexing and decentralized storage allowed it to scale rapidly,
transforming digital media distribution and inspiring future innovations in P2P networking. Despite
its legal challenges and eventual shutdown, Napster’s impact on digital culture, technology, and the
music industry remains profound.
Peer-to-peer (P2P) middleware
Peer-to-peer (P2P) middleware is a specialized software layer that enables the efficient creation and
management of distributed applications across a network of hosts, with the primary purpose of
allowing clients to locate and access data resources quickly, regardless of where they are in the
network. Unlike traditional client-server systems, P2P systems use decentralized approaches,
distributing both the workload and data storage across multiple participating nodes or "peers."

Functional Requirements of Peer-to-Peer Middleware

1.Resource Location and Communication: P2P middleware must enable clients to locate and
interact with any resource distributed across the network, even though these resources may be
widely spread among nodes.

2.Dynamic Addition and Removal of Resources: Resources and hosts can join or leave the network
dynamically. Middleware must handle these changes without manual reconfiguration.

3.Simplified Programming Interface: The middleware should provide a simple API, allowing
developers to interact with distributed resources without needing to understand the complexities of
the underlying network.

Non-functional Requirements

To achieve high performance, P2P middleware must address several critical non-functional aspects:

1.Global Scalability:

P2P systems aim to leverage the hardware resources of a vast number of Internet-connected hosts,
often reaching thousands or millions of nodes.

The middleware must support this scale by efficiently managing resource discovery and distribution.

2.Load Balancing:

Efficient resource usage across nodes depends on an even workload distribution. Middleware should
ensure random placement of resources and replication for heavily used files to prevent overloading
individual nodes.

3.Optimization for Local Interactions:

Minimizing the "network distance" (latency) between interacting nodes helps reduce delays in
resource access and lowers network traffic.

Middleware should ideally place frequently accessed resources closer to the requesting nodes.

4.Dynamic Host Availability:

P2P networks operate with nodes that can join or leave at any time, a factor driven by the fact that
hosts in P2P systems are not centrally managed and may face connectivity issues.

Middleware should automatically detect and adjust to the addition or removal of nodes,
redistributing load and resources accordingly.
5.Data Security in a Heterogeneous Trust Environment:

Given the varied ownership and potential lack of trust between nodes, security is a major concern.
Middleware should employ authentication, encryption, and other mechanisms to maintain data
integrity and privacy.

6.Anonymity, Deniability, and Resistance to Censorship:

Middleware should provide anonymity for users and enable hosts to plausibly deny responsibility
for holding or supplying certain data. This is crucial in resisting censorship and protecting user
privacy.

Architectural Challenges

Given the need for scalability and high availability, maintaining a unified database of resource
locations across all nodes is impractical. Instead, P2P middleware relies on:

Partitioned and Distributed Indexing: Knowledge of data locations is divided across the network,
with each node managing a portion of the namespace (a segment of the global resource directory).

Topology Awareness: Nodes maintain limited knowledge of the overall network topology,
enhancing both efficiency and fault tolerance.

Replication: High replication levels (e.g., 16 copies of data) ensure system resilience against node
unavailability or network disruptions.

Examples and Legacy

First-generation P2P systems like Napster used a centralized index, whereas second-generation
systems like Gnutella and Freenet moved to fully decentralized approaches. The evolution of P2P
middleware addresses the challenges of dynamic, large-scale networks by distributing control and
data management across multiple nodes, forming the foundation for modern P2P applications such
as file-sharing platforms, decentralized storage solutions, and even blockchain technologies.
Routing overlays
Routing overlays are a foundational concept in peer-to-peer (P2P) systems and distributed
networks, providing a mechanism for locating and retrieving data in a network where data is
decentralized and spread across multiple nodes (peers). Routing overlays manage how nodes
communicate with each other and how resources or data are located across the network without
relying on a central server. They provide efficient paths for data requests and responses, even as
nodes frequently join or leave the network.

Types of Routing Overlays

There are two main types of routing overlays:

structured and unstructured.

Structured Overlays

Structured overlays use specific algorithms to maintain a strict topology and organize data. This
structure enables efficient, deterministic routing.

Chord: Organizes nodes in a circular ring. Each node is responsible for a specific range of data, and
data lookup is achieved in O(log⁡N)O(\log N)O(logN) hops, where NNN is the number of nodes.

Pastry: Routes requests by progressively finding nodes with closer IDs, organizing nodes in a circular
ID space and assigning a unique numeric identifier to each node.

Tapestry: Uses a similar approach to Pastry, routing based on the "prefix" of the destination node’s
ID, with a focus on optimizing proximity between nodes.

Kademlia: Relies on XOR-based distance between node IDs, allowing nodes to find data based on
shortest logical distance. Known for its robustness and used by BitTorrent.

Unstructured Overlays

Unstructured overlays do not impose a strict topology on the network, allowing nodes to join and
leave freely. These systems are generally more resilient but may require more resources to locate
data.
Flooding and Random Walks: Nodes in unstructured overlays often use methods like flooding
(broadcasting requests to all neighbors) or random walks (sending requests to random neighbors)
to locate data.

Examples: Early Gnutella networks and Freenet, which lack structured routing and rely on
broadcasting or searching through known nodes.

Operation of Routing Overlays

The process of finding data in a routing overlay typically involves:

1.Joining the Network:

When a node joins, it is assigned a unique identifier and is introduced to the overlay structure by
connecting to existing nodes.

In structured overlays, the new node is integrated into the routing structure and takes responsibility
for a specific range of data keys.

2.Routing and Data Lookup:

When a node requests data, it routes the request through the overlay structure. Using its knowledge
of neighbor nodes, it forwards the request closer to the node responsible for the data key.

Structured overlays typically have predictable routing paths, achieving lookups within O(log N) hops.

3.Handling Node Failures:

Routing overlays use redundancy and replication to handle node failures. For example, each data
item might be replicated on multiple nodes, or alternative paths can be used if the primary path
fails.

Nodes periodically update their neighbor information to account for changes in the network.

Advantages of Routing Overlays


Scalability: Routing overlays are designed to efficiently handle large, decentralized networks,
making them suitable for applications with millions of users or devices.

Fault Tolerance: By distributing data and routing paths, routing overlays can remain operational
despite node churn and network failures.
Load Distribution: Many routing overlays use DHTs to ensure that data is evenly distributed across
nodes, preventing hotspots and balancing the load.

Challenges in Routing Overlays


Node Churn: Frequent joining and leaving of nodes can affect routing efficiency and data
availability, especially in unstructured networks.

Latency and Network Distance: Although DHTs provide efficient routing, network distance can still
introduce delays. Routing overlays often attempt to optimize for local interactions to minimize
latency.

Security and Anonymity: Ensuring data integrity, preventing malicious nodes from disrupting
routing, and protecting user privacy are ongoing challenges in decentralized networks.

Applications of Routing Overlays


Routing overlays are foundational for many decentralized applications:

File Sharing (e.g., BitTorrent): Distributes file chunks across peers, where Kademlia-based DHT
helps locate peers with required chunks.

Blockchain and Distributed Ledgers: Nodes use routing overlays to share transaction data and
synchronize distributed ledgers.

Content Distribution Networks (CDNs): Nodes cache and route content, improving latency and
load distribution.

Decentralized Web: Projects like IPFS (InterPlanetary File System) use DHTs to locate and retrieve
content in a decentralized manner.

Routing overlays are essential for decentralized, scalable, and resilient data access in distributed
networks. By providing structured or unstructured mechanisms for data location, they enable
efficient resource sharing in P2P networks, handling challenges like node churn, load balancing, and
network distance. Routing overlays continue to play a critical role in P2P systems and inspire new
decentralized technologies, from file-sharing applications to blockchain networks.
Coordination and Agreement
Coordination and Agreement are crucial aspects in distributed systems, where multiple nodes (or
processes) work together to achieve common objectives. Due to the lack of a single centralized
control in distributed systems, these nodes must coordinate to reach a consensus on shared data,
task assignments, or the order of operations. This coordination ensures consistency, reliability, and
correctness in the presence of network delays, faults, or varying speeds of different nodes.

Importance of Coordination and Agreement in Distributed Systems

1. Consistency: Distributed systems often require data consistency across nodes. For example,
in a distributed database, if multiple nodes are updating or reading shared data, they must
agree on a consistent view of that data.

2. Fault Tolerance: Distributed systems are prone to partial failures. Nodes may crash, networks
may partition, or messages may be delayed or lost. Agreement protocols help ensure that the
system can handle these failures without losing coherence.

3. Order of Operations: In distributed systems, the order in which operations are executed can
significantly impact the final outcome. For example, in a banking application, the order of
transactions matters for account balances. Agreement protocols help nodes maintain a
consistent order for operations.

4. Load Distribution: Coordination is needed for efficient distribution of tasks and resources
across nodes. Agreement on task allocation helps balance the load, prevent redundant work,
and ensure that all nodes are used effectively.

Key Concepts in Coordination and Agreement

1. Consensus: The fundamental problem in distributed systems is achieving consensus — a


situation where all participating nodes agree on a single value or decision. Consensus
algorithms allow nodes to agree on the same data values or operations despite failures or
network delays.

2. Mutual Exclusion: In some cases, only one node should access a resource at a time. Mutual
exclusion protocols allow nodes to coordinate and ensure that only one node can access a
resource or perform a specific operation at any given time.

3. Leader Election: In distributed systems, it is often necessary to have one node act as a
coordinator (leader) to make decisions on behalf of others. Leader election algorithms help
nodes decide which among them should take on this role.

4. Atomic Commitment: In transactions that span multiple nodes, atomic commitment ensures
that all nodes agree on whether to commit or abort a transaction. If any part of the transaction
fails, the entire transaction should be rolled back to maintain consistency.

5. Fault Models: Distributed systems must handle various types of faults, including:

o Crash faults: Nodes stop functioning.

o Omission faults: Messages or actions are lost.


o Byzantine faults: Nodes exhibit arbitrary or malicious behavior.

6. Quorums: Quorum-based methods are often used to ensure consistency. By requiring that a
certain number of nodes (a quorum) agree on a value before it is accepted, systems can
achieve a level of fault tolerance and consistency.

Coordination and Agreement Protocols

Several protocols and algorithms help achieve coordination and agreement in distributed systems.
These include:

1.Two-Phase Commit (2PC):

Used in distributed transactions to achieve atomic commitment.

In the first phase (prepare phase), the coordinator node asks all participants if they can commit. If all
agree, the transaction proceeds to the second phase (commit phase) where all nodes commit.

2PC is blocking and can leave the system waiting indefinitely if the coordinator crashes.

2.Three-Phase Commit (3PC):

An extension of 2PC designed to address the blocking problem.

3PC adds an additional phase to ensure that even if the coordinator crashes, nodes can reach a
consistent decision without being left in an uncertain state.

3.Paxos Algorithm:

A consensus algorithm designed for distributed systems to achieve agreement on a single value.

Paxos is fault-tolerant, handling node failures and network partitions, and is widely used in
applications requiring high availability.

4.Raft Algorithm:

A consensus algorithm that is easier to understand than Paxos and achieves similar goals.

Raft elects a leader among nodes, and all changes to the distributed state are handled through this
leader. If the leader fails, a new one is elected.

5.Byzantine Fault Tolerance (BFT):

Byzantine Fault Tolerance algorithms are designed to handle Byzantine faults, where nodes may act
maliciously or arbitrarily.

Practical Byzantine Fault Tolerance (PBFT) is an example, used in systems where a high level of
reliability and security is required, such as blockchain networks.

6.Vector Clocks and Lamport Timestamps:

Used for ordering events in a distributed system.

Lamport timestamps provide a logical clock to order events without requiring synchronized physical
clocks, which helps achieve a consistent view of events across nodes.

Challenges in Coordination and Agreement


1. Network Partitions: Distributed systems often experience network partitions, where groups
of nodes become isolated. This makes it challenging to achieve global consensus.

2. Fault Tolerance vs. Consistency: Achieving agreement often requires trading off between
consistency, availability, and partition tolerance (CAP theorem). Consensus protocols typically
favor consistency over availability in case of network partitions.

3. Latency: Communication delays can cause inconsistencies and make agreement difficult to
reach quickly, especially in geographically distributed systems.

4. Scalability: As the number of nodes increases, achieving agreement becomes more complex
and time-consuming. Protocols must be designed to scale efficiently.

5. Security and Trust: In some distributed systems, nodes may act maliciously. Byzantine fault-
tolerant protocols are necessary in such environments, but they are complex and resource-
intensive.

Applications of Coordination and Agreement

• Distributed Databases: Ensure data consistency and transaction atomicity.


• Blockchain and Cryptocurrencies: Use consensus protocols (e.g., Proof of Work, PBFT) to
validate transactions and secure the network.
• Cloud Services and Microservices: Achieve fault tolerance and data consistency across
distributed instances.
• Telecommunications Networks: Manage network resources and ensure reliable data
delivery.

Coordination and agreement are essential for ensuring that distributed systems operate reliably
and consistently. Through consensus algorithms, fault tolerance mechanisms, and protocols for
resource allocation, distributed systems can achieve coordination even in complex environments.
The field continues to evolve, addressing challenges of scalability, security, and efficiency, with
newer algorithms and models like Raft, Paxos, and BFT being applied to large-scale applications
across industries.

Distributed Mutual Exclusion


Distributed Mutual Exclusion is a fundamental problem in distributed systems where multiple
nodes (or processes) need to coordinate access to shared resources in such a way that only one
node can access the resource at a time. This is critical in distributed environments where there is no
centralized authority, yet there is a need to prevent race conditions and ensure data consistency.

In centralized systems, mutual exclusion can be achieved with locking mechanisms. However, in
distributed systems, the lack of a single coordinating process and the unreliable nature of network
communication add complexity to the problem.
Key Requirements for Distributed Mutual Exclusion

1. Mutual Exclusion: Only one node should be allowed to enter the critical section (access the
shared resource) at any given time.

2. Freedom from Deadlock: The system should prevent situations where two or more nodes
wait indefinitely for each other to release the critical section.

3. Freedom from Starvation: Every request to enter the critical section should eventually be
granted to avoid indefinite waiting.

4. Fault Tolerance: The system should continue to function even if some nodes or network
connections fail.

5. Fairness: Requests should be granted in the order they are made, or based on some fairness
criterion, to prevent any node from having an unfair advantage.

Algorithms for Distributed Mutual Exclusion

Several algorithms have been developed to address distributed mutual exclusion, each with its own
advantages and trade-offs. Here are some widely used approaches:

1. Centralized Algorithm

• In the centralized approach, a single node acts as a coordinator.

• When a node wants to enter the critical section, it sends a request to the coordinator.

• The coordinator grants the request if no other node is in the critical section. Otherwise, it
queues the request.

• Pros: Simple to implement, with a low message overhead.

• Cons: The coordinator is a single point of failure and may become a performance bottleneck
in large systems.

2. Ring-Based Algorithm
• Structure: Nodes are arranged in a logical ring topology, where each node is connected to
two neighboring nodes, forming a circular structure. There is no central coordinator, and
communication is typically unidirectional.

• Token Passing: A token (a special control message) circulates around the ring. A node must
possess the token to access the shared resource. When a node wants to access the resource,
it waits for the token, processes its request, and then passes the token to the next node in
the ring.

• Pros:

o Simplicity: The algorithm is relatively simple to implement, as it relies on


straightforward token-passing semantics, making it easy for nodes to manage access
to shared resources.

o Fairness: Each node gets an equal opportunity to access the resource, as the token is
passed in a circular manner, ensuring no node is starved of access.

• Cons:

o Message Overhead: The need to pass the token around the entire ring can lead to
increased message overhead, especially if the number of nodes is large or if the token
is far from the requesting node.

o Single Point of Failure: If the token is lost or if a node fails while holding the token,
the entire system can be disrupted, requiring mechanisms for token recovery or
regeneration.

o Latency: The time it takes for the token to circulate can lead to higher latency,
particularly if nodes are far apart in the ring.
Elections in Coordination and Agreement in Distributed Systems

Elections in the context of coordination and agreement in distributed systems are crucial for
achieving consensus among multiple nodes, especially when determining a leader or coordinator
for tasks such as resource management, decision-making, and fault tolerance. Below is an overview
of the election process and its significance in distributed systems:

Purpose: Elections are used to select a coordinator or leader node among a set of distributed nodes.
The selected leader is responsible for coordinating activities, managing resources, and ensuring that
consensus is reached among the nodes.

Common Election Algorithms:

1.Bully Algorithm:

Mechanism: Each node has a unique identifier. When a node wants to initiate an election, it sends
an election message to nodes with higher identifiers. If no higher identifier responds, it declares
itself the leader.

Pros: Simple and straightforward; ensures a single leader is elected.

Cons: High message overhead and can be inefficient in large systems due to multiple messages
being sent.

2.Ring Algorithm:

Mechanism: Nodes are arranged in a logical ring. When a node wants to start an election, it sends
a message containing its identifier around the ring. Nodes update the message with their identifiers
and forward it until it returns to the initiating node, which then selects the highest identifier as the
leader.

Pros: Reduces message complexity compared to the Bully Algorithm; efficient for systems with many
nodes.

Cons: Can have higher latency due to the message needing to circulate the entire ring.

3.Paxos Algorithm:

Mechanism: Involves multiple phases where nodes propose values, agree on a value, and commit
to that value. A leader is elected based on the highest proposal number.

Pros: Highly fault-tolerant and can handle network partitions.

Cons: Complex to implement and understand; can have performance bottlenecks.

4.Raft Consensus Algorithm:

Mechanism: Organizes nodes into a leader and followers. The leader handles all client requests, and
elections occur if the leader fails. Nodes vote for candidates based on log completeness.

Pros: Easier to understand and implement than Paxos; provides strong consistency guarantees.

Cons: Requires more stable leader election processes, which can be a challenge in dynamic
environments.
Challenges in Election:

• Fault Tolerance: The system must be able to handle node failures gracefully. Elections should
ensure that a new leader can be elected if the current leader fails.
• Network Partitions: In the case of network splits, different parts of the system may elect
different leaders. The system must reconcile these states when the network is restored.
• Scalability: As the number of nodes increases, the election process must remain efficient to
prevent performance degradation.

Applications:

• Distributed Databases: To ensure consistency and coordination of transactions.


• Resource Management: To allocate resources fairly among nodes.
• Fault Recovery: To re-establish coordination after a failure.

Elections play a vital role in achieving coordination and agreement in distributed systems. By
selecting a leader, the system can effectively manage resources, ensure consistent state across
nodes, and provide fault tolerance. The choice of election algorithm can significantly impact the
system's performance, scalability, and resilience. Understanding these dynamics is essential for
designing robust distributed applications.

Multicast communication
The term “multicast” refers to a method of sending a single message to a large group of people and
a tool to make the most of available network bandwidth while conserving system resources. Also,
we can say that Multicast communication is a type of technique that transfers packets from one
source to many receivers simultaneously.

We also use many applications in our daily life, like Audio/Video Conferencing, Online Gaming, IPTV,
etc. The best part is that all these applications work on multicast communication.

Multicasting is regarded as a special kind of broadcasting. Similar to broadcasting in operation,


multicasting sends information to selected or targeted network participants. This task can be
completed by sending separate copies to each user or network node, but doing so is wasteful and
could result in an increase in network latency. Multicasting, which allows a single transmission to be
split up among numerous users to address these drawbacks, therefore limits the signal’s capacity.

Ethernet Multicast

Ethernet multicast signifies the process of multicasting at the data link layer in Ethernet networks.
Ethernet Frames are sent to a group of destination devices that shares a common multicast address.
By setting the least significant bit of the first byte of the destination address to 1 these frames are
identified, differentiating them from unicast and broadcast frames.

IP Multicast

IP multicast is a communication method that allows one-to-many communication over an IP


network. The sender sends a single packet that is replicated by routers in the network to reach
multiple receivers, instead of sending multiple individual unicast packets to each recipient. It helps
in reducing Network traffic and conserves bandwidth. Again, by sending join and leave messages to
the network routers the destination nodes indicate their interest in receiving multicast messages.
This way, the sender only needs to transmit the data packet once, and the routers take care of
replicating and delivering it to the intended recipients.

A multicast group consists of all hosts that have been configured to receive packets on a specific
address.

Multicast Groups

When a host is configured to receive datagrams sent to a multicast, it is added to the multicast
group for that address.

One to an unlimited number of hosts comes under a group. The list of individual group members is
neither maintained by the host nor by routers.

A host can belong to several multicast groups and send multicast messages to various multicast
addresses.

A host can send datagrams to a multicast group address even though there are no members present
in that group, and a host doesn’t need to be a member of a group to send multicast datagrams to
that group.

Note: Multicast packets are routed through switches.

Multicasting on Internet
router will check to see if any hosts on a locally connected network are set up to accept multicast
datagrams by using IGMP (Internet Group Management Protocol).

On the local subnet on a regular basis router will listen to IGMP messages and send queries. By using
the multicast group address 224.0.0.1 (reserved for all hosts).

Multicast routers do not keep a record of which hosts are members of a group but only need to
know if any hosts on that subnet are part of a group.

If a router gets a multicast datagram from another network and does not have any members for that
group address on any of its subnets, the packet is dropped.
Unit IV Part-II
Transactions and Replications in Distributed Systems: An Introduction

In distributed systems, transactions and replication are two fundamental concepts that enhance the
reliability, consistency, and availability of data across multiple nodes. They address challenges posed
by network latency, node failures, and concurrent access to shared resources.

Transactions

A transaction is a sequence of operations performed as a single logical unit of work. Transactions


are designed to ensure data integrity, consistency, and durability in the presence of failures. They
are governed by the ACID properties:

• Atomicity: Transactions are all-or-nothing. If any part of the transaction fails, the entire
transaction is aborted, and changes are rolled back.

• Consistency: A transaction brings the system from one valid state to another, maintaining
database invariants.

• Isolation: Transactions execute independently, and the intermediate states of a transaction


are invisible to other transactions.

• Durability: Once a transaction is committed, its changes are permanent, even in the case of
a system failure.

Importance

• Data Integrity: Ensures that data remains accurate and consistent across distributed nodes.

• Concurrency Control: Manages simultaneous access to data by multiple transactions,


preventing conflicts and ensuring isolation.

• Failure Recovery: Provides mechanisms to recover from failures, maintaining a consistent


state.

Replication

Replication involves creating copies of data across multiple nodes in a distributed system. The
primary goal of replication is to enhance data availability, fault tolerance, and performance by
distributing the data across various locations.

Types of Replication

1.Synchronous Replication:

Updates are made to all replicas simultaneously. This ensures strong consistency but can introduce
latency, as the system must wait for all replicas to acknowledge the update.

2.Asynchronous Replication:

Updates are propagated to replicas after the primary operation completes. This improves
performance and reduces latency but may lead to temporary inconsistencies among replicas.

3.Quorum-Based Replication:
Requires a majority (or quorum) of replicas to agree on an update before it is considered committed.
This balances consistency and availability.

Importance

• Fault Tolerance: If one node fails, other replicas can continue to provide access to data,
ensuring system resilience.

• Load Balancing: Distributing read operations among multiple replicas can enhance
performance and reduce the load on any single node.

• High Availability: Replication helps ensure that data is always accessible, even in the event
of node failures or network partitions.

Interaction Between Transactions and Replication

• Consistency Models: When transactions are used in conjunction with replication, consistency
models (like eventual consistency, strong consistency, and causal consistency) dictate how
replicas synchronize and maintain data integrity.

• Conflict Resolution: In replicated environments, concurrent transactions may lead to


conflicts. Mechanisms such as versioning, timestamps, and distributed locking are employed
to resolve these conflicts.

• Distributed Transactions: Transactions that span multiple nodes introduce additional


complexity, requiring coordination protocols (like Two-Phase Commit) to ensure that all
participating nodes either commit or abort the transaction.

Transactions and replication are essential components of distributed systems, enabling reliable and
consistent data management across multiple nodes. By understanding and effectively implementing
these concepts, distributed systems can achieve higher levels of availability, fault tolerance, and
performance. As distributed applications become more prevalent, the need for robust transaction
management and effective replication strategies will continue to grow.

System Model:
The system model for managing replicated data in distributed systems consists of clients, front
ends, and replica managers. Clients make requests to access or modify data, which are handled by
front ends that act as intermediaries. The front ends send these requests to replica managers, which
store the actual copies of the data. Replica managers communicate with each other to keep the data
consistent across all replicas. This model is designed to ensure data availability, consistency, and
fault tolerance in distributed environments.

Role of Group Communication:


Group communication is essential in replicated data systems as it enables synchronization and
consistency across replicas. It allows replica managers to send updates to each other through
broadcast or multicast methods, ensuring that changes made in one replica are propagated to all
others. This communication model helps maintain data consistency, provides fault tolerance by
managing failures within replica managers, and enforces operation ordering to prevent conflicts.
Group communication ensures that all replicas remain coordinated, reliable, and up-to-date in
distributed systems.

This system model is designed to ensure data consistency across multiple copies, or "replicas," stored
in different locations. It typically consists of three main components:

• Clients (C): The clients are the entities that make requests to the system to either retrieve or
update data. They don’t interact directly with the replica managers but instead go through
an intermediary.

• Front Ends (FE): The front ends act as intermediaries between the clients and the replica
managers. They receive requests from clients, forward these requests to the appropriate
replica managers, and relay the responses back to the clients. Front ends play a crucial role in
ensuring requests are handled efficiently and that clients receive consistent data.

• Replica Managers (RM): The replica managers are responsible for storing the actual copies
(replicas) of the data. They communicate with each other to keep their replicas synchronized.
In case of updates, each replica manager must ensure that the change is propagated to other
replica managers to maintain data consistency. The network of replica managers is called the
"Service."

Group communication is fundamental to managing replicated data in distributed systems. Here’s


how it works and why it’s essential:

• Ensures Consistency: When an update is made to a replica, group communication protocols


ensure that all replica managers receive the update. This is typically achieved through
techniques like broadcast or multicast, where messages are sent to multiple replica managers
simultaneously.

• Failure Tolerance: Group communication helps to manage the failure of individual replica
managers. If one replica manager becomes unreachable, the others can still continue
operating, ensuring the system remains available to serve client requests. Additionally, group
communication can help detect failures and redistribute requests to healthy replicas.
• Ordering of Operations: In distributed systems, the order in which updates are applied
across replicas is crucial for consistency. Group communication protocols often enforce
ordering guarantees (like total order or causal order), ensuring that all replicas process
updates in the same sequence, which prevents inconsistencies from arising due to out-of-
order updates.

• Coordination and Synchronization: When multiple clients try to update the same data
simultaneously, the replica managers need to coordinate to determine the final state of the
data. Group communication protocols help in achieving this by enabling synchronized
communication between replica managers, so they can agree on the update to be applied.

These system model uses group communication to enable efficient and consistent data replication,
ensuring that multiple copies of data remain synchronized and resilient to failures. Group
communication protocols provide the necessary infrastructure for replication, data consistency, and
fault tolerance across distributed replica managers.

Concurrency control in distributed transactions


Concurrency control in distributed transactions ensures that multiple transactions can occur
simultaneously in a distributed system without leading to inconsistencies in the data. Here’s an
overview of the primary methods used for concurrency control:

1. Locking

Locking is a widely used technique to control access to data in distributed transactions by restricting
simultaneous access to resources.

Mechanism: In a locking mechanism, a transaction must acquire a lock on the data it wants to read
or write. There are typically two types of locks:

• Shared Lock (Read Lock): Multiple transactions can hold a shared lock on a resource,
allowing them to read but not write to it.
• Exclusive Lock (Write Lock): Only one transaction can hold an exclusive lock on a resource,
preventing other transactions from reading or writing to it.

Two-Phase Locking (2PL): To prevent conflicts, many distributed systems use the two-phase
locking protocol, which has two phases:

• Growing Phase: The transaction acquires all the locks it needs.


• Shrinking Phase: After a transaction releases a lock, it cannot acquire any more locks.

Benefits and Limitations: Locking provides strong consistency but can lead to deadlocks (where
two or more transactions are waiting for each other’s locks) and reduced concurrency due to lock
contention.
2. Timestamp Ordering Concurrency Control

Timestamp ordering is a non-locking concurrency control technique that uses timestamps to


manage access to data.

Mechanism: Each transaction is assigned a unique timestamp when it starts. Transactions are
ordered based on their timestamps, and operations are executed according to this order:

If a transaction tries to read or write data that another newer transaction has accessed, it may be
aborted and restarted to maintain timestamp order.

Types:

Basic Timestamp Ordering: Ensures that conflicting operations are executed according to the
timestamp order of transactions.

Multiversion Timestamp Ordering (MVTO): Keeps multiple versions of a data item, each
associated with the timestamp of the transaction that created it. This allows transactions to read
different versions of the same data, increasing concurrency.

Benefits and Limitations: Timestamp ordering reduces the risk of deadlocks and increases
concurrency. However, it may lead to frequent transaction restarts if timestamps conflict,
especially in high-contention environments.

3. Optimistic Concurrency Control (OCC)

Optimistic concurrency control assumes conflicts are rare and allows transactions to execute without
restrictions until they are ready to commit.

Mechanism: OCC divides a transaction into three phases:

• Read Phase: The transaction reads data items without acquiring any locks.
• Validation Phase: Before committing, the transaction checks if other concurrent transactions
have modified the data it has read. If conflicts are detected, the transaction is aborted and
restarted.
• Write Phase: If the transaction passes validation, it writes its changes to the database.

Benefits and Limitations: OCC is beneficial in environments with low contention, as it allows high
concurrency without the need for locks. However, in systems with high contention, OCC can lead to
frequent rollbacks, as transactions may fail validation more often.
Distributed deadlock
Distributed deadlock occurs in a distributed system when two or more transactions or processes,
running on different servers, wait indefinitely for resources held by each other, forming a circular
dependency. This makes it impossible for any of the transactions involved to proceed, leading to a
system stall.

Types of Distributed Deadlock:

There are two types of Deadlocks in Distributed System:

Resource Deadlock: A resource deadlock occurs when two or more processes wait permanently for
resources held by each other.

• A process that requires certain resources for its execution, and cannot proceed until it has
acquired all those resources.
• It will only proceed to its execution when it has acquired all required resources.
• It can also be represented using AND condition as the process will execute only if it has all
the required resources.
• Example: Process 1 has R1, R2, and requests resources R3. It will not execute if any one of
them is missing. It will proceed only when it acquires all requested resources i.e. R1, R2, and
R3.

Communication Deadlock: On the other hand, a communication deadlock occurs among a set
of processes when they are blocked waiting for messages from other processes in the set in order
to start execution but there are no messages in transit between them. When there are no
messages in transit between any pair of processes in the set, none of the processes will ever
receive a message. This implies that all processes in the set are deadlocked. Communication
deadlocks can be easily modeled by using WFGs to indicate which processes are waiting to
receive messages from which other processes. Hence, the detection of communication deadlocks
can be done in the same manner as that for systems having only one unit of each resource type.
Example of Distributed Deadlock:

Consider three servers X, Y, and Z managing objects A, B, C, and D.

Three transactions U, V, and W are operating on these objects:

• Transaction U running on server X locks object A and wants to access B, which is locked by
V. Transaction V running on server Y locks object B and wants to access C, which is locked
by W.
• Transaction W running on server Z locks object C and wants to access A, which is locked by
U.

Circular Dependency:

U → V → W → U creates a cycle, where:

U waits for V (for B).

V waits for W (for C).

W waits for U (for A).

This cycle indicates a distributed deadlock where none of the transactions can proceed unless
one of them is aborted and releases its lock.

Resolving Distributed Deadlock:

To resolve a distributed deadlock, one of the following methods is used:

Timeouts: Transactions are aborted if they wait too long.

Deadlock Detection: Servers communicate to build a global wait-for graph and detect cycles,
aborting one of the transactions involved in the cycle.
Transaction Recovery in Distributed Systems

In a distributed system, transactions are a sequence of operations that need to be executed in a


consistent, reliable, and fault-tolerant manner. A transaction typically involves multiple nodes or
processes, and these operations must either complete successfully or fail entirely to ensure data
integrity. If a failure occurs during a transaction (e.g., due to network issues, hardware failures, or
crashes), the system must be able to recover from it, ensuring that no partial, inconsistent state is
left behind.

Transaction recovery refers to the process of restoring a system to a consistent state after a
failure during or after the execution of a transaction. It ensures that a transaction either commits
(completes successfully) or aborts (reverts all changes), and that no partial results are visible to
the system.

Transaction Properties: ACID

For a distributed transaction recovery to work, the transaction must adhere to the ACID (Atomicity,
Consistency, Isolation, Durability) properties:

Atomicity: The entire transaction is treated as a single unit. If any part of the transaction fails, the
entire transaction fails.

Consistency: The system transitions from one consistent state to another. Transactions should bring
the system to a valid state according to predefined rules.

Isolation: The effects of one transaction should be isolated from other concurrent transactions until
the transaction is complete.

Durability: Once a transaction is committed, its effects are permanent, even in the event of a crash.

The goal of transaction recovery is to ensure that, despite failures, transactions will always leave the
system in a consistent state, either by successfully committing or by aborting and undoing any
partial changes made.

Types of Failures in Transaction Recovery

System Crash: When the system or a process crashes unexpectedly.

Media Failure: Failure of disk or storage devices where transaction logs and data are stored.

Network Partition: Communication failure between different nodes or components of a distributed


system.

Node Crash: A node in the distributed system crashes, leading to data inconsistency or the
transaction not being fully processed.

Transaction Recovery Techniques

To achieve transaction recovery, several mechanisms and techniques are employed in distributed
systems. These include logging, checkpointing, and two-phase commit protocols. Below are the
primary techniques used for transaction recovery:
1. Write-Ahead Logging (WAL)

Write-Ahead Logging (WAL) ensures that before any changes are made to the actual data, the
changes are first written to a log. This log records the sequence of transaction operations and their
effects. If a failure occurs, the system can use this log to determine which operations were completed
and which were not.

The recovery process involves:

Redoing: Reapplying the changes from the log that were committed but not yet reflected in the
system (i.e., recovering committed transactions).

Undoing: Reverting the changes of uncommitted transactions to ensure the system is consistent.

2. Two-Phase Commit Protocol (2PC)

The Two-Phase Commit Protocol (2PC) is a consensus protocol used to ensure that a transaction
is either fully committed across all participating nodes or fully aborted. It operates in two phases:

Prepare Phase: The coordinator sends a "prepare" message to all participants (nodes), asking them
if they are ready to commit the transaction.

Commit/Abort Phase: If all participants respond with "yes," the coordinator sends a "commit"
message to all participants, and they make the transaction permanent. If any participant responds
with "no," the coordinator sends an "abort" message, and all participants roll back the transaction.

2PC guarantees atomicity but has limitations, such as blocking in case of failures (if the coordinator
or participants crash).

3. Three-Phase Commit Protocol (3PC)

Three-Phase Commit Protocol (3PC) improves upon 2PC by addressing its blocking problem. In
3PC, there is an additional phase after the "prepare" phase, allowing for more robust recovery in the
event of failures.

Phase 1: The coordinator sends a "prepare" message to all participants.

Phase 2: Participants send back an acknowledgment that they are ready to commit.

Phase 3: Once the coordinator receives acknowledgments, it sends a "commit" message to all
participants. If any participant is unable to prepare, it sends an "abort" message.

This protocol reduces blocking by providing a "safe" state after the second phase, preventing
deadlocks due to participant crashes.

4. Compensation or Forward Recovery

In some cases, instead of rolling back a transaction to a previous state (backward recovery),
compensation or forward recovery is used. In forward recovery, once an error is detected, the
system does not revert to an earlier state but instead makes corrective actions to move forward and
maintain consistency.

For instance, if a transaction partially commits and later encounters an error, the system might apply
compensating transactions to neutralize the effects of the failed transaction.
5. Distributed Snapshot and Global State

Distributed Snapshot is a technique used to create a consistent global state of all nodes in a
distributed system at a certain point in time. During recovery, these snapshots can be used to restore
the system’s state to a consistent point in time, undoing any incomplete or erroneous transactions.

This technique is particularly useful in distributed systems with multiple processes that need to be
synchronized to prevent inconsistent states.

Replication

Replication is the process of duplicating data, operations, or services across multiple systems or
locations to improve performance, reliability, and fault tolerance. In the context of distributed
systems or databases, replication ensures that copies of data are consistently maintained across
several nodes, so that even if one node fails, the system can continue functioning normally by relying
on the other copies.

There are two primary types of replication strategies: active replication and passive replication.
Both have distinct characteristics and are used in different scenarios based on system requirements,
such as fault tolerance, consistency, and performance.

Active Replication

Active replication, also known as primary-backup or master-slave replication, involves multiple


replicas (or copies) of a system that are actively involved in processing requests. In this model, all
replicas perform the same operations concurrently and are synchronized in real-time.

Key characteristics of Active Replication:

Concurrency: All replicas are active and handle requests simultaneously, meaning that multiple
replicas are processing operations at the same time.

Fault Tolerance: If one or more replicas fail, the system can continue to operate as long as there are
other replicas that can process the requests. There’s no single point of failure.

Consistency: In active replication, it is crucial that all replicas maintain a consistent state. This is
typically achieved through consensus algorithms or synchronization techniques like Paxos or Raft,
which ensure that all replicas agree on the sequence of operations.

Performance: Active replication can improve performance in read-heavy workloads by distributing


the load across multiple replicas. However, write operations must be coordinated across replicas to
maintain consistency.

Examples of Active Replication:

Distributed databases where read queries can be served by any replica, but write queries must be
synchronized across all replicas to maintain consistency.

Load-balancing systems where requests are distributed across multiple servers, each of which
processes requests concurrently.
2. Passive Replication

In passive replication, also known as primary-backup or master-slave replication, there is typically


one active replica (the primary or master), while the other replicas

(secondary or slave) are passive and act as backups. The passive replicas only take over in case the
primary replica fails.

Key characteristics of Passive Replication:

Single Primary Replica: Only one replica, the "primary," handles all requests and updates to the
system.

Failover Mechanism: If the primary replica fails, one of the passive replicas (usually the most up-
to-date one) is promoted to be the new primary. This failover process ensures that the system can
continue operating.

Consistency and Availability: Passive replication ensures consistency since the primary replica is
the only one making updates to the system. However, it might have lower availability compared to
active replication, as there is downtime during the failover process.

Performance: Passive replication can be more efficient in write-heavy workloads, as only one node
handles write requests, reducing synchronization overhead. However, read performance can be
improved if read requests are distributed to passive replicas.

Examples of Passive Replication:

Database systems where one master database handles all write operations, and read queries can
be distributed to read-only replicas.

Distributed file systems where there is a master node for updates, but backup nodes (passive
replicas) store copies of the data.

Comparison Between Active and Passive Replication:

Aspect Active Replication Passive Replication

Replica Activity All replicas are active and process Only one replica (primary) processes
requests requests, others are passive

Fault Tolerance High; can tolerate multiple failures Lower; only one replica is active, so
failover is necessary

Consistency Achieved through coordination (e.g., Easier to maintain as only the primary
consensus algorithms) replica updates data

Performance Good for read-heavy workloads, but Good for write-heavy workloads, but
write synchronization can be slow failover can introduce delays
Availability High availability even during failures Availability may drop during failover if
the primary fails

Use Cases Distributed databases, load Master-slave databases, backup


balancing, cloud services systems

You might also like