0% found this document useful (0 votes)

11 views24 pages

Unit 4

Uploaded by

storytimess111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views24 pages

Unit 4

Uploaded by

storytimess111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

System Models In replication in distributed systems

Replication in distributed systems involves creating multiple copies of data or services across
different nodes to improve reliability, availability, and performance. System models in
replication typically describe the behavior, consistency, and architectural aspects of
replication. These models can be broadly categorized as follows:

1. Replication Architectures
These models define how replicas are organized and managed.
 Master-Slave Model
o One master replica handles all write operations and propagates changes to
slave replicas.
o Slave replicas handle read requests.
o Advantage: Simplified conflict resolution.
o Disadvantage: Single point of failure at the master.
 Multi-Master Model
o Multiple replicas can accept write operations.
o Updates are propagated and synchronized between masters.
o Advantage: Improved availability and fault tolerance.
o Disadvantage: Requires complex conflict resolution mechanisms.
 Peer-to-Peer Model
o All replicas are equal, and any replica can serve read and write requests.
o Replicas coordinate to maintain consistency.
o Advantage: High fault tolerance and scalability.
o Disadvantage: Synchronization overhead.

2. Consistency Models
Consistency models specify the guarantees provided about the visibility of updates across
replicas.
 Strong Consistency
o All replicas reflect the same data at any given time.
o Ensured through protocols like Two-Phase Commit (2PC) or Paxos.
o Advantage: Simplified programming model.
o Disadvantage: High latency and lower availability.
 Eventual Consistency
o Replicas may diverge temporarily but will eventually converge to the same
state.
o Suitable for systems with high availability needs, like DNS.
o Advantage: High performance and scalability.
o Disadvantage: Temporary inconsistency can lead to anomalies.
 Causal Consistency
o Guarantees that causally related operations are seen in the same order by all
replicas.
o Advantage: Reduces anomalies while being more efficient than strong
consistency.
o Disadvantage: Requires maintaining causal history, which can be complex.
 Quorum-Based Consistency
o Operations must be acknowledged by a majority (quorum) of replicas to
succeed.
o Used in systems like Apache Cassandra and Amazon DynamoDB.

3. Update Propagation Models

These models describe how updates are shared among replicas.
 Synchronous Replication
o Updates are propagated immediately and must be acknowledged by all
replicas before completion.
o Advantage: Ensures consistency.
o Disadvantage: High latency and risk of bottlenecks.
 Asynchronous Replication
o Updates are propagated after the original operation is completed.
o Advantage: Low latency and better throughput.
o Disadvantage: Temporary inconsistencies may occur.
 Hybrid Replication
o Combines synchronous and asynchronous replication for different parts of the
system or operations.

4. Failure Models
These models address how systems handle faults during replication.
 Crash Failure Model
o Nodes may stop functioning but do not exhibit arbitrary behavior.
o Replication systems handle such failures using techniques like failover.
 Byzantine Failure Model
o Nodes may behave maliciously or unpredictably.
o Requires robust protocols like Byzantine Fault Tolerance (BFT) to maintain
correctness.
 Network Partition Model
o Network failures may isolate parts of the system, leading to partition
tolerance challenges.
o The CAP theorem highlights the trade-off between Consistency, Availability,
and Partition tolerance.

5. Concurrency Control Models

Concurrency control ensures correct operation when multiple replicas process requests
simultaneously.
 Pessimistic Replication
o Prevents conflicts by locking resources during updates.
o Advantage: Ensures consistency.
o Disadvantage: Can lead to high latency.
 Optimistic Replication
o Allows conflicts to occur and resolves them later using mechanisms like
conflict-free replicated data types (CRDTs).
o Advantage: High availability and low latency.
o Disadvantage: Conflict resolution may be complex.
Group Communication in Distributed Systems
In distributed systems, efficient group communication is crucial for coordinating activities among
multiple entities. This article explores the challenges and solutions involved in facilitating reliable and
ordered message delivery among members of a group spread across different nodes or networks.

What is Group Communication in Distributed Systems?

Group communication in distributed systems refers to the process where multiple nodes or entities
communicate with each other as a group.

 Instead of sending messages to individual recipients, group communication allows a sender

to transmit information to all members of a group simultaneously.

 This method is essential for coordinating actions, sharing data, and ensuring that all
participants in the system are informed and synchronized. It’s particularly useful in scenarios
like collaborative applications and real-time updates.

Importance of Group Communication in Distributed Systems

Group communication is critically important in distributed systems due to several key reasons:

 Multiple nodes must collaborate and synchronize their actions. Group communication helps
them exchange information and stay updated.

 Different nodes can create data that needs to be shared. Group communication helps quickly
send this information to everyone involved, reducing delays and keeping data consistent.

 Group communication protocols enhance reliability by allowing messages to be replicated or

acknowledged across multiple nodes. This ensures robust communication, even during
failures or network issues.

 As distributed systems expand, effective scaling is crucial. Group communication

mechanisms can manage more nodes and messages without sacrificing performance,
keeping the system efficient and responsive.

Types of Group Communication in a Distributed System

Below are the three types of group communication in distributed systems:

1. Unicast Communication
Unicast Communication

Unicast communication is the point-to-point transmission of data between two nodes in a network.
In the context of distributed systems:

 Unicast is when a sender sends a message to a specific recipient, using their unique network
address.

 Each message targets one recipient, creating a direct connection between the sender and
the receiver.

 You commonly see unicast in client-server setups, where a client makes requests and
receives responses, as well as in direct connections between peers.

 This method makes good use of network resources, is easy to implement, and keeps latency
low because messages go straight to the right person.

 Unicast isn’t efficient for sending messages to many recipients at once, as it requires
separate messages for each one, leading to more work.

2. Multicast Communication
Multicast Communication

Multicast communication involves sending a single message from one sender to multiple receivers
simultaneously within a network. It is particularly useful in distributed systems where broadcasting
information to a group of nodes is necessary:

 Multicast lets a sender share a message with a specific group of people who want it.

 This way, the sender can reach many people at once, which is more efficient than sending
separate messages.

 This approach is often used to send updates to subscribers or in collaborative applications

where real-time sharing of changes is needed.

 By sending data just once to a group, multicast saves bandwidth, simplifies communication,
and can easily handle a larger number of recipients.

 Managing group membership is necessary to ensure reliable message delivery, and multicast
can run into issues if there are network problems that affect everyone in the group.

3. Broadcast Communication

Broadcast communication involves sending a message from one sender to all nodes in the network,
ensuring that every node receives the message:

Broadcast Communication
 Broadcast is when a sender sends a message to every node in the network without targeting
specific recipients.

 Messages are delivered to all nodes at once using a special address designed for this
purpose.

 It’s often used for network management tasks, like sending status updates, or for emergency
alerts that need to reach everyone quickly.

 Broadcast ensures that every node receives the message without needing to specify who the
recipients are, making it efficient for sharing information widely.

 It can cause network congestion in larger networks and raises security concerns since anyone
on the network can access the broadcast message, which might lead to unauthorized access.

Reliable Multicast Protocols for Group Communication

Messages sent from a sender to multiple recipients should be delivered reliably, consistently, and in
a specified order. Types of Reliable Multicast Protocols include:

 FIFO Ordering:

o Ensures that messages are delivered to all group members in the order they were
sent by the sender.

o Achieved by sequencing messages and delivering them sequentially to maintain the

correct order.

 Causal Ordering:

o Preserves the causal relationships between messages based on their dependencies.

o Ensures that messages are delivered in an order that respects the causal
dependencies observed by the sender.

 Total Order and Atomicity:

o Guarantees that all group members receive messages in the same global order.

o Ensures that operations based on the multicast messages (like updates to shared
data) appear atomic or indivisible to all recipients.

Scalability for Group Communication

Scalability and performance are vital for effective group communication in distributed systems. It’s
essential for the system to handle more nodes, messages, and participants while still operating
efficiently. Here’s a closer look at these important aspects:

1. Scalability

Scalability refers to the system’s ability to grow without losing efficiency. This includes:

 Managing an increasing number of nodes or participants while keeping communication

smooth.

 Handling more messages exchanged among group members, ensuring timely and responsive
communication.
 Supporting connections across distant nodes or networks, which can introduce latency and
bandwidth issues.

2. Challenges in Scalability

As the group size grows, several challenges arise:

 The management of group membership and message routing becomes more complex,
adding overhead.

 The network must have enough bandwidth to support the higher traffic from a larger group
to avoid congestion.

 Keeping distributed nodes consistent and synchronized gets more complicated as the system
scales.

3. Strategies for Scalability

To tackle these challenges, various strategies can be employed:

 Partitioning and Sharding: Breaking the system into smaller parts can make communication
and management more manageable.

 Load Balancing: Evenly distributing the workload across nodes helps prevent bottlenecks and
optimizes resource use.

 Replication and Caching: Duplicating data or messages across nodes can lower access times
and enhance fault tolerance, aiding scalability.

Performance for Group Communication

Performance in group communication is crucial for optimizing message speed, resource utilization,
and addressing challenges like message ordering and concurrent access, ensuring efficient
collaboration in distributed systems.

1. Performance

In group communication, performance focuses on a few key areas:

 It’s crucial to minimize the time it takes for messages to reach their intended recipients.

 We want to enhance the rate at which messages are handled and sent out.

 Efficient use of bandwidth, CPU, and memory helps keep communication fast and effective.

2. Challenges in Performance

There are challenges that come with achieving high performance:

 Making sure messages arrive in the right order can be tough, especially with strict protocols.

 Handling multiple users trying to access shared resources at the same time can lead to
slowdowns.

 Communication needs to adjust based on changing conditions, like slow bandwidth or lost
packets.

3. Strategies for Improvement

To boost performance, consider these strategies:

 Smart Routing: Using efficient routing methods can reduce delays by cutting down on the
number of hops messages take.

 Asynchronous Communication: This allows senders and receivers to work independently,

improving responsiveness.

 Pre-storing Data: Keeping frequently accessed messages or data ready can help lower delays
and speed up responses.

Challenges of Group Communication in Distributed Systems

Group communication in distributed systems comes with several challenges due to the need to
coordinate multiple nodes that may be spread out or connected through unreliable networks. Key
challenges include:

 Reliability: Messages must reach all intended recipients, even during network failures or
node crashes, which can be complicated when nodes frequently join or leave.

 Scalability: As the number of participants grows, managing communication effectively

becomes harder, leading to issues with bandwidth usage and processing delays.

 Concurrency and Consistency: Keeping shared data consistent while allowing simultaneous
updates is tricky, requiring strong synchronization to avoid conflicts.

 Fault Tolerance: The system must handle node failures and communication issues without
losing reliability. This means having mechanisms to detect failures and manage changes in
group membership.

Fault Tolerance in Distributed System

Fault tolerance in distributed systems is the capability to continue operating smoothly despite
failures or errors in one or more of its components. This resilience is crucial for maintaining
system reliability, availability, and consistency. By implementing strategies like redundancy,
replication, and error detection, distributed systems can handle various types of failures, ensuring
uninterrupted service and data integrity.

Fault Tolerance in Distributed System

In distributed systems, three types of problems occur. All these three types of problems are related.

 Fault: Fault is defined as a weakness or shortcoming in the system or any hardware and
software component. The presence of fault can lead to error and failure.

 Errors: Errors are incorrect results due to the presence of faults.

 Failure: Failure is the outcome where the assigned goal is not achieved.

What is Fault Tolerance?

Fault Tolerance is defined as the ability of the system to function properly even in the presence of
any failure. Distributed systems consist of multiple components due to which there is a high risk of
faults occurring. Due to the presence of faults, the overall performance may degrade.

Types of Faults

 Transient Faults: Transient Faults are the type of faults that occur once and then disappear.
These types of faults do not harm the system to a great extent but are very difficult to find or
locate. Processor fault is an example of transient fault.

 Intermittent Faults: Intermittent Faults are the type of faults that come again and again.
Such as once the fault occurs it vanishes upon itself and then reappears again. An example of
intermittent fault is when the working computer hangs up.

 Permanent Faults: Permanent Faults are the type of faults that remain in the system until
the component is replaced by another. These types of faults can cause very severe damage
to the system but are easy to identify. A burnt-out chip is an example of a permanent Fault.

Need for Fault Tolerance in Distributed Systems

Fault Tolerance is required in order to provide below four features.

1. Availability: Availability is defined as the property where the system is readily available for its
use at any time.

2. Reliability: Reliability is defined as the property where the system can work continuously
without any failure.

3. Safety: Safety is defined as the property where the system can remain safe from
unauthorized access even if any failure occurs.

4. Maintainability: Maintainability is defined as the property states that how easily and fastly
the failed node or system can be repaired.

Fault Tolerance in Distributed Systems

In order to implement the techniques for fault tolerance in distributed systems, the design,
configuration and relevant applications need to be considered. Below are the phases carried out for
fault tolerance in any distributed systems.

Phases of Fault Tolerance in Distributed Systems

1. Fault Detection

Fault Detection is the first phase where the system is monitored continuously. The outcomes are
being compared with the expected output. During monitoring if any faults are identified they are
being notified. These faults can occur due to various reasons such as hardware failure, network
failure, and software issues. The main aim of the first phase is to detect these faults as soon as they
occur so that the work being assigned will not be delayed.

2. Fault Diagnosis

Fault diagnosis is the process where the fault that is identified in the first phase will be diagnosed
properly in order to get the root cause and possible nature of the faults. Fault diagnosis can be done
manually by the administrator or by using automated Techniques in order to solve the fault and
perform the given task.

3. Evidence Generation

Evidence generation is defined as the process where the report of the fault is prepared based on the
diagnosis done in an earlier phase. This report involves the details of the causes of the fault, the
nature of faults, the solutions that can be used for fixing, and other alternatives and preventions that
need to be considered.

4. Assessment

Assessment is the process where the damages caused by the faults are analyzed. It can be
determined with the help of messages that are being passed from the component that has
encountered the fault. Based on the assessment further decisions are made.

5. Recovery

Recovery is the process where the aim is to make the system fault free. It is the step to make the
system fault free and restore it to state forward recovery and backup recovery. Some of the common
recovery techniques such as reconfiguration and resynchronization can be used.

Types of Fault Tolerance in Distributed Systems

1. Hardware Fault Tolerance: Hardware Fault Tolerance involves keeping a backup plan for
hardware devices such as memory, hard disk, CPU, and other hardware peripheral devices.
Hardware Fault Tolerance is a type of fault tolerance that does not examine faults and
runtime errors but can only provide hardware backup. The two different approaches that are
used in Hardware Fault Tolerance are fault-masking and dynamic recovery.

2. Software Fault Tolerance: Software Fault Tolerance is a type of fault tolerance where
dedicated software is used in order to detect invalid output, runtime, and programming
errors. Software Fault Tolerance makes use of static and dynamic methods for detecting and
providing the solution. Software Fault Tolerance also consists of additional data points such
as recovery rollback and checkpoints.

3. System Fault Tolerance: System Fault Tolerance is a type of fault tolerance that consists of a
whole system. It has the advantage that it not only stores the checkpoints but also the
memory block, and program checkpoints and detects the errors in applications
automatically. If the system encounters any type of fault or error it does provide the required
mechanism for the solution. Thus system fault tolerance is reliable and efficient.
Fault Tolerance Strategies

Fault tolerance strategies are essential for ensuring that distributed systems continue to operate
smoothly even when components fail. Here are the key strategies commonly used:

 Redundancy and Replication

o Data Replication: Data is duplicated across multiple nodes or locations to ensure

availability and durability. If one node fails, the system can still access the data from
another node.

o Component Redundancy: Critical system components are duplicated so that if one

component fails, others can take over. This includes redundant servers, network
paths, or services.

 Failover Mechanisms

o Active-Passive Failover: One component (active) handles the workload while

another component (passive) remains on standby. If the active component fails, the
passive component takes over.

o Active-Active Failover: Multiple components actively handle workloads and share

the load. If one component fails, others continue to handle the workload.

 Error Detection Techniques

o Heartbeat Mechanisms: Regular signals (heartbeats) are sent between components

to detect failures. If a component stops sending heartbeats, it is considered failed.

o Checkpointing: Periodic saving of the system’s state so that if a failure occurs, the
system can be restored to the last saved state.

 Error Recovery Methods

o Rollback Recovery: The system reverts to a previous state after detecting an error,
using saved checkpoints or logs.

o Forward Recovery: The system attempts to correct or compensate for the failure to
continue operating. This may involve reprocessing or reconstructing data.

Design Patterns for Fault Tolerance

Design patterns for fault tolerance help in creating systems that can handle failures gracefully and
maintain reliable operations. Here are some key fault tolerance design patterns:

1. Circuit Breaker Pattern

This pattern prevents a system from making calls to a failing service by wrapping it in a “circuit
breaker.” When the service fails, the circuit breaker trips, causing further calls to fail fast instead of
trying to connect to a failing service repeatedly.

Useful in scenarios where services might experience temporary outages. For example, a microservices
architecture where a downstream service might be unreliable.

2. Bulkhead Pattern
This pattern isolates different components or services to prevent a failure in one part of the system
from affecting others. It’s similar to the bulkheads in a ship that prevent flooding in one
compartment from sinking the entire vessel.

Essential in systems where failures in one service should not impact others. For instance, an e-
commerce platform might use bulkhead isolation to separate payment processing from inventory
management.

3. Retry Pattern

This pattern involves automatically retrying an operation that has failed due to transient errors. The
retries are typically done with exponential backoff to avoid overwhelming the system.

Suitable for scenarios where operations might fail intermittently due to temporary issues like network
glitches or service overloads.

4. Rate Limiting Pattern

This pattern controls the number of requests a system or service can handle within a specific time
window to prevent overload and ensure fair usage.

Essential for APIs and services that might be susceptible to abuse or excessive traffic. It helps in
maintaining system stability and performance.

5. Failover Pattern

This pattern involves switching to a backup system or component when the primary one fails. It
ensures continuity of service by having redundant systems ready to take over.

Architecture of Distributed Shared Memory(DSM)

Distributed Shared Memory (DSM)

implements the distributed systems shared memory model in a distributed system, that hasn’t any
physically shared memory. Shared model provides a virtual address area shared between any or all
nodes. To beat the high forged of communication in distributed system. DSM memo, model provides
a virtual address area shared between all nodes. systems move information to the placement of
access. Information moves between main memory and secondary memory (within a node) and
between main recollections of various nodes. Every Greek deity object is in hand by a node. The
initial owner is that the node that created the object. possession will amendment as the object
moves from node to node. Once a method accesses information within the shared address space,
the mapping manager maps shared memory address to physical memory (local or remote).
DSM permits programs running on separate reasons to share information while not the software
engineer having to agitate causation message instead underlying technology can send the messages
to stay the DSM consistent between compute. DSM permits programs that wont to treat constant
laptop to be simply tailored to control on separate reason. Programs access what seems to them to
be traditional memory. Hence, programs that Pine Tree State DSM square measure sometimes
shorter and easier to grasp than programs that use message passing. But, DSM isn’t appropriate for
all things. Client-server systems square measure typically less suited to DSM, however, a server is
also wont to assist in providing DSM practicality for information shared between purchasers.

Architecture of Distributed Shared Memory (DSM) :

The architecture of a Distributed Shared Memory (DSM) system typically consists of several key
components that work together to provide the illusion of a shared memory space across distributed
nodes. the components of Architecture of Distributed Shared Memory :

1.Nodes: Each node in the distributed system consists of one or more CPUs and a memory unit.
These nodes are connected via a high-speed communication network.

2.Memory Mapping Manager Unit: The memory mapping manager routine in each node is
responsible for mapping the local memory onto the shared memory space. This involves dividing the
shared memory space into blocks and managing the mapping of these blocks to the physical memory
of the node.
Caching is employed to reduce operation latency. Each node uses its local memory to cache portions
of the shared memory space. The memory mapping manager treats the local memory as a cache for
the shared memory space, with memory blocks as the basic unit of caching.

3.Communication Network Unit: This unit facilitates communication between nodes. When a process
accesses data in the shared address space, the memory mapping manager maps the shared memory
address to physical memory. The communication network unit handles the communication of data
between nodes, ensuring that data can be accessed remotely when necessary.

A layer of code, either implemented in the operating system kernel or as a runtime routine, is
responsible for managing the mapping between shared memory addresses and physical memory
locations.

Each node’s physical memory holds pages of the shared virtual address space. Some pages are local
to the node, while others are remote and stored in the memory of other nodes.

In summary, the architecture of a DSM system includes nodes with CPUs and memory, a memory
mapping manager responsible for mapping local memory to the shared memory space, caching
mechanisms to reduce latency, a communication network unit for inter-node communication, and a
mapped layer to manage the mapping between shared memory addresses and physical memory
locations.

Sequential Consistency In Distributed Systems

Sequential consistency is a crucial concept in distributed systems, ensuring operations appear in a
consistent order. This article explores its fundamental principles, significance, and practical
implementations, addressing the challenges and trade-offs involved in achieving sequential
consistency in complex, distributed environments.

Sequential Consistency In Distributed Systems

What are Distributed Systems?

Distributed systems are a collection of independent computers that appear to their users as a single
coherent system. These computers collaborate to achieve a common goal and communicate over a
network.

What is Consistency?

Consistency is a fundamental property of distributed systems that ensures that all replicas of a
shared data store have the same value. In a distributed system, data is typically replicated across
multiple nodes to improve availability and fault tolerance. However, maintaining consistency across
all replicas can be challenging, especially in the presence of concurrent updates and network delays.

What is the problem if we do not maintain consistency in Distributive systems?

When multiple processes access shared resources concurrently, where the order of execution is not
deterministic, the problem arises due to the need for maintaining consistency across these
processes. This challenge is further complicated by the fact that distributed systems are prone to
various types of failures, such as message losses, network delays, and crashes, which can lead to
inconsistencies in the execution order.
 To ensure that shared resources are accessed in a consistent order by consistency, there are
several consistency models that can be used in distributed systems to ensure that shared
resources are accessed in a consistent order

 These models define the level of consistency that is guaranteed for shared resources in the
presence of concurrent access by multiple processes.

Why use sequential consistency as a consistency model in distributive systems?

Every concurrent execution of a program should ensure that the execution results are in some
sequential order. It provides an intuitive model of how the distributed systems should behave, which
makes it easier for users to understand and reason about the behavior of the system.

What is Sequential Consistency in Distributed Systems?

Sequential consistency is a consistency model that ensures that the order in which operations are
executed by different processes in the system appears to be consistent with a global order of
execution. It ensures that the result of any execution of a distributive system is the same as if the
operations of all the processes in the system were executed in some sequential order, one after the
other.

Example of Sequential Consistency in Distributed Systems

Below is the example of Sequential Consistency in a distributed system with three processes: P1, P2,
and P3, interacting with a shared variable X. Below is the example scenario of processes:

 P1: W(X, 5) — Writes the value 5 to X.

 P2: R(X) — Reads the value of X.

 P3: W(X, 10) — Writes the value 10 to X.

 P2: R(X) — Reads the value of X.

Explanation of Sequential Consistency:

Sequential consistency requires that the result of operations appear as if they were executed in some
global sequence that respects the real-time order of operations performed by each process. Possible
sequential orders are:

 Sequential Order 1:

o P1: W(X, 5) — Write 5 to X.

o P2: R(X) — Read 5 (as 5 was the last write before this read).

o P3: W(X, 10) — Write 10 to X.

o P2: R(X) — Read 10 (as 10 was the last write before this read).

Consistency Check:

o The first read by P2 returns 5, which is the value written by P1.

o The second read by P2 returns 10, which is the value written by P3.
o This order maintains sequential consistency as it reflects a global sequence where
operations are ordered as if they occurred one after the other.

 Sequential Order 2:

o P1: W(X, 5) — Write 5 to X.

o P3: W(X, 10) — Write 10 to X.

o P2: R(X) — Read 10 (as 10 was the latest write before this read).

Consistency Check:

o The first read by P2 returns 10, reflecting that it reads the value written by P3, which
was the latest write.

o The second read by P2 also returns 10, consistent with the latest write.

o This order is valid because it maintains a consistent view of the latest write
operation, reflecting a valid global order.

Techniques to implement Sequential Consistency

The following techniques can be used to implement sequential consistency in distributed systems:

 Two-Phase Locking:

o Two-phase locking is a technique used to ensure that concurrent access to shared

data is prevented.

o In this technique, locks are used to control access to shared data.

o Each process acquires a lock before accessing the shared data.

o Once a process acquires a lock, it holds the lock until it has completed its operation.

o Two-phase locking ensures that no two processes can access the same data at the
same time, which can prevent inconsistencies in the data.

 Timestamp Ordering:

o Timestamp ordering is a technique used to ensure that operations are performed in

the same order across all nodes. In this technique, each operation is assigned a
unique timestamp.

o The system ensures that all operations are performed in the order of their
timestamps.

o Timestamp ordering ensures that the order of operations is preserved across all
nodes.

 Quorum-based Replication:

o Quorum-based replication is a technique used to ensure that data is replicated

across different nodes.
o In this technique, the system ensures that each write operation is performed on a
subset of nodes.

o To ensure consistency, the system requires that a majority of nodes agree on the
value of the data.

o Quorum-based replication ensures that the data is replicated across different nodes,
and inconsistencies in the data are prevented.

 Vector Clocks:

o Vector clocks are a technique used to ensure that operations are performed in the
same order across all nodes.

o In this technique, each node maintains a vector clock that contains the timestamp of
each operation performed by the node.

o The system ensures that all operations are performed in the order of their vector
clocks.

o Vector clocks ensure that the order of operations is preserved across all nodes.

Some common challenges in achieving Sequential Consistency

Below are some common challenges in achieving sequential consistency:

 Network Latency: Communication between different nodes in a distributed system can take
time, and this latency can result in inconsistencies in the data.

 Node Failure: In a distributed system, nodes can fail, which can result in data inconsistencies.

 Concurrent Access: Concurrent access to shared data can lead to inconsistencies in the data.

 Replication: Replication of data across different nodes can lead to inconsistencies in the
data.

Best Practices for Implementing Sequential Consistency

Below are the best practices for implementing Sequential Consistency:

 Design for Global Order: Use global timestamps or logical clocks to order operations across
processes.

 Implement Synchronization Mechanisms: Utilize locks, barriers, or distributed consensus

algorithms to coordinate operations and enforce the global order.

 Utilize Consensus Algorithms: Apply consensus protocols like Two-Phase Commit (2PC) or
Three-Phase Commit (3PC) to agree on the order of transactions.

 Ensure Fault Tolerance: Incorporate redundancy and failover mechanisms to maintain

consistency during failures.

 Optimize Performance: Balance synchronization overhead with performance requirements

by batching operations or using efficient consensus algorithms.

 Monitor and Test Consistency: Continuously monitor for consistency and perform regular
testing to validate that operations adhere to sequential consistency.
 Use Logical Clocks: Implement Lamport timestamps or vector clocks to provide a partial
ordering of events.

Ivy: A Read/Write Peer-to-Peer File System

Abstract:

Ivy is a multi-user read/write peer-to-peer file system. Ivy has no centralized or dedicated
components, and it provides useful integrity properties without requiring users to fully trust either
the underlying peer-to-peer storage system or the other users of the file system.

An Ivy file system consists solely of a set of logs, one log per participant. Ivy stores its logs in the
DHash distributed hash table. Each participant finds data by consulting all logs, but performs
modifications by appending only to its own log. This arrangement allows Ivy to maintain meta-data
consistency without locking. Ivy users can choose which other logs to trust, an appropriate
arrangement in a semi-open peer-to-peer system.

Ivy presents applications with a conventional file system interface. When the underlying network is
fully connected, Ivy provides NFS-like semantics, such as close-to-open consistency. Ivy detects
conflicting modifications made during a partition, and provides relevant version information to
application-specific conflict resolvers. Performance measurements on a wide-area network show
that Ivy is two to three times slower than NFS.
Difference between Write invalidate protocol &
Write Update Protocol (Bus-Snooping )
Cache coherence becomes an issue in today’s multi processor system where all the
processors are using a common memory. Two key maintained protocols to achieve this
coherency through bus-snooping are the write invalidate protocol and the write update
protocol. To efficiently manage memory and achieve better performance while using the
operating system the differences between these protocols should be made clear.
What is Cache Coherence Protocol?
To keep the data consistent, cache coherence protocols are used. These protocols update
cache copies in the multiprocessor systems. In bus – snooping mechanisms , processors
snoop (monitoring) the bus and take appropriate action on relevant events (data update) to
ensure the data consistency.
The 2 protocols that are usually used to update cache copies are –
1. Write-update protocol
2. Write-invalidate protocol
What is Write Invalidate Protocol?
Wherever there are multiple caches to a common memory, there must be a protocol that
ensures that these caches are in harmony when sharing data, and this is where the Write
Invalidate Protocol comes into play. Here, when a processor wants to write to a certain
memory location, then this protocol guarantees that other cache in the system with the
same memory line will nullify their copies. This helps avoid the situation where other
processors read via other caches that have old data and makes sure that only the one has
the latest data.
Advantages of Write Invalidate Protocol
 Reduces Bandwidth Usage: To this, the only message that is transmitted in the bus is
invalidation messages and therefore the subject is smaller as compared to
transmitting the actual data.
 Simpler Implementation: The protocol is also easy to implement because it simplifies
the problem of cache coherence.
 Prevents Stale Data Access: By voiding other caches’ copies it ensures that no
processor acquire old information from that particular cache.
Disadvantages of Write Invalidate Protocol
 Increased Cache Misses: Invalidating caches may result in more cache misses, in the
sense that other processors may have to read the data in from either the main
memory or from the cache of the writing processor.
 Performance Overhead: It also allows for frequent invalidations which can add
latency where there is high write contention.
What is Write Update Protocol?
The Write Update Protocol or also called as the Write Broadcast Protocol is another cache
coherence strategy applied to multiprocessor systems. Unlike the Write Invalidate Protocol,
on write operation, the cache sends an update to all other processors that have that
memory line of the cache. This makes sure all the caches hold the new value and no cache
copy has to be invalidated.
Advantages of Write Update Protocol
 Reduced Cache Misses: Due to the availability of the updated data in all the caches,
they do not have to use the fetch operation thus reducing on the number of cache
misses.
 Improved Performance for Read-Heavy Workloads: Applications with more frequent
reads are benefitted most when there is the latest data in all the caches.
 Consistent Data Across Caches: Makes sure all caches have up to date data, thus
improving data synchronization.
Disadvantages of Write Update Protocol
 Higher Bandwidth Consumption: The approach of broadcasting updated data to all
caches in order to reduce multiple invalidation messages requires more
bus bandwidth than when sending invalidation messages only.
 Complex Implementation: Co-ordinating and making certain that all of the caches
have the correct data can be even more challenging.
 Potential for Increased Traffic: When multiple processors are involved, the amount of
update messages can grow big, causing congestion and thus the bus.
Difference between Write Invalidate Protocol & Write Update Protocol

Write invalidate protocol Write Update Protocol

One initial invalidation is required when Multiple write broadcasts are required
1
multiple writes to the same word is when multiple writes to the same word
.
done with no intervening reads. is done with no read is done in between.
Write invalidate protocol Write Update Protocol

With multiword cache blocks, only the A write broadcast is required when each
2 first write to any of the word in the word is written in a cache block with
. block need to generate an invalidate. multiword cache blocks. This protocol
This protocol works on cache blocks. works on individual words.(bytes)

Less time is taken to read the data

because the data written by the updating
Longer time is taken (by the other non-
processor is immediately updated in
updating processor) because any read
3 another cache as well. (reading
to invalidate data requires you to fetch
. processor should have a data copy
a new copy of data from the updating
beforehand) (Although updating all the
processor(its cache has updated data).
cache copies may take some time
initially)

Since the written data is not instantly Since the written data is instantly
updated in the reader’s cache, the delay updated in the reader’s cache, the delay
4 between writing a word in one between writing a word in one processor
. processor and reading the written value and reading the written value in another
in another processor is larger than the processor is normally shorter in a write
write Update. update scheme.

Updated data is given to the processors

5 Updated data is given to the processor
who contain the same cache block copy
. who requires it.
that was updated.

Whenever a processor modifies data Whenever a processor modifies data

frequently, updated data is sent to the frequently, for every modification, write
6
processor who requires it. No need to broadcast is required to be done . Write
.
broadcast again & again after every broadcast is done as many times as
modification in the data. modifications are done.

Conclusion
WI and WUP protocols are used to make cache coherent in multiprocessor systems but in
contrast manner. The Write Invalidate Protocol is nominally a little more efficient and
simpler than the Directory, so it is useful for systems where writes are comparatively less
common. While the Write Update Protocol helps keep the data synchronized across different
caches, and is helpful to those programs that do mostly reads, this comes at the expense of
more bandwidth use and a larger cost of implementation. The concepts of these protocols
are that, the decision between them has to be based on the circumstances of the certain
system workload.

Release Consistency and munin in distributed

system:-
Release Consistency (RC)
Release Consistency is a consistency model for distributed shared memory systems, where
multiple processors or nodes may need access to shared memory. It aims to reduce
communication overhead by allowing certain optimizations in how memory updates are
shared across nodes. Here’s how it works:
1. Relaxed Synchronization: In RC, updates to shared memory don’t have to be
immediately visible to other nodes after each write. Instead, updates are propagated
at specific points, known as synchronization points.
2. Acquisition and Release of Locks:
o Acquire: When a node acquires a lock, it ensures it has the most recent
version of the data, fetching any updates that may have happened since its
last access.
o Release: When a node releases a lock, it pushes any updates it made to
shared memory so other nodes can access the latest data.
3. Types of Consistency Models:
o Strict Consistency: Every update to memory is instantly visible to all nodes.
o Sequential Consistency: Operations appear in a sequential order, consistent
across all nodes.
o Release Consistency: Offers flexibility by decoupling updates from immediate
visibility, only enforcing consistency when a synchronization (release) is
required.
By delaying synchronization until a release point, RC reduces the need for constant data
propagation, which can lower the network's communication costs in distributed systems.
However, this does mean that developers need to be aware of when synchronization is
happening.

Munin
Munin is a distributed shared memory (DSM) system that implements release consistency
with optimizations for parallel applications. It’s designed to provide a balance between
performance and consistency by leveraging RC and a few additional techniques:
1. Multiple Consistency Protocols: Munin uses different protocols tailored to different
types of data. For example, read-mostly data might be handled differently from data
that is frequently updated, optimizing communication for specific access patterns.
2. Delayed Update Propagation: Using release consistency, Munin delays the
propagation of updates until synchronization points. This can make it more efficient
in distributed environments where immediate consistency isn’t necessary.
3. Lazy Release Consistency (LRC): Munin incorporates Lazy Release Consistency,
which further reduces overhead by delaying the propagation of updates until they
are explicitly needed, rather than at each release point.
4. Minimizing False Sharing: Munin also employs strategies to minimize false sharing,
where two processes frequently access different parts of the same memory page,
leading to unnecessary synchronization.
In summary, Munin was one of the first DSM systems to explore the release consistency
model, demonstrating that RC could enhance the performance of distributed systems by
minimizing synchronization costs and adjusting consistency protocols based on application
needs.

06 Parallel and Distributed Computing
No ratings yet
06 Parallel and Distributed Computing
43 pages
Fault Tolerance Unit 3-4
No ratings yet
Fault Tolerance Unit 3-4
32 pages
Unit5 Ds
No ratings yet
Unit5 Ds
31 pages
Distributed Computing Module 1 Important Topics PYQs
No ratings yet
Distributed Computing Module 1 Important Topics PYQs
23 pages
Lecture 8
No ratings yet
Lecture 8
14 pages
DSC5
No ratings yet
DSC5
13 pages
Chapter 4
No ratings yet
Chapter 4
51 pages
CC Unit - 2
No ratings yet
CC Unit - 2
32 pages
DC Mod 5
No ratings yet
DC Mod 5
12 pages
DSCC QB Solution
No ratings yet
DSCC QB Solution
15 pages
Ds Part B
No ratings yet
Ds Part B
30 pages
DS Unit5
No ratings yet
DS Unit5
13 pages
Distrributed System Detail Notes
No ratings yet
Distrributed System Detail Notes
69 pages
DS Module-1
No ratings yet
DS Module-1
4 pages
DC Prelim Qbank Ans by RBA
No ratings yet
DC Prelim Qbank Ans by RBA
29 pages
801 DCexp 3
No ratings yet
801 DCexp 3
18 pages
Questions On DS
No ratings yet
Questions On DS
8 pages
Ds Part B
No ratings yet
Ds Part B
29 pages
Comp 4126 Study Notes
No ratings yet
Comp 4126 Study Notes
18 pages
Unit-2 1
No ratings yet
Unit-2 1
6 pages
Distributed Systems Practitioners Dimos Raptis Raspoznan
No ratings yet
Distributed Systems Practitioners Dimos Raptis Raspoznan
259 pages
Tech Things
No ratings yet
Tech Things
6 pages
Synchronization in Distributed Systems
No ratings yet
Synchronization in Distributed Systems
8 pages
REPLICATION
No ratings yet
REPLICATION
20 pages
Distributive Systems Lecture
No ratings yet
Distributive Systems Lecture
24 pages
Apogee Prepress v9 Tutorial en
No ratings yet
Apogee Prepress v9 Tutorial en
172 pages
Unit 5
No ratings yet
Unit 5
12 pages
Distributed System2
No ratings yet
Distributed System2
102 pages
Designs and Issues
No ratings yet
Designs and Issues
3 pages
5-Distributed Systems Engineering
No ratings yet
5-Distributed Systems Engineering
42 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
48 pages
UNIT-1:: Distributed Computing Systems Have The Following Characteristics
No ratings yet
UNIT-1:: Distributed Computing Systems Have The Following Characteristics
31 pages
Unit 2 HPCcontent
No ratings yet
Unit 2 HPCcontent
37 pages
Distributed Systems Unit 1
No ratings yet
Distributed Systems Unit 1
30 pages
Distributed Systems
No ratings yet
Distributed Systems
35 pages
Indirect Communication: Distributed Computing
No ratings yet
Indirect Communication: Distributed Computing
37 pages
Distributed Systems Detailed Explanation
No ratings yet
Distributed Systems Detailed Explanation
5 pages
Distributed Systems: Dr. Martin Kleppmann mk428@cst - Cam.ac - Uk
No ratings yet
Distributed Systems: Dr. Martin Kleppmann mk428@cst - Cam.ac - Uk
91 pages
Distributed Sys 6thsem
No ratings yet
Distributed Sys 6thsem
11 pages
CLS Unit-1 (IV)
No ratings yet
CLS Unit-1 (IV)
5 pages
UNIT-1 What Is: Q1: Distributed System? or Why Would You Design A System As A Distributed System
No ratings yet
UNIT-1 What Is: Q1: Distributed System? or Why Would You Design A System As A Distributed System
55 pages
IntroDistribuetComputing
No ratings yet
IntroDistribuetComputing
41 pages
DS Unit 1
No ratings yet
DS Unit 1
34 pages
Module 1
No ratings yet
Module 1
21 pages
Chap 2 DC
No ratings yet
Chap 2 DC
10 pages
Unit I
No ratings yet
Unit I
17 pages
Distributed Systems Notes
No ratings yet
Distributed Systems Notes
86 pages
DC Rev
No ratings yet
DC Rev
11 pages
Unit 1 CC
No ratings yet
Unit 1 CC
25 pages
CC - Unit 1
No ratings yet
CC - Unit 1
38 pages
Message Passing Synchronous & Asynchronous
No ratings yet
Message Passing Synchronous & Asynchronous
11 pages
Distributed System Assinmnet
No ratings yet
Distributed System Assinmnet
9 pages
Innovative Technologies in Urban Mapping Built Space and Mental Space 1st Edition Antonella Contin Instant Download
No ratings yet
Innovative Technologies in Urban Mapping Built Space and Mental Space 1st Edition Antonella Contin Instant Download
54 pages
Networking Self Study 4.1
No ratings yet
Networking Self Study 4.1
10 pages
CC ZG526 Course Handout
No ratings yet
CC ZG526 Course Handout
6 pages
Introduction To Distributed Systems
No ratings yet
Introduction To Distributed Systems
9 pages
Distributed System
No ratings yet
Distributed System
19 pages
Distributed Systems As DS DS
No ratings yet
Distributed Systems As DS DS
7 pages
DC Notes - 2 Marks
No ratings yet
DC Notes - 2 Marks
11 pages
Reliable Messaging with Nanomsg: Definitive Reference for Developers and Engineers
From Everand
Reliable Messaging with Nanomsg: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Plotly PDF
No ratings yet
Plotly PDF
166 pages
Pytube Documentation: Release 9.0.7
No ratings yet
Pytube Documentation: Release 9.0.7
34 pages
Distributed Systems Notes
No ratings yet
Distributed Systems Notes
12 pages
CipherTrust Manager - Hands-On - CTE - Linux
0% (1)
CipherTrust Manager - Hands-On - CTE - Linux
25 pages
F
No ratings yet
F
22 pages
Unit - 5: System Software and Operating System
No ratings yet
Unit - 5: System Software and Operating System
3 pages
COMP-111 Programming Fundamentals
No ratings yet
COMP-111 Programming Fundamentals
26 pages
Sanmartin - BT501 - Advanced System Integration Architecture - ASIA (Case Study Pre Finals)
No ratings yet
Sanmartin - BT501 - Advanced System Integration Architecture - ASIA (Case Study Pre Finals)
8 pages
WhatsApp Chat With +91 93256 05575
No ratings yet
WhatsApp Chat With +91 93256 05575
2 pages
Introduction To IoT With Machine Learning and Image Processing Using Raspberry Pi (Shrirang Ambaji Kulkarni, Varadrah P. Gurupur Etc.) (Z-Library)
No ratings yet
Introduction To IoT With Machine Learning and Image Processing Using Raspberry Pi (Shrirang Ambaji Kulkarni, Varadrah P. Gurupur Etc.) (Z-Library)
167 pages
AnmolLipi Key Map
100% (2)
AnmolLipi Key Map
1 page
Topics - Oracle Licensing One Day Course
No ratings yet
Topics - Oracle Licensing One Day Course
4 pages
Crime Record Management System11
No ratings yet
Crime Record Management System11
40 pages
Synchronization
No ratings yet
Synchronization
3 pages
A Data - Image Transmission Device Based On TCP - IP Protocol
No ratings yet
A Data - Image Transmission Device Based On TCP - IP Protocol
7 pages
With Chapter 2, Ciosk
No ratings yet
With Chapter 2, Ciosk
13 pages
ASP Chap5
No ratings yet
ASP Chap5
11 pages
CS244 Final Exam
No ratings yet
CS244 Final Exam
4 pages
+ Add/request New Update: 19949926 - SICHUAN Province Airport Group. Co., LTD
No ratings yet
+ Add/request New Update: 19949926 - SICHUAN Province Airport Group. Co., LTD
2 pages
CRQ Creation 4 New Remedy
No ratings yet
CRQ Creation 4 New Remedy
62 pages
Syscon Error Codes - PS3 Developer Wiki
No ratings yet
Syscon Error Codes - PS3 Developer Wiki
22 pages
Cloud-Services-Developer - Template 10
No ratings yet
Cloud-Services-Developer - Template 10
1 page
Jeev I New Resume
No ratings yet
Jeev I New Resume
2 pages
jcp11 01 Rms 20240118
No ratings yet
jcp11 01 Rms 20240118
14 pages
Chapter 2 Time Management (Part 1)
No ratings yet
Chapter 2 Time Management (Part 1)
19 pages
Olympiad Open Doors
0% (1)
Olympiad Open Doors
11 pages
Targeting Target With A 100 Million Dollar Data Breach
No ratings yet
Targeting Target With A 100 Million Dollar Data Breach
15 pages
Sai Baba
No ratings yet
Sai Baba
1 page
Presentation On Cms Wordpress
No ratings yet
Presentation On Cms Wordpress
13 pages
ATM Personal Identification Pin Theft Avoidance System
No ratings yet
ATM Personal Identification Pin Theft Avoidance System
3 pages