0% found this document useful (0 votes)
16 views60 pages

DC - Notes

Designing distributed operating systems is complex due to the lack of complete system knowledge and challenges in resource management, synchronization, and scheduling. Key design issues include transparency, reliability, fault tolerance, flexibility, performance, scalability, heterogeneity, and security, each requiring specific strategies to ensure efficient and resilient operation. The document also discusses hardware concepts and classifications relevant to distributed systems, emphasizing the importance of effective communication and resource management.

Uploaded by

Mithali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views60 pages

DC - Notes

Designing distributed operating systems is complex due to the lack of complete system knowledge and challenges in resource management, synchronization, and scheduling. Key design issues include transparency, reliability, fault tolerance, flexibility, performance, scalability, heterogeneity, and security, each requiring specific strategies to ensure efficient and resilient operation. The document also discusses hardware concepts and classifications relevant to distributed systems, emphasizing the importance of effective communication and resource management.

Uploaded by

Mithali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

ISSUES IN DISTRIBUTED SYSTEM

Designing a distributed operating system is more challenging than designing a centralized


operating system due to several complexities. In a centralized system, the operating system has
complete and accurate information about the environment, ensuring consistency in
decision-making. However, a distributed operating system must function without complete
system knowledge, as resources are physically separated, there is no common clock, message
delivery can be delayed, and messages may even be lost.

These factors lead to difficulties in resource management, synchronization, and processor


scheduling, as the system lacks up-to-date and consistent state information. Despite these
challenges, a distributed operating system must still provide the advantages of a distributed
system, allowing users to perceive it as a virtual centralized system that is flexible, efficient,
reliable, secure, and user-friendly. To achieve this, designers must address several critical
design issues, which are explored in detail in subsequent discussions.

1.​ Transparency: One of the primary goals of a distributed operating system is to make
multiple computers appear as a single system to users, ensuring transparency. This
means a collection of distinct machines connected by a communication subsystem
should function as a virtual uniprocessor. Achieving complete transparency is
challenging and requires support for multiple aspects of transparency. According to the
International Standards Organization's Reference Model for Open Distributed
Processing [ISO 1992], the eight forms of transparency are access transparency,
location transparency, replication transparency, failure transparency, migration
transparency, concurrency transparency, performance transparency, and scaling
transparency.
2.​ Reliability: Distributed systems are generally expected to be more reliable than
centralized systems due to the availability of multiple resource instances. However,
simply having multiple instances does not guarantee reliability. The distributed operating
system must be designed effectively to leverage this characteristic and enhance system
reliability.
A fault is a mechanical or algorithmic defect that can lead to an error, ultimately causing
system failure. System failures can be categorized into two types: fail-stop failures
[Schlichting and Schneider, 1983] and Byzantine failures [Lamport et al., 1982]. In a
fail-stop failure, the system ceases to function but does so in a detectable manner. In
contrast, a Byzantine failure allows the system to continue running while producing
incorrect results, often caused by undetected software bugs, making it more challenging
to handle than fail-stop failures.
To achieve higher reliability, the fault-handling mechanisms in a distributed operating
system must be designed to prevent faults, tolerate faults, and detect and recover from
faults. Various methods are commonly used to address these challenges.
3.​ Fault Avoidance: Fault avoidance focuses on designing system components to
minimize the occurrence of faults. This is often achieved through conservative design
practices, such as using high-reliability components, to enhance system reliability. While
a distributed operating system has little direct influence on the fault avoidance capability
of hardware components, its software components must be thoroughly tested to ensure
high reliability. Effective testing and robust design of the software components play a
crucial role in reducing faults and improving overall system stability.
4.​ Fault Tolerance: Fault tolerance enables a system to function despite partial failures.
Two key approaches in distributed operating systems are:
1. **Redundancy Techniques:** Replicating critical components (processes, files, or
storage) prevents single points of failure. A system is **k-fault tolerant** with **k+1
replicas** for fail-stop failures and **2k+1 replicas** for Byzantine failures (using majority
voting). However, redundancy increases system overhead, requiring a balance between
reliability and efficiency.
2. **Distributed Control:** Decentralizing services (file management, scheduling, name
resolution) avoids single points of failure. Independent servers ensure reliability,
preventing system-wide failures.
These strategies enhance resilience and ensure continued operation in distributed
systems.
5.​ Fault Detection and Recovery:To enhance reliability, distributed operating systems use
fault detection and recovery techniques: Atomic Transactions: Ensures operations occur
entirely or not at all, preventing inconsistent data states after failures. Transactions
simplify crash recovery by maintaining data integrity. Stateless Servers: Unlike stateful
servers, stateless servers do not retain client history, making crash recovery simpler by
eliminating complex state management. Acknowledgments & Timeouts: Lost messages
due to failures are detected via acknowledgment messages and retransmissions.
Sequence numbers help avoid duplicate messages. These mechanisms improve
reliability but introduce system overhead, requiring a balance between efficiency and
fault tolerance.
6.​ Flexibility: Flexibility in distributed operating systems is crucial for ease of modification
and enhancement. A flexible design allows seamless updates to fix bugs, adapt to
changing environments, and incorporate new functionalities. The choice of kernel model
significantly impacts flexibility—microkernel architecture offers higher modularity, making
it easier to modify and add services without system downtime. While it may introduce
slight performance overhead due to interprocess communication, its advantages in
maintainability, scalability, and customization outweigh this drawback. Modern distributed
OS designs prefer microkernels for their adaptability and user-centric configurability.
7.​ Performance: For a distributed system to be effective, its performance must match or
exceed that of a centralized system. Proper design of system components is essential to
avoid inefficiencies. Key principles for optimizing performance include:

Batch Processing – Sending data in large chunks and piggybacking acknowledgments improves
efficiency.

Caching – Storing frequently accessed data locally reduces network usage and speeds up
operations.

Minimizing Data Copying – Reducing unnecessary data transfers between memory and devices
decreases CPU overhead.
Reducing Network Traffic – Process migration and clustering minimize internode communication
costs.

Fine-Grained Parallelism – Using threads and concurrency mechanisms enhances


responsiveness and resource utilization.

Applying these strategies ensures better speed, lower latency, and improved overall system
performance.

8.​ Scalability: Scalability is the ability of a system to handle increasing service loads,
making it a crucial consideration for distributed systems that are expected to grow over
time. A well-designed distributed operating system should accommodate this growth
without significant performance degradation or service disruptions. Below are key
principles for designing scalable systems:
1.​ Avoid Centralized Entities​
Centralized components like a single file server or database can create bottlenecks as
the system scales. The failure of such entities can lead to system-wide failures, and
increased contention for resources can saturate network capacity. To avoid these issues,
distributed systems should empl
2.​ oy techniques like resource replication and decentralized control, ensuring that all nodes
share an equal role in system operation.​

3.​ Avoid Centralized Algorithms​


Centralized algorithms that collect global data from all nodes and process it on a single
node are inefficient for large systems. For instance, a scheduling algorithm that requires
collecting and processing information from all nodes leads to heavy network traffic and
increased latency as the system grows. Decentralized algorithms, which rely on local
information and avoid global state collection, are more scalable.​

4.​ Perform Operations on Client Workstations​


By offloading operations to client workstations rather than relying on shared server
resources, systems can minimize contention for central resources. This improves
scalability by reducing the load on servers and allowing the system to gracefully scale as
the number of clients increases. Caching is a common technique that facilitates this
principle by enabling faster access to locally stored data.​

Applying these principles ensures that distributed systems can effectively scale to meet
increasing demands while maintaining performance and resilience.

9. Heterogeneity: in distributed systems arises from the use of different hardware, software,
communication protocols, and data formats across interconnected nodes. This diversity leads to
challenges in compatibility, requiring data translation between incompatible systems. The
complexity increases with the number of formats, making it difficult to manage and scale. Using
an intermediate standard data format for conversion reduces software complexity and improves
system interoperability.

10. Security: Enforcing security in distributed systems is more challenging than in centralized
systems due to the lack of a single control point and the use of insecure networks for
communication. Unlike centralized systems, where user authentication is straightforward, a
distributed system requires methods to authenticate both the client and server, ensuring that
messages are received by the intended recipient and are unaltered during transmission.

Key security requirements in a distributed system include:

1.​ Ensuring the sender knows the message was received by the intended receiver.
2.​ Ensuring the receiver knows the message was sent by the genuine sender.
3.​ Guaranteeing that the contents of the message remain unchanged during transfer.

Cryptography is essential for addressing these issues by encrypting messages, ensuring


confidentiality and integrity. Additionally, focusing security on a smaller number of entities (such
as servers) instead of clients helps maintain system security as it scales.

Types of Transparency:

Access transparency: Access transparency ensures that users cannot distinguish between
local and remote resources in a distributed system. The system should provide a uniform
interface where system calls remain the same regardless of resource location. While complete
access transparency is challenging due to communication failures, global resource naming has
been successfully developed. Distributed shared memory also aids in access transparency but
has performance limitations for certain applications.

Location Transparency: Location transparency in a distributed system ensures seamless


access to resources without concern for their physical location. It has two key aspects:

1.​ Name Transparency – Resource names should not reveal their physical location and
must remain unchanged even if resources move within the system. Names should be
unique systemwide.​

2.​ User Mobility – Users should be able to access resources using the same name
regardless of which machine they log into, without requiring additional effort.​
Both aspects rely on a global resource naming facility.

3. Replication Transparency: Replication Transparency ensures that users are unaware of the
existence and management of multiple copies of a resource in a distributed system. It involves:

1.​ Naming of Replicas – The system maps user-supplied resource names to appropriate
replicas without user intervention.
2.​ Replication Control – The system automatically handles decisions on the number,
placement, creation, and deletion of replicas for performance and reliability.
These tasks are managed entirely by the system to maintain seamless access.

4. Failure Transparency: Failure Transparency ensures that users remain unaware of partial
system failures, such as communication failures, machine crashes, or storage failures. A
distributed operating system with failure transparency continues functioning, possibly in a
degraded form.

For example, a failure-transparent file service can be implemented using multiple file servers
that cooperate to ensure uninterrupted access, even if some servers fail. However, designing
such systems requires balancing redundancy and overhead.

While complete failure transparency is impractical due to network failures and cost constraints,
partial failure handling improves system reliability.

5. Migration Transparency: Migration Transparency ensures that object movement (e.g., files
or processes) in a distributed system happens automatically without user awareness. Key
aspects include:

1.​ Automated Migration Decisions – The system determines which objects to move and
where.
2.​ Name Preservation – Objects retain their original names after migration.
3.​ Seamless Communication – Messages reach migrating processes without requiring
resending.

This enhances performance, reliability, and security while maintaining uninterrupted


operation.

6. Concurrency Transparency: Concurrency Transparency ensures that users in a distributed


system experience resource sharing as if they were the sole users, without interference from
others. To achieve this, the system must enforce:

1.​ Event Ordering – Ensures consistent access sequencing for all users.
2.​ Mutual Exclusion – Prevents multiple processes from simultaneously accessing
resources that require exclusive use.
3.​ No Starvation – Guarantees that every process requesting a resource eventually gets
access.
4.​ No Deadlock – Prevents situations where processes block each other indefinitely.

These properties maintain system integrity and efficient resource sharing.

7. Performance Transparency: Performance Transparency ensures that a distributed system


dynamically reconfigures itself to optimize performance. Key aspects include:

1.​ Load Balancing – Prevents some processors from being overloaded while others remain
idle.
2.​ Intelligent Resource Allocation – Efficiently distributes system resources among active
jobs.
3.​ Process Migration – Moves processes between nodes to maintain optimal workload
distribution.

These mechanisms enhance system efficiency and responsiveness.

8. Scaling Transparency: Scaling Transparency ensures that a distributed system can expand
without disrupting users. It requires:

1.​ Open-System Architecture – Allows seamless integration of new components.


2.​ Scalable Algorithms – Ensures system components perform efficiently as scale
increases.

These design principles enable smooth growth and adaptability in distributed systems.

Hardware Concepts:

Hardware Concepts in Distributed Systems

Distributed systems consist of multiple CPUs, but their organization varies based on how they
interconnect and communicate. A fundamental classification method for multi-CPU systems is
Flynn’s taxonomy, which categorizes systems based on the number of instruction and data
streams.

Flynn’s Classification

1.​ SISD (Single Instruction Stream, Single Data Stream)​

○​ Traditional uniprocessor computers, including personal computers and


mainframes.
2.​ SIMD (Single Instruction Stream, Multiple Data Streams)​

○​ Parallel processing with multiple data units executing the same instruction.
○​ Common in supercomputers and applications requiring repetitive calculations,
such as vector processing.
3.​ MISD (Multiple Instruction Streams, Single Data Stream)​

○​ Theoretical model; no known practical implementations.


4.​ MIMD (Multiple Instruction Streams, Multiple Data Streams)​

○​ Comprises multiple independent computers, each with its own program and data.
○​ All distributed systems belong to this category.

Further Classification of MIMD Systems


MIMD systems can be further divided into:

●​ Multiprocessors (Shared Memory)​

○​ A single virtual address space shared by all CPUs.


○​ If one CPU writes data, others can immediately access the updated value.
●​ Multicomputers (Distributed Memory)​

○​ Each CPU has its own private memory, requiring message passing for
communication.

This classification helps in understanding how distributed systems operate and manage
communication among processors.

Bus-based Multiprocessor

1. Basic Structure of a Bus-Based Multiprocessor

1.​ Common Bus


○​ Multiple CPUs are all connected to a single, shared bus.
○​ A memory module is also attached to the bus.
○​ A simple configuration might have a high-speed backplane or motherboard into
which CPU and memory cards can be inserted.
2.​ Address, Data, and Control Lines
○​ Typical buses have 32 or 64 address lines and the same number of data lines
(e.g., 32 or 64).
○​ There are also several control lines to coordinate operations.
○​ During a memory read, the CPU puts the desired address on the bus address
lines and asserts the appropriate control signals. Memory then places the data on
the bus data lines for the CPU to read.
○​ Write operations follow a similar pattern, with the CPU placing both the address
and data onto the bus, and then asserting the relevant control lines to write into
memory.

2. Performance Challenges

1.​ Bus Overload with Multiple CPUs


○​ As soon as more than a few CPUs (e.g., 4 or 5) are connected, the bus can
become a bottleneck.
○​ Every CPU’s request for memory travels over the same shared bus, causing
contention and potentially reducing overall system performance.
2.​ Coherence Issues
○​ When multiple CPUs can read and write the same memory locations, it is crucial
to maintain a consistent view of memory (cache coherence).
○​ If one CPU writes to a memory location and another CPU reads it soon after, the
second CPU should see the updated value.
○​ Maintaining coherence becomes more complex when each CPU has its own
cache (described next).

3. The Role of Caches

1.​ Adding High-Speed Caches


○​ To reduce bus traffic, each CPU is typically equipped with a cache that stores
recently accessed words from memory.
○​ When a CPU needs a word, it checks its cache first:
■​ Cache Hit: If the data is in the cache, no bus request is needed, so the
bus is not used, reducing contention.
■​ Cache Miss: If the data is not in the cache, the CPU issues a request on
the bus to fetch it from memory.
2.​ Impact on Bus Traffic
○​ With effective caching, a high percentage of CPU requests can be served from
local caches.
○​ This lowers the frequency of bus usage and thus mitigates the bottleneck
problem.
○​ Typical cache sizes might range from 64 KB to 1 MB, which often yields a high hit
rate.
3.​ Cache Coherence
○​ Once caches are introduced, the system must ensure that all CPUs see the
correct, up-to-date data.
○​ If one CPU modifies a cached word, other CPUs that also have that word in their
caches must be made aware of the change.
○​ Various cache-coherence protocols (e.g., MESI, MOESI) handle this by
invalidating or updating other caches as needed.

4. Overall Summary

●​ Design Goal: Achieve high parallel performance by allowing multiple CPUs to operate
simultaneously on shared data, while minimizing the contention for a single bus.
●​ Key Trade-Off: A single bus is simpler but becomes a bottleneck as the number of
CPUs grows. Caches greatly reduce bus traffic but introduce the complexity of
maintaining coherence.
●​ Practical Approach: Equip each CPU with a private cache, implement a coherence
protocol to keep data consistent, and rely on the bus for misses and cache coherence
signals.

Why This Matters

Bus-based multiprocessors with caches are one of the foundational designs for shared-memory
parallel systems. While straightforward to implement for a small number of CPUs, they require
careful design of cache-coherence mechanisms and bus arbitration to scale effectively.

By including caches, overall performance improves significantly (due to fewer bus transactions),
but ensuring coherence is key to correct program behavior. Different hardware protocols and
optimizations exist to handle coherence efficiently, especially as systems scale in CPU count
and complexity.

Using it, it is possible to put about 32 or 64 CPUs on a single bus

Switched Microprocessors:

1.​ Crossbar Switch:​

○​ A method to connect multiple CPUs to memory using a crossbar switch.


○​ Each CPU and memory module has a dedicated connection, with crosspoint
switches at intersections.
○​ These switches momentarily close when a CPU accesses memory.
○​ Multiple CPUs can access memory simultaneously unless they target the same
memory module, causing delays.
2.​ Disadvantages of Crossbar Switch:​

○​ Requires n2 crosspoint switches for n CPUs and n memory modules.


○​ Becomes impractical for large values of nn due to cost and complexity.
3.​ Omega Switching Network:​

○​ Uses fewer switches than a crossbar, reducing cost and complexity.


○​ Composed of multiple 2×2 switches that can route inputs to different outputs.
○​ Requires log⁡2n switching stages, each containing n/2 switches, for a total of
(nlog⁡2n)/2 switches.
○​ More efficient than the crossbar switch but still incurs substantial switching
delays.
4.​ Delay Issues:​

○​ For n=1024n = 1024, a request must traverse 20 switching stages (10 outbound,
10 inbound).
○​ If a CPU runs at 100 MIPS (10 ns per instruction), switching time must be 500
picoseconds (0.5 ns) per stage.
○​ A large multiprocessor would require thousands of high-speed switches, making
it costly.
5.​ NUMA (Non-Uniform Memory Access) Machines:​

○​ Introduces a hierarchical system where each CPU has its local memory.
○​ Accessing local memory is fast, but accessing other CPUs’ memory is slower.
○​ Requires careful software optimization to ensure most memory accesses are
local.
6.​ Overall Conclusion:​

○​ Bus-based multiprocessors are limited to 64 CPUs due to bus capacity


constraints.
○​ To scale beyond this, switching networks like crossbar or omega are needed.
○​ Large crossbar switches are expensive, and large omega networks are slow.
○​ NUMA machines offer better average access times but require complex software
management.
○​ Building a large, tightly-coupled, shared memory multiprocessor is possible but
challenging and expensive.
Module 2

Interprocess Communication (IPC):

1.​ Process and Communication in Distributed Systems:​

○​ A process is a program in execution.


○​ In a distributed system, two computers communicate through their respective
processes.
○​ Processes on different computers need to communicate to achieve a common
goal.
○​ Example: A resource manager process on each computer monitors local
resources, and these managers communicate to balance the system load
dynamically.
2.​ Interprocess Communication (IPC) Mechanisms:​

○​ The operating system must provide IPC mechanisms to enable communication


between processes.
○​ IPC involves information sharing among processes.
○​ There are two fundamental approaches to information sharing:
■​ Shared-data approach (Original Sharing)
■​ Message-passing approach (Copy Sharing)
3.​ Shared-data Approach:​

○​ Information is placed in a common memory area.


○​ All processes involved in IPC can access this shared memory.
○​ Conceptual model illustrated in Figure 3.1(a).
4.​ Message-passing Approach:​

○​ Information is physically copied from the sender’s address space to the receiver’s
address space.
○​ Data is transmitted in the form of messages.
○​ Conceptual model illustrated in Figure 3.1(b).
5.​ IPC in Distributed Systems:​

○​ Since computers in a network do not share memory, the shared-data approach is


not feasible.
○​ Distributed systems primarily use message passing for IPC.
6.​ Message-Passing System in Distributed OS:​

○​ A message-passing system is a subsystem of a distributed operating system


that provides message-based IPC.
○​ It abstracts network protocols and platform heterogeneity.
○​ Allows processes to communicate using simple primitives like send and receive.
○​ Serves as a foundation for higher-level IPC mechanisms like:
■​ Remote Procedure Call (RPC) (explored in Chapter 4)
■​ Distributed Shared Memory (DSM) (explored in Chapter 5)

Conclusion:

In distributed systems, message passing is the primary IPC mechanism because shared
memory is not available. A message-passing system simplifies communication by handling
network complexities and provides a foundation for advanced IPC methods like RPC and DSM.

DESIRABlE FEATURES OF AGOOD MESSAGE·PASSING SYSTEM


Message Block in IPC:
Design Issues:

Solutions:

1.​ Synchronization:

Synchronization in Message-Passing Systems

Synchronization plays a crucial role in message-passing systems, as it determines how


communication between processes is coordinated. The semantics of synchronization are
broadly classified into blocking and nonblocking types, which impact the way processes send
and receive messages.

1. Blocking vs. Nonblocking Communication


Blocking Semantics

A primitive is considered blocking if its invocation halts the execution of the process until a
certain condition is met.
●​ Blocking Send:​

○​ The sender process executes the send statement and gets blocked until it
receives an acknowledgment from the receiver confirming that the message has
been received.
○​ If the receiver is not ready or fails, the sender remains blocked.
●​ Blocking Receive:​

○​ The receiver process executes the receive statement and gets blocked until a
message arrives.
○​ The receiver cannot proceed until it successfully obtains the message.

Nonblocking Semantics

A primitive is considered nonblocking if its invocation does not stop execution. The process
continues running immediately after invoking the primitive.

●​ Nonblocking Send:​

○​ The sender places the message into a buffer and continues execution without
waiting for an acknowledgment.
○​ It improves concurrency but requires additional mechanisms to ensure message
delivery.
●​ Nonblocking Receive:​

○​ The receiver executes the receive statement, but instead of waiting for a
message, it continues execution immediately.
○​ The system must provide a way for the receiver to check if a message has
arrived.

2. Handling Nonblocking Receive


Since a nonblocking receive does not halt execution, the receiver must determine when a
message has arrived. There are two common methods for this:

1.​ Polling:​

○​ The receiver periodically checks the buffer using a test primitive to see if a
message is available.
○​ If no message has arrived, it continues execution and checks again later.
○​ This method can be inefficient due to frequent checks, which may waste CPU
resources.
2.​ Interrupts:​

○​ When a message arrives in the buffer, a software interrupt notifies the receiving
process.
○​ This method allows the receiver to continue execution without unnecessary
polling.
○​ It enables maximum parallelism but introduces complexity, as user-level
interrupts can be difficult to handle.

Conditional Receive Primitive

●​ A variation of nonblocking receive that returns control immediately with either:


○​ A message (if available).
○​ An indicator that no message is currently available.

3. Synchronous vs. Asynchronous Communication


Synchronous Communication

●​ Occurs when both send and receive primitives use blocking semantics.​

●​ The sender and receiver must synchronize before exchanging messages.​

●​ Execution flow:​

1.​ The sender process sends a message and waits for an acknowledgment
before proceeding.
2.​ The receiver executes the receive statement and remains blocked until the
message arrives.
3.​ The receiver then sends an acknowledgment message to the sender.
4.​ The sender resumes execution only after receiving the acknowledgment.
●​ Illustration of Synchronous Communication:​

●​ Advantages of Synchronous Communication:

✅ Simple and easy to implement.​


✅ Ensures reliability, as the sender knows its message has been received before proceeding.​
✅ No need for backward error recovery since undelivered messages are detected
immediately.

Disadvantages of Synchronous Communication:

❌ Limits concurrency because processes must wait for each other.​


❌ Risk of communication deadlocks, where both sender and receiver wait indefinitely
❌ Less flexibility, as the sender must always wait for an acknowledgment, even when
(discussed in Chapter 6).​

unnecessary.

●​ Thread-based Optimization:
○​ In systems supporting multiple threads within a process (discussed in Chapter
8), blocking primitives can be used without significantly reducing concurrency.
○​ One thread may be blocked on message communication while others continue
execution.

Asynchronous Communication

●​ At least one operation (send or receive) is nonblocking.


●​ Allows higher concurrency because the sender does not wait for an acknowledgment,
and the receiver does not wait for a message.
●​ Requires additional mechanisms to ensure message delivery, such as buffering and
error handling.

4. Timeout Mechanism in Blocking Communication


A blocking send or receive may cause a process to be blocked indefinitely in certain
situations:

●​ The intended receiver process crashes before receiving the message.


●​ The message gets lost in the network due to a communication failure.

Solution: Timeout Values

●​ A timeout value is set to define a maximum waiting period.


●​ If the message is not received within this period, the operation terminates with an error
status.
●​ Timeout values can be:
○​ Predefined default values.
○​ User-specified parameters in the send or receive primitive.

Timeout values prevent infinite blocking and enhance system reliability.

5. Flexibility in Message Passing Systems


A well-designed message-passing system should:

●​ Support both blocking and nonblocking send/receive primitives.


●​ Allow developers to choose the best synchronization method based on application
requirements.
●​ Balance concurrency, reliability, and efficiency through appropriate synchronization
strategies.

Conclusion
Synchronization in message-passing systems plays a vital role in determining system
performance, concurrency, and reliability.

●​ Synchronous communication (blocking send/receive) ensures message delivery


but can lead to communication deadlocks and reduced concurrency.
●​ Asynchronous communication (nonblocking send/receive) improves concurrency
but requires additional mechanisms for message handling.
●​ Nonblocking receive operations can use polling (inefficient) or interrupts (efficient
but complex).
●​ Timeout mechanisms prevent indefinite blocking in case of process crashes or network
failures.
●​ A flexible system supports both blocking and nonblocking primitives to
accommodate different use cases.

By understanding and carefully choosing synchronization strategies, developers can


design efficient, reliable, and scalable distributed systems.

2. Buffering:

Buffering Strategies in Message Passing

In interprocess communication (IPC), messages can be transmitted from one process to


another by copying the message body from the sender’s address space to the receiver’s
address space. This transfer may happen directly or via intermediate storage in the operating
system’s kernel.

However, the receiver may not always be ready to accept the message at the time of
transmission. In such cases, the operating system must provide buffering mechanisms to
temporarily store messages until the receiver is ready.

Buffering strategies play a crucial role in synchronization between communicating processes.


Synchronous and asynchronous communication correspond to two extreme buffering
strategies:

●​ Null Buffering (No Buffering)


●​ Unbounded Buffering (Infinite Buffer Capacity)

Additionally, two other buffering strategies are commonly used:

●​ Single-Message Buffering
●​ Finite-Bound Buffering (Multiple-Message Buffers)
3.5.1 Null Buffer (No Buffering)
In this strategy, no intermediate storage is used between the sender and receiver. Since there is
no place to temporarily store the message, one of the following methods is used:

1. Delayed Message Transmission (Blocking Sender)

●​ The message remains in the sender’s address space, and the sender is blocked until
the receiver executes a corresponding receive() operation.
●​ The sender process is suspended and restarts the send() operation once the receiver
is ready.
●​ The receiver executes receive(), sending an acknowledgment to the sender’s
kernel.
●​ Upon receiving the acknowledgment, the sender is unblocked and retries the send
operation.

2. Discard and Retransmit (Timeout Mechanism)

●​ The message is discarded if the receiver is not ready.


●​ The sender waits for an acknowledgment from the receiver. If no acknowledgment is
received within a timeout period, the sender resends the message.
●​ The sender retries multiple times before giving up if the receiver remains
unresponsive.


Characteristics of Null Buffering:​


Minimal memory usage (no extra storage required).​


Ensures tight synchronization between sender and receiver.​


High synchronization overhead (sender and receiver must execute at the same time).​
Potential message loss (in case of timeout-based retransmissions).

📌 Illustration (Figure 3.4a)


●​ In null buffering, message transfer happens directly from sender to receiver in a single
copy operation.

3.5.2 Single-Message Buffering


To overcome the inefficiencies of null buffering, single-message buffering is used in distributed
systems.

How It Works:
●​ A single buffer is allocated at the receiver’s node.
●​ If the receiver is not ready, the message is temporarily stored in the buffer.
●​ The message remains ready for retrieval when the receiver executes receive().
●​ The buffer can be located in:
○​ The kernel’s address space (managed by the OS).
○​ The receiver’s address space (managed by the process).


Characteristics of Single-Message Buffering:​
Reduces synchronization constraints (sender and receiver don’t need to be active


simultaneously).​


Ensures message reliability (no immediate loss if the receiver is busy).​


Still limited to one message at a time (cannot handle high message traffic).​
Involves two copy operations (sender → buffer, buffer → receiver).

📌 Illustration (Figure 3.4b)


●​ In this case, message transfer involves two copy operations:
○​ Sender → Buffer (Receiver Node)
○​ Buffer → Receiver Process

3.5.3 Unbounded-Capacity Buffering


In asynchronous communication, the sender does not wait for the receiver to be ready. Multiple
messages may be pending before being received, requiring an unbounded message
buffer.

How It Works:

●​ The system maintains a message queue of unlimited size for each receiver.
●​ Messages remain stored until explicitly retrieved by the receiver.
●​ The sender can send messages at any time, and they will be queued up for
processing.


Characteristics of Unbounded Buffering:​


Ideal for high-throughput systems (messages won’t be lost).​


Maximum flexibility (no blocking on sender or receiver).​


Impractical in real systems (memory is finite).​
Requires complex memory management (to prevent excessive storage usage).

3.5.4 Finite-Bound Buffering (Multiple-Message Buffers)


Since an unbounded buffer is unrealistic, most systems implement finite-bound buffers with
a fixed storage capacity.

Handling Buffer Overflows

When the buffer reaches its limit, the system must decide what to do with new incoming
messages. Two strategies are commonly used:

1.​ Unsuccessful Communication (Message Drop)​

○​ If the buffer is full, new messages are discarded.


○​ The sender receives an error notification and may choose to retry later.
○​ Reduces reliability, as messages may be lost.
2.​ Flow-Controlled Communication (Blocking Sender)​

○​ The sender blocks until the receiver processes some messages, creating space
in the buffer.


○​ Synchronization is enforced, preventing message loss.
○​ May cause deadlocks if not handled properly.

Implementation Considerations

●​ Buffers (also called mailboxes or ports) can be allocated in:


○​ Kernel address space → Managed by OS, limited per process.
○​ Receiver process address space → Managed by the process, requiring
additional memory handling.
●​ When using mailboxes, a system call (create_buffer) allows receivers to define
buffer size.


Characteristics of Finite-Bound Buffering:​


More efficient than unbounded buffering (prevents unlimited memory growth).​


More reliable than null/single-message buffering (can hold multiple messages).​


Risk of overflow (must manage buffer size carefully).​
Extra overhead in buffer management (memory allocation, message ordering, etc.).

📌 Illustration (Figure 3.4c)


●​ Message transfer involves two copy operations:
○​ Sender → Buffer (Mailbox in Receiver Node)
○​ Buffer → Receiver Process

Comparison of Buffering Strategies


Buffering Synchronizatio Buffer Message Blocking Use Case
Strategy n Size Reliability

Null Buffering Tight None Low Sender Simple IPC


(No Buffering) (Possible Blocks
Message
Loss)

Single-Messag Moderate 1 Medium (No Receiver Basic


e Buffering Message Overflow Blocks Distributed
Handling) Systems

Unbounded-C Loose Unlimited High (No None High-Through


apacity (Asynchronous Message put Systems
Buffering ) Loss)

Finite-Bound Flexible Fixed Medium May Block General


(Multiple-Mess (Controlled) (Risk of (Flow-Contr Asynchronous
age) Buffering Overflow) olled) IPC

Conclusion
Choosing the right buffering strategy depends on:

●​ Synchronization needs → Blocking vs. Nonblocking.


●​ Memory constraints → Unbounded buffers are impractical.
●​ Message reliability requirements → Avoiding message loss vs. handling buffer
overflow.

📌 Key Takeaways:
●​ Null buffering is the simplest but least flexible method.
●​ Single-message buffering improves performance but still has synchronization
limitations.
●​ Unbounded buffering is ideal in theory but impractical due to memory constraints.
●​ Finite-bound buffering is the most commonly used strategy in real-world systems.

🚀
By understanding these buffering mechanisms, developers can design efficient IPC
models tailored to their system requirements.
3. Multidatagram Messages and Maximum Transfer Unit (MTU)

1. Understanding MTU (Maximum Transfer Unit)


●​ In computer networks, MTU refers to the maximum size of data that can be transmitted
in a single packet.
●​ Almost all networks have an upper limit on the size of a data packet.
●​ Any message larger than the MTU must be fragmented into multiple smaller packets.

2. Types of Messages Based on MTU

1.​ Single-Datagram Messages​

○​ If the message size is less than or equal to the MTU, it can be sent in a single
packet.
○​ These messages do not require fragmentation and can be transmitted directly.
2.​ Multidatagram Messages​

○​ If the message size is greater than the MTU, it must be fragmented into multiple
packets.
○​ Each fragment is sent separately and contains both control information and
message data.
○​ The order of packets is important since the receiver must reassemble them
correctly.

3. Fragmentation and Reassembly Process

●​ On the sender’s side​

○​ A large message is broken into multiple smaller packets (fragments).


○​ Each fragment is assigned a sequence number to maintain order.
○​ Packets are transmitted separately over the network.
●​ On the receiver’s side​

○​ The receiver collects all fragments.


○​ It uses the sequence numbers to reassemble the original message.
○​ Once all fragments are received, the complete message is reconstructed.

4. Importance of Message Fragmentation

✅ Ensures compatibility with network constraints (MTU limitations).​


✅ Allows large data transmission by breaking it into smaller packets.​
✅ Improves network efficiency by reducing transmission failures due to oversized
❌ Increases overhead (extra control information in each packet).​
messages.​

❌ Requires additional processing for reassembly at the receiver’s end.


5. Role of Message-Passing System

●​ The message-passing system (such as TCP/IP or UDP) is responsible for:


○​ Dividing large messages into multiple packets (fragmentation).
○​ Adding control information (such as headers and sequence numbers).
○​ Ensuring correct order and reassembly of fragmented packets at the receiver.

📌 Conclusion:​
Multidatagram messages play a vital role in enabling the transmission of large data across

🚀
networks while adhering to MTU constraints. Efficient fragmentation and reassembly are
crucial for maintaining data integrity and reliable communication in distributed systems.

4. Encoding and Decoding in Message Passing

1. Need for Encoding and Decoding

●​ A message should be meaningful to the receiving process.


●​ Program objects should ideally retain their structure when transmitted.
●​ However, challenges arise due to:
○​ Heterogeneous Systems: Different computer architectures handle data
differently.
○​ Homogeneous Systems: Even in similar systems, absolute pointers and
varying data sizes create difficulties.

2. Challenges in Transmitting Program Objects

1.​ Absolute Pointers Lose Meaning​

○​ An absolute memory address is valid only within a specific process’s


address space.
○​ Transmitting a pointer as-is will make it useless in another process.
○​ Solution:
■​ Convert complex structures (e.g., trees, linked lists) into a flat, ordered
format.
■​ Send object-type information for correct reconstruction at the receiver’s
side.
2.​ Varying Storage Requirements​

○​ Messages contain multiple data types (e.g., integers, strings, floating


points).
○​ The receiver needs to understand the data structure inside the message
buffer.
○​ Solution:
■​ Include metadata to describe data type and size.

3. Encoding and Decoding Process

●​ Encoding (Sender Side): Converts data into a stream format suitable for transmission.
●​ Decoding (Receiver Side): Converts the received data back into program objects.
●​ Ensures correct data representation across different architectures and systems.

4. Encoding Methods

There are two main encoding techniques:

1.​ Tagged Representation​

○​ Each data element includes both its value and type.


○​ Self-descriptive → The receiver can directly interpret the data.
○​ Example Standards:
■​ ASN.1 (Abstract Syntax Notation)
■​ Mach Distributed Operating System
○​ Pros: Easier decoding.
○​ Cons: Increases data size and processing time.
2.​ Untagged Representation​

○​ Only the raw data is sent (no type information included).


○​ The receiver must already know the expected format.
○​ Example Standards:
■​ Sun XDR (External Data Representation)
■​ Courier (Xerox)
○​ Pros: More efficient (smaller message size, faster processing).
○​ Cons: Requires predefined knowledge of the message structure.

5. Symmetry in Encoding and Decoding

●​ Encoding and decoding are perfectly symmetrical.


●​ The receiver must decode exactly what was encoded.
●​ Differences in local representations (e.g., byte order, alignment) are handled.

6. Error Handling in Decoding

●​ If the receiver gets badly encoded data (e.g., exceeds max length), it:
○​ Fails to decode the message.
○​ Returns an error message to the sender.

📌 Conclusion:​
Encoding and decoding are critical for reliable message passing in distributed systems. The

🚀
choice between tagged and untagged representation depends on a trade-off between
efficiency and flexibility.

4. Process Addressing in Message-Based Communication


1. The Need for Addressing in Message Passing

●​ A sender must specify to whom it wants to send a message.


●​ A receiver must determine from whom it will accept a message.
●​ Types of Addressing:
○​ Explicit Addressing: The sender directly specifies the receiver’s process ID.
○​ Implicit Addressing: The sender specifies a service instead of a process.

2. Explicit vs. Implicit Addressing

Type Description Use Case

Explicit The sender specifies the Direct communication between two


Addressing exact process ID. processes.

Implicit The sender specifies a Client-server communication, where any


Addressing service, not a process. available server can respond.

📌 Primitives for Process Addressing:


●​ Explicit
○​ send(process_id, message): Sends a message to a specific process.
○​ receive(process_id, message): Receives a message from a specific
process.
●​ Implicit
○​ send_any(service_id, message): Sends a message to any process
providing a service.
○​ receive_any(process_id, message): Receives a message from any
sender.

3. Process Addressing Methods

(A) Machine-Based Addressing

●​ Each process is identified by machine_id@local_id.


●​ The machine_id directs the message to the correct machine.
●​ The local_id identifies the receiving process on that machine.
●​ Example: Used in Berkeley UNIX (32-bit Internet address + 16-bit process ID).
✅ Pros:
●​ No need for global coordination.
●​ Local IDs can be assigned independently by each machine.

❌ Cons:
●​ Process Migration is Impossible: If a process moves to another machine, its original
address becomes invalid.

(B) Link-Based Addressing (Supports Process Migration)

●​ Uses three fields: machine_id@local_id@machine_id


○​ First field: Original machine where the process was created.
○​ Second field: Unique local ID assigned at creation.
○​ Third field: Current location of the process.
●​ How it Works:
○​ When a process migrates, it leaves a link on the old machine pointing to the
new machine.
○​ The message follows these links until it reaches the process.
○​ The sender caches the new location for future direct communication.
●​ Example: Used in DEMOSIMP and Charlotte distributed systems.

✅ Pros:
●​ Supports Process Migration.
●​ Caches last known location to improve efficiency.

❌ Cons:
●​ Overhead: If a process moves frequently, finding it may take multiple hops.
●​ Failure Risk: If an intermediate machine crashes, the process may become
unreachable.

(C) Location-Transparent Addressing (Preferred in Distributed Systems)

●​ Goal: Users should not need to know the physical location of a process.
●​ Solution: Do not embed the machine ID in process identifiers.

✅ Methods to Achieve Location Transparency:


1️⃣ Centralized Process Identifier Allocation

●​ A single central counter generates unique process IDs.


●​ Issue: Not scalable or reliable (single point of failure).

2️⃣ Two-Level Naming Scheme (Using a Name Server)


●​ Each process has:
○​ High-Level Name (e.g., ServiceA) → Machine-independent.
○​ Low-Level Name (machine_id@local_id) → Machine-dependent.
●​ Process Lookup:
○​ The sender queries a name server to get the process’s current low-level name.
○​ The message is sent using this information.
○​ The sender caches the mapping for future use.

✅ Advantages:
●​ Supports process migration without modifying programs.
●​ Works for functional addressing (e.g., mapping a service name to multiple
processes).

❌ Disadvantages:
●​ Single point of failure (if the name server crashes).
●​ Scalability issues (high demand on the name server).
●​ Solution: Replicate the name server (but requires synchronization).

Conclusion

●​ Explicit vs. Implicit Addressing:


○​ Use explicit when specific process communication is needed.
○​ Use implicit when service-based communication is required.
●​ Process Addressing Methods:
○​ Machine-based: Simple but does not support migration.
○​ Link-based: Supports migration but adds overhead.
○​ Location-transparent (Name Server): Best for distributed systems but needs
replication.

🚀 Final Thought: Distributed systems should prioritize location transparency and scalability
for efficient process communication!
6. Failure Handling in Message-Based Communication

In a distributed system, interprocess communication (IPC) is vulnerable to partial failures, such


as node crashes or communication link failures. These failures can lead to several issues,
including:

1.​ Loss of Request Message – The message may be lost due to a communication link
failure or because the receiver's node is down.
2.​ Loss of Response Message – The response may be lost due to network failure or if the
sender's node crashes.
3.​ Unsuccessful Request Execution – If the receiver's node crashes during request
processing, the request execution may be incomplete.

To address these issues, reliable IPC protocols use internal retransmissions and
acknowledgment messages to ensure message delivery. The sender's kernel retransmits a
message if no acknowledgment is received within a specified timeout period.

Reliable IPC Protocols

1.​ Four-Message Reliable IPC Protocol​

○​ The client sends a request message to the server.


○​ The server's kernel acknowledges the request message. If no acknowledgment is
received, the client retransmits the request.
○​ The server processes the request and sends a reply message.
○​ The client's kernel acknowledges the reply. If no acknowledgment is received, the
server retransmits the reply.
2.​ Three-Message Reliable IPC Protocol​

○​ The client sends a request to the server.


○​ The server processes the request and sends a reply. If the client does not receive
the reply within the timeout period, it retransmits the request.
○​ The client acknowledges the reply. If no acknowledgment is received, the server
retransmits the reply.
○​ This method is efficient but may cause unnecessary retransmissions if the
request processing time is long.
3.​ Enhanced Three-Message Protocol (Handles Long Processing Time)​

○​ The server starts a timer upon receiving the request. If it finishes before the timer
expires, the reply serves as an acknowledgment. Otherwise, a separate
acknowledgment is sent.
○​ If no acknowledgment is received, the client retransmits the request.
○​ The client acknowledges the reply to prevent unnecessary retransmissions.
4.​ Two-Message IPC Protocol (Used in Many Systems)​
○​ The client sends a request and waits for a reply.
○​ The server processes the request and sends a response. If the client does not
receive the reply within the timeout, it retransmits the request.
○​ This protocol follows at-least-once semantics, ensuring that the request is
executed at least once. However, it may cause duplicate executions, leading to
inconsistent results.

Example of Failure Handling

A client-server communication using the two-message IPC protocol may experience the
following:

●​ The request message is lost, and the client retransmits it.


●​ The server crashes during execution, restarts, and reprocesses the request.
●​ The response is lost, prompting the client to resend the request.
●​ The server may process the request multiple times, producing different results.

Conclusion

Reliable IPC protocols mitigate failures through retransmissions and acknowledgments. The
choice of protocol depends on the application’s tolerance for duplicate executions and network
overhead considerations.

THE RPC MODEL

Explanation of the Remote Procedure Call (RPC) Model

Remote Procedure Call (RPC) is an extension of the traditional procedure call mechanism that
allows a process to execute a procedure located in a different address space, possibly on
another computer. Unlike a regular procedure call, where the caller and the callee share
memory, in RPC, the caller (client process) and callee (server process) operate in separate
memory spaces and exchange information through message passing.

How RPC Works:

1.​ Call Initiation:​

○​ The client process sends a request message to the server, containing the
procedure name and parameters.
○​ The client then waits (blocks) for a response from the server.
2.​ Procedure Execution:​
○​ The server receives the request, extracts the parameters, and executes the
requested procedure.
○​ Once execution is complete, the server sends the result back to the client in a
reply message.
3.​ Response Handling:​

○​ The client receives the reply message, extracts the result, and resumes
execution from the calling point.

Characteristics of RPC:

●​ Message Passing: Since the client and server do not share memory, they communicate
via request and response messages.
●​ Blocking by Default: The client usually waits (blocks) until it receives a response.
However, asynchronous RPC models allow the client to continue executing while waiting
for a response.
●​ Concurrency Models: The server can process requests sequentially or create threads
to handle multiple requests concurrently.
●​ Location Transparency: The client does not need to know the exact location of the
procedure—it only makes a request as if calling a local function.

RPC enables distributed computing by allowing processes to communicate and execute


functions across different systems while abstracting the complexities of network communication.

Transparency in Remote Procedure Calls (RPC)

A key design goal of RPC mechanisms is transparency, which means making remote
procedure calls behave as similarly as possible to local procedure calls. Transparency can be
categorized into two types:

1.​ Syntactic Transparency – The syntax of a remote procedure call should be identical to
that of a local procedure call. This ensures that programmers do not need to learn a
different way of writing function calls when using RPC.
2.​ Semantic Transparency – The behavior (semantics) of a remote procedure call should
match that of a local procedure call, ensuring that function execution, argument passing,
and return values work in the same way.

Challenges in Achieving Full Transparency

Despite efforts to make RPC resemble local procedure calls, several fundamental differences
make full transparency difficult:

1.​ Disjoint Address Spaces​

○​ In local calls, functions can access variables and memory of the caller.
○​ In RPC, the caller and callee operate in separate memory spaces (possibly on
different machines).
○​ Passing pointers (e.g., linked lists, graphs) is problematic because memory
addresses are meaningful only within a single process. Workarounds like copying
values before sending them may change program behavior.
2.​ Increased Vulnerability to Failures​

○​ Local procedure calls execute within the same process, reducing failure risks.
○​ RPCs involve multiple processes, a network, and potentially multiple computers,
making them prone to failures such as:
■​ Network communication errors
■​ Server crashes
■​ Delayed responses due to congestion
3.​ Higher Latency​

○​ Local function calls execute almost instantly.


○​ RPCs rely on network communication, making them 100 to 1000 times slower
than local calls.
○​ Delays can be caused by network congestion, server load, or communication
failures.

Debate on RPC Transparency

Due to these differences, some researchers argue that RPC should not be fully transparent:

●​ Hamilton (1984) suggested that remote procedures should be explicitly distinguished


from local ones, leading to a nontransparent RPC model.
●​ Argus RPC designers (Liskov & Scheifler, 1983) believed that while RPC should
abstract low-level details, it should not hide failures and delays from programmers,
allowing applications to handle them as needed.

Conclusion

While complete semantic transparency in RPC is impossible, enough abstraction can be


provided to make distributed programming easier while ensuring that programmers can handle
failures and network-related issues effectively.
Group Communication:

❖​ One-to-Many Communication in Message Passing Systems

One-to-many communication, also known as multicast communication, allows a single sender


to transmit a message to multiple receivers. A special case of this is broadcast
communication, where a message is sent to all processors connected to a network.

Applications of Multicast/Broadcast Communication

Multicast and broadcast communication play crucial roles in distributed systems. Some key
applications include:

●​ Server Management: A server manager can multicast a request to multiple server


processes, allowing a free server to respond and handle the request without keeping
track of available servers.
●​ Service Discovery: A process can broadcast an inquiry message to locate a processor
offering a specific service, reducing the need for tracking all service providers.

Group Management in One-to-Many Communication

In multicast communication, receiver processes form groups. There are two main types of
groups:

1.​ Closed Groups:​

○​ Only members of the group can send messages to the group.


○​ External processes can send messages to individual members but not to the
entire group.
○​ Suitable for processes working on a common problem that do not need to
communicate with external entities.
2.​ Open Groups:​

○​ Any process in the system can send messages to the group.


○​ Commonly used in distributed client-server architectures, where client
processes send requests to replicated servers.

A flexible message-passing system should support both closed and open groups, depending
on application needs.

Dynamic Group Management


A robust message-passing system should allow:

●​ Dynamic group creation and deletion.


●​ Processes to join or leave groups at any time.
●​ Efficient management of group membership information.

A simple approach to managing groups is through a centralized group server that handles:

●​ Group creation and deletion requests.


●​ Addition and removal of members.
●​ Maintaining up-to-date membership records.

However, a centralized approach has drawbacks:

●​ Poor reliability – If the central group server fails, the entire system is affected.
●​ Limited scalability – As the number of groups grows, the centralized server may
become a bottleneck.

To improve reliability, the group server can be replicated, but this introduces challenges in
maintaining consistency across all replicated servers.

❖​ Many-to-One Communication in Message Passing Systems

Many-to-one communication involves multiple senders transmitting messages to a single


receiver. The receiver can operate in two ways:

1.​ Selective Receiver – Specifies a unique sender and only exchanges messages with
that sender.
2.​ Nonselective Receiver – Accepts messages from any sender within a predefined set.

Key Issue: Nondeterminism

Since the receiver does not know which sender will send a message first, the communication is
nondeterministic. This is especially useful in scenarios where:

●​ The receiver waits for information from any available sender in a group.
●​ The receiver dynamically adjusts which senders it accepts messages from.

Example: Producer-Consumer Model


A buffer process in a producer-consumer system follows many-to-one communication:

●​ It accepts a message from a producer when the buffer is not full.


●​ It accepts a message from a consumer when the buffer is not empty.

Such a system requires a way to handle nondeterministic behavior, often achieved using
guarded commands introduced by Dijkstra (1975). These allow conditions to dynamically
control message acceptance, ensuring synchronization between senders and the receiver.

❖​ Many-to-Many Communication in Message Passing Systems

Many-to-many communication involves multiple senders transmitting messages to multiple


receivers. Since it includes elements of one-to-many and many-to-one communication, the
issues discussed in those schemes also apply here.

Key Issue: Ordered Message Delivery

●​ Ensures that all messages are delivered to all receivers in a consistent order
acceptable to the application.
●​ Required for applications such as database replication, where receiving updates in
different orders can lead to data inconsistencies.

Message Sequencing in Different Schemes

1.​ One-to-Many Communication:​

○​ Ensuring order is simple if the sender waits for confirmation before sending the
next message.
2.​ Many-to-One Communication:​

○​ Messages are delivered in the order they arrive at the receiver’s machine.
○​ Ordering is handled naturally by the receiver.

Challenges in Many-to-Many Communication

●​ A message from one sender may arrive at a receiver before another sender’s
message, while the order may be reversed at a different receiver.
●​ Causes:
○​ LAN contention: Multiple processes compete for network access, making
message order nondeterministic.
○​ WAN routing differences: Messages take different, unpredictable paths to the
same destination.

To resolve these challenges, a special message-handling mechanism is required for ordered


message delivery in a many-to-many communication scheme.

Absolute Ordering in Message Delivery

Absolute ordering ensures that all messages are delivered to all receiver processes in the
exact order in which they were sent.

Implementation Using Global Timestamps

●​ Each machine in the system has a synchronized clock.


●​ When a sender transmits a message, a timestamp is assigned as its identifier.
●​ This timestamp is embedded in the message.

Message Queue and Sliding-Window Mechanism

●​ The kernel at each receiver’s machine stores incoming messages in a separate


queue.
●​ A sliding-window mechanism ensures periodic delivery of messages.
○​ A fixed time interval (window size) is chosen.
○​ Messages within the current window are delivered to the receiver.
○​ Messages outside the window remain in the queue to accommodate late
arrivals with lower timestamps.
●​ The window size is selected based on the maximum possible delay in message
transmission across the network.

Consistent Ordering in Message Delivery

Absolute ordering requires globally synchronized clocks, which are difficult to implement.
However, many applications do not need absolute ordering. Instead, consistent ordering
ensures that all receiver processes receive messages in the same order, though this order
may differ from the original sending order.

Sequencer-Based Implementation

●​ The many-to-many scheme is transformed into a many-to-one and one-to-many


scheme.
●​ A sequencer assigns a sequence number to each message before multicasting it.
●​ Each receiver stores incoming messages in a separate queue and delivers them
immediately, unless there is a gap in the sequence numbers.
●​ Messages after the gap are not delivered until the missing ones arrive.

Limitation: The sequencer introduces a single point of failure and has poor reliability.

ABCAST Protocol for Consistent Ordering


A distributed approach (ABCAST protocol) avoids the single point of failure and works as
follows:

1.​ Sender assigns a temporary sequence number to the message, ensuring it is larger
than any previously used number.

2.​
3.​ Sender selects the largest proposed sequence number as the final one and sends a
commit message to all members.
4.​ Each member assigns the final sequence number to the message upon receiving the
commit message.
5.​ Messages are delivered in the order of their final sequence numbers.

The ABCAST protocol ensures consistent ordering semantics by achieving distributed


agreement on sequence numbers.

Causal Ordering in Message Delivery

For some applications, consistent-ordering semantics is unnecessary, and a weaker


ordering semantics, such as causal ordering, can improve performance.

Definition of Causal Ordering


Causal ordering ensures that if one message-sending event is causally related to another (i.e.,
the second event is influenced by the first), the messages are delivered in the correct order.
However, if two messages are not causally related, they can be delivered in any order.

Example of Causal Ordering

●​ A sender S1S_1 sends message m1m_1 to receivers R1,R2,R3R_1, R_2, R_3.


●​ R1R_1 receives m1m_1, inspects it, and then creates message m3m_3, sending it to
R2R_2 and R3R_3.
●​ Since m3m_3 is derived from m1m_1, it must be delivered after m1m_1.
●​ Another sender S2S_2 sends message m2m_2 to R2R_2 and R3R_3, but since m2m_2
is unrelated to m1m_1 or m3m_3, it can be delivered at any time.

CBCAST Protocol for Implementing Causal Ordering

1.​ Vector Timestamps:​

○​ Each process maintains a vector with one component for each group member.
○​ The iith component tracks the last received message from the corresponding
sender.
2.​ Sending a Message:​

○​ The sender increments its own vector component and attaches the updated
vector to the message.
3.​ Message Delivery Conditions:​

○​ Let SS be the sender’s vector and RR be the receiver’s vector.


○​ Let ii be the sender’s identifier.
○​ The message is delivered only if both conditions hold:
1.​ S[i]=R[i]+1 (ensures no message is missed from the sender).
2.​ S[j]≤R[j] for all j≠ii (ensures no causal dependency is violated).
4.​ Buffering Mechanism:​

○​ If the conditions fail, the message is buffered and rechecked when a new
message arrives.

Example of CBCAST Algorithm

●​ Four processes A,B,C,DA, B, C, D maintain vector timestamps.


●​ AA sends a new message, updating its vector.
●​ Message delivery results:
○​ BB can receive the message immediately.
○​ CC must delay the message because it missed a prior message from AA.
○​ DD must delay the message because it missed a message received by AA.

A good message-passing system should support consistent and causal ordering


semantics, allowing users to choose based on application needs.
Module 3

Election Algorithms in Distributed Systems

In distributed systems, certain processes need to take on special roles such as coordinator,
initiator, or sequencer. Election algorithms are used to select a coordinator among multiple
processes.

1.​ Purpose of Election Algorithms​

○​ Election algorithms ensure that when an election starts, all processes agree on a
single coordinator.
○​ The goal is to locate the process with the highest unique identifier (e.g., network
address) and designate it as the coordinator.
2.​ Assumptions in Election Algorithms​

○​ Each process has a unique number, typically a network address.


○​ Every process knows the process numbers of all others but does not know which
ones are active or down.
3.​ Key Characteristics​

○​ If all processes are identical with no distinguishing features, there is no natural


leader, making an election necessary.
○​ The election process must conclude with agreement on a new coordinator.
4.​ Examples of Election Algorithms​

○​ Several election algorithms exist, including those proposed by Fredrickson and


Lynch (1987), Garcia-Molina (1982), and Singh & Kurose (1994).

Bully Algorithm:

The Bully Algorithm for Coordinator Election

The Bully Algorithm, devised by Garcia-Molina (1982), is an election algorithm used in


distributed systems to select a new coordinator when the existing one fails. It follows a
hierarchical approach where the highest-numbered process always wins.

Steps of the Bully Algorithm

When a process, P, detects that the current coordinator is unresponsive, it initiates an election:

1.​ Election Initiation:​


○​ P sends an ELECTION message to all processes with a higher number.
2.​ Handling Responses:​

○​ If no one responds, P wins the election and becomes the new coordinator.
○​ If a higher-numbered process responds, it takes over the election, and P stops
its process.
3.​ Election Continuation:​

○​ When a higher-numbered process receives an ELECTION message, it replies


with an OK message, indicating that it is alive.
○​ The responding process then initiates its own election, repeating the process.
4.​ Coordinator Announcement:​

○​ The highest-numbered active process eventually wins.


○​ It sends a COORDINATOR message to all other processes, announcing its role.
5.​ Handling Process Recovery:​

○​ If a previously failed process rejoins and has the highest number, it will initiate
an election and take over as the coordinator, "bullying" its way to the top.

Example Scenario (Fig. 3-12 Explanation)

Consider a system with eight processes (0 to 7), where process 7 was the coordinator but has
crashed.

1.​ Process 4 detects the failure and starts an election, sending messages to 5, 6, and 7.
2.​ Processes 5 and 6 respond, indicating they are alive, and take over the election.
3.​ Process 6 wins and announces itself as the new coordinator by sending a
COORDINATOR message to all processes.

If process 7 restarts, it will initiate an election and become the coordinator again since it has
the highest number.

Key Characteristics of the Bully Algorithm

✅ Ensures the highest-priority process becomes the coordinator​


✅ Quick recovery from failures​
✅ Works in synchronous and asynchronous systems
🔴 Disadvantages:​
❌ High message complexity (Multiple elections cause overhead)​
❌ Higher-numbered processes dominate (Lower ones get overruled quickly)
The Bully Algorithm is widely used in distributed systems for fault-tolerant leader election,
ensuring continuity and recovery when failures occur.

The Ring Algorithm for Coordinator Election

The Ring Algorithm is an election algorithm used in distributed systems to select a new
coordinator when the existing one fails. It operates in a logically ordered ring of processes,
ensuring that every process knows its successor. Unlike token-based ring algorithms, it does not
require a special token for operation.

Steps of the Ring Algorithm

1.​ Detection of Coordinator Failure​

○​ A process detects that the coordinator is unresponsive.


○​ It initiates an ELECTION message containing its own process number and
sends it to its successor.
2.​ Handling Unresponsive Successors​

○​ If the successor is down, the sender skips it and moves to the next active
process in the ring.
○​ Each process that receives the ELECTION message adds its own process
number to the list and forwards it further.
3.​ Completion of the Election​

○​ The ELECTION message circulates around the ring until it reaches the process
that started it.
○​ The initiating process identifies the highest process number in the list as the
new coordinator.
4.​ Coordinator Announcement​

○​ The message type is changed to COORDINATOR and is circulated around the


ring again.
○​ This informs all processes about the new coordinator and finalizes the election.
○​ Once the message completes a full round, it is removed, and all processes
resume normal operations.

Example Scenario (Fig. 3-13 Explanation)

●​ In Fig. 3-13, two processes, 2 and 5, detect the failure of the previous coordinator
(process 7) simultaneously.
●​ Both independently initiate an election and circulate their messages.
●​ Since the election messages travel around the ring, both reach their starting points,
where they are converted into COORDINATOR messages.
●​ These COORDINATOR messages also circulate and inform all processes about the new
coordinator.
●​ Once both messages complete their rounds, they are removed, causing only minor
bandwidth overhead.
Key Characteristics of the Ring Algorithm

✅ Efficient for structured networks (logically ordered rings)​


✅ Ensures a fair election based on process numbers​
✅ Handles multiple simultaneous election initiations smoothly
🔴 Disadvantages:​
❌ Relies on a predefined logical order (Not ideal for dynamic systems)​
❌ Slower than the Bully Algorithm (Requires full circulation of messages)​
❌ Message overhead in large rings (As every process forwards messages)

Comparison with the Bully Algorithm

Feature Bully Algorithm Ring Algorithm

Selection Highest Process Number Highest Process Number


Criterion

Message High (Multiple responses & Moderate (One full round-trip)


Complexity elections)

Speed Faster Slower

Handling of Quick, but high overhead Slower, but fair


Failures

Multiple Elections Causes conflicts but stabilizes No conflicts, extra messages are just
redundant

Conclusion
The Ring Algorithm is a structured approach to coordinator election, ideal for systems with
logically ordered processes. While it introduces extra message overhead, it ensures a fair
and distributed election process. However, in dynamic or large-scale systems, the Bully
Algorithm may be preferred due to its faster convergence.
Lamport’s Algorithm:
Ricart–Agrawala algorithm:
Maekawa’s Algorithm

You might also like