0% found this document useful (0 votes)
26 views48 pages

CS3551 Unit 1 and 2

Uploaded by

Nusrath Farheen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views48 pages

CS3551 Unit 1 and 2

Uploaded by

Nusrath Farheen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

CS3551- DISTRIBUTED COMPUTING

UNIT I INTRODUCTION

Introduction: Definition-Relation to Computer System Components – Motivation – Message -Passing


Systems versus Shared Memory Systems – Primitives for Distributed Communication –
Synchronous versus Asynchronous Executions – Design Issues and Challenges; A Model of
Distributed Computations: A Distributed Program – A Model of Distributed Executions – Models of
Communication Networks – Global State of a Distributed System.

UNIT II LOGICAL TIME AND GLOBAL STATE

Logical Time: Physical Clock Synchronization: NTP – A Framework for a System of Logical Clocks
– Scalar Time – Vector Time; Message Ordering and Group Communication: Message Ordering
Paradigms – Asynchronous Execution with Synchronous Communication – Synchronous Program
Order on Asynchronous System – Group Communication – Causal Order – Total Order; Global
State and Snapshot Recording Algorithms: Introduction – System Model and Definitions – Snapshot
Algorithms for FIFO Channels

UNIT III DISTRIBUTED MUTEX AND DEADLOCK

Distributed Mutual exclusion Algorithms: Introduction – Preliminaries – Lamport’s algorithm –


RicartAgrawala’s Algorithm –– Token-Based Algorithms – Suzuki-Kasami’s Broadcast Algorithm;
Deadlock Detection in Distributed Systems: Introduction – System Model – Preliminaries – Models
of Deadlocks – Chandy-Misra-Haas Algorithm for the AND model and OR Model.

UNIT IV CONSENSUS AND RECOVERY

Consensus and Agreement Algorithms: Problem Definition – Overview of Results – Agreement in a


Failure-Free System(Synchronous and Asynchronous) – Agreement in Synchronous Systems with
Failures; Checkpointing and Rollback Recovery: Introduction – Background and Definitions – Issues
in Failure Recovery – Checkpoint-based Recovery – Coordinated Checkpointing Algorithm –
– Algorithm for Asynchronous Checkpointing and Recovery

UNIT V CLOUD COMPUTING

Definition of Cloud Computing – Characteristics of Cloud – Cloud Deployment Models – Cloud


Service Models – Driving Factors and Challenges of Cloud – Virtualization – Load Balancing –
Scalability and Elasticity – Replication – Monitoring – Cloud Services and Platforms: Compute
Services – Storage Services – Application Services
UNIT I
INTRODUCTION

The process of computation was started from working on a single processor. This uni-
processor computing can be termed as centralized computing.
A distributed system is a collection of independent computers, interconnected via a
network, capable of collaborating on a task. Distributed computing is computing
performed in a distributed system.

A distributed system is a collection of independent entities that cooperate to solve a problem


that cannot be individually solved. Distributed computing is widely used due to
advancements in machines; faster and cheaper networks. In distributed systems, the entire
network will be viewed as a computer. The multiple systems connected to the network will
appear as a single system to the user.
Features of Distributed Systems:
No common physical clock - It introduces the element of “distribution” in the system and
gives rise to the inherent asynchrony amongst the processors.
No shared memory - A key feature that requires message-passing for communication. This
feature implies the absence of the common physical clock.
Geographical separation – The geographically wider apart that the processors are, the
more representative is the system of a distributed system.
Autonomy and heterogeneity – Here the processors are “loosely coupled” in that they have
different speeds and each can be running a different operating system.

Issues in distributed systems


Heterogeneity
Openness
Security
Scalability
Failure handling
Concurrency
Transparency
Quality of service

1.2 Relation to Computer System Components

Fig 1.1: Example of a Distributed System


As shown in Fig 1.1, Each computer has a memory-processing unit and the computers are
connected by a communication network. Each system connected to the distributed networks
hosts distributed software which is a middleware technology. This drives the Distributed
System (DS) at the same time preserves the heterogeneity of the DS. The term computation
or run in a distributed system is the execution of processes to achieve a common goal.

Fig 1.2: Interaction of layers of network

The interaction of the layers of the network with the operating system and
middleware is shown in Fig 1.2. The middleware contains important library functions for
facilitating the operations of DS.
The distributed system uses a layered architecture to break down the complexity of system
design. The middleware is the distributed software that drives the distributed system, while
providing transparency of heterogeneity at the platform level

Examples of middleware: Object Management Group’s (OMG), Common Object Request


Broker Architecture (CORBA) [36], Remote Procedure Call (RPC), Message Passing
Interface (MPI)

1.3 Motivation

The following are the key points that acts as a driving force behind DS:

Inherently distributed computations: DS can process the computations at geographically


remote locations.
Resource sharing: The hardware, databases, special libraries can be shared between
systems without owning a dedicated copy or a replica. This is cost effective and reliable.
Access to geographically remote data and resources: Resources such as centralized
servers can also be accessed from distant locations.
Enhanced reliability: DS provides enhanced reliability, since they run on multiple copies of
resources.
The term reliability comprises of:
1. Availability: The resource/ service provided by the resource should be accessible
atall times
2. Integrity: the value/state of the resource should be correct and consistent.
3. Fault-Tolerance: Ability to recover from system failures
Increased performance/cost ratio: The resource sharing and remote access features of DS
naturally increase the performance / cost ratio.
Scalable: The number of systems operating in a distributed environment can be increased as
the demand increases.

1.4 MESSAGE-PASSING SYSTEMS VERSUS SHARED MEMORY SYSTEMS


Communication among processors takes place via shared data variables, and
control variables for synchronization among the processors. The communications between
the tasks in multiprocessor systems take place through two main modes:

Message passing systems:


• This allows multiple processes to read and write data to the message queue
without being connected to each other.
• Messages are stored on the queue until their recipient retrieves them.
Shared memory systems:
• The shared memory is the memory that can be simultaneously accessed by
multiple processes. This is done so that the processes can communicate with each
other.
• Communication among processors takes place through shared data variables, and
control variables for synchronization among the processors.
• Semaphores and monitors are common synchronization mechanisms on shared
memory systems.
• When shared memory model is implemented in a distributed environment, it is
termed as distributed shared memory.
Differences between message passing and shared memory models

Message Passing Distributed Shared Memory


Services Offered:
Variables have to be The processes share variables directly, so
marshalled from one process, no marshalling and unmarshalling. Shared
transmitted and variables can be named, stored and
unmarshalled into other variables at accessed in DSM.
thereceiving process.
Processes can communicate with other Here, a process does not have private
processes. They can be protected from address space. So one process can alter
one the execution
another by having private address spaces. of other.
This technique can be used in This cannot be used to heterogeneous
heterogeneous computers.
computers.
Synchronization between processes is Synchronization is through locks and
through semaphores.
message passing primitives.
Processes communicating via message Processes communicating through DSM
passing may execute with non-overlapping
must execute at the same time. lifetimes.
Efficiency:
All remote data accesses are explicit and Any particular read or update may or may
therefore the programmer is always aware not involve communication by the
of underlying runtime support.
whether a particular operation is in-
process or involves the expense of
communication.

Emulating message-passing on a shared memory system (MP → SM)


• The shared memory system can be made to act as message passing system. The
shared address space can be partitioned into disjoint parts, one part being
assigned to each processor.
• Send and receive operations care implemented by writing to and reading from the
destination/sender processor’s address space. The read and write operations are
synchronized.
• Specifically, a separate location can be reserved as the mailbox for each ordered
pair of processes.
Emulating shared memory on a message-passing system (SM → MP)
• This is also implemented through read and write operations. Each shared
location can be modeled as a separate process. Write to a shared location is
emulated by sending an update message to the corresponding owner process and
read operation to a shared location is emulated by sending a query message to the
owner process.
• This emulation is expensive as the processes has to gain access to other process
memory location. The latencies involved in read and write operations may be
high even when using shared memory emulation because the read and write
operations are implemented by using network-wide communication.

1.5 PRIMITIVES FOR DISTRIBUTED COMMUNICATION

Blocking / Non blocking / Synchronous / Asynchronous


• Message send and message receive communication primitives are done through
Send() and Receive(), respectively.
• A Send primitive has two parameters: the destination, and the buffer in the user
space that holds the data to be sent.
• The Receive primitive also has two parameters: the source from which the data is
to be received and the user buffer into which the data is to be received.
There are two ways of sending data when the Send primitive is called:

• Buffered: The standard option copies the data from the user buffer to the kernel
buffer. The data later gets copied from the kernel buffer onto the network. For the
Receive primitive, the buffered option is usually required because the data may
already have arrived when the primitive is invoked, and needs a storage place in
the kernel.
• Unbuffered: The data gets copied directly from the user buffer onto the network.
Blocking primitives
• The primitive commands wait for the message to be delivered. The execution of
the processes is blocked.
• The sending process must wait after a send until an acknowledgement is made
bythe receiver.
• The receiving process must wait for the expected message from the sending
process
• A primitive is blocking if control returns to the invoking process after the
processing for the primitive completes.
Non Blocking primitives
• If send is nonblocking, it returns control to the caller immediately, before the
message is sent.
• The advantage of this scheme is that the sending process can continue computing
in parallel with the message transmission, instead of having the CPU go idle.
• This is a form of asynchronous communication.
• A primitive is non-blocking if control returns back to the invoking process
immediately after invocation, even though the operation has not completed.
• For a non-blocking Send, control returns to the process even before the data
iscopied out of the user buffer.
• For anon-blocking Receive, control returns to the process even before the data
may have arrived from the sender.
Synchronous
• A Send or a Receive primitive is synchronous if both the Send() and Receive()
handshake with each other.
• The processing for the Send primitive completes only after the invoking
processor learns
• The processing for the Receive primitive completes when the data to be
receivedis copied into the receiver’s user buffer.
Asynchronous
• A Send primitive is said to be asynchronous, if control returns back to the
invoking process after the data item to be sent has been copied out of the user-
specified buffer.
• For non-blocking primitives, a return parameter on the primitive call returns a
system-generated handle which can be later used to check the status of
completion of the call.
• The process can check for the completion:
o checking if the handle has been flagged or posted
o issue a Wait with a list of handles as parameters: usually blocks until one
of the parameter handles is posted.
The send and receive primitives can be implemented in four modes:
• Blocking synchronous
• Non- blocking synchronous
• Blocking asynchronous
• Non- blocking asynchronous

Four modes of send operation


Blocking synchronous Send:
• The data gets copied from the user buffer to the kernel buffer and is then sent over
the network.
• After the data is copied to the receiver’s system buffer and a Receive call has been
issued, an acknowledgement back to the sender causes control to return to the
process that invoked the Send operation and completes the Send.
Non-blocking synchronous Send:
• Control returns back to the invoking process as soon as the copy of data from the user
buffer to the kernel buffer is initiated.
• A parameter in the non-blocking call also gets set with the handle of a location that
the user process can later check for the completion of the synchronous send
operation.
• The location gets posted after an acknowledgement returns from the receiver.
• The user process can keep checking for the completion of the non-blocking
synchronous Send by testing the returned handle, or it can invoke the blocking Wait
operation on the returned handle
Blocking asynchronous Send:
• The user process that invokes the Send is blocked until the data is copied from the
user’s buffer to the kernel buffer.
Non-blocking asynchronous Send:
• The user process that invokes the Send is blocked until the transfer of the data from
the user’s buffer to the kernel buffer is initiated.
• Control returns to the user process as soon as this transfer is initiated, and a parameter
in the non-blocking call also gets set with the handle of a location that the user
process can check later using the Wait operation for the completion of the
asynchronous Send.
The asynchronous Send completes when the data has been copied out of the user’s
buffer. The checking for the completion may be necessary if the user wants to reuse the
buffer from which the data was sent.
Modes of receive operation
Blocking Receive:
The Receive call blocks until the data expected arrives and is written in the specified
user buffer. Then control is returned to the user process.
Non-blocking Receive:
• The Receive call will cause the kernel to register the call and return the handle
of a location that the user process can later check for the completion of the
non-blocking Receive operation.
• This location gets posted by the kernel after the expected data arrives and is
copied to the user-specified buffer. The user process can check for then
completion of the non-blocking Receive by invoking the Wait operation on the
returned handle.
Processor Synchrony

Processor synchrony indicates that all the processors execute in lock-step with their clocks
synchronized.

To ensure that no processor begins executing the next step of code until all the processors
have completed executing the previous steps ofcode assigned to each of the processors.

Libraries and standards


There exists a wide range of primitives for message-passing. The message-passing interface
(MPI) library and the PVM (parallel virtual machine) library are used largely by the
scientific community
• Message Passing Interface (MPI): This is a standardized and portable message-
passing system to function on a wide variety of parallel computers. MPI primarily
addresses the message-passing parallel programming model: data is moved from the
address space of one process to that of another process through cooperative
operations on each process.
• Parallel Virtual Machine (PVM): It is a software tool for parallel networking of
computers. It is designed to allow a network of heterogeneous Unix and/or Windows
machines to be used as a single distributed parallel processor.
• Remote Procedure Call (RPC): The Remote Procedure Call (RPC) is a common
model of request reply protocol. In RPC, the procedure need not exist in the same
address space as the calling procedure.
• Remote Method Invocation (RMI): RMI (Remote Method Invocation) is a way that
a programmer can write object-oriented programming in which objects on different
computers can interact in a distributed network.
• Remote Procedure Call (RPC): RPC is a powerful technique for constructing
distributed, client-server based applications. In RPC, the procedure need not exist in
the same address space as the calling procedure. The two processes may be on the
same system, or they may be on different systems with a network connecting them.
• Common Object Request Broker Architecture (CORBA): CORBA describes a
messaging mechanism by which objects distributed over a network can communicate with
each other irrespective of the platform and language used to develop those objects.
1.6 SYNCHRONOUS VS ASYNCHRONOUS EXECUTIONS
The execution of process in distributed systems may be synchronous or asynchronous.

Asynchronous Execution:
A communication among processes is considered asynchronous, when every
communicating process can have a different observation of the order of the messages being
exchanged. In an asynchronous execution:
• there is no processor synchrony and there is no bound on the drift rate of processor
clocks
• message delays are finite but unbounded
• no upper bound on the time taken by a process

Fig: Asynchronous execution in message passing system

Synchronous Execution:
A communication among processes is considered synchronous when every process
observes the same order of messages within the system. In an synchronous execution:
• processors are synchronized and the clock drift rate between any two processors is
bounded
• message delivery times are such that they occur in one logical step or round
• upper bound on the time taken by a process to execute a step.
Emulating an asynchronous system by a synchronous system (A → S)
An asynchronous program can be emulated on a synchronous system fairly trivially as the
synchronous system is a special case of an asynchronous system – all communication
finishes within the same round in which it is initiated.

Emulating a synchronous system by an asynchronous system (S → A)


A synchronous program can be emulated on an asynchronous system using a tool called
synchronizer.

Emulation for a fault free system

Fig 1.15: Emulations in a failure free message passing system


If system A can be emulated by system B, denoted A/B, and if a problem is not solvable in
B, then it is also not solvable in A. If a problem is solvable in A, it is also solvable in B.
Hence, in a sense, all four classes are equivalent in terms of computability in failure-free
systems.

1.7 DESIGN ISSUES AND CHALLENGES IN DISTRIBUTED SYSTEMS


The design of distributed systems has numerous challenges. They can be categorized
into:
• Issues related to system and operating systems design
• Issues related to algorithm design
• Issues arising due to emerging technologies
The above three classes are not mutually exclusive.

1.7.1 Issues related to system and operating systems design


The following are some of the common challenges to be addressed in designing a
distributed system from system perspective:
➢ Communication: This task involves designing suitable communication mechanisms
among the various processes in the networks.
Examples: RPC, RMI

➢ Processes: The main challenges involved are: process and thread management at
both client and server environments, migration of code between systems, design of software
and mobile agents.
➢ Naming: Devising easy to use and robust schemes for names, identifiers, and
addresses is essential for locating resources and processes in a transparent and scalable
manner. The remote and highly varied geographical locations make this task difficult.
➢ Synchronization: Mutual exclusion, leader election, deploying physical clocks,
global state recording are some synchronization mechanisms.
➢ Data storage and access Schemes: Designing file systems for easy and efficient data
storage with implicit accessing mechanism is very much essential for distributed operation
➢ Consistency and replication: The notion of Distributed systems goes hand in hand
with replication of data, to provide high degree of scalability. The replicas should be handed
with care since data consistency is prime issue.
➢ Fault tolerance: This requires maintenance of fail proof links, nodes, and processes.
Some of the common fault tolerant techniques are resilience, reliable communication,
distributed commit, checkpointing and recovery, agreement and consensus, failure detection,
and self-stabilization.
➢ Security: Cryptography, secure channels, access control, key management –
generation and distribution, authorization, and secure group management are some of the
security measure that is imposed on distributed systems.
➢ Applications Programming Interface (API) and transparency: The user
friendliness and ease of use is very important to make the distributed services to be used by
wide community. Transparency, which is hiding inner implementation policy from users, is
of the following types:
▪ Access transparency: hides differences in data representation
▪ Location transparency: hides differences in locations y providing uniform access to
data located at remote locations.
▪ Migration transparency: allows relocating resources without changing names.
▪ Replication transparency: Makes the user unaware whether he is working on
original or replicated data.
▪ Concurrency transparency: Masks the concurrent use of shared resources for the
user.
▪ Failure transparency: system being reliable and fault-tolerant.
➢ Scalability and modularity: The algorithms, data and services must be as distributed
as possible. Various techniques such as replication, caching and cache management, and
asynchronous processing help to achieve scalability.

1.7.2 Algorithmic challenges in distributed computing


➢ Designing useful execution models and frameworks
The interleaving model, partial order model, input/output automata model and the Temporal
Logic of Actions (TLA) are some examples of models that provide different degrees of
infrastructure.
➢ Dynamic distributed graph algorithms and distributed routing algorithms
• The distributed system is generally modeled as a distributed graph.
• Hence graph algorithms are the base for large number of higher level
communication,data dissemination, object location, and object search functions.
• These algorithms must have the capacity to deal with highly dynamic graph
characteristics. They are expected to function like routing algorithms.
• The performance of these algorithms has direct impact on user-perceived latency, data
traffic and load in the network.
➢ Time and global state in a distributed system

• The geographically remote resources demands the synchronization based on logical


time.
• Logical time is relative and eliminates the overheads of providing physical time for
applications. Logical time can
(i) Capture the logic and inter-process dependencies
(ii) track the relative progress at each process
• Maintaining the global state of the system across space involves the role of time
dimension for consistency. This can be done with extra effort in a coordinated manner.
• Deriving appropriate measures of concurrency also involves the time dimension, as
theexecution and communication speed of threads may vary a lot.
➢ Synchronization/coordination mechanisms
• Synchronization is essential for the distributed processes to facilitate concurrent
execution without affecting other processes.
• The synchronization mechanisms also involve resource management and
concurrencymanagement mechanisms.
• Some techniques for providing synchronization are:
✓ Physical clock synchronization: Physical clocks usually diverge in their values due
to hardware limitations. Keeping them synchronized is a fundamental challenge to maintain
common time.
✓ Leader election: All the processes need to agree on which process will play the
role of a distinguished process or a leader process. A leader is necessary even for many
distributed algorithms because there is often some asymmetry.
✓ Mutual exclusion: Access to the critical resource(s) has to be coordinated.
✓ Deadlock detection and resolution: This is done to avoid duplicate work,
and deadlock resolution should be coordinated to avoid unnecessary aborts of
processes.
✓ Termination detection: cooperation among the processes to detect the specific global
state of quiescence.
✓ Garbage collection: Detecting garbage requires coordination among the processes.
➢ Group communication, multicast, and ordered message delivery
• A group is a collection of processes that share a common context and collaborate on a
common task within an application domain. Group management protocols are needed for
group communication wherein processes can join and leave groups dynamically, or fail.
➢ Monitoring distributed events and predicates
• Predicates defined on program variables that are local to different processes are used
for specifying conditions on the global system state.
• On-line algorithms for monitoring such predicates are hence important.
• The specification of such predicates uses physical or logical time relationships.
➢ Distributed program design and verification tools
Methodically designed and verifiably correct programs can greatly reduce the overhead of
software design, debugging, and engineering. Designing these is a big challenge.
➢ Debugging distributed programs
Debugging distributed programs is much harder because of the concurrency and replications.
Adequate debugging mechanisms and tools are need of the hour.
➢ Data replication, consistency models, and caching
• Fast access to data and other resources is important in distributed systems.
• Managing replicas and their updates faces concurrency problems.
• Placement of the replicas in the systems is also a challenge because resources
usuallycannot be freely replicated.
➢ World Wide Web design – caching, searching, scheduling
• WWW is a commonly known distributed system.
• The issues of object replication and caching, prefetching of objects have to be done on
WWW also.
• Object search and navigationon the web are important functions in the operation of
the web.
➢ Distributed shared memory abstraction
• A shared memory is easier to implement since it does not involve managing the
communication tasks.
• The communication is done by the middleware by message passing.
• The overhead of shared memory is to be dealt by the middleware technology.
• Some of the methodologies that does the task of communication in shared memory
distributed systems are:
✓ Wait-free algorithms: The ability of a process to complete its execution irrespective
of the actions of other processes is wait free algorithm. They control the access to shared
resources in the shared memory abstraction. They are expensive.
✓ Mutual exclusion: Concurrent access of processes to a shared resource or data is
executed in mutually exclusive manner. Only one process is allowed to execute the critical
section at any given time. In a distributed system, shared variables or a local kernel cannot
be used to implement mutual exclusion. Message passing is the sole means for implementing
distributed mutual exclusion.
✓ Register constructions: Architectures must be designed in such a way that,
registersallows concurrent access without any restrictions on the concurrency permitted.
➢ Reliable and fault-tolerant distributed systems
The following are some of the fault tolerant strategies:
✓ Consensus algorithms: Consensus algorithms allow correctly functioning processes
to reach agreement among themselves in spite of the existence of malicious processes. The
goal of the malicious processes is to prevent the correctly functioning processes from
reaching agreement. The malicious processes operate by sending messages with misleading
information, to confuse the correctly functioning processes.
✓ Replication and replica management: The Triple Modular Redundancy (TMR)
technique is used in software and hardware implementation. TMR is a fault-tolerant form of
N-modular redundancy, in which three systems perform a process and that result is
processed by a majority-voting system to produce a single output.
✓ Voting and quorum systems: Providing redundancy in the active or passive
components in the system and then performing voting based on some quorum criterion is a
classical way of dealing with fault-tolerance. Designing efficient algorithms for this
purposeis the challenge.
✓ Distributed databases and distributed commit: The distributed databases should
also follow atomicity, consistency, isolation and durability (ACID) properties.
✓ Self-stabilizing systems: A self-stabilizing algorithm guarantee to take the system to
a good state even if a bad state were to arise due to some error. Self-stabilizing algorithms
require some in-built redundancy to track additional variables of the state and do extra work.
✓ Checkpointing and recovery algorithms: Checkpointing is periodically recording
the current state on secondary storage so that, in case of a failure. The entire computation is
not lost but can be recovered from one of the recently taken checkpoints. Checkpointing in
distributed environment is difficult because if the checkpoints at the different processes are
not coordinated, the local checkpoints may become useless because they are inconsistent
with the checkpoints at other processes.
✓ Failure detectors: The asynchronous distributed do not have a bound on the message
transmission time. This makes the message passing very difficult, since the receiver do not
know the waiting time. Failure detectors probabilistically suspect another process as having
failed and then converge on a determination of the up/down status of the suspected process.
➢ Load balancing
The objective of load balancing is to gain higher throughput, and reduce the user
perceivedlatency. Load balancing may be necessary because of a variety off actors such as
high network traffic or high request rate causing the network connection to be a
bottleneck, or high computational load. The following are some forms of load balancing:
✓ Data migration: The ability to move data around in the system, based on the access
pattern of the users
✓ Computation migration: The ability to relocate processes in order to perform
aredistribution of the workload.
✓ Distributed scheduling: This achieves a better turnaround time for the users by
usingidle processing power in the system more efficiently.
➢ Real-time scheduling
Real-time scheduling becomes more challenging when a global view of the system state is
absent with more frequent on-line or dynamic changes. The message propagation delays
which are network-dependent are hard to control or predict. This is an hindrance to meet the
QoS requirements of the network.

➢ Performance
User perceived latency in distributed systems must be reduced. The common issues in
performance:
✓ Metrics: Appropriate metrics must be defined for measuring the performance of
theoretical distributed algorithms and its implementation.
✓ Measurement methods/tools: The distributed system is a complex entity
appropriate methodology and tools must be developed for measuring the performance
metrics.
1.7.3 Applications of distributed computing and newer challenges
The deployment environment of distributed systems ranges from mobile systems to
cloud storage. All the environments have their own challenges:
➢ Mobile systems
o Mobile systems which use wireless communication in shared broadcast
medium have issues related to physical layer such as transmission range,
power, battery power consumption, interfacing with wired internet, signal
processing and interference.
o The issues pertaining to other higher layers include routing, location
management, channel allocation, localization and position estimation, and
mobility management.
o Apart from the above mentioned common challenges, the architectural
differences of the mobile network demands varied treatment. The two
architectures are:
✓ Base-station approach (cellular approach): The geographical region is divided into
hexagonal physical locations called cells. The powerful base station transmits signals to all
other nodes in its range
✓ Ad-hoc network approach: This is an infrastructure-less approach which do not
have any base station to transmit signals. Instead all the responsibility is distributed among
the mobile nodes.
✓ It is evident that both the approaches work in different environment with different
principles of communication. Designing a distributed system to cater the varied need is a
great challenge.

➢ Sensor networks
o A sensor is a processor with an electro-mechanical interface that is capable of
sensing physical parameters.
o They are low cost equipment with limited computational power and battery
life. They are designed to handle streaming data and route it to external
computer network and processes.
o They are susceptible to faults and have to reconfigure themselves.
o These features introduces a whole new set of challenges, such as position
estimation and time estimation when designing a distributed system .
➢ Ubiquitous or pervasive computing
o In Ubiquitous systems the processors are embedded in the environment to
perform application functions in the background.
o Examples: Intelligent devices, smart homes etc.
o They are distributed systems with recent advancements operating in wireless
environments through actuator mechanisms.
o They can be self-organizing and network-centric with limited resources.
➢ Peer-to-peer computing
o Peer-to-peer (P2P) computing is computing over an application layer
networkwhere all interactions among the processors are at a same level.
o This is a form of symmetric computation against the client sever paradigm.
o They are self-organizing with or without regular structure to the network.
o Some of the key challenges include: object storage mechanisms, efficient
object lookup, and retrieval in a scalable manner; dynamic reconfiguration
with nodes as well as objects joining and leaving the network randomly;
replication strategies to expedite object search; tradeoffs between object size
latency and table sizes; anonymity, privacy, and security.
➢ Publish-subscribe, content distribution, and multimedia
o The users in present day require only the information of interest.
o In a dynamic environment where the information constantly fluctuates there
isgreat demand for
o Publish: an efficient mechanism for distributing this information
o Subscribe: an efficient mechanism to allow end users to indicate interest in
receiving specific kinds of information
o An efficient mechanism for aggregating large volumes of published
information and filtering it as per the user’s subscription filter.
o Content distribution refers to a mechanism that categorizes the information
based on parameters.
o The publish subscribe and content distribution overlap each other.
o Multimedia data introduces special issue because of its large size.
➢ Distributed agents
o Agents are software processes or sometimes robots that move around the
system to do specific tasks for which they are programmed.
o Agents collect and process information and can exchange such
informationwith other agents.
o Challenges in distributed agent systems include coordination mechanisms
among the agents, controlling the mobility of the agents, their software design
and interfaces.
➢ Distributed data mining
o Data mining algorithms process large amount of data to detect patterns and
trends in the data, to mine or extract useful information.
o The mining can be done by applying database and artificial intelligence
techniques to a data repository.
➢ Grid computing
• Grid computing is deployed to manage resources. For instance, idle CPU
cycles of machines connected to the network will be available to others.
• The challenges includes: scheduling jobs, framework for implementing quality
of service, real-time guarantees, security.
➢ Security in distributed systems
The challenges of security in a distributed setting include: confidentiality,
authentication and availability. This can be addressed using efficient and scalable solutions.
1.8 A MODEL OF DISTRIBUTED COMPUTATIONS: DISTRIBUTED PROGRAM
• A distributed program is composed of a set of asynchronous processes that
communicate by message passing over the communication network. Each process
may run on different processor.
• The processes do not share a global memory and communicate solely by passing
messages. These processes do not share a global clock that is instantaneously
accessible to these processes.
• Process execution and message transfer are asynchronous – a process may execute an
action spontaneously and a process sending a message does not wait for the delivery
of the message to be complete.
• The global state of a distributed computation is composed of the states of the
processes and the communication channels. The state of a process is characterized by
the state of its local memory and depends upon the context.
• The state of a channel is characterized by the set of messages in transit in the channel.
A MODEL OF DISTRIBUTED EXECUTIONS

• The execution of a process consists of a sequential execution of its actions.


• The actions are atomic and the actions of a process are modeled as three types of
events: internal events, message send events, and message receive events.
• An internal event changes the state of the process at which it occurs.
• A send event changes the state of the process that sends the message and the state of
the channel on which the message is sent.
• The execution of process pi produces a sequence of events e1, e2, e3, …, and it is
denoted by Hi: Hi =(hi→i). Here hiare states produced by pi and →are the casual
dependencies among events pi.
• →msgindicates the dependency that exists due to message passing between two events.
Fig Space time distribution of distributed systems
• An internal event changes the state of the process at which it occurs. A send event
changes the state of the process that sends the message and the state of the channel
onwhich the message is sent.
• A receive event changes the state of the process that receives the message and the
stateof the channel on which the message is received.
Casual Precedence Relations
Causal message ordering is a partial ordering of messages in a distributed computing
environment. It is the delivery of messages to a process in the order in which they were
transmitted to that process.

It places a restriction on communication between processes by requiring that if the transmission of


message mi to process pk necessarily preceded the transmission of message mj to the same process,
then the delivery of these messages to that process must be ordered such that mi is delivered before
mj.

Happen Before Relation


The partial ordering obtained by generalizing the relationship between two process is called
as happened-before relation or causal ordering or potential causal ordering. This term
was coined by Lamport. Happens-before defines a partial order of events in a distributed
system. Some events can’t be placed in the order. If say A →B if A happens before B. A B
is defined using the following rules:
✓ Local ordering:A and B occur on same process and A occurs before B.
✓ Messages: send(m) → receive(m) for any message m
✓ Transitivity: e → e’’ if e → e’ and e’ → e’’
• Ordering can be based on two situations:
1. If two events occur in same process then they occurred in the order observed.
2. During message passing, the event of sending message occurred before the event of
receiving it.

Lamports ordering is happen before relation denoted by →


• a→b, if a and b are events in the same process and a occurred before b.
• a→b, if a is the vent of sending a message m in a process and b is the event of the
same message m being received by another process.
• If a→b and b→c, then a→c. Lamports law follow transitivity property.

When all the above conditions are satisfied, then it can be concluded that a→b is casually
related. Consider two events c and d; c→d and d→c is false (i.e) they are not casually
related, then c and d are said to be concurrent events denoted as c||d.
Fig Communication between processes
Fig 1.22 shows the communication of messages m1 and m2 between three processes p1, p2
and p3. a, b, c, d, e and f are events. It can be inferred from the diagram that, a→b; c→d;
e→f; b->c; d→f; a→d; a→f; b→d; b→f. Also a||e and c||e.

Logical vs physical concurrency


Physical as well as logical concurrency is two events that creates confusion in
distributed systems.
Physical concurrency: Several program units from the same program that execute
simultaneously.
Logical concurrency: Multiple processors providing actual concurrency. The actual
execution of programs is taking place in interleaved fashion on a single processor.

Differences between logical and physical concurrency


Logical concurrency Physical concurrency
Several units of the same program execute Several program units of the same program
simultaneously on same processor, giving an execute at the same time on different
illusion to the programmer that they are processors.
executing on multiple processors.
They are implemented through interleaving. They are implemented as uni-processor with
I/O
channels, multiple CPUs, network of uni or
multi CPU machines.

MODELS OF COMMUNICATION NETWORK


The three main types of communication models in distributed systems are:
FIFO (first-in, first-out): each channel acts as a FIFO message queue.
Non-FIFO (N-FIFO): a channel acts like a set in which a sender process adds messages and
receiver removes messages in random order.
Causal Ordering (CO): It follows Lamport’s law.
o The relation between the three models is given by CO FIFO N-FIFO.

A system that supports the causal ordering model satisfies the following property:

GLOBAL STATE

Distributed Snapshot represents a state in which the distributed system might have been in. A
snapshot of the system is a single configuration of the system.
• The global state of a distributed system is a collection of the local states of its
components, namely, the processes and the communication channels.
• The state of a process at any time is defined by the contents of processor registers,
stacks, local memory, etc. and depends on the local context of the distributed
application.
• The state of a channel is given by the set of messages in transit in the channel.

The state of a channel is difficult to state formally because a channel is a distributed entity
and its state depends upon the states of the processes it connects. Let
denote the state of a channel Cij defined as follows:

For a successful Global State, all states must beconsistent:


• If we have recorded that a process P has received a message from a process Q, then
we should have also recorded that process Q had actually send that message.
• Otherwise, a snapshot will contain the recording of messages that have been received
but never sent.
• The reverse condition (Q has sent a message that P has not received) is allowed.

Consistent states: The states should not violate causality. Such states are called consistent
global states and are meaningful global states.
Inconsistent global states: They are not meaningful in the sense that a distributed system

can never be in an inconsistent state.

.
UNIT II

LOGICAL TIME & GLOBAL STATE

2.1 LOGICAL TIME

Logical clocks are based on capturing chronological and causal relationships of processes and
ordering events based on these relationships.

Three types of logical clock are maintained in distributed systems:


• Scalar clock
• Vector clock
• Matrix clock

In a system of logical clocks, every process has a logical clock that is advanced using a set
of rules. Every event is assigned a timestamp and the causality relation between events can
be generally inferred from their timestamps.
The timestamps assigned to events obey the fundamental monotonicity property; that is, if
an event a causally affects an event b, then the timestamp of a is smaller than the timestamp
of b.

Differences between physical and logical clock


Physical Logical
Clock Clock
A physical clock is a physical procedure A logical clock is a component for catching
combined with a strategy for measuring sequential and causal connections in a
that procedure to record the progression dispersed framework.
of time.
The physical clocks are based on A logical clock allows global
cyclicprocesses such as a ordering on events from different
celestial rotation. processes.

A Framework for a system of logical clocks

A system of logical clocks consists of a time domain T and a logical clock C. Elements of T form a
partially ordered set over a relation <. This relation is usually called the happened before or
causal precedence.

The logical clock C is a function that maps an event e in a distributed system to an element
in the time domain T denoted as C(e).
such that
for any two events ei and ej,.
This monotonicity property is called the clock consistency condition. When T and C
satisfythe following condition,
Then the system of clocks is strongly consistent.

Implementing logical clocks


The two major issues in implanting logical clocks are:
Data structures: representation of each process
Protocols: rules for updating the data structures to ensure consistent conditions.

Data structures:
Each process pi maintains data structures with the given capabilities:
• A local logical clock (lci), that helps process pi measure its own progress.
• A logical global clock (gci), that is a representation of process pi’s local view of the
logicalglobal time. It allows this process to assign consistent timestamps to its local events.

Protocol:
The protocol ensures that a process’s logical clock, and thus its view of the global time,
ismanaged consistently with the following rules:
Rule 1: Decides the updates of the logical clock by a process. It controls send, receive and
other operations.
Rule 2: Decides how a process updates its global logical clock to update its view of the
global time and global progress. It dictates what information about the logical time is
piggybacked in a message and how this information is used by the receiving process to
update its view of the global time.

2.1.1 SCALAR TIME


Scalar time is designed by Lamport to synchronize all the events in distributed
systems. A Lamport logical clock is an incrementing counter maintained in each process.
When a process receives a message, it resynchronizes its logical clock with that sender
maintaining causal relationship.
The Lamport’s algorithm is governed using the following rules:
• The algorithm of Lamport Timestamps can be captured in a few rules:
• All the process counters start with value 0.
• A process increments its counter for each event (internal event, message sending,
message receiving) in that process.
• When a process sends a message, it includes its (incremented) counter value with the
message.
• On receiving a message, the counter of the recipient is updated to the greater of its
current counter and the timestamp in the received message, and then incremented by
one.

• If Ci is the local clock for process Pi then,


• if a and b are two successive events in Pi, then Ci(b) = Ci(a) + d1, where d1 > 0
• if a is the sending of message m by Pi, then m is assigned timestamp tm = Ci(a)
• if b is the receipt of m by Pj, then Cj(b) = max{Cj(b), tm + d2}, where d2 > 0

Rules of Lamport’s clock


Rule 1: Ci(b) = Ci(a) + d1, where d1 > 0
Rule 2: The following actions are implemented when pi receives a message m with timestamp Cm:
a) Ci= max(Ci, Cm)
b) Execute Rule 1
c) deliver the message

Fig 1.20: Evolution of scalar time

Basic properties of scalar time:


1. Consistency property: Scalar clock always satisfies monotonicity. A monotonic clock
only increments its timestamp and never jump. Hence it is consistent.

2. Total Reordering: Scalar clocks order the events in distributed systems. But all the
events do not follow a common identical timestamp. Hence a tie breaking mechanism is
essential toorder the events. The tie breaking is done through:
• Linearly order process identifiers.
• Process with low identifier value will be given higher priority.

The term (t, i) indicates timestamp of an event, where t is its time of occurrence and i is the
identity of the process where it occurred.
The total order relation ( ) over two events x and y with timestamp (h, i) and (k, j) is given by:

A total order is generally used to ensure liveness properties in distributed algorithms.

3. Event Counting
If event e has a timestamp h, then h−1 represents the minimum logical duration,
counted in units of events, required before producing the event e. This is called height of the
event e. h-1 events have been produced sequentially before the event e regardless of the
processes that produced these events.

4. No strong consistency
The scalar clocks are not strongly consistent is that the logical local clock and logical
global clock of a process are squashed into one, resulting in the loss causal dependency
information among events at different processes.

2.1.2 VECTOR TIME


The ordering from Lamport's clocks is not enough to guarantee that if two events
precede one another in the ordering relation they are also causally related. Vector Clocks use
a vector counter instead of an integer counter. The vector clock of a system with N processes
is a vector of N counters, one counter per process. Vector counters have to follow the
following update rules:
• Initially, all counters are zero.
• Each time a process experiences an event, it increments its own counter in the vector
by one.
• Each time a process sends a message, it includes a copy of its own (incremented)
vector in the message.
• Each time a process receives a message, it increments its own counter in the vector by
one and updates each element in its vector by taking the maximum of the value in its
own vector counter and the value in the vector in the received message.

The time domain is represented by a set of n-dimensional non-negative integer vectors in vector
time.

Rules of Vector Time


Rule 1: Before executing an event, process pi updates its local logical time
as follows:

Rule 2: Each message m is piggybacked with the vector clock vt of the sender
process at sending time. On the receipt of such a message (m,vt), process
pi executes the following sequence of actions:
1. update its global logical time

2. execute R1
3. deliver the message m

Fig 1.21: Evolution of vector scale


Basic properties of vector time
1. Isomorphism:
• “→” induces a partial order on the set of events that are produced by a distributed
execution.
• If events x and y are timestamped as vh and vk then,

• If the process at which an event occurred is known, the test to compare two
timestamps can be simplified as:

2. Strong consistency
The system of vector clocks is strongly consistent; thus, by examining the vector timestamp
of two events, we can determine if the events are causally related.
3. Event counting
If an event e has timestamp vh[i], vh[j] denotes the number of events executed by process
pj that causally precede e.

2.2 PHYSICAL CLOCK SYNCHRONIZATION: NEWTWORK TIME PROTOCOL


(NTP)
Centralized systems do not need clock synchronization, as they work under a common
clock. But the distributed systems do not follow common clock: each system functions based
on its own internal clock and its own notion of time. The time in distributed systems is
measured in the following contexts:
• The time of the day at which an event happened on a specific machine in the network.
• The time interval between two events that happened on different machines in the
network.
• The relative ordering of events that happened on different machines in the network.

Clock synchronization is the process of ensuring that physically distributed processors have a
common notion of time.

Due to different clocks rates, the clocks at various sites may diverge with time, and
periodically a clock synchronization must be performed to correct this clock skew in
distributed systems. Clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time). Clocks that must not only be synchronized with each other
but also have to adhere to physical time are termed physical clocks. This degree of
synchronization additionally enables to coordinate and schedule actions between multiple
computers connected to a common network.

Basic terminologies:
If Ca and Cb are two different clocks, then:
• Time: The time of a clock in a machine p is given by the function C p(t),where Cp(t)=
tfor a perfect clock.
• Frequency: Frequency is the rate at which a clock progresses. The frequency at time t
of clock Ca is Ca’(t).
• Offset: Clock offset is the difference between the time reported by a clock and the
real time. The offset of the clock Ca is given by Ca(t)− t. The offset of clock C a
relative toCb at time t ≥ 0 is given by Ca(t)- Cb(t)
• Skew: The skew of a clock is the difference in the frequencies of the clock and
the perfect clock. The skew of a clock Ca relative to clock Cb at timet is Ca’(t)-
Cb’(t).
• Drift (rate): The drift of clock Ca the second derivative of the clockvalue with
respectto time. The drift is calculated as:
Clocking Inaccuracies
Physical clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time). Due to the clock inaccuracy discussed above, a timer (clock)
is said to be working within its specification if:

1. Offset delay estimation


A time service for the Internet - synchronizes clients to UTC Reliability from
redundant paths, scalable, authenticates time sources Architecture. The design of NTP
involves a hierarchical tree of time servers with primary server at the root synchronizes with
the UTC. The next level contains secondary servers, which act as a backup to the primary
server. At the lowest level is the synchronization subnet which has the clients.

2. Clock offset and delay estimation


A source node cannot accurately estimate the local time on the target node due to
varying message or network delays between the nodes. This protocol employs a very
common practice of performing several trials and chooses the trial with the minimum

delay.
Fig : Behavior of clocks

Fig a) Offset and delay estimation Fig b) Offset and delay estimation
between processes from same server between processes from different servers

Let T1, T2, T3, T4 be the values of the four most recent timestamps. The clocks A and B
are stable and running at the same speed. Let a = T1 − T3 and b = T2 − T4. If the network
delay difference from A to B and from B to A, called differential delay, is
small, the clock offset and roundtrip delay of B relative to A at time T4 are
approximatelygiven by the following:

Each NTP message includes the latest three timestamps T1, T2, andT3, while
T4 isdetermined upon arrival.

2.3 MESSAGE ORDERING AND GROUP COMMUNICATION


As the distributed systems are a network of systems at various physical locations, the
coordination between them should always be preserved. The message ordering means the
order of delivering the messages to the intended recipients. The common message order
schemes are First in First out (FIFO), non FIFO, causal order and synchronous order. In case
of group communication with multicasting, the causal and total ordering scheme is followed.
It is also essential to define the behaviour of the system in case of failures. The following
are the notations that are widely used in this chapter:
• Distributed systems are denoted by a graph (N, L).
• The set of events are represented by event set {E, }
• Message is denoted as mi: send and receive events as si and ri respectively.
• Send (M) and receive (M) indicates the message M send and received.
• ab denotes a and b occurs at the same process
• The send receive pairs ={(s, r) Ei x Ejcorresponds to r}
2.3.1 MESSAGE ORDERING PARADIGMS
The message orderings are

(i) non-FIFO
(ii) FIFO
(iii) causal order
(iv) synchronous order

There is always a trade-off between concurrency and ease of use and implementation.

Asynchronous Executions
An asynchronous execution (or A-execution) is an execution (E, ≺) for which the causality relation
is a partial order.
• There cannot be any causal relationship between events in asynchronous execution.
• The messages can be delivered in any order even in non FIFO.
• Though there is a physical link that delivers the messages sent on it in FIFO order due
to the physical properties of the medium, a may be formed as a composite of
physical links and multiple paths may exist between the two end points of the logical
link.
Fig 2.1: a) FIFO executions b) non FIFO executions

FIFO executions

A FIFO execution is an A-execution in which, for all

• The logical link is non-FIFO.


• A FIFO logical channel can be created over a non-FIFO channel by using a
separate numbering scheme to sequence the messages on each logical channel.
• The sender assigns and appends a <sequence_num, connection_id> tuple to each
message.
• The receiver uses a buffer to order the incoming messages as per the sender’s
sequence numbers, and accepts only the “next” message in sequence.

Causally Ordered (CO) executions

CO execution is an A-execution in which, for all,

Fig: CO Execution
• Two send events s and s’ are related by causality ordering (not physical time
ordering), then a causally ordered execution requires that their corresponding receive
events r and r’ occur in the same order at all common destinations.
Applications of causal order:
Applications that requires update to shared data to implement distributed shared
memory, and fair resource allocation in distributed mutual exclusion.

Causal Order(CO) for Implementations:

If send(m1) ≺ send(m2) then for each common destination d of messages m1 and m2,
deliverd(m1) ≺deliverd(m2) must be satisfied.
Other properties of causal ordering
1. Message Order (MO): A MO execution is an A-execution in which, for all

Fig: Not a CO execution

Empty Interval Execution: An execution (E ≺) is an empty-interval (EI) execution if


for each pair of events (s, r) ∈ T, the open interval set

in the partial order is empty.


An execution (E, ≺) is CO if and only if for each pair of events (s, r) ∈ T and eachevent e ∈ E,
• weak common past:

• weak common future:

Synchronous Execution

• When all the communication between pairs of processes uses synchronous send and
receives primitives, the resulting order is the synchronous order.
• The synchronous communication always involves a handshake between the receiver
and the sender, the handshake events may appear to be occurring instantaneously and
atomically.

Causality in a synchronous execution


The synchronous causality relation << on E is the smallest transitive relation that satisfies the
following: S1: If x occurs before y at the same process, then x<< y.
S2: If sr∈ T, then for all x ∈ E,[(x s ⇐⇒ x<<r) and (s<<x⇐⇒ r <<x)].
S3: If x <<y and y<<z, then x<<z
Synchronous Execution:
A synchronous execution (or S-execution) is an execution (E, << )for which the causality relation<< is
partial order
Fig) Execution in an asynchronous system Fig) Execution in synchronous

Timestamping a synchronous execution

An execution( E ≺)is synchronous if and only if there exists a mapping from E to T (scalar timestamps)
such that
• for any message M, T(s(M))=T(r(M);
• for each process Pi, if ei ≺ ei1 then T(ei)< T(ei1) .

2.4 ASYNCHRONOUS EXECUTION WITH SYNCHRONOUS COMMUNICATION


When all the communication between pairs of processes is by using synchronous send
and receive primitives, the resulting order is synchronous order. If a program is written for an
asynchronous system, say a FIFO system, will it still execute correctly if the communication
is done by synchronous primitives. There is a possibility that the program may deadlock,

Fig) A communication program for an asynchronous system deadlock when using synchronous
primitives

Fig) Illustrations of asynchronous crown of size 2 crown of size 3


Execution and of crowns
Crown of size 2
Realizable Synchronous Communication (RSC)
A-execution can be realized under synchronous communication is called a realizable with
synchronous communication (RSC).
An execution can be modeled to give a total order that extends the partial order (E, ≺).
In an A-execution, the messages can be made to appear instantaneous if there exist a linear extension of
the execution, such that each send event is immediately followed by its corresponding receive event in
this linear extension.
Non-separated linear extension is an extension of (E, ≺) is a linear extension of (E, ≺) such that for
each pair (s, r) ∈ T, the interval { x∈ E s ≺ x ≺ r } is empty.
A-execution (E, ≺) is an RSC execution if and only if there exists a non-separated linear extension of
the partial order (E, ≺).
In the non-separated linear extension, if the adjacent send event and its corresponding receive event are
viewed atomically, then that pair of events shares a common past and a common future with each other.
Crown
Let E be an execution. A crown of size k in E is a sequence <(si, ri), i ∈{0,…, k-1}> of pairs of
corresponding send and receive events such that: s0 ≺ r1, s1 ≺ r2, sk−2 ≺ rk−1, sk−1 ≺ r0.
The crown is <(s1, r1) (s2, r2)> as we have s1 ≺ r2 and s2 ≺ r1. Cyclic dependencies may exist in a
crown. The crown criterion states that an A-computation is RSC, i.e., it can be realized on a system
with synchronous communication, if and only if it contains no crown.
Timestamp criterion for RSC execution
An execution (E, ≺) is RSC if and only if there exists a mapping from E to T (scalar timestamps)
such that

Synchronous programs on asynchronous systems


− A (valid) S-execution can be trivially realized on an asynchronous system by
scheduling the messages in the order in which they appear in the S-execution.
− The partial order of the S-execution remains unchanged but the communication
occurs on an asynchronous system that uses asynchronous communication primitives.
− Once a message send event is scheduled, the middleware layer waits for
acknowledgment; after the ack is received, the synchronous send primitive completes.

2.5 SYNCHRONOUS PROGRAM ORDER ON AN ASYNCHRONOUS SYSTEM

Non deterministic programs


The partial ordering of messages in the distributed systems makes the repeated runs of
the same program will produce the same partial order, thus preserving deterministic nature.
But sometimes the distributed systems exhibit non determinism:
• A receive call can receive a message from any sender who has sent a message, if the
expected sender is not specified.
• Multiple send and receive calls which are enabled at a process can be executed in an
interchangeable order.
• If i sends to j, and j sends to i concurrently using blocking synchronous calls, there
results a deadlock.
• There is no semantic dependency between the send and the immediately following
receive at each of the processes. If the receive call at one of the processes can be
scheduled before the send call, then there is no deadlock.

Rendezvous
Rendezvous systems are a form of synchronous communication among an arbitrary
number of asynchronous processes. All the processes involved meet with each other, i.e.,
communicate synchronously with each other at one time. Two types of rendezvous systems
are possible:
• Binary rendezvous: When two processes agree to synchronize.
• Multi-way rendezvous: When more than two processes agree to synchronize.

Features of binary rendezvous:


• For the receive command, the sender must be specified. However, multiple recieve
commands can exist. A type check on the data is implicitly performed.

• Send and received commands may be individually disabled or enabled. A command is


disabled if it is guarded and the guard evaluates to false. The guard would likely
contain an expression on some local variables.
• Synchronous communication is implemented by scheduling messages under the
covers using asynchronous communication.

• Scheduling involves pairing of matching send and receives commands that are both
enabled. The communication events for the control messages under the covers do not
alter the partial order of the execution.

2.3.2 Binary rendezvous algorithm


If multiple interactions are enabled, a process chooses one of them and tries to
synchronize with the partner process. The problem reduces to one of scheduling messages
satisfying the following constraints:
• Schedule on-line, atomically, and in a distributed manner.
• Schedule in a deadlock-free manner (i.e., crown-free).
• Schedule to satisfy the progress property in addition to the safety property.

Steps in Bagrodia algorithm


1. Receive commands are forever enabled from all processes.
2. A send command, once enabled, remains enabled until it completes, i.e., it is not
possible that a send command gets before the send is executed.
3. To prevent deadlock, process identifiers are used to introduce asymmetry to break
potential crowns that arise.
4. Each process attempts to schedule only one send event at any time.
The message (M) types used are: M, ack(M), request(M), and permission(M). Execution
events in the synchronous execution are only the send of the message M and receive of the
message M. The send and receive events for the other message types – ack(M), request(M),
and permission(M) which are control messages. The messages request(M), ack(M), and
permission(M) use M’s unique tag; the message M is not included in these messages.

(message types)

M, ack(M), request(M), permission(M)

(1) Pi wants to execute SEND(M) to a lower priority process Pj:

Pi executes send(M) and blocks until it receives ack(M) from Pj . The send event SEND(M) now
completes.

Any M’ message (from a higher priority processes) and request(M’) request for
synchronization (from a lower priority processes) received during the blocking period are
queued.

(2) Pi wants to execute SEND(M) to a higher priority

process Pj: (2a) Pi seeks permission from Pj by executing

send(request(M)).

.(2b) While Pi is waiting for permission, it remains unblocked.

(i) If a message M’ arrives from a higher priority process Pk, Pi accepts M’ by scheduling a
RECEIVE(M’) event and then executes send(ack(M’)) to Pk.

(ii) Ifa request(M’) arrives from a lower priority process Pk, Pi executes
send(permission(M’)) to Pk and blocks waiting for the messageM’. WhenM’ arrives, the
RECEIVE(M’) event is executed.

(2c) When the permission(M) arrives, Pi knows partner Pj is synchronized and Pi executes
send(M). The SEND(M) now completes.

(3) request(M) arrival at Pi from a lower priority process Pj:

At the time a request(M) is processed by Pi, process Pi executes send(permission(M)) to Pj


and blocks waiting for the message M. When M arrives, the RECEIVE(M) event is executed
and the process unblocks.
(4) Message M arrival at Pi from a higher priority process Pj:

At the time a message M is processed by Pi, process Pi executes RECEIVE(M) (which is


assumed to be always enabled) and then send(ack(M)) to Pj .

(5) Processing when Pi is unblocked:

When Pi is unblocked, it dequeues the next (if any) message from the queue and processes it
as a message arrival (as per rules 3 or 4).

Fig 2.5: Bagrodia Algorithm

2.6 GROUP COMMUNICATION


Group communication is done by broadcasting of messages. A message broadcast is
the sending of a message to all members in the distributed system. The communication may
be
• Multicast: A message is sent to a certain subset or a group.
• Unicasting: A point-to-point message communication.

The network layer protocol cannot provide the following functionalities:


▪ Application-specific ordering semantics on the order of delivery of messages.
▪ Adapting groups to dynamically changing membership.
▪ Sending multicasts to an arbitrary set of processes at each send event.
▪ Providing various fault-tolerance semantics.
▪ The multicast algorithms can be open or closed group.

2.7 CAUSAL ORDER (CO)


In the context of group communication, there are two modes of communication:
causal order and total order. Given a system with FIFO channels, causal order needs to be
explicitly enforced by a protocol. The following two criteria must be met by a causal
ordering protocol:
• Safety: In order to prevent causal order from being violated, a message M that
arrives at a process may need to be buffered until all system wide messages sent in the
causal past of the send (M) event to that same destination have already arrived. The
arrival of a message is transparent to the application process. The delivery event
corresponds to the receive event in the execution model.
• Liveness: A message that arrives at a process must eventually be delivered to the
process.

The Raynal–Schiper–Toueg algorithm


• Each message M should carry a log of all other messages sent causally before M’s
send event, and sent to the same destination dest(M).
• The Raynal–Schiper–Toueg algorithm canonical algorithm is a representative of
several algorithms that reduces the size of the local space and message space
overhead by various techniques.
• This log can then be examined to ensure whether it is safe to deliver a message.
• All algorithms aim to reduce this log overhead, and the space and time overhead of
maintaining the log information at the processes.
• To distribute this log information, broadcast and multicast communication is used.
• The hardware-assisted or network layer protocol assisted multicast cannot efficiently
provide features:
➢ Application-specific ordering semantics on the order of delivery of messages.
➢ Adapting groups to dynamically changing membership.
➢ Sending multicasts to an arbitrary set of processes at each send event.
➢ Providing various fault-tolerance semantics

The Kshem Kalyani – Singhal Optimal Algorithm


An optimal CO algorithm stores in local message logs and propagates on messages,
information of the form d is a destination of M about a message M sent in the causal past, as
long as and only as long as:

Propagation Constraint I: it is not known that the message M is delivered to d.

Propagation Constraint II: it is not known that a message has been sent to d in the causal
future of Send(M), and hence it is not guaranteed using a reasoning based on transitivity that
the message M will be delivered to d in CO.

Fig : Conditions for causal ordering


The Propagation Constraints also imply that if either (I) or (II) is false, the information
“d ∈ M.Dests” must not be stored or propagated, even to remember that (I) or (II) has been
falsified:
▪ not in the causal future of Deliverd(M1, a)
▪ not in the causal future of e k, c where d ∈Mk,cDests and there is no
other message sent causally between Mi,a and Mk, c to the same
destination d.

Information about messages:


(i) not known to be delivered
(ii) not guaranteed to be delivered in CO, is explicitly tracked by the algorithm using (source,
timestamp, destination) information.
Information about messages already delivered and messages guaranteed to be delivered in
CO is implicitly tracked without storing or propagating it, and is derived from the explicit
information. The algorithm for the send and receive operations is given in Fig. 2.7 a) and b).
Procedure SND is executed atomically. Procedure RCV is executed atomically except for a
possible interruption in line 2a where a non-blocking wait is required to meet the Delivery
Condition.
Fig 2.7 a) Send algorithm by Kshemkalyani–Singhal to optimally implement causal
ordering

Fig b) Receive algorithm by Kshemkalyani–Singhal to optimally implement causal


ordering

The data structures maintained are sorted row–major and then column–major:

1. Explicit tracking:
▪ Tracking of (source, timestamp, destination) information for messages (i) not known to be
delivered and (ii) not guaranteed to be delivered in CO, is done explicitly using the
I.Dests field of entries in local logs at nodes and o.Dests field of entries in messages.
▪ Sets li,a Dests and oi,a. Dests contain explicit information of destinations to which Mi,ais
not guaranteed to be delivered in CO and is not known to be delivered.
▪ The information about d ∈Mi,a .Dests is propagated up to the earliest events on all causal
paths from (i, a) at which it is known that Mi,a is delivered to d or is guaranteed to be
delivered to d in CO.

2. Implicit tracking:
▪ Tracking of messages that are either (i) already delivered, or (ii) guaranteed to be
delivered in CO, is performed implicitly.
▪ The information about messages (i) already delivered or (ii) guaranteed to be
delivered in CO is deleted and not propagated because it is redundant as far as
enforcing CO is concerned.
▪ It is useful in determining what information that is being carried in other messages
and is being stored in logs at other nodes has become redundant and thus can be
purged.
▪ The semantics are implicitly stored and propagated. This information about messages
that are (i) already delivered or (ii) guaranteed to be delivered in CO is tracked
without explicitly storing it.
▪ The algorithm derives it from the existing explicit information about messages (i) not
known to be delivered and (ii) not guaranteed to be delivered in CO, by examining
only oi,aDests or li,aDests, which is a part of the explicit information.

Fig 2.8: Illustration of propagation

constraintsMulticasts M5,1and M4,1


Message M5,1 sent to processes P4 and P6 contains the piggybacked information M5,1.
Dest= {P4, P6}. Additionally, at the send event (5, 1), the information M5,1.Dests = {P4,P6}
is also inserted in the local log Log5. When M5,1 is delivered to P6, the (new) piggybacked
information P4 ∈ M5,1 .Dests is stored in Log6 as M5,1.Dests ={P4} information about P6 ∈
M5,1.Dests which was needed for routing, must not be stored in Log6 because of constraint
I.
In the same way when M5,1 is delivered to process P4
at event (4, 1), only the new piggybacked information P6 ∈ M5,1 .Dests is inserted in Log4 as
M5,1.Dests =P6which is later propagated during multicast M4,2.

Multicast M4,3
At event (4, 3), the information P6 ∈M5,1.Dests in Log4 is propagated on multicast M4,3only
to process P6 to ensure causal delivery using the Delivery Condition. The piggybacked
information on message M4,3sent to process P3must not contain this information because of
constraint II. As long as any future message sent to P6 is delivered in causal order w.r.t.
M4,3sent to P6, it will also be delivered in causal order w.r.t. M5,1. And as M5,1 is already
delivered to P4, the information M5,1Dests = ∅ is piggybacked on M4,3 sent to P 3.
Similarly, the information P6 ∈ M5,1Dests must be deleted from Log4 as it will no longer be
needed, because of constraint II. M5,1Dests = ∅ is stored in Log4 to remember that M5,1 has
been delivered or is guaranteed to be delivered in causal order to all its destinations.
Learning implicit information at P2 and P3
When message M4,2is received by processes P2 and P3, they insert the (new)
piggybacked information in their local logs, as information M5,1.Dests = P6. They both
continue to store this in Log2 and Log3 and propagate this information on multicasts until
they learn at events (2, 4) and (3, 2) on receipt of messages M3,3and M4,3, respectively, that
any future message is expected to be delivered in causal order to process P6, w.r.t. M5,1sent
toP6. Hence by constraint II, this information must be deleted from Log2 andLog3. The
flow of events isgiven by;
• When M4,3 with piggybacked information M5,1Dests = ∅ is received byP3at (3, 2),
this is inferred to be valid current implicit information about multicast M5,1because
the log Log3 already contains explicit informationP6 ∈M5,1.Dests about that
multicast. Therefore, the explicit information in Log3 is inferred to be old and must be
deleted to achieve optimality. M5,1Dests is set to ∅ in Log3.
• The logic by which P2 learns this implicit knowledge on the arrival of M3,3is
identical.

Processing at P6
When message M5,1 is delivered to P6, only M5,1.Dests = P4 is added to Log6. Further,
P6 propagates only M5,1.Dests = P4 on message M6,2, and this conveys the current
implicit information M5,1 has been delivered to P6 by its very absence in the explicit
information.
• When the information P6 ∈ M5,1Dests arrives on M4,3, piggybacked as M5,1 .Dests
= P6 it is used only to ensure causal delivery of M4,3 using the Delivery
Condition,and is not inserted in Log6 (constraint I) – further, the presence of M5,1
.Dests = P4 in Log6 implies the implicit information that M5,1 has already been
delivered to P6. Also, the absence of P4 in M5,1 .Dests in the explicit
piggybacked information implies the implicit information that M5,1 has been
delivered or is guaranteed to be delivered in causal order to P4, and, therefore,
M5,1. Dests is set to ∅ in Log6.
• When the information P6 ∈ M5,1 .Dests arrives on M5,2 piggybacked as M5,1. Dests
= {P4, P6} it is used only to ensure causal delivery of M4,3 using the Delivery
Condition, and is not inserted in Log6 because Log6 contains M5,1 .Dests = ∅,
which gives the implicit information that M5,1 has been delivered or is
guaranteedto be delivered in causal order to both P4 and P6.

Processing at P1
• When M2,2arrives carrying piggybacked information M5,1.Dests = P6 this
(new)information is inserted in Log1.
• When M6,2arrives with piggybacked information M5,1.Dests ={P4}, P1learns
implicit information M5,1has been delivered to P6 by the very absence of explicit
information P6 ∈ M5,1.Dests in the piggybacked information, and hence marks
information P6 ∈ M5,1Dests for deletion from Log1
• The information “P6 ∈M5,1.Dests piggybacked on M2,3,which arrives at P 1, is
inferred to be outdated using the implicit knowledge derived from M5,1.Dest= ∅”
inLog1.
2.8 TOTAL ORDER

For each pair of processes Pi and Pj and for each pair of messages Mx and My that are delivered to
both the processes, Pi is delivered Mx before My if and only if Pj is delivered Mxbefore My.

Centralized Algorithm for total ordering

Each process sends the message it wants to broadcast to a centralized process, which
relays all the messages it receives to every other process over FIFO channels.

Complexity: Each message transmission takes two message hops and exactly n messages
in a system of n processes.

Drawbacks: A centralized algorithm has a single point of failure and congestion, and is
not an elegant solution.

Three phase distributed algorithm

Three phases can be seen in both sender and receiver side.

Sender

Phase 1
• In the first phase, a process multicasts the message M with a locally unique tag and
the local timestamp to the group members.

Phase 2
• The sender process awaits a reply from all the group members who respond with a
tentative proposal for a revised timestamp for that message M.
• The await call is non-blocking.

Phase 3
• The process multicasts the final timestamp to the group.
Fig
2.9:

Fig) Sender side of three phase distributed algorithm

Receiver Side
Phase 1
• The receiver receives the message with a tentative timestamp. It updates the variable
priority that tracks the highest proposed timestamp, then revises the proposed
timestamp to the priority, and places the message with its tag and the revised
timestamp at the tail of the queue temp_Q. In the queue, the entry is marked as
undeliverable.

Phase 2
• The receiver sends the revised timestamp back to the sender. The receiver then waits
in a non-blocking manner for the final timestamp.

Phase 3
• The final timestamp is received from the multi caster. The corresponding
message entry in temp_Q is identified using the tag, and is marked as deliverable
after the revised timestamp is overwritten by the final timestamp.
• The queue is then resorted using the timestamp field of the entries as the key. As the
queue is already sorted except for the modified entry for the message under
consideration, that message entry has to be placed in its sorted position in the queue.
• If the message entry is at the head of the temp_Q, that entry, and all consecutive
subsequent entries that are also marked as deliverable, are dequeued from temp_Q,
and enqueued in deliver_Q.

Complexity
This algorithm uses three phases, and, to send a message to n − 1 processes, it uses 3(n – 1)
messages and incurs a delay of three message hops
Example
An example execution to illustrate the algorithm is given in Figure 6.14. Here, A and B
multicast to a set of destinations and C and D are the common destinations for both
multicasts. •
Figure (a) The main sequence of steps is as follows:
1. A sends a REVISE_TS(7) message, having timestamp 7. B sends a REVISE_TS(9)
message, having timestamp 9.
2. C receives A’s REVISE_TS(7), enters the corresponding message in temp_Q, and marks
it as undeliverable; priority = 7. C then sends PROPOSED_TS(7) message to A
3. D receives B’s REVISE_TS(9), enters the corresponding message in temp_Q, and marks
it as undeliverable; priority = 9. D then sends PROPOSED_TS(9) message to B.
4. C receives B’s REVISE_TS(9), enters the corresponding message in temp_Q, and marks
it as undeliverable; priority = 9. C then sends PROPOSED_TS(9) message to B.
5. D receives A’s REVISE_TS(7), enters the corresponding message in temp_Q, and marks
it as undeliverable; priority = 10. D assigns a tentative timestamp value of 10, which is
greater than all of the times tamps on REVISE_TSs seen so far, and then sends
PROPOSED_TS(10) message to A.
The state of the system is as shown in the figure

Fig) An example to illustrate the three-phase total ordering algorithm. (a) A snapshot for
PROPOSED_TS and REVISE_TS messages. The dashed lines show the further execution
after the snapshot. (b) The FINAL_TS messages in the example.
Figure (b) The continuing sequence of main steps is as follows:
6. When A receives PROPOSED_TS(7) from C and PROPOSED_TS(10) from D, it
computes the final timestamp as max710=10, and sends FINAL_TS(10) to C and D.
7. When B receives PROPOSED_TS(9) from C and PROPOSED_TS(9) from D, it
computes the final timestamp as max99= 9, and sends FINAL_TS(9) to C and D.
8. C receives FINAL_TS(10) from A, updates the corresponding entry in temp_Q with the
timestamp, resorts the queue, and marks the message as deliverable. As the message is not
at the head of the queue, and some entry ahead of it is still undeliverable, the message is
not moved to delivery_Q.
9. D receives FINAL_TS(9) from B, updates the corresponding entry in temp_Q by
marking the corresponding message as deliverable, and resorts the queue. As the message
is at the head of the queue, it is moved to delivery_Q. This is the system snapshot shown in
Figure (b).
The following further steps will occur:
10. When C receives FINAL_TS(9) from B, it will update the correspond ing entry in
temp_Q by marking the corresponding message as deliv erable. As the message is at the
head of the queue, it is moved to the delivery_Q, and the next message (of A), which is
also deliverable, is also moved to the delivery_Q.
11. When D receives FINAL_TS(10) from A, it will update the corre sponding entry in
temp_Q by marking the corresponding message as deliverable. As the message is at the
head of the queue, it is moved to the delivery_Q

2.9 GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS


• A distributed computing system consists of processes that do not share a common
memory and communicate asynchronously with each other by message passing.
• Each component of has a local state. The state of the process is the local memory and
ahistory of its activity.
• The state of a channel is characterized by the set of messages sent along the channel
less the messages received along the channel. The global state of a distributed system
isa collection of the local states of its components.
• If shared memory were available, an up-to-date state of the entire system would be
available to the processes sharing the memory.
• The absence of shared memory necessitates ways of getting a coherent and complete
view of the system based on the local states of individual processes.
• A meaningful global snapshot can be obtained if the components of the distributed
system record their local states at the same time.
• This would be possible if the local clocks at processes were perfectly synchronized or
if there were a global system clock that could be instantaneously read by the
processes.
• If processes read time from a single common clock, various in determinate
transmission delays during the read operation will cause the processes to identify
various physical instants as the same time.

2.9.1 System Model


• The system consists of a collection of n processes, p1, p2,…,pn that are
connectedby channels.
• Let Cij denote the channel from process pi to process pj.
• Processes and channels have states associated with them.
• The state of a process at any time is defined by the contents of processor registers,
stacks, local memory, etc., and may be highly dependent on the local context of
the distributed application.
• The state of channel Cij, denoted by SCij, is given by the set of messages in transit
in the channel.
• The events that may happen are: internal event, send (send (mij)) and receive
(rec(mij)) events.
• The occurrences of events cause changes in the process state.
• A channel is a distributed entity and its state depends on the local states of the
processes on which it is incident.

• The transit function records the state of the channel Cij.


• In the FIFO model, each channel acts as a first-in first-out message queue and,
thus, message ordering is preserved by a channel.
• In the non-FIFO model, a channel acts like a set in which the sender process
adds messages and the receiver process removes messages from it in a random
order.

2.9.2 A consistent global state


The global state of a distributed system is a collection of the local states of the
processes and the channels. The global state is given by:

The two conditions for global state are:

Condition 1 preserves law of conservation of messages. Condition C2 states that in


thecollected global state, for every effect, its cause must be present.

Law of conservation of messages: Every message mijthat is recorded as sent in the local state of a
process pi must be captured in the state of the channel Cij or in the collected local state of the
receiver process pj.

➢ In a consistent global state, every message that is recorded as received is also recorded
as sent. Such a global state captures the notion of causality that a message cannot be
received if it was not sent.
➢ Consistent global states are meaningful global states and inconsistent global states are
not meaningful in the sense that a distributed system can never be in an inconsistent
state.

2.9.3 Interpretation of cuts


• Cuts in a space–time diagram provide a powerful graphical aid in representing and
reasoning about the global states of a computation. A cut is a line joining an arbitrary
point on each process line that slices the space–time diagram into a PAST and a
FUTURE.
• A consistent global state corresponds to a cut in which every message received in the
PAST of the cut has been sent in the PAST of that cut. Sucha cut is known as a
consistent cut.
• In a consistent snapshot, all the recorded local states of processes are concurrent; that
is, the recorded local state of no process casually affects the recorded local state of
anyother process.

Issues in recording global state


The non-availability of global clock in distributed system, raises the following issues:
Issue 1:
How to distinguish between the messages to be recorded in the snapshot from those
not to be recorded?
Answer:
• Any message that is sent by a process before recording its snapshot, must
berecorded in the global snapshot (from C1).
• Any message that is sent by a process after recording its snapshot, must not
berecorded in the global snapshot (from C2).

Issue 2:
How to determine the instant when a process takes its snapshot?
The answer
Answer:
A process pj must record its snapshot before processing a message mij that was sent byprocess pi after
recording its snapshot

2.9.4 SNAPSHOT ALGORITHMS FOR FIFO CHANNELS


Each distributed application has number of processes running on different physical
servers. These processes communicate with each other through messaging channels.

A snapshot captures the local states of each process along with the state of each communication channel.

Snapshots are required to:


• Checkpointing
• Collecting garbage
• Detecting deadlocks
• Debugging

Chandy–Lamport algorithm
• The algorithm will record a global snapshot for each process channel.
• The Chandy-Lamport algorithm uses a control message, called a marker.
• After a site has recorded its snapshot, it sends a marker along all of its outgoing
channels before sending out any more messages.
• Since channels are FIFO, a marker separates the messages in the channel into those to
be included in the snapshot from those not to be recorded in the snapshot.

• This addresses issue I1. The role of markers in a FIFO system is to act as delimiters
for the messages in the channels so that the channel state recorded by the process
at the receiving end of the channel satisfies the condition C2.

Fig 2.10: Chandy–Lamport algorithm

Initiating a snapshot
• Process Pi initiates the snapshot
• Pi records its own state and prepares a special marker message.
• Send the marker message to all other processes.
• Start recording all incoming messages from channels Cij for j not equal to i.

Propagating a snapshot
• For all processes Pjconsider a message on channel Ckj.

• If marker message is seen for the first time:


− Pjrecords own sate and marks Ckj as empty
− Send the marker message to all other processes.
− Record all incoming messages from channels Clj for 1 not equal to j or k.
− Else add all messages from inbound channels.

Terminating a snapshot
• All processes have received a marker.
• All process have received a marker on all the N-1 incoming channels.
• A central server can gather the partial state to build a global snapshot.

Correctness of the algorithm


• Since a process records its snapshot when it receives the first marker on any
incoming channel, no messages that follow markers on the channels incoming to it are
recorded in the process’s snapshot.
• A process stops recording the state of an incoming channel when a marker is received
on that channel.
• Due to FIFO property of channels, it follows that no message sent after the marker on that
channel is recorded in the channel state. Thus, condition C2 is satisfied.
• When a process pj receives message mij that precedes the marker on channel Cij, it acts
as follows: if process pj has not taken its snapshot yet, then it includes mij in its recorded
snapshot. Otherwise, it records mij in the state of the channel Cij. Thus, condition C1
is satisfied.

Complexity
The recording part of a single instance of the algorithm requires O(e) messages
and O(d) time, where e is the number of edges in the network and d is the diameter of
thenetwork.

Properties of the recorded global state

Fig) Timing diagram of two possible executions of the banking examples


1. (Markers shown using dashed-and-dotted arrows.) Let site S1 initiate the algorithm just after t1.
Site S1 records its local state (account A=$550) and sends a marker to site S2. The marker is
received by site S2 after t4. When site S2 receives the marker, it records its local state
(account B=$170), the state of channel C12 as $0, and sends a marker along channel C21.
When site S1 receives this marker, it records the state of channel C21 as $80. The $800 amount
in the system is conserved in the recorded global state,
A=$550 B=$170 C12 =$0 C21 =$80

2. (Markers shown using dotted arrows.) Let site S1 initiate the algorithm just after t0 and before
Sending the $50 for S2. Site S1 records its local state (account A = $600) and sends a marker to
S2. The marker is received by site S2 between t2 and t3. When site S2 receives the marker, it
records its local state (account B = $120), the state of channel C12 as $0, and sends a marker
along channel C21. When site S1 receives this marker, it records the state of channel C21 as $80.
The $800 amount in the system is conserved in the recorded global state,
A=$600 B=$120 C12 =$0 C21 =$80

The recorded global state may not correspond to any of the global states that occurred
during the computation.
This happens because a process can change its state asynchronously before the markers it
sent are received by other sites and the other sites record their states.
But the system could have passed through the recorded global states in some equivalent
executions.
The recorded global state is a valid state in an equivalent execution and if a stable property
(i.e., a property that persists) holds in the system before the snapshot algorithm begins, it holds in
the recorded global snapshot.
Therefore, a recorded global state is useful in detecting stable properties.

You might also like