0% found this document useful (0 votes)
2 views

DC_Module2_Notes

The document discusses logical time in distributed computing, emphasizing the importance of causality and the implementation of logical clocks through scalar, vector, and matrix time. It also covers the leader election algorithm, specifically the Bully and Ring algorithms, detailing their processes, advantages, and challenges. The document highlights the safety and liveness conditions necessary for effective leader election in distributed systems.

Uploaded by

sreya.sajeev05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DC_Module2_Notes

The document discusses logical time in distributed computing, emphasizing the importance of causality and the implementation of logical clocks through scalar, vector, and matrix time. It also covers the leader election algorithm, specifically the Bully and Ring algorithms, detailing their processes, advantages, and challenges. The document highlights the safety and liveness conditions necessary for effective leader election in distributed systems.

Uploaded by

sreya.sajeev05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CST402 DISTRIBUTED COMPUTING

MODULE: 2

Part1: Logical Time


Introduction
➢ Causality is tracked using physical time.
➢ In distributed systems, it is not possible to have global physical time; it is possible to
realize only an approximation of it.
➢ Three ways to implement logical time:
❖ Scalar time
❖ Vector time
❖ Matrix time
➢ Causality (or the causal precedence relation) among events in a distributed system is a
powerful concept in reasoning, analyzing, and drawing inferences about a computation.
➢ The knowledge of the causal precedence relation among the events of processes helps
solve a variety of problems in distributed systems such as:
❖ Distributed algorithms design
❖ Tracking of dependent events
❖ Knowledge about the progress
❖ Concurrency measure
➢ In a system of logical clocks, every process has a logical clock that is advanced using
a set of rules.
➢ Every event is assigned a timestamp and the causality relation between events can be
generally inferred from their timestamps.
➢ The timestamps assigned to events obey the fundamental monotonicity property; that
is, if an event a causally affects an event b, then the timestamp of a is smaller than the
timestamp of b.

2.1 A framework for a system of logical clocks


2.1.1 Definition:
➢ A system of logical clocks consists of a time domain T and a logical clock C.
➢ Elements of T form a partially ordered set over a relation <.
➢ This relation is usually called the happened before or causal precedence.
➢ The logical clock C is a function that maps an event e in a distributed system to an
element in the time domain T, denoted as C(e) and called the timestamp of e, and is
defined as follows:

such that the following property is satisfied:

1
➢ This monotonicity property is called the clock consistency condition.
➢ When T and C satisfy the following condition,

the system of clocks is said to be strongly consistent.

2.1.2 Implementing logical clocks


➢ Implementation of logical clocks requires addressing two issues:
❖ Data structures local to every process to represent logical time
❖ A protocol (set of rules) to update the data structures to ensure the consistency
condition
1. Data structures:

➢ Each process pi maintains data structures that allow it the following two capabilities:
❖ A local logical clock, denoted by lci, that helps process pi measure its own
progress.
❖ A logical global clock, denoted by gci, that is a representation of process pi’s
local view of the logical global time.
➢ It allows this process to assign consistent timestamps to its local events.
➢ Typically, lci is a part of gci.

2. Protocol:
➢ The protocol ensures that a process’s logical clock, and thus its view of the global time,
is managed consistently.
➢ The protocol consists of the following two rules:
❖ R1: This rule governs how the local logical clock is updated by a process when
it executes an event (send, receive, or internal).
❖ R2: This rule governs how a process updates its global logical clock to update
its view of the global time and global progress.
➢ It dictates what information about the logical time is piggybacked in a message.
➢ All logical clock systems implement rules R1 and R2 and consequently ensure the
fundamental monotonicity property associated with causality.

2.2 Scalar time


2.2.1 Definition:
➢ The scalar time representation was proposed by Lamport in 1978 as an attempt to
totally order events in a distributed system.

2
➢ Time domain in this representation is the set of non-negative integers.
➢ The logical local clock of a process pi and its local view of the global time are squashed
into one integer variable Ci.
➢ Rules R1 and R2 to update the clocks are as follows:
R1: Before executing an event (send, receive, or internal), process pi executes
the following:

❖ In general, every time R1 is executed, d can have a different value, and


this value may be application-dependent.
❖ Typically, d is kept at 1.
R2: Each message piggybacks the clock value of its sender at sending time.
❖ When a process pi receives a message with timestamp Cmsg, it executes
the following actions:
1. Ci: = max (Ci, Cmsg);
2. execute R1;
3. deliver the message.

Fig 2.1 Evolution of scalar time

2.2.2 Basic properties


1. Consistency property:
➢ Scalar clocks satisfy the monotonicity and hence the consistency property:

2.Total Ordering:
➢ Scalar clocks can be used to totally order events in a distributed system.
➢ The main problem in totally ordering events is that two or more events at different
processes may have an identical timestamp.
➢ Note that for two events e1 and e2,
➢ A tie-breaking mechanism is needed to order such events.
➢ Typically, a tie is broken as follows: process identifiers are linearly ordered and a tie
among events with identical scalar timestamp is broken on the basis of their process
identifiers.
➢ The lower the process identifier in the ranking, the higher the priority.
➢ The total order relation ≺ on two events x and y with timestamps (h,i) and (k,j),
respectively, is defined as follows:

3
3. Event counting:
➢ If the increment value d is always 1, the scalar time has the following interesting
property: if event e has a timestamp h, then h−1 represents the minimum logical
duration, counted in units of events, required before producing the event e; we call it
the height of the event e.
4. No strong consistency:
➢ The system of scalar clocks is not strongly consistent; that is, for two events ei and ej,

2.3 Vector time


2.3.1 Definition
➢ In the system of vector clocks, the time domain is represented by a set of n-dimensional
non-negative integer vectors.
➢ Each process pi maintains a vector vti[1..n], where vti[i] is the local logical clock of pi
and describes the logical time progress at process pi.
➢ Vti[j] represents process pi’s latest knowledge of process pj local time.
➢ If vti[j] = x, then process pi knows that local time at process pj has progressed till x.
➢ The entire vector vti constitutes pi’s view of the global logical time and is used to
timestamp events.
➢ Process pi uses the following two rules R1 and R2 to update its clock:
❖ R1: Before executing an event, process pi updates its local logical time as
follows:

❖ R2: Each message m is piggybacked with the vector clock vt of the sender
process at sending time.
❖ On the receipt of such a message (m,vt), process pi executes the following
sequence of actions:
1. update its global logical time as follows:

2. execute R1;
3. deliver the message m.
➢ The timestamp associated with an event is the value of the vector clock of its process
when the event is executed.

4
➢ Figure 2.2 shows an example of vector clocks progress with the increment value d = 1.
Initially, a vector clock is [0,0,0…..0].

Figure 2.2 Evolution of vector time

2.3.2 Basic properties


1. Isomorphism:
➢ If events in a distributed system are timestamped using a system of vector clocks, we
have the following property:
❖ If two events x and y have timestamps vh and vk, respectively, then

➢ Thus, there is an isomorphism between the set of partially ordered events produced by
a distributed computation and their vector timestamps.
➢ If the process at which an event occurred is known, the test to compare two timestamps
can be simplified as follows: if events x and y respectively occurred at processes pi and
pj and are assigned timestamps vh and vk, respectively, then

2. Strong consistency
➢ The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related.
3. Event counting

5
➢ If d is always 1 in rule R1, then the ith component of vector clock at process pi, vti[i],
denotes the number of events that have occurred at pi until that instant.
➢ So, if an event e has timestamp vh, vh[j] denotes the number of events executed by
process pj that causally precede e.
➢ Clearly, represents the total number of events that causally precede e in
the distributed computation.
Applications of Vector time

➢ Distributed debugging
➢ Implementations of causal ordering communication and causal distributed shared
memory
➢ Establishment of global breakpoints and in determining the consistency of checkpoints
in optimistic recovery.

Part2: Leader election algorithm

2.4 Leader Election Algorithm


➢ An algorithm for choosing a unique process to play a particular role (coordinator) is
called an election algorithm.
➢ Election algorithm assumes that every active process in the system has a unique
priority number.
➢ The process with highest priority will be chosen as the coordinator.
➢ When a coordinator fails, the algorithm elects that active process which has highest
priority number.
➢ Then this number is sent to every active process in the distributed system.
➢ Two types:
1. Bully Algorithm
2. Ring Algorithm

2.4.1 Bully Algorithm


Messages in Bully Algorithm
➢ There can be three types of messages that processes exchange witheach other in
the bully algorithm:
1. Election message: Sent to announce election.
2. OK (Alive) message: Responds to the Election message.
3. Coordinator (Victory) message: Sent by winner of the election to announce the
new coordinator.
➢ A process can begin an election by sending an election message to processes with
high priority number and waiting for ok messages inresponse.
➢ If none arrives within time T, the process considers itself as the coordinatorand sends a

6
coordinator message to all processes with lower identifiers announcing this.
➢ Otherwise, the other process start election for a coordinator.
➢ If coordinator does not respond to it within a time interval T, then it isassumed
that coordinator has failed.
➢ Now process P sends election message to every process with high prioritynumber.
➢ It waits for responses, if no one responds within time interval T, thenprocess P
elects itself as a coordinator.
➢ Then it sends a message to all lower priority number processes that it iselected as
their new coordinator.
➢ If a process that was previously down/failed comes back it take over thecoordinator
job.
➢ Biggest guy always wins hence the name bully algorithm.

Steps involved in Bully Algorithm


Step 1: Suppose Process P2 sends a message to coordinator P5 and P5 doesn’t respondin a
desired time T (possible reason could be crash, down, etc.

Step 2: Then process P2, sends an election message to all processes with Process IDgreater
than P2 (i.e. P3, P4 & P5) and awaits a response from the processes.

Step 3: If no one responds, P2 wins the election and become the coordinator.

7
Step 4: If any of the processes with Process ID higher than 2 responds with OK, P2’sjob is
done and this Process will take over.

Step 5: It then restarts and initiates an election message.

Step 6: Process P4 responds to P3 with an OK message to confirm its alive state and Process
P4 figures out that process 5 has crashed, and the new process with the highestID is process
4.

8
Step 7: The process that receives an election message sends a coordinator message ifit is
the Process with the highest ID (in this case it is P4).

Step 8: If a Process which was previously down (i.e. P5) comes back, it holds an election
and if it has the highest Process Id then it will become the new coordinator and sends message
to all other processes.

9
Practical Applications of the Bully Algorithm

➢ Cluster Management
➢ Resource Allocation
➢ Leader Election
Pros of the Bully algorithm
➢ Simple
➢ Effective in small networks
➢ Fault-tolerant
Cons of the Bully algorithm

➢ Inefficient in large networks


➢ Risk of starvation
➢ Initialization challenges
➢ Lack of preemption

Safety:
➢ Safety ensures that no incorrect process will ever be elected as thecoordinator.
➢ In the bully algorithm, this is guaranteed because only processes withhigher IDs
respond to election messages.
➢ Therefore, the process with the highest ID among the active processes will always
become the coordinator.
➢ Hence, the safety condition is met.
Liveness:
➢ Liveness ensures that the system eventually makes progress.
➢ In the bully algorithm, the election process continues until a new coordinator is elected.
➢ If a coordinator fails, an election is triggered, and eventually, a new coordinator will be
elected.
➢ However, it's important to note that if there's a continuous failure of higher-ID processes,
the election process might keep restarting without reaching a stable state.
➢ Therefore, the algorithm generally meets liveness conditions.

2.4.2 Ring-based election algorithm


➢ Each process pi has a communication channel to the next process in thering, p ( i
+ 1) mod N.
➢ The goal of this algorithm is to elect a single process called the
coordinator.
➢ In this algorithm we assume that the link between the processes are
unidirectional.
➢ Every process can message to other process in clockwise direction only.
➢ Initially every process is marked as a non participant in an election.
➢ Any process can begin an election.
➢ It proceeds by making itself as a participant, placing its identifier in anelection

10
message and sending it to its clockwise neighbour.
➢ When a process receives an election message it compares the identifier in the
message with its own.
➢ If the arrived identifier is greater, then it forwards the message to its neighbour.
➢ If the arrived identifier is smaller, then it substitutes its own its own identifier in the
message and forwards it.
➢ If the received identifier is that of the receiver itself, then this process’s identifier
must be the greatest, and it becomes the coordinator.
➢ The coordinator marks itself as a non participant once more and sends an elected
message to its neighbour, announcing its election and enclosing itsidentity.
Example:

➢ Initiator (Process 17) starts the election:


❖ Process 17 marks itself as a participant and sends an election message with
its identifier (17) to its clockwise neighbour, whichis Process 24.
➢ Process 24 receives the election message:
❖ Process 24 compares the received identifier (17) with its own (24).
❖ Since 17 < 24, Process 24 forwards the message with identifier 24 to its
clockwise neighbour, which is Process 1.
➢ Process 1 receives the election message:
❖ Process 1 compares the received identifier (24) with its own (1).
❖ Since 24 > 1, Process 1 forwards the message with identifier 24 to its
clockwise neighbour, which is Process 28.
➢ Process 28 receives the election message:
❖ Process 28 compares the received identifier (24) with its own (28).
❖ Since 24 < 28, Process 28 forwards the message with identifier 28to its
clockwise neighbour, which is Process 15.
➢ Process 15 receives the election message:
❖ Process 15 compares the received identifier (28) with its own (15).
❖ Since 28 > 15, Process 15 forwards the message with identifier 28to its
clockwise neighbor, which is Process 9.

11
➢ Process 9 receives the election message:
❖ Process 9 compares the received identifier (28) with its own (9).
❖ Since 28 > 9, Process 9 forwards the message with identifier 28 to its
clockwise neighbour, which is Process 4.
➢ Process 4 receives the election message:
❖ Process 4 compares the received identifier (28) with its own (4).
❖ Since 28 > 4, Process 4 forwards the message with identifier 28to its
clockwise neighbour, which is Process 3.
➢ Process 3 receives the election message:
❖ Process 3 compares the received identifier (28) with its own (3).
❖ Since 28 > 3, Process 3 forwards the message with identifier 28to its
clockwise neighbour, which is Process 17.
➢ Process 17 receives its own election message:
❖ Process 17 compares the received identifier (28) with its own (17).
❖ Since 28 > 17, Process 17 forwards the message with identifier 28to its
clockwise neighbour, which is Process 24.
➢ Process 24 receives its own election message:
❖ Process 24 compares the received identifier (28) with its own (24).
❖ Since 28 > 24, Process 24 forwards the message with identifier 28to its
clockwise neighbour, which is Process 1.
➢ Process 1 receives its own election message:
❖ Process 1 compares the received identifier (28) with its own (1).
❖ Since 28 > 1, Process 1 forwards the message with identifier 28to its
clockwise neighbour, which is Process 28.
➢ Process 28 receives its own election message:
❖ Process 28 compares the received identifier (28) with its own (28).
❖ Process 28 realizes that it has received its own identifier.
❖ It concludes that it has the highest identifier in the ring andbecomes
the coordinator.
➢ Process 28 sends an elected message:
❖ Process 28 marks itself as a non-participant and sends an electedmessage
with its identity (28) to its neighbour.

Part3: Global state and snapshot recording algorithms


2.5 Global state and snapshot recording algorithms
Introduction
➢ A distributed computing system is composed of spatially separated processes.
➢ Processes in this system do not share a common memory.
➢ Communication among processes is asynchronous and occurs through message
passing over communication channels.
➢ Each component (process) in a distributed system has a local state.

12
➢ Channels facilitate communication between processes.
➢ The state of a channel is characterized by the set of messages sent along the channel
less the messages received along the channel.
➢ The global state of a distributed system is a collection of the local states of all its
components.
➢ It represents the combined state of all processes in the distributed system.

2.5.1 System model and definitions


➢ System Components:
❖ The system comprises a collection of n processes (p1, p2, ..., pn) connected by
channels.
❖ Communication between processes occurs exclusively through message passing.
❖ No globally shared memory exists within the system.
➢ Communication Channels:
❖ The system is represented as a directed graph, with vertices representing
processes and edges representing unidirectional communication channels.
❖ The channel from process pi to process pj is denoted as Cij.
➢ States of Processes and Channels:
❖ The state of a process includes the contents of processor registers, stacks, local
memory, etc.
❖ The state of a channel (Cij) is defined by the set of messages currently in transit
through the channel.
➢ Events in the System:
❖ Three types of events model the actions performed by processes.
❖ Internal events occur within a process and change its state.
❖ Message send events send (mij) and message receive events rec (mij) change the
states of the sending and receiving processes and the corresponding channel.
❖ Events at a process are linearly ordered based on their occurrence.
➢ Global System State Transitions:
❖ Events cause transitions in the global system state.
❖ Internal events alter the state of the process in which they occur, while send and
receive events impact both the sending and receiving processes and the associated
channel state.
➢ Process State Representation:
❖ The state of process pi, denoted by LSi, is a result of the sequence of all events
executed by pi up to that instant.
❖ For an event e and a process state LSi, (e∈LSi) iff e belongs to the sequence of
events that have taken process pi to state LSi.
❖ For an event e and a process state LSi, (e∈LSi) iff e does not belong to the sequence
of events that have taken process pi to state LSi.
➢ Channel as a Distributed Entity:
❖ A channel is considered a distributed entity.

13
❖ The state of a channel (Cij) is influenced by the local states of the processes (pi
and pj).
❖ The set of messages for a channel Cij can be defined based on the local states of
processes pi and pj:

➢ Snapshot Recording Algorithm:


❖ The state recording algorithm captures the state of processes pi and pj as LSi and
LSj, respectively.
❖ The state of the channel Cij must be recorded as transit (LSi, LSj).
➢ Communication Models and Snapshot Algorithms:
❖ In the FIFO model, channels operate as first-in, first-out message queues,
preserving message ordering.
❖ In the non-FIFO model, a channel functions as a set where the sender adds
messages, and the receiver removes messages in a random order.
❖ A system supporting causal delivery satisfies the property: "for any two
messages mij and mkj, if send(mij) → send(mkj), then rec(mij) → rec(mkj).”
❖ The causal ordering model is beneficial in developing distributed algorithms
and may simplify algorithm design.
2.5.2 A consistent global state

➢ Definition of Global State:


❖ The global state of a distributed system is a compilation of the local states of
processes and channels.
❖ Notationally, the global state (GS) is defined as

➢ Consistent Global State Conditions:


❖ A global state GS is considered consistent if it satisfies two conditions:

➢ Interpretation of Conditions:
❖ Condition C1 ensures that every sent message (mij) recorded in the local state
of a process pi must be present either in the state of the channel Cij or in the collected
local state of the receiver process pj.
❖ Condition C2 states that, in the collected global state, every effect must have its
cause. If a message mij is not recorded as sent in the local state of process pi, then it
must not be present in the state of the channel Cij or in the collected local state of the
receiver process pj.
➢ Causality in Consistent Global State:
❖ In a consistent global state, every message recorded as received must also be
recorded as sent.

14
❖ This captures the causality principle that a message cannot be received if it
was not sent.
➢ Meaningful Global States:
❖ Consistent global states are considered meaningful, while inconsistent global
states are not meaningful.

2.5.3 Interpretation in terms of cuts

➢ Cuts in Space-Time Diagram:


❖ Cuts in a space-time diagram serve as a powerful graphical tool for
representing and reasoning about the global states of a computation.
❖ A cut is a line that connects an arbitrary point on each process line, dividing the
space-time diagram into a PAST and a FUTURE.
❖ Each cut corresponds to a global state, and conversely, every global state can be
graphically represented by a cut in the space-time diagram.
➢ Consistent Global States and Cuts:
❖ A consistent global state aligns with a cut in which every message received in
the PAST of the cut has been sent in the PAST.
❖ Such a cut is termed a consistent cut, and messages crossing the cut from the
PAST to the FUTURE are captured in the corresponding channel state.
➢ Illustration with Space-Time Diagram:

Fig 2.3 An interpretation in terms of a cut

❖ In a given example illustrated in Figure 2.3, Cut C1 is inconsistent as message


m1 flows from the FUTURE to the PAST.
❖ Cut C2 is consistent, and message m4 must be captured in the state of channel
C21.
➢ Concurrent Local States in Consistent Snapshot:
❖ In a consistent snapshot, all recorded local states of processes are concurrent.
❖ The recorded local state of one process does not causally affect the recorded
local state of any other process.

15
❖ Causality, extending from the set of events to the set of recorded local states, is
a fundamental concept in maintaining consistency.
2.5.4 Issues in recording a global state

➢ Procedure with a Global Physical Clock:


❖ A consistent global snapshot could be obtained using a global physical clock.
❖ The initiator sets a future time for the snapshot, broadcasts it to all processes,
and each process takes a local snapshot at that global time.
❖ The snapshot of channel Cij includes messages received by process pj after the
snapshot, with timestamps smaller than the snapshot time.
➢ Challenges without a Global Physical Clock:
❖ In the absence of a global clock, two challenges arise in recording a consistent
global snapshot:

• I1: Distinguishing Messages to be Recorded:


✓ Messages sent before recording the snapshot must be included, and
those sent after must be excluded (Conditions C1 and C2).
• I2: Determining the Instant for Snapshot:
✓ A process pj records its snapshot before processing a message mij sent
by process pi after recording its snapshot (Condition C2).
2.6 Snapshot algorithms for FIFO channels

➢ Global Snapshot = Global State = collection of individual local states of each process
in the distributed system + individual state of each communication channel in the
distributed system
➢ Each distributed application has a number of processes running on a number of physical
servers.
➢ These processes communicate with each other via channels.
➢ A snapshot captures the local states of each process along with the state of each
communication channel.
➢ Need for taking snapshots or recording global state of a system:
1. Check pointing: snapshot will be used as a checkpoint, to restart the application in
case of failure.
2. Collecting garbage: used to remove objects that don’t have any references.
3. Detecting deadlocks: used to examine the current application state.
4. Termination detection
5. Other Debugging
➢ Various snapshot algorithms are discussed for distributed systems, each assuming
different inter process communication capabilities.
➢ Two types of messages are identified: computation messages (from the application)
and control messages (exchanged by the snapshot algorithm).

16
➢ Execution of a snapshot algorithm is transparent to the underlying application,
except for occasional delays in application actions.

Example of global snapshots

➢ Two processes: P1 and P2

➢ Channel C12 from P1 to P2


➢ Channel C21 from P2 to P1

➢ Process states for P1 and P2

➢ Channel states (i.e., messages) for C12 and C21


➢ This is initial global state
➢ Also, a global snapshot

➢ P1 tells P2 to change its state variable, X2, from 1 to 4


➢ This is another global snapshot

17
➢ P2 receives the message from P1
➢ Another global snapshot

➢ P2 changes its state variable, X2, from 1 to 4


➢ And another global snapshot

2.6.1 Chandy–Lamport algorithm

➢ The Chandy-Lamport algorithm uses a control message, called a marker.


➢ After a site has recorded its snapshot, it sends a marker along all of its outgoing channels
before sending out any more messages.
➢ Markers act as delimiters in FIFO channels, separating messages to be included in
the snapshot from those not to be recorded.
➢ This addresses issue I1.
➢ A process must record its snapshot no later than when it receives a marker on any of its
incoming channels.
➢ This addresses issue I2.
➢ The algorithm is initiated by a process executing the marker sending rule.
➢ A process records its local state and sends a marker on each outgoing channel.
➢ Upon receiving a marker, a process executes the marker receiving rule.
➢ The algorithm terminates when each process has received a marker on all incoming
channels.

18
The Chandy–Lamport algorithm

Marker sending rule for process pi


(1) Process pi records its state.
(2) For each outgoing channel C on which a marker has not been sent, pi sends
a marker along C before pi sends further messages along C.

Marker receiving rule for process pj


On receiving a marker along channel C:
if pj has not recorded its state then Record the state of C as the empty
set Execute the “marker sending rule”
else
Record the state of C as the set of messages received along C after pj,s
state was recorded and before pj received the marker along C.

➢ Correctness:
❖ Conditions C1 and C2 must be satisfied for correctness.
❖ C1: Messages following markers on incoming channels are not recorded in a
process's snapshot.
❖ C2: No messages sent after a marker on a channel are recorded in the channel
state.
❖ The algorithm ensures these conditions are met.
➢ Complexity:
❖ Recording a single instance of the algorithm requires O(e) messages and O(d)
time.
❖ e is the number of edges in the network, and d is the diameter of the network.

Phases in Chandy–Lamport algorithm

1. Initiating a snapshot
➢ Process Pi initiates the snapshot
➢ Pi records its own state and prepares a special marker message.
➢ Send the marker message to all other processes (using N-1 outbound channels).
➢ Start recording all incoming messages from channels Cij for j not equal to i.
2. Propagating a snapshot
➢ For all processes Pj consider a message on channel Ckj.
➢ If marker message is seen for the first time:
❖ Pj records own sate and marks Ckj as empty
❖ Send the marker message to all other processes (using N-1 outbound channels).

19
❖ Record all incoming messages from channels Clj for 1 not equal to j or k.
➢ Else add all messages from inbound channels.
2. Terminating a snapshot
➢ All processes have received a marker.
➢ All process have received a marker on all the N-1 incoming channels.
➢ A central server can gather the partial state to build a global snapshot.
Example:

➢ P1 initiates a snapshot

➢ First, P1 records its state

➢ Then, P1 sends a marker message to P2 and begins recording all messages on inbound
channels.
➢ Meanwhile, P2 sent a message to P1.

➢ P2 receives a marker message for the first time, so records its state.
➢ P2 then sends a marker message to P1.

20
➢ P1 has already sent a marker message, so it records all messages it received on inbound
channels to the appropriate channel’s state.

➢ Both processes have recorded their state and all the state of all incoming channels.

2.6.2 Properties of the recorded global state

➢ The recorded global state may not correspond to any actual global state during the
computation.
➢ Asynchronous changes in process states may result in recorded states that did not occur
sequentially.
➢ If a stable property (e.g., termination or deadlock) holds before the snapshot algorithm,
it holds in the recorded global snapshot.
➢ The recorded global state is valuable for detecting stable properties in the system.
➢ The recorded states of all processes occur simultaneously at one physical instant,
viewed as stretching an elastic band.
➢ All recorded process states are mutually concurrent, meaning one state does not
causally depend on another.
➢ Logically, all these process states are viewed as occurring simultaneously, despite
differences in physical time.

Part4: Termination detection


2.7 Termination detection
➢ Detecting the termination of a distributed computation is challenging due to the absence
of complete knowledge of the global state and the non-existence of global time.

21
➢ Global termination is defined as every process being locally terminated with no
messages in transit between processes.
➢ A "locally terminated" state occurs when a process finishes its computation and won't
restart any action unless it receives a message.
➢ Termination detection involves inferring when the underlying computation has
terminated.
➢ Two distributed computations occur simultaneously:
❖ The underlying computation and
❖ The termination detection algorithm.
➢ Messages used in the underlying computation are basic messages, while those used for
termination detection are control messages.
➢ A Termination Detection (TD) algorithm must ensure:
❖ Execution of the TD algorithm cannot indefinitely delay the underlying
computation.
❖ The TD algorithm must not require the addition of new communication
channels between processes.

2.7.1 System model of a distributed computation


➢ Processes communicate solely through message passing in a distributed computation.
➢ Messages are received correctly after a finite delay, and communication is
asynchronous.
➢ No waiting for the receiver to be ready before sending a message, and FIFO ordering
is not guaranteed for messages on the same channel.
➢ Characteristics of Distributed Computation:

❖ Two states for a process: active (busy, doing local computation) and idle
(passive, temporarily finished local computation).
❖ An active process can become idle at any time, indicating completion of local
computation and processing of received messages.
❖ An idle process can become active only upon receiving a message from another
process.
❖ Only active processes can send messages.
❖ Messages can be received in both active and idle states, and an idle process
becomes active upon message receipt.
❖ Sending and receiving messages are atomic actions.

➢ Definition of termination detection:

Let pi(t) denote the state (active or idle) of process pi at instant t and ci,j(t) denote the
number of messages in transit in the channel at instant t from process pi to process pj.
A distributed computation is said to be terminated at time instant t0 iff:

22
2.8 Termination detection using distributed snapshots
➢ The algorithm relies on the concept that a consistent snapshot of a distributed system
can capture stable properties.
➢ Termination of a distributed computation is considered a stable property.
➢ If a consistent snapshot is taken after the distributed computation has terminated, it will
capture the termination of the computation.
➢ The algorithm assumes the existence of logical bidirectional communication channels
between every pair of processes.
➢ Communication channels are reliable but non-FIFO.
➢ Message delay is arbitrary but finite in the algorithm.

2.8.1 Informal description


➢ The algorithm is based on the concept that when a computation terminates, there is a
unique process that became idle last.
➢ When a process transitions from active to idle, it issues a request to all other processes
and itself to take a local snapshot.
➢ Upon receiving a request, a process grants it if it agrees that the requester became idle
before itself, taking a local snapshot for the request.
➢ A request is considered successful if all processes have taken local snapshots for it.
➢ The requester or an external agent may collect all local snapshots of a successful
request.
➢ If a request is successful, a global snapshot can be obtained, indicating the termination
of the computation. In the recorded snapshot, all processes are idle, and there are no
messages in transit to any of the processes.
2.8.2 Formal description
➢ The algorithm utilizes logical time to order requests, with each process i maintaining a
logical clock denoted by x, initialized to zero at the beginning of the computation.
➢ Processes increment their logical clock x by one each time they become idle.
➢ A basic message sent by a process at its logical time x is in the form of B(x).
➢ A control message requesting processes to take a local snapshot, issued by process i at
its logical time x, is in the form of R(x, i).
➢ Processes synchronize their logical clocks x loosely with those on other processes,
ensuring it is the maximum of clock values ever received or sent in messages.
➢ In addition to the logical clock x, a process maintains a variable k such that when idle,
(x, k) is the maximum of the values (x, k) on all messages R(x, k) ever received or sent.

23
➢ Logical time is compared based on the tuple (x, k) as follows: (x, k) > (x', k') if (x >
x') or ((x = x') and (k > k')), breaking ties using the process identification numbers k
and k'.
➢ The algorithm is defined by the following four rules.
➢ Each process i applies one of the rules whenever it is applicable.

2.9 Termination detection by weight throwing


➢ Termination detection by weight throwing involves a controlling agent monitoring the
computation, with communication channels between each process and the controlling
agent, as well as between every pair of processes.
➢ The basic idea includes initializing all processes in the idle state, assigning a weight of
zero to each process, and a weight of 1 to the controlling agent.
➢ The computation begins when the controlling agent sends a basic message to one of the
processes, making it active.
➢ A non-zero weight (0 < W ≤ 1) is assigned to each process in the active state and each
message in transit.
➢ Processes send a part of their weight in a message, and upon receiving a message, they
add the received weight to their own.
➢ The sum of weights on all processes and messages in transit is always 1.

24
➢ When a process becomes passive, it sends its weight to the controlling agent in a control
message, which adds it to its own weight.
➢ Termination is concluded by the controlling agent when its weight becomes 1.
➢ Notation:
❖ W: represents the weight on the controlling agent and a process.
❖ B(DW): A basic message B is sent as part of the computation, with DW
representing the assigned weight.
❖ C(DW): A control message C is sent from a process to the controlling agent,
with DW as the assigned weight.

2.9.1 Formal description


The algorithm is defined by the following four rules:

2.9.2 Correctness of the algorithm


➢ To prove the correctness of the algorithm, the following sets are defined:
A: set of weights on all active processes;
B: set of weights on all basic messages in transit;
C: set of weights on all control messages in transit;
Wc: weight on the controlling agent. 2
Two invariants I1 and I2 are defined for the algorithm:

25
➢ Invariant I1 states that “the sum of weights at the controlling process, at all active
processes, on all basic messages in transit, and on all control messages in transit is
always equal to 1”.
➢ Invariant I2 states that “weight at each active process, on each basic message in transit,
and on each control message in transit is non-zero.”

➢ Since the message delay is finite, after the computation has terminated, eventually
Wc = 1.
➢ Thus, the algorithm detects a termination in finite time.
2.10 A spanning-tree-based termination detection algorithm
➢ The algorithm assumes there are N processes Pi, 0 ≤ i ≤ N, which are modelled as the
nodes i, 0 ≤ i ≤ N of a fixed connected undirected graph.
➢ Edges: communication channels
➢ Uses a fixed spanning tree with process P0 at its root.
➢ Process P0 communicates with other processes.
➢ All leaf nodes report to their parents, if they have terminated.
➢ A parent node will similarly report to its parent.
➢ The root concludes that termination has occurred if it has terminated and all of its
immediate children have also terminated.
➢ Two waves of signals generated:
1. Tokens: a contracting wave of signals that move inward from the leaves to the root.
2. Repeat signal: move outward through the spanning tree.
A Simple Algorithm
➢ Initially, each leaf process is given a token.
➢ Each leaf process, after it has terminated sends its token to its parent.
➢ When a parent process terminates and after it has received a token from each of its
children, it sends a token to its parent.
➢ This way, each process indicates to its parent process that the subtree below it has
become idle.
➢ In a similar manner, the tokens get propagated to the root.
➢ The root of the tree concludes that termination has occurred, after it has become idle
and has received a token from each of its children.
A Problem with the algorithm
➢ This simple algorithm fails under some circumstances.

26
➢ When a process, after it has sent a token to its parent, receives a message from some
other process, which could cause the process to again become active.

Fig 2.4 An Example of the problem

The Correct Algorithm


➢ It was developed by Topor.
➢ Main idea is to color the processes and tokens and change the color when messages
are involved.
➢ The algorithm works as follows:
❖ Initially, each leaf process is provided with a token.
❖ The set S is used for book-keeping to know which processes have the token.
❖ Hence S will be the set of all leaves in the tree.
➢ Initially, all processes and tokens are colored “white”.
➢ When a leaf node terminates, it sends the token it holds to its parent process.
➢ A parent process will collect the token sent by each of its children.
➢ After it has received a token from all of its children and after it has terminated, the
parent process sends a token to its parent.
➢ A process turns black when, it sends a message to some other process.
➢ When a process terminates, if its color is black, it sends a black token to its parent.
➢ A black process turns back to white, after it has sent a black token to its parent.
➢ A parent process holding a black token, sends only a black token to its parent.
➢ The root, upon receiving a black token, will know that a process in the tree had sent a
message to some other process.
➢ Hence, it restarts the algorithm by sending a Repeat signal to all its children.
➢ Each child of the root propagates the Repeat signal to each of its children and so on,
until the signal reaches the leaves.
➢ The leaf nodes restart the algorithm on receiving the Repeat signal.
➢ The root concludes that termination has occurred, if:
❖ it is white,
❖ it is idle, and
❖ it received a white token from each of its children.

27
An example
Step: 1
➢ Initially, all nodes 0 to 6 are white.
➢ Leaf nodes 3, 4, 5, and 6 are each given a token.
➢ Node 3 has token T3, node 4 has token T4, node 5 has token T5, and node 6 has token
T6.
➢ Hence, S is 3, 4, 5, 6.

Step: 2
➢ When node 3 terminates, it transmits T3 to node 1.
➢ Now S changes to 1, 4, 5, 6.
➢ When node 4 terminates, it transmits T4 to node 1.
➢ Hence, S changes to 1, 5, 6.

Step: 3
➢ Node 1 has received a token from each of its children and, when it terminates, it
transmits a token T1 to its parent.
➢ S changes to 0, 5, 6.

28
Step:4
➢ Suppose node 5 sends a message to node 1, causing node 1 to again become active.
➢ Since node 1 had already sent a token to its parent node 0, the new message makes the
system inconsistent as far as termination detection is concerned.
➢ Node 5 is colored black, since it sent a message to node 1.

Step:5
➢ When node 5 terminates, it sends a black token T5 to node 2.
➢ So, S changes to 0, 2, 6.
➢ After node 5 sends its token, it turns white.
➢ When node 6 terminates, it sends the white token T6 to node 2.
➢ Hence, S changes to 0, 2.

Step:6
➢ When node 2 terminates, it sends a black token T2 to node 0, since it holds a black
token T5 from node 5.
➢ Since node 0 has received a black token T2 from node 2, it knows that there was a
message sent by one or more of its children in the tree and hence sends a repeat signal
to each of its children.
➢ The repeat signal is propagated to the leaf nodes and the algorithm is repeated.

29
➢ Node 0 concludes that termination has occurred if it is white, it is idle, and it has
received a white token from each of its children.

Performance
➢ Best case message complexity: O(N)
✓ N is the number of processes
➢ Worst case complexity: O(N*M)
✓ M is the number of computation messages exchanged

*****************************

30

You might also like