0% found this document useful (0 votes)
2 views

mod-2

This document covers key concepts in distributed computing, including logical time frameworks, leader election algorithms, global state recording, and termination detection methods. It details scalar and vector time properties, the bully and ring-based election algorithms, and provides insights into implementing logical clocks and their consistency. Additionally, it discusses challenges in recording global states and the significance of causality in distributed systems.

Uploaded by

arathipc2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

mod-2

This document covers key concepts in distributed computing, including logical time frameworks, leader election algorithms, global state recording, and termination detection methods. It details scalar and vector time properties, the bully and ring-based election algorithms, and provides insights into implementing logical clocks and their consistency. Additionally, it discusses challenges in recording global states and the significance of causality in distributed systems.

Uploaded by

arathipc2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

lOMoARcPSD|55070525

MOD-2

Distributed Computing (APJ Abdul Kalam Technological University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Arathi Pc ([email protected])
lOMoARcPSD|55070525

MOD-2
19 May 2024 05:33 PM

Module – 2 (Election algorithm, Global state and Termination detection)


Logical time – A framework for a system of logical clocks, Scalar time, Vector time.
Leader election algorithm – Bully algorithm, Ring algorithm.
Global state and snapshot recording algorithms – System model and definitions, Snapshot algorithm for FIFO channels –
Chandy Lamport algorithm.
Termination detection – System model of a distributed computation, Termination detection using distributed snapshots,
Termination detection by weight throwing, Spanning-tree-based algorithm.

1. What are the basic properties of scalar time. -3


2. Explain about Termination Detection.-3
3. Illustrate the Working of Spanning Tree Algorithm-10
4. Define properties of Vector time.-4
5. Explain Ring based Election Algorithm in Detail.-8
6. Explain how logical clock is implemented.-6
7. Define logical clock and explain the implementation of the logical clock.
8. Apply ring-based leader election algorithm with 10 processes in the worst-performing case. Count the number of messages needed.
9. Apply spanning tree-based termination detection algorithm in the following scenario. The nodes are processes 0 to 6. Leaf nodes 3, 4, 5, and 6 are each given tokens T3,
T4, T5 and T6 respectively. Leaf nodes 3, 4, 5 and 6 terminate in the order, but before terminating node 5,it sends a message to node 1.

10.

10. Specify the issues in recording a global state. -3


11. Explain the rules used to update clocks in scalar time representation. -3
12. Illustrate bully algorithm for electing a new leader. Does the algorithm meet liveness and safety conditions? -7
13. Clearly mentioning assumptions, explain the rules of termination detection using distributed snapshots. -7
14. In Chandy-Lamport algorithm for recording global snapshots, explain how the recorded local snapshots can be put together to create the global snapshot. Can multiple
processes initiate the algorithm concurrently?
15. Illustrate the working of spanning tree based termination detection algorithm.

Logical Clock
• In distributed systems, it is not possible to have global physical time
• It is possible to realize only an approximation of it .
• As asynchronous distributed computations make progress in spurts, it turns out that the logical time, which advances in jumps, is sufficient to capture the
fundamental monotonicity distributed systems property(order) associated with causality in
• Causality (or the causal precedence relation) among events in a distributed system is a powerful concept in reasoning, analysing, and drawing inferences about
a computation .
• The knowledge of the causal precedence relation among the events of processes helps solve a variety of problems in distributed systems .
• However, in distributed computing systems, the rate of occurrence of events is several magnitudes higher and the event execution time is several magnitudes
smaller.
• Consequently, if the physical clocks are not precisely synchronized, the causality relation between events may not be accurately captured
• Network Time Protocols which can maintain time accurate to a few tens of milliseconds on the Internet, are not adequate to capture the causality relation in
distributed systems.
• However, in a distributed computation, generally the progress is made in spurts and the interaction between processes occurs in spurts
• In a system of logical clocks, every process has a logical clock that is advanced using a set of rules.
• Every event is assigned a timestamp and the causality relation between events can be generally inferred from their timestamps.
• The timestamps assigned to events obey the fundamental monotonicity property; that is, if an event a causally affects an event b, then the timestamp of a is
smaller than the timestamp of b.

Framework for a system of Logical clocks


• Definition : A system of logical clocks consists of a time domain T and a logical clock C .
• Elements of T form a partially ordered set over a relation. This relation is usually called the happened before or causal precedence.
• Intuitively, this relation is analogous to the earlier than relation provided by the physical time.
• The logical clock C is a function that maps an event e in a distributed system to an element in the time domain T, denoted as C(e) and called the timestamp of
e, and is defined as follows:
○ C: H->T
○ Such that the following property is satisfied:
▪ For two events ei and ej, ei->ej <->C(ei)<C(ej).
○ This monotonicity property is called the clock consistency condition. When T and C satisfy the following condition,
▪ For two events ei and ej, ei->ej<-> C(ei)<C(ej).
▪ The system of clocks is said to be strongly consistent.

Implementing Logical Clocks


Implementation of logical clocks requires addressing two issues:

Downloaded
MOD-2 Page 1 by Arathi Pc ([email protected])
lOMoARcPSD|55070525

Implementation of logical clocks requires addressing two issues:


1. data structures local to every process to represent logical time
2. protocol (set of rules) to update the data structures to ensure the consistency condition.
• Each process pi maintains data structures that allow it the following two capabilities:
○ A local logical clock, denoted by lci, that helps process pi measure its own progress.
○ A logical global clock, denoted by gci, that is a representation of process pi’s local view of the logical global time. It allows this process to assign
consistent timestamps to its local events. Typically, lci is a part of gci
• The protocol ensures that a process’s logical clock, and thus its view of the global time, is managed consistently. The protocol consists of the following two
rules:
○ R1: This rule governs how the local logical clock is updated by a process when it executes an event (send, receive, or internal).
○ R2: This rule governs how a process updates its global logical clock to update its view of the global time and global progress. It dictates what information
about the logical time is piggybacked in a message and how this information is used by the receiving process to update its view of the global time.

Scalar Time - Explain the rules used to update clocks in scalar time representation.
• Time domain is the set of non-negative integers.
• The logical local clock of a process pi and its local view of the global time are squashed into one integer variable Ci .
• Rules R1 and R2 to update the clocks are as follows:
○ R1: Before executing an event (send, receive, or internal), process pi executes the following: Ci := Ci + d (d > 0)
▪ In general, every time R1 is executed, d can have a different value; however, typically d is kept at 1.
○ R2: Each message piggybacks the clock value of its sender at sending time.
• When a process pi receives a message with timestamp Cmsg , it executes the following actions:
○ Ci := max(Ci, Cmsg )
○ Execute R1.
○ Deliver the message.

Basic Properties of Scalar Time


1. Consistency Property
○ Scalar clocks satisfy the monotonicity and hence the consistency property: for two events ei and ej , ei → ej ⇒ C(ei) < C(ej).
2. Total Ordering
○ Scalar clocks can be used to totally order events in a distributed system.
○ The main problem in totally ordering events is that two or more events at different processes may have identical timestamp.
▪ e1 and e2, C(e1)=C(e2) => e1|e2.
○ A tie-breaking mechanism is needed to order such events.
○ A tie is broken as follows: •
▪ Process identifiers are linearly ordered and tie among events with identical scalar timestamp is broken on the basis of their process identifiers.
▪ The lower the process identifier in the ranking, the higher the priority.
▪ The timestamp of an event is denoted by a tuple (t, i) where t is its time of occurrence and i is the identity of the process where it occurred.
▪ The total order relation ≺ on two events x and y with timestamps (h,i) and (k,j), respectively, is defined as follows:
x ≺ y ⇔(h < k or (h = k and i < j))
3. Event counting
○ If the increment value d is always 1, the scalar time has the following interesting property:
▪ if event e has a timestamp h, then h-1 represents the minimum logical duration, counted in units of events, required before producing the event e;
○ We call it the height of the event e.
○ In other words, h-1 events have been produced sequentially before the event e regardless of the processes that produced these events.
4. No Strong Consistency
○ The system of scalar clocks is not strongly consistent; that is,
▪ for two events ei and ej , C(ei) < C(ej) ⇒ ei → ej .
○ The reason that scalar clocks are not strongly consistent is that the logical local clock and logical global clock of a process are squashed into one,
resulting in the loss of causal dependency information among events at different processes.

Vector Time
• In the system of vector clocks, the time domain is represented by a set of n-dimensional non-negative integer vectors.
• Each process pi maintains a vector vti [1..n], where vti[i] is the local logical clock of pi and describes the logical time progress at process pi .
• vti[j] represents process pi ’s latest knowledge of process pj local time.
• If vti[j]=x, then process pi knows that local time at process pj has progressed till x.
• The entire vector vti constitutes pi ’s view of the global logical time and is used to timestamp events.
• Process pi uses the following two rules R1 and R2 to update its clock:
○ R1: Before executing an event, process pi updates its local logical time as follows:
▪ vti [i] := vti [i] + d ; (d > 0)
○ R2: Each message m is piggybacked with the vector clock vt of the sender process at sending time.
• On the receipt of such a message (m,vt), process pi executes the following sequence of actions:
1. Update its global logical time as follows: 1 ≤ k ≤ n : vti[k] := max(vti[k], vt[k])
2. Execute R1.
3. Deliver the message m.

Properties of Vector Time


1. Isomorphism
○ If events in a distributed system are timestamped using a system of vector clocks, we have the following property.
▪ If two events x and y have timestamps vh and vk, respectively,then
□ x → y ⇔ vh < vk
□ x || y ⇔ vh || vk.
○ Thus, there is an isomorphism between the set of partially ordered events produced by a distributed computation and their vector timestamps
2. Strong Consistency
○ The system of vector clocks is strongly consistent; thus, by examining the vector timestamp of two events, we can determine if the events are causally
related.
○ However, Charron-Bost showed that the dimension of vector clocks cannot be less than n, the total number of processes in the distributed computation,
for this property to hold.
3. Event Counting
○ If d=1 (in rule R1), then the i th component of vector clock at process pi , vti [i], denotes the number of events that have occurred at pi until that instant.
○ So, if an event e has timestamp vh, vh[j] denotes the number of events executed by process pj that causally precede e. Clearly, ∑vh[j] − 1 represents the
total number of events that causally precede e in the distributed computation.

Downloaded
MOD-2 Page 2 by Arathi Pc ([email protected])
lOMoARcPSD|55070525

Leader Election Algorithm


• In order to perform coordination, distributed systems employ the concept of coordinators.
• An algorithm for choosing a unique process to play a particular role(coordinator) is called an election algorithm.
• Election algorithm assumes that every active process in the system has a unique priority number.
• The process with highest priority will be chosen as the coordinator.
• When a coordinator fails, the algorithm elects that active process which has highest priority number.
• Then this number is send to every active process in the distributed system.

Bully Algorithm
• There are 3 types of messages in bully algorithm:
○ Election message – announces an election
○ Ok message – response to an election message
○ Coordinator message – announce the identity of the elected process
Steps:-
1. A process can begin an election by sending an election message to processes with high priority number and waiting for ok messages in response.
2. If none arrives within time T, the process considers itself as the coordinator and sends a coordinator message to all processes with lower identifiers
announcing this.
3. Otherwise the other process start election for a coordinator.
4. If coordinator does not respond to it within a time interval T, then it is assumed that coordinator has failed.
5. Now process P sends election message to every process with high priority number.
6. It waits for responses, if no one responds within time interval T, then process P elects itself as a coordinator.
7. Then it sends a message to all lower priority number processes that it is elected as their new coordinator.
8. If a process that was previously down/failed comes back it take over the coordinator job.
9. Biggest guy always wins hence the name bully algorithm.

10.

• EG: Pid's 0,4,2,1,5,6,3,7, P7 was the initial coordinator and crashed, Illustrate Bully algorithm, if P4 initiates election , Calculate total number of election
messages and coordinator message.

Ring Based Algorithm


• This algorithm applies to systems organized as a logical ring.
• In this algorithm we assume that the link between the processes are unidirectional.
• Every process can message to other process in clockwise direction only.
• Initially every process is marked as a non participant in an election.
• Any process can begin an election.
• It proceeds by making itself as a participant, placing its identifier in an election message and sending it to its clockwise neighbour.
• When a process receives an election message it compares the identifier in the message with its own.
• If the arrived identifier is greater, then it forwards the message to its neighbour.

Downloaded
MOD-2 Page 3 by Arathi Pc ([email protected])
lOMoARcPSD|55070525

• If the arrived identifier is greater, then it forwards the message to its neighbour.
• If the arrived identifier is smaller, then it substitutes its own identifier in the message and forwards it.
• If the received identifier is that of the receiver itself, then this process’s identifier must be the greatest, and it becomes the coordinator.
• The coordinator marks itself as a coordinator and sends an elected message to its neighbour.

EG:

• The election was started by process 17.


• Process forward to neighbour with greatest identifier
• The election message currently contains 24, and forwards
• The process 28 will replace 24 with its identifier when the message reaches it
• The election message currently contains 28, and
• Forwards until the received identifier is that of the receiver itself,
• It becomes the coordinator and sends a coordinator message to its neighbours

EG: In a ring topology 7 processes are connected with different ID's as shown: P20->P5->P10 >P18->P3->P16->P9 If process P10 initiates election after how many
message passes will the coordinator be elected and known to all the processes. What modification will take place to the election message as it passes through all the
processes? Calculate total number of election messages and coordinator messages

Downloaded
MOD-2 Page 4 by Arathi Pc ([email protected])
lOMoARcPSD|55070525

Global State
• A distributed computing system consists of spatially separated processes that do not share a common memory and communicate asynchronously with each other by
message passing over communication channels.
• Each component of a distributed system has a local state.
• The state of a process is characterized by the state of its local memory and a history of its activity.
• The state of a channel is characterized by the set of messages sent along the channel and the messages received along the channel.
• The global state of a distributed system is a collection of the local states of its components.

System Model
• The system consists of a collection of n processes p1, p2, ..., pn that are connected by channels.
• There are no globally shared memory and physical global clock and processes communicate by passing messages through communication channels.
• Cij denotes the channel from process pi to process pj and its state is denoted by SCij .
• The actions performed by a process are modelled as three types of events:
○ Internal events,
○ the message send event and
○ the message receive event.
• For a message mij that is sent by process pi to process pj ,
○ let send(mij) and rec(mij) denote its send and receive events.
○ At any instant, the state of process pi , denoted by LSi , is a result of the sequence of all the events executed by pi till that instant.
• For an event e and a process state LSi , e∈LSi iff e belongs to the sequence of events that have taken process pi to state LSi .
• For an event e and a process state LSi , e∉LSi iff e does not belong to the sequence of events that have taken process pi to state LSi .
• For a channel Cij , the following set of messages can be defined based on the local states of the processes pi and pj
○ Transit: transit(LSi , LSj) = {mij |send(mij) ∈ LSi ꓥ rec(mij) ∉ LSj }
• Thus, if a snapshot recording algorithm records the state of processes pi and pj as LSi and LSj , respectively, then it must record the state of channel Cij as transit
(LSi_LSj).
• In FIFO model, each channel acts as a first-in first-out message queue and thus, message ordering is preserved by a channel.
• In non-FIFO model, a channel acts like a set in which the sender process adds messages and the receiver process removes messages from it in a random order.
• A system that supports causal delivery of messages satisfies the following property:
○ “For any two messages mij and mkj , if send(mij) → send(mkj), then rec(mij) → rec(mkj)”

Consistent Global State


• Notationally, global state GS is defined as, GS = { ՍiLSi , ՍiLSj, SCij }
• A global state GS is a consistent global state iff it satisfies the following two conditions :
○ C1: send(mij)∈LSi ⇒ mij∈SCij ⊕ rec(mij)∈LSj . (⊕ is Ex-OR operator.)
○ C2: send(mij)∉LSi ⇒ mij ∉SCij ∧ rec(mij)∉LSj .
• Condition C1 states the law of conservation of messages. Every message mij that is recorded as sent in the local state of a process pi must be captured in the state of
the channel Cij or in the collected local state of the receiver process pj .
• Condition C2 states that in the collected global state, for every effect, its cause must be present. If a message mij is not recorded as sent in the local state of process
pi, then it must neither be present in the state of the channel Cij nor in the collected local state of the receiver process pj . In a consistent global state, every message
that is recorded as received is also recorded as sent.

Interpretation in terms of cuts:


• A cut in a space-time diagram is a line joining an arbitrary point on each process line that slices the space-time diagram into a PAST and a FUTURE.
• A cut is a line joining an arbitrary point on each process line that slices the space–time diagram into a PAST and a FUTURE.
• A consistent global state corresponds to a cut in which every message received in the PAST of the cut was sent in the PAST of that cut. Such a cut is known as a
consistent cut.
• For example, consider the space-time diagram for the computation illustrated in Figure.

• Cut C1 is inconsistent because message m1 is flowing from the FUTURE to the PAST. Cut C2 is consistent and message m4 must be captured in the state of channel
C21.
• Note that in a consistent snapshot, all the recorded local states of processes are concurrent; that is, the recorded local state of no process casually affects the
recorded local state of any other process.

Issues in Recording a global state


• If a global physical clock were available, the following simple procedure could be used to record a consistent global snapshot of a distributed system.
○ In this, the initiator of the snapshot collection decides a future time at which the snapshot is to be taken and broadcasts this time to every process.
○ All processes take their local snapshots at that instant in the global time.
○ The snapshot of channel Cij includes all the messages that process pj receives after taking the snapshot and whose timestamp is smaller than the time of the
snapshot. (All messages are timestamped with the sender’s clock.)
○ Clearly, if channels are not FIFO, a termination detection scheme will be needed to determine when to stop waiting for messages on channels.
• However, a global physical clock is not available in a distributed system and the following two issues need to be addressed in recording of a consistent global
snapshot of a distributed system
○ I1: How to distinguish between the messages to be recorded in the snapshot from those not to be recorded. Any message that is sent by a process before
recording its snapshot, must be recorded in the global snapshot (from C1). Any message that is sent by a process after recording its snapshot, must not be
recorded in the global snapshot (from C2).
○ I2: How to determine the instant when a process takes its snapshot. A process pj must record its snapshot before processing a message mij that was sent by
process pi after recording its snapshot.

Snapshot Algorithms for FIFO channels

Downloaded
MOD-2 Page 5 by Arathi Pc ([email protected])
lOMoARcPSD|55070525

Snapshot Algorithms for FIFO channels


• Global Snapshot = Global State = collection of individual local states of each process in the distributed system + individual state of each communication channel in
the distributed system
• Snapshot:- is a photograph of a process taken or recorded quickly
• Need for taking snapshots or recording global state of a system
○ 1. Check pointing:- snapshot will be used as a checkpoint, to restart the application in case of failure
○ 2. Collecting garbage:- used to remove objects that don’t have any references
○ 3. Detecting deadlocks:- used to examine the current application state.
○ 4. Termination detection

Chandy Lamport Algorithm


• The Chandy-Lamport algorithm uses a control message, called a marker whose role in a FIFO system is to separate messages in the channels.
• After a site has recorded its snapshot, it sends a marker, along all of its outgoing channels before sending out any more messages.
• A marker separates the messages in the channel into those to be included in the snapshot from those not to be recorded in the snapshot.
• A process must record its snapshot no later than when it receives a marker on any of its incoming channels.
• The algorithm can be initiated by any process by executing the “Marker Sending Rule” by which it records its local state and sends a marker on each outgoing channel.
• A process executes the “Marker Receiving Rule” on receiving a marker. If the process has not yet recorded its local state, it records the state of the channel on which
the marker is received as empty and executes the “Marker Sending Rule” to record its local state.
• The algorithm terminates after each process has received a marker on all of its incoming channels.
• All the local snapshots get disseminated to all other processes and all the processes can determine the global state.

Correctness:
○ Due to FIFO property of channels, it follows that no message sent after the marker on that channel is recorded in the channel state. Thus, condition C2 is
satisfied.
○ When a process pj receives message mij that precedes the marker on channel Cij , it acts as follows: If process pj has not taken its snapshot yet, then it includes
mij in its recorded snapshot. Otherwise, it records mij in the state of the channel Cij . Thus, condition C1 is satisfied.
Complexity:
○ The recording part of a single instance of the algorithm requires O(e) messages and O(d) time, where e is the number of edges in the network and d is the
diameter of the network.

Termination Detection
• A fundamental problem in distributed systems is to determine if a distributed computation has terminated.
• The detection of the termination of a distributed computation is non-trivial since no process has complete knowledge of the global state, and global time does not
exist.
• A distributed computation is considered to be globally terminated if every process is locally terminated and there is no message in transit between any processes.
• A “locally terminated” state is a state in which a process has finished its computation and will not restart any action unless it receives a message.
• In the termination detection problem, a particular process (or all of the processes) must infer when the underlying computation has terminated.
• Messages used in the underlying computation are called basic messages, and messages used for the purpose of termination detection (by a termination detection
algorithm) are called control messages.
• A termination detection (TD) algorithm must ensure the following:
○ 1. Execution of a TD algorithm cannot indefinitely delay the underlying computation; that is, execution of the termination detection algorithm must not freeze
the underlying computation.
○ 2. The termination detection algorithm must not require addition of new communication channels between processes.

System Model of Distributed Computation


• A distributed computation has the following characteristics:
○ At any given time, a process can be in only one of the two states: active, where it is doing local computation and idle, where the process has (temporarily)
finished the execution of its local computation and will be reactivated only on the receipt of a message from another process.
• An active process can become idle at any time. An idle process can become active only on the receipt of a message from another process.
• Only active processes can send messages.
• A message can be received by a process when the process is in either of the two states, i.e., active or idle.
• On the receipt of a message, an idle process becomes active.
• The sending of a message and the receipt of a message occur as atomic actions
• Definition of termination detection:
○ Let pi(t) denote the state (active or idle) of process pi at instant t and
○ ci,j(t) denote the number of messages in transit in the channel at instant t from process pi to process pj .
○ A distributed computation is said to be terminated at time instant t0 iff:
(∀i :: pi(t0) = idle) ꓥ (∀i, j :: ci,j(t0) = 0)

Downloaded
MOD-2 Page 6 by Arathi Pc ([email protected])
lOMoARcPSD|55070525

▪ (∀i :: pi(t0) = idle) ꓥ (∀i, j :: ci,j(t0) = 0)


○ Thus, a distributed computation has terminated iff all processes have become idle and there is no message in transit in any channel

Termination Detection using Distributed Snapshot


• The algorithm assumes that there is a logical bidirectional communication channel between every pair of processes.
• Communication channels are reliable but non-FIFO.
• Message delay is arbitrary but finite.
• Informal description:
○ when a computation terminates, there must exist a unique process which became idle last.
○ When a process goes from active to idle, it issues a request to all other processes to take a local snapshot, and also requests itself to take a local snapshot.
○ When a process receives the request, if it agrees that the requester became idle before itself, it grants the request by taking a local snapshot for the request.
○ A request is said to be successful if all processes have taken a local snapshot for it.
○ The requester or any external agent may collect all the local snapshots of a request.
○ If a request is successful, global snapshot of the request can thus be obtained and the recorded state will indicate termination of the computation
• Formal description:
○ Each process i maintains a logical clock denoted by x, initialized to zero at the start of the computation.
○ A process increments its x by one each time it becomes idle.
○ A basic message sent by a process at its logical time x is of the form B(x).
○ A control message that requests processes to take local snapshot issued by process i at its logical time x is of the form R(x, i).
○ Each process synchronizes its logical clock x loosely with the logical clocks x’s on other processes in such a way that it is the maximum of clock values ever
received or sent in messages.
○ A process also maintains a variable k such that when the process is idle, (x,k) is the maximum of the values (x, k) on all messages R(x, k) ever received or sent by
the process.
○ Logical time is compared as follows: (x, k) > (x’, k’) iff (x > x’) or ((x=x’) and (k>k’)), i.e., a tie between x and x’ is broken by the process identification numbers k
and k’
The algorithm is defined by the following four rules.

• The last process to terminate will have the largest clock value. Therefore, every process will take a snapshot for it, however, it will not take a snapshot for any other
process.

Termination detection by Weight Throwing


• In termination detection by weight throwing, a process called controlling agent monitors the computation.
• A communication channel exists between each of the processes and the controlling agent and also between every pair of processes.
• Basic Idea:
○ Initially, all processes are in the idle state.
○ The weight at each process is zero and the weight at the controlling agent is 1.
○ The computation starts when the controlling agent sends a basic message to one of the processes.
○ A non-zero weight W (0 < W ≤ 1) is assigned to each process in the active state and to each message in transit in the following manner:
▪ When a process sends a message, it sends a part of its weight in the message.
▪ When a process receives a message, it add the weight received in the message to its weight.
▪ Thus, the sum of weights on all the processes and on all the messages in transit is always 1.
▪ When a process becomes passive, it sends its weight to the controlling agent in a control message, which the controlling agent adds to its weight.
▪ The controlling agent concludes termination if its weight becomes 1.
• Notation:
○ The weight on the controlling agent and a process is in general represented by W.
○ B(DW): A basic message B is sent as a part of the computation, where DW is the weight assigned to it.
○ C(DW): A control message C is sent from a process to the controlling agent where DW is the weight assigned to it.
• Formal Description:

• Correctness:

Downloaded
MOD-2 Page 7 by Arathi Pc ([email protected])
lOMoARcPSD|55070525

• Correctness:
○ A: set of weights on all active processes
○ B: set of weights on all basic messages in transit
○ C: set of weights on all control messages in transit
○ Wc : weight on the controlling agent

Spanning-Tree-Based Termination Detection Algorithm


• The edges of the graph represent the communication channels.
• There are N processes Pi , 0≤i≤N, which are modelled as the nodes i, 0≤i≤N, of a fixed connected undirected graph.
• The algorithm uses a fixed spanning tree of the graph with process P0 at its root which is responsible for termination detection.
• Process P0 communicates with other processes to determine their states through signals.
• All leaf nodes report to their parents, if they have terminated.
• A parent node will similarly report to its parent when it has completed processing and all of its immediate children have terminated, and so on.
• The root concludes that termination has occurred, if it has terminated and all of its immediate children have also terminated.

• Two waves of signals generated one moving inward and other outward through the spanning tree.
• Initially, a contracting wave of signals, called tokens, moves inward from leaves to the root.
• If this token wave reaches the root without discovering that termination has occurred, the root initiates a second outward wave of repeat signals.
• As this repeat wave reaches leaves, the token wave gradually forms and starts moving inward again, this sequence of events is repeated until the termination is
detected.
• Initially, each leaf process is given a token.
• Each leaf process, after it has terminated sends its token to its parent.
• When a parent process terminates and after it has received a token from each of its children, it sends a token to its parent.
• This way, each process indicates to its parent process that the subtree below it has become idle.
• In a similar manner, the tokens get propagated to the root.
• The root of the tree concludes that termination has occurred, after it has become idle and has received a token from each of its children.

The algo works as follows:


• Initially, each leaf process is provided with a token.
• The set S is used for book-keeping to know which processes have the token. Hence S will be the set of all leaves in the tree.
• Initially, all processes and tokens are coloured white.
• When a leaf node terminates, it sends the token it holds to its parent process.
• A parent process will collect the token sent by each of its children. After it has received a token from all of its children and after it has terminated, the parent process
sends a token to its parent.
• A process turns black when it sends a message to some other process.
• When a process terminates, if its color is black, it sends a black token to its parent.
• A black process turns back to white, after it has sent a black token to its parent.
• A parent process holding a black token (from one of its children), sends only a black token to its parent, to indicate that a message-passing was involved in its
subtree.
• Tokens are propagated to the root in this fashion.
• The root, upon receiving a black token, will know that a process in the tree had sent a message to some other process. Hence, it restarts the algorithm by sending a
Repeat signal to all its children.
• Each child of the root propagates the Repeat signal to each of its children and so on, until the signal reaches the leaves.
• The leaf nodes restart the algorithm on receiving the Repeat signal.
• The root concludes that termination has occurred, if it is white, it is idle, and it received a white token from each of its children.

• Eg:
Apply spanning tree-based termination detection algorithm in the following scenario. The nodes are processes 0 to 6. Leaf nodes 3, 4, 5, and 6 are each given tokens
T3, T4, T5 and T6 respectively. Leaf nodes 3,4, 5 and 6 terminate in the order, but before terminating node 5,it sends a message to node

Downloaded
MOD-2 Page 8 by Arathi Pc ([email protected])
lOMoARcPSD|55070525

Downloaded
MOD-2 Page 9 by Arathi Pc ([email protected])

You might also like