DC_Module2_Notes
DC_Module2_Notes
MODULE: 2
1
➢ This monotonicity property is called the clock consistency condition.
➢ When T and C satisfy the following condition,
➢ Each process pi maintains data structures that allow it the following two capabilities:
❖ A local logical clock, denoted by lci, that helps process pi measure its own
progress.
❖ A logical global clock, denoted by gci, that is a representation of process pi’s
local view of the logical global time.
➢ It allows this process to assign consistent timestamps to its local events.
➢ Typically, lci is a part of gci.
2. Protocol:
➢ The protocol ensures that a process’s logical clock, and thus its view of the global time,
is managed consistently.
➢ The protocol consists of the following two rules:
❖ R1: This rule governs how the local logical clock is updated by a process when
it executes an event (send, receive, or internal).
❖ R2: This rule governs how a process updates its global logical clock to update
its view of the global time and global progress.
➢ It dictates what information about the logical time is piggybacked in a message.
➢ All logical clock systems implement rules R1 and R2 and consequently ensure the
fundamental monotonicity property associated with causality.
2
➢ Time domain in this representation is the set of non-negative integers.
➢ The logical local clock of a process pi and its local view of the global time are squashed
into one integer variable Ci.
➢ Rules R1 and R2 to update the clocks are as follows:
R1: Before executing an event (send, receive, or internal), process pi executes
the following:
2.Total Ordering:
➢ Scalar clocks can be used to totally order events in a distributed system.
➢ The main problem in totally ordering events is that two or more events at different
processes may have an identical timestamp.
➢ Note that for two events e1 and e2,
➢ A tie-breaking mechanism is needed to order such events.
➢ Typically, a tie is broken as follows: process identifiers are linearly ordered and a tie
among events with identical scalar timestamp is broken on the basis of their process
identifiers.
➢ The lower the process identifier in the ranking, the higher the priority.
➢ The total order relation ≺ on two events x and y with timestamps (h,i) and (k,j),
respectively, is defined as follows:
3
3. Event counting:
➢ If the increment value d is always 1, the scalar time has the following interesting
property: if event e has a timestamp h, then h−1 represents the minimum logical
duration, counted in units of events, required before producing the event e; we call it
the height of the event e.
4. No strong consistency:
➢ The system of scalar clocks is not strongly consistent; that is, for two events ei and ej,
❖ R2: Each message m is piggybacked with the vector clock vt of the sender
process at sending time.
❖ On the receipt of such a message (m,vt), process pi executes the following
sequence of actions:
1. update its global logical time as follows:
2. execute R1;
3. deliver the message m.
➢ The timestamp associated with an event is the value of the vector clock of its process
when the event is executed.
4
➢ Figure 2.2 shows an example of vector clocks progress with the increment value d = 1.
Initially, a vector clock is [0,0,0…..0].
➢ Thus, there is an isomorphism between the set of partially ordered events produced by
a distributed computation and their vector timestamps.
➢ If the process at which an event occurred is known, the test to compare two timestamps
can be simplified as follows: if events x and y respectively occurred at processes pi and
pj and are assigned timestamps vh and vk, respectively, then
2. Strong consistency
➢ The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related.
3. Event counting
5
➢ If d is always 1 in rule R1, then the ith component of vector clock at process pi, vti[i],
denotes the number of events that have occurred at pi until that instant.
➢ So, if an event e has timestamp vh, vh[j] denotes the number of events executed by
process pj that causally precede e.
➢ Clearly, represents the total number of events that causally precede e in
the distributed computation.
Applications of Vector time
➢ Distributed debugging
➢ Implementations of causal ordering communication and causal distributed shared
memory
➢ Establishment of global breakpoints and in determining the consistency of checkpoints
in optimistic recovery.
6
coordinator message to all processes with lower identifiers announcing this.
➢ Otherwise, the other process start election for a coordinator.
➢ If coordinator does not respond to it within a time interval T, then it isassumed
that coordinator has failed.
➢ Now process P sends election message to every process with high prioritynumber.
➢ It waits for responses, if no one responds within time interval T, thenprocess P
elects itself as a coordinator.
➢ Then it sends a message to all lower priority number processes that it iselected as
their new coordinator.
➢ If a process that was previously down/failed comes back it take over thecoordinator
job.
➢ Biggest guy always wins hence the name bully algorithm.
Step 2: Then process P2, sends an election message to all processes with Process IDgreater
than P2 (i.e. P3, P4 & P5) and awaits a response from the processes.
Step 3: If no one responds, P2 wins the election and become the coordinator.
7
Step 4: If any of the processes with Process ID higher than 2 responds with OK, P2’sjob is
done and this Process will take over.
Step 6: Process P4 responds to P3 with an OK message to confirm its alive state and Process
P4 figures out that process 5 has crashed, and the new process with the highestID is process
4.
8
Step 7: The process that receives an election message sends a coordinator message ifit is
the Process with the highest ID (in this case it is P4).
Step 8: If a Process which was previously down (i.e. P5) comes back, it holds an election
and if it has the highest Process Id then it will become the new coordinator and sends message
to all other processes.
9
Practical Applications of the Bully Algorithm
➢ Cluster Management
➢ Resource Allocation
➢ Leader Election
Pros of the Bully algorithm
➢ Simple
➢ Effective in small networks
➢ Fault-tolerant
Cons of the Bully algorithm
Safety:
➢ Safety ensures that no incorrect process will ever be elected as thecoordinator.
➢ In the bully algorithm, this is guaranteed because only processes withhigher IDs
respond to election messages.
➢ Therefore, the process with the highest ID among the active processes will always
become the coordinator.
➢ Hence, the safety condition is met.
Liveness:
➢ Liveness ensures that the system eventually makes progress.
➢ In the bully algorithm, the election process continues until a new coordinator is elected.
➢ If a coordinator fails, an election is triggered, and eventually, a new coordinator will be
elected.
➢ However, it's important to note that if there's a continuous failure of higher-ID processes,
the election process might keep restarting without reaching a stable state.
➢ Therefore, the algorithm generally meets liveness conditions.
10
message and sending it to its clockwise neighbour.
➢ When a process receives an election message it compares the identifier in the
message with its own.
➢ If the arrived identifier is greater, then it forwards the message to its neighbour.
➢ If the arrived identifier is smaller, then it substitutes its own its own identifier in the
message and forwards it.
➢ If the received identifier is that of the receiver itself, then this process’s identifier
must be the greatest, and it becomes the coordinator.
➢ The coordinator marks itself as a non participant once more and sends an elected
message to its neighbour, announcing its election and enclosing itsidentity.
Example:
11
➢ Process 9 receives the election message:
❖ Process 9 compares the received identifier (28) with its own (9).
❖ Since 28 > 9, Process 9 forwards the message with identifier 28 to its
clockwise neighbour, which is Process 4.
➢ Process 4 receives the election message:
❖ Process 4 compares the received identifier (28) with its own (4).
❖ Since 28 > 4, Process 4 forwards the message with identifier 28to its
clockwise neighbour, which is Process 3.
➢ Process 3 receives the election message:
❖ Process 3 compares the received identifier (28) with its own (3).
❖ Since 28 > 3, Process 3 forwards the message with identifier 28to its
clockwise neighbour, which is Process 17.
➢ Process 17 receives its own election message:
❖ Process 17 compares the received identifier (28) with its own (17).
❖ Since 28 > 17, Process 17 forwards the message with identifier 28to its
clockwise neighbour, which is Process 24.
➢ Process 24 receives its own election message:
❖ Process 24 compares the received identifier (28) with its own (24).
❖ Since 28 > 24, Process 24 forwards the message with identifier 28to its
clockwise neighbour, which is Process 1.
➢ Process 1 receives its own election message:
❖ Process 1 compares the received identifier (28) with its own (1).
❖ Since 28 > 1, Process 1 forwards the message with identifier 28to its
clockwise neighbour, which is Process 28.
➢ Process 28 receives its own election message:
❖ Process 28 compares the received identifier (28) with its own (28).
❖ Process 28 realizes that it has received its own identifier.
❖ It concludes that it has the highest identifier in the ring andbecomes
the coordinator.
➢ Process 28 sends an elected message:
❖ Process 28 marks itself as a non-participant and sends an electedmessage
with its identity (28) to its neighbour.
12
➢ Channels facilitate communication between processes.
➢ The state of a channel is characterized by the set of messages sent along the channel
less the messages received along the channel.
➢ The global state of a distributed system is a collection of the local states of all its
components.
➢ It represents the combined state of all processes in the distributed system.
13
❖ The state of a channel (Cij) is influenced by the local states of the processes (pi
and pj).
❖ The set of messages for a channel Cij can be defined based on the local states of
processes pi and pj:
➢ Interpretation of Conditions:
❖ Condition C1 ensures that every sent message (mij) recorded in the local state
of a process pi must be present either in the state of the channel Cij or in the collected
local state of the receiver process pj.
❖ Condition C2 states that, in the collected global state, every effect must have its
cause. If a message mij is not recorded as sent in the local state of process pi, then it
must not be present in the state of the channel Cij or in the collected local state of the
receiver process pj.
➢ Causality in Consistent Global State:
❖ In a consistent global state, every message recorded as received must also be
recorded as sent.
14
❖ This captures the causality principle that a message cannot be received if it
was not sent.
➢ Meaningful Global States:
❖ Consistent global states are considered meaningful, while inconsistent global
states are not meaningful.
15
❖ Causality, extending from the set of events to the set of recorded local states, is
a fundamental concept in maintaining consistency.
2.5.4 Issues in recording a global state
➢ Global Snapshot = Global State = collection of individual local states of each process
in the distributed system + individual state of each communication channel in the
distributed system
➢ Each distributed application has a number of processes running on a number of physical
servers.
➢ These processes communicate with each other via channels.
➢ A snapshot captures the local states of each process along with the state of each
communication channel.
➢ Need for taking snapshots or recording global state of a system:
1. Check pointing: snapshot will be used as a checkpoint, to restart the application in
case of failure.
2. Collecting garbage: used to remove objects that don’t have any references.
3. Detecting deadlocks: used to examine the current application state.
4. Termination detection
5. Other Debugging
➢ Various snapshot algorithms are discussed for distributed systems, each assuming
different inter process communication capabilities.
➢ Two types of messages are identified: computation messages (from the application)
and control messages (exchanged by the snapshot algorithm).
16
➢ Execution of a snapshot algorithm is transparent to the underlying application,
except for occasional delays in application actions.
17
➢ P2 receives the message from P1
➢ Another global snapshot
18
The Chandy–Lamport algorithm
➢ Correctness:
❖ Conditions C1 and C2 must be satisfied for correctness.
❖ C1: Messages following markers on incoming channels are not recorded in a
process's snapshot.
❖ C2: No messages sent after a marker on a channel are recorded in the channel
state.
❖ The algorithm ensures these conditions are met.
➢ Complexity:
❖ Recording a single instance of the algorithm requires O(e) messages and O(d)
time.
❖ e is the number of edges in the network, and d is the diameter of the network.
1. Initiating a snapshot
➢ Process Pi initiates the snapshot
➢ Pi records its own state and prepares a special marker message.
➢ Send the marker message to all other processes (using N-1 outbound channels).
➢ Start recording all incoming messages from channels Cij for j not equal to i.
2. Propagating a snapshot
➢ For all processes Pj consider a message on channel Ckj.
➢ If marker message is seen for the first time:
❖ Pj records own sate and marks Ckj as empty
❖ Send the marker message to all other processes (using N-1 outbound channels).
19
❖ Record all incoming messages from channels Clj for 1 not equal to j or k.
➢ Else add all messages from inbound channels.
2. Terminating a snapshot
➢ All processes have received a marker.
➢ All process have received a marker on all the N-1 incoming channels.
➢ A central server can gather the partial state to build a global snapshot.
Example:
➢ P1 initiates a snapshot
➢ Then, P1 sends a marker message to P2 and begins recording all messages on inbound
channels.
➢ Meanwhile, P2 sent a message to P1.
➢ P2 receives a marker message for the first time, so records its state.
➢ P2 then sends a marker message to P1.
20
➢ P1 has already sent a marker message, so it records all messages it received on inbound
channels to the appropriate channel’s state.
➢ Both processes have recorded their state and all the state of all incoming channels.
➢ The recorded global state may not correspond to any actual global state during the
computation.
➢ Asynchronous changes in process states may result in recorded states that did not occur
sequentially.
➢ If a stable property (e.g., termination or deadlock) holds before the snapshot algorithm,
it holds in the recorded global snapshot.
➢ The recorded global state is valuable for detecting stable properties in the system.
➢ The recorded states of all processes occur simultaneously at one physical instant,
viewed as stretching an elastic band.
➢ All recorded process states are mutually concurrent, meaning one state does not
causally depend on another.
➢ Logically, all these process states are viewed as occurring simultaneously, despite
differences in physical time.
21
➢ Global termination is defined as every process being locally terminated with no
messages in transit between processes.
➢ A "locally terminated" state occurs when a process finishes its computation and won't
restart any action unless it receives a message.
➢ Termination detection involves inferring when the underlying computation has
terminated.
➢ Two distributed computations occur simultaneously:
❖ The underlying computation and
❖ The termination detection algorithm.
➢ Messages used in the underlying computation are basic messages, while those used for
termination detection are control messages.
➢ A Termination Detection (TD) algorithm must ensure:
❖ Execution of the TD algorithm cannot indefinitely delay the underlying
computation.
❖ The TD algorithm must not require the addition of new communication
channels between processes.
❖ Two states for a process: active (busy, doing local computation) and idle
(passive, temporarily finished local computation).
❖ An active process can become idle at any time, indicating completion of local
computation and processing of received messages.
❖ An idle process can become active only upon receiving a message from another
process.
❖ Only active processes can send messages.
❖ Messages can be received in both active and idle states, and an idle process
becomes active upon message receipt.
❖ Sending and receiving messages are atomic actions.
Let pi(t) denote the state (active or idle) of process pi at instant t and ci,j(t) denote the
number of messages in transit in the channel at instant t from process pi to process pj.
A distributed computation is said to be terminated at time instant t0 iff:
22
2.8 Termination detection using distributed snapshots
➢ The algorithm relies on the concept that a consistent snapshot of a distributed system
can capture stable properties.
➢ Termination of a distributed computation is considered a stable property.
➢ If a consistent snapshot is taken after the distributed computation has terminated, it will
capture the termination of the computation.
➢ The algorithm assumes the existence of logical bidirectional communication channels
between every pair of processes.
➢ Communication channels are reliable but non-FIFO.
➢ Message delay is arbitrary but finite in the algorithm.
23
➢ Logical time is compared based on the tuple (x, k) as follows: (x, k) > (x', k') if (x >
x') or ((x = x') and (k > k')), breaking ties using the process identification numbers k
and k'.
➢ The algorithm is defined by the following four rules.
➢ Each process i applies one of the rules whenever it is applicable.
24
➢ When a process becomes passive, it sends its weight to the controlling agent in a control
message, which adds it to its own weight.
➢ Termination is concluded by the controlling agent when its weight becomes 1.
➢ Notation:
❖ W: represents the weight on the controlling agent and a process.
❖ B(DW): A basic message B is sent as part of the computation, with DW
representing the assigned weight.
❖ C(DW): A control message C is sent from a process to the controlling agent,
with DW as the assigned weight.
25
➢ Invariant I1 states that “the sum of weights at the controlling process, at all active
processes, on all basic messages in transit, and on all control messages in transit is
always equal to 1”.
➢ Invariant I2 states that “weight at each active process, on each basic message in transit,
and on each control message in transit is non-zero.”
➢ Since the message delay is finite, after the computation has terminated, eventually
Wc = 1.
➢ Thus, the algorithm detects a termination in finite time.
2.10 A spanning-tree-based termination detection algorithm
➢ The algorithm assumes there are N processes Pi, 0 ≤ i ≤ N, which are modelled as the
nodes i, 0 ≤ i ≤ N of a fixed connected undirected graph.
➢ Edges: communication channels
➢ Uses a fixed spanning tree with process P0 at its root.
➢ Process P0 communicates with other processes.
➢ All leaf nodes report to their parents, if they have terminated.
➢ A parent node will similarly report to its parent.
➢ The root concludes that termination has occurred if it has terminated and all of its
immediate children have also terminated.
➢ Two waves of signals generated:
1. Tokens: a contracting wave of signals that move inward from the leaves to the root.
2. Repeat signal: move outward through the spanning tree.
A Simple Algorithm
➢ Initially, each leaf process is given a token.
➢ Each leaf process, after it has terminated sends its token to its parent.
➢ When a parent process terminates and after it has received a token from each of its
children, it sends a token to its parent.
➢ This way, each process indicates to its parent process that the subtree below it has
become idle.
➢ In a similar manner, the tokens get propagated to the root.
➢ The root of the tree concludes that termination has occurred, after it has become idle
and has received a token from each of its children.
A Problem with the algorithm
➢ This simple algorithm fails under some circumstances.
26
➢ When a process, after it has sent a token to its parent, receives a message from some
other process, which could cause the process to again become active.
27
An example
Step: 1
➢ Initially, all nodes 0 to 6 are white.
➢ Leaf nodes 3, 4, 5, and 6 are each given a token.
➢ Node 3 has token T3, node 4 has token T4, node 5 has token T5, and node 6 has token
T6.
➢ Hence, S is 3, 4, 5, 6.
Step: 2
➢ When node 3 terminates, it transmits T3 to node 1.
➢ Now S changes to 1, 4, 5, 6.
➢ When node 4 terminates, it transmits T4 to node 1.
➢ Hence, S changes to 1, 5, 6.
Step: 3
➢ Node 1 has received a token from each of its children and, when it terminates, it
transmits a token T1 to its parent.
➢ S changes to 0, 5, 6.
28
Step:4
➢ Suppose node 5 sends a message to node 1, causing node 1 to again become active.
➢ Since node 1 had already sent a token to its parent node 0, the new message makes the
system inconsistent as far as termination detection is concerned.
➢ Node 5 is colored black, since it sent a message to node 1.
Step:5
➢ When node 5 terminates, it sends a black token T5 to node 2.
➢ So, S changes to 0, 2, 6.
➢ After node 5 sends its token, it turns white.
➢ When node 6 terminates, it sends the white token T6 to node 2.
➢ Hence, S changes to 0, 2.
Step:6
➢ When node 2 terminates, it sends a black token T2 to node 0, since it holds a black
token T5 from node 5.
➢ Since node 0 has received a black token T2 from node 2, it knows that there was a
message sent by one or more of its children in the tree and hence sends a repeat signal
to each of its children.
➢ The repeat signal is propagated to the leaf nodes and the algorithm is repeated.
29
➢ Node 0 concludes that termination has occurred if it is white, it is idle, and it has
received a white token from each of its children.
Performance
➢ Best case message complexity: O(N)
✓ N is the number of processes
➢ Worst case complexity: O(N*M)
✓ M is the number of computation messages exchanged
*****************************
30