0% found this document useful (0 votes)
8 views26 pages

Causal Ordering

The document discusses the importance of global clocks for causally ordering events in distributed systems, highlighting Lamport’s Happened Before relationship and the limitations of Lamport's logical clocks. It introduces vector clocks as a solution for true causal ordering and outlines the Chandy-Lamport algorithm for global state recording. Additionally, it covers termination detection in distributed systems using Huang’s algorithm and weight distribution for recovery.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views26 pages

Causal Ordering

The document discusses the importance of global clocks for causally ordering events in distributed systems, highlighting Lamport’s Happened Before relationship and the limitations of Lamport's logical clocks. It introduces vector clocks as a solution for true causal ordering and outlines the Chandy-Lamport algorithm for global state recording. Additionally, it covers termination detection in distributed systems using Huang’s algorithm and weight distribution for recovery.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Logical Clocks and Causal Ordering

Why do we need global clocks?


• For causally ordering events in a distributed system
– Example:
• Transaction T transfers Rs 10,000 from S1 to S2
• Consider the situation when:
– State of S1 is recorded after the deduction and state of
S2 is recorded before the addition
– State of S1 is recorded before the deduction and state
of S2 is recorded after the addition
• Should not be confused with the clock-synchronization problem

What data is being transmitted? 0101?

Yes, if this is the clock

If this is the clock, then 01110001

The receiver needs to know the clock of the sender


Ordering of Events
Lamport’s Happened Before relationship:

For two events a and b, a → b if


 a and b are events in the same process and a occurred before b, or
 a is a send event of a message m and b is the corresponding receive
event at the destination process, or
 a → c and c → b for some event c
Causally Related versus Concurrent
Causally related events:
Event a causally affects event b if a → b

Concurrent events:
• Two distinct events a and b are said to be concurrent ( denoted by a||b ) if
a → b and b → a

e11 e12 e13 e14


P1 e11 and e21 are concurrent

e14 and e23 are concurrent

P2 e22 causally affects e14


e21 e22 e23 e24

A space-time diagram
Lamport’s Logical Clock
Each process i keeps a clock Ci

• Each event a in i is time-stamped Ci(a), the value of Ci when a occurred

• Ci is incremented by 1 for each event in i

• In addition, if a is a send of message m from process i to j, then on receive of


m,
Cj = max (Cj, Ci(a)+1)
How Lamport’s clocks advance

e11 e12 e13 e14 e15 e16 e17


P1
(1) (2) (3) (4) (5) (6) (7)

(1) (2) (3) (4) (7)


P2
e21 e22 e23 e24 e25
Points to note

• if a → b, then C(a) < C(b)

• → is a par6al order

• Total ordering possible by arbitrarily ordering concurrent events by process


numbers:
• If a is any event in process Pi and b is any event in process Pj then a => b if and only if
Ci(a) < Cj(b) or
Ci(a) = Cj(b) and Pi < Pj where < is an arbitrary relation that totally orders the
processes to break ties. A simple way to implement < is to assign unique
identification numbers to each process and then
Pi < Pj if i < j
Limitation of Lamport’s Clock
a → b implies C(a) < C(b)

BUT

C(a) < C(b) doesn’t imply a → b !!

So not a true clock !!


Solution: Vector Clocks

Each process Pi has a clock Ci, which is a vector of size n


The clock Ci assigns a vector Ci(a) to any event a at Pi

Update rules:

• Ci[i]++ for every event at process i

• If a is send of message m from i to j with vector timestamp tm, then on receipt


of m:
Cj[k] = max(Cj[k], tm[k]) for all k
Partial Order between Timestamps
For events a and b with vector timestamps ta and tb,

• Equal: ta = tb iff ∀i, ta[i] = tb[i]

• Not Equal: ta ≠ tb iff ∃i, ta[i] ≠ tb[i]

• Less or equal: ta ≤ tb iff ∀i, ta[i] ≤ tb[i]

• Not less or equal: ta ≤ tb iff ∃i, ta[i] > tb[i]

• Less than: ta < tb iff (ta ≤ tb and ta ≠ tb)

• Not less than: ta < tb iff ¬(ta ≤ tb and ta ≠ tb)

• Concurrent: ta || tb iff (ta < tb and tb < ta)


Causal Ordering
• a → b iff ta < tb

• Events a and b are causally related iff ta < tb or tb < ta, else they are
concurrent

• Note that this is still not a total order


Use of Vector Clocks in Causal Ordering of Messages

• If send(m1) → send(m2), then every recipient of both message m1 and m2


must “deliver” m1 before m2.

– “deliver” – when the message is actually given to the application for


processing
Birman-Schiper-Stephenson Protocol
• To broadcast m from process i, increment Ci(i), and timestamp m
with VTm = Ci

• When j ≠ i receives m, j delays delivery of m until


– Cj[i] = VTm[i] –1 and
– Cj[k] ≥ VTm[k] for all k ≠ i
– Delayed messages are queued in j sorted by vector time. Concurrent
messages are sorted by receive time.

• When m is delivered at j, Cj is updated according to vector clock


rule.
Problem of Vector Clock

• Message size increases since each message needs to be tagged with the
vector

• Size can be reduced in some cases by only sending values that have
changed
Global State Recording
Global State Collection
• Applications:
– Checking “stable” properties, checkpoint &
recovery

• Issues:
– Need to capture both node and channel
states
– system cannot be stopped
– no global clock
Notations
Some notations:
– LSi: Local state of process i
– send(mij) : Send event of message mij from
process i to process j
– rec(mij) : Similar, receive instead of send
– time(x) : Time at which state x was recorded
– time (send(m)) : Time at which send(m)
occurred
Definitions
• send(mij) є LSi iff time(send(mij)) < time(LSi)

• rec(mij) є LSj iff time(rec(mij)) < time(LSj)

• transit(LSi, LSj)
= { mij | send(mij) є LSi and rec(mij) ∉ LSj }

• inconsistent(LSi, LSj)
= { mij | send(mij) ∉ LSi and rec(mij) є LSj }
Definitions
• Global state: collection of local states
GS = {LS1, LS2,…, LSn}

• GS is consistent iff
for all i, j, 1 ≤ i, j ≤ n,
inconsistent(LSi, LSj) = Ф

• GS is transitless iff
for all i, j, 1 ≤ i, j ≤ n,
transit(LSi, LSj) = Ф

• GS is strongly consistent if it is consistent and transitless.


Chandy-Lamport’s Algorithm
• Uses special marker messages.

• One process acts as initiator, starts the state collection by following the
marker sending rule below.

• Marker sending rule for process P:


– P records its state and
– For each outgoing channel C from P on which a marker has not been sent
already, P sends a marker along C before any further message is sent on C
Chandy Lamport’s Algorithm contd..
• When Q receives a marker along a channel C:

– If Q has not recorded its state then Q records


the state of C as empty; Q then follows the
marker sending rule

– If Q has already recorded its state, it records


the state of C as the sequence of messages
received along C after Q’s state was recorded
and before Q received the marker along C
Notable Points
• Markers sent on a channel distinguish messages sent on the channel
before the sender recorded its states and the messages sent after the
sender recorded its state

• The state collected may not be any state that actually happened in reality,
rather a state that “could have” happened

• Requires FIFO channels

• Message complexity O(|E|), where E = no. of links


Termination Detection
Termination Detection
• Model
– processes can be active or idle
– only active processes send messages
– idle process can become active on receiving a
computation message
– active process can become idle at any time

– Termination: all processes are idle and no


computation message are in transit
– Can use global snapshot to detect termination also
Huang’s Algorithm
• One controlling agent, has weight 1 initially
• All other processes are idle initially and has weight 0
• Computation starts when controlling agent sends a computation message
to a process
• An idle process becomes active on receiving a computation message
• B(DW) – computation message with weight DW. Can be sent only by the
controlling agent or an active process
• C(DW) – control message with weight DW, sent by active processes to
controlling agent when they are about to become idle
Weight Distribution and Recovery
• Let current weight at process = W
• Send of B(DW):
– Find W1, W2 such that W1 > 0, W2 > 0, W1 + W2 = W
– Set W = W1 and send B(W2)

• Receive of B(DW):
– W = W + DW;
– if idle, become active

• Send of C(DW):
– send C(W) to controlling agent
– Become idle

• Receive of C(DW):
– W = W + DW
– if W = 1, declare “termination”

You might also like