0% found this document useful (0 votes)
137 views53 pages

Unit-2 1

The document discusses theoretical foundations for distributed systems including ordering of events, logical clocks, vector clocks, and capturing global states. It describes Lamport's happened before relationship and causal ordering of events. Logical clocks like Lamport's clocks and vector clocks are presented as ways to order events in a distributed system. The Chandy-Lamport algorithm for capturing a consistent global state of a distributed system is summarized, which uses marker messages and records local states at each process.

Uploaded by

Kr Nishant
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views53 pages

Unit-2 1

The document discusses theoretical foundations for distributed systems including ordering of events, logical clocks, vector clocks, and capturing global states. It describes Lamport's happened before relationship and causal ordering of events. Logical clocks like Lamport's clocks and vector clocks are presented as ways to order events in a distributed system. The Chandy-Lamport algorithm for capturing a consistent global state of a distributed system is summarized, which uses marker messages and records local states at each process.

Uploaded by

Kr Nishant
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

DISTRIBUTED SYSTEM

UNIT-2

Theoretical Foundation for Distributed Systems

Prepared By: G.S.Mishra

LIMITATION OF DISTRIBUTED SYSTEM


Absence of Shared Memory Absence of Global clocks

Solution:

Ordering of events & logical clocks

ORDERING OF EVENTS Lamports Happened Before relationship:


For two events a and b, a b if a and b are events in the same process and a occurred before b

a is an event of sending a message m and b is the corresponding receive event at the destination process
If a b and b c for some event c, then c (transitive relation ) a

ORDERING OF EVENTS

CONTD..

a b implies a is a potential cause of b Causal ordering : potential dependencies

Happened Before relationship causally orders events


If a b, then a causally affects b If a b and b a, then a and b are concurrent i.e. ( a || b)

LOGICAL CLOCK
Each process i keeps a clock Ci.
IR1: Clock Ci is incremented between any two successive events in process Pi : Ci := Ci + d (d > 0) If a and b are two successive events in Pi and a b in Pi, then Ci(b) = Ci(a) +d IR2: If event a is the sending of message m by process Pi, then message m is assigned a time stamp tm = Ci(a). On receiving the same message m by process Pj, Cj is set to a value greater than or equal to its present value and greater than tm. Cj = max(Cj, tm + d) (d>0)

EXAMPLE-1

EXAMPLE-2

FIND OUT THE CLOCK VALUE OF EACH PROCESS AT ITS


LAST EVENT

LIMITATION OF LAMPORTS CLOCK


a b implies C(a) < C(b) BUT

C(a) < C(b) doesnt imply a b !!


So not a true clock !!

VECTOR CLOCKS For Pi process Ci is a vector of size n (no. of processes)

Update rules:

[IR1] Clock Ci is incremented between any two successive events in process Pi :

Ci[i] := Ci[i] + d

(d > 0)

[IR2] if a is sending of message m from Pi to Pj with vector timestamp tm = Ci (a) on receive of m :

Cj[k] = max(Cj[k], tm[k])

for all k

EXAMPLE-3

EXAMPLE-4

COMPARISION OF TIME STAMPS


For events a and b with vector timestamps ta and tb,

ta = tb iff for all i, ta[i] = tb[i] ta tb iff for some i, ta[i] tb[i] ta tb iff for all i, ta[i] tb[i] ta tb iff for some i, ta[i] > tb[i] ta < tb iff (ta tb and ta tb) ta || tb iff (ta < tb and tb < ta)

CONCLUSION
In the system of vector clocks
a

b iff ta < tb

Events

a and b are causally related iff ta < tb or tb < ta, else they are concurrent the vector clocks casually order events

So

Now

the vector clocks can be used to order the messages

CAUSAL ORDERING OF MESSAGES:


APPLICATION OF VECTOR CLOCKS
If

send(m1) send(m2), then every recipient of both message m1 and m2 must deliver m1 before m2 .

deliver when the message is actually given to the application for processing Send( M1 ) Send( M2 ) requires Receive( M1 ) Receive( M2 )

VIOLATION OF CAUSAL ORDERING OF MESSAGES

SOLUTION

upon arrival of a message at a process, buffer (delay delivery) the message until the message immediately preceding it is delivered. Birman-Schiper-Stephenson Protocol: Enforcing Causal Ordering of Messages Assumes broadcast communication channels that do not lose or corrupt messages.

Use vector clocks to "count" number of messages


set d = 1 . n processes.

VECTOR TIME

When Pi begins to execute, Ci is initialized to zero.

For each event send( m ) at Pi, Ci[i] is incremented by 1.


Time stamp tm = Ci is sent along with m. When process Pj delivers a message m from Pi, Pj updates its vector clock:

all k {1, 2, ..n} : Cj[k] = max ( Cj[k], tm[k] )


( Note: Recv ( m ) -> Deliver ( m ) )

THE PROTOCOL:

Process Pi updates vector time Ci and broadcasts message m with timestamp tm = Ci.

So Ci[i] - 1 is the number of messages sent before m.

Process Pj ( j i ) upon receiving message m with timestamp tm, Pj buffers the message until all messages sent by Pi preceding m have arrived

i.e. Cj[i] = tm[i] - 1

Pj has received all messages that Pi had received before sending m. i.e. Cj[k] tm[k] k = 1, 2, .. n, k i

When the message is finally delivered at Pj, vector time Cj is adjusted according to vector clock rule 2

EXAMPLE

SES ALGORITHM

SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages.

Each process maintains a vector V_P of size N - 1, N the number of processes in the system. V_P is a vector of tuple (P,t): P the destination process id and t, a vector timestamp.
Tm: logical time of sending message m Tpi: present logical time at pi

SES ALGORITHM

CONTD.

Sending a Message: Send message M, time stamped tm, along with V_P1 to P2. Insert (P2, tm) into V_P1. Overwrite the previous value of (P2,t), if any. Any future message carrying (P2,tm) in V_P1 cannot be delivered to P2 until tm < tP2. Delivering a message If V_M (in the message) does not contain any pair (P2, t), it can be delivered. /* (P2, t) exists */ If t > Tp2, buffer the message. (dont deliver) else deliver it

SES ALGORITHM

CONTD.

On delivering the message:

Merge V_M (in message) with V_P2 as follows.


If (P,t) is not there in V_P2, merge. If (P,t) is present in V_P2, t is updated with max(t in Vm, t in V_P2).

Update site P2s local, logical clock.

Check buffered messages after local, logical clock update.

SES ALGORITHM

CONTD.

What does the condition t > Tp2 imply?

t is message vector time stamp. t > Tp2 -> For all j, t[j] > Tp2[j]

This implies some events occurred without P2s knowledge in other processes. So P2 decides to buffer the message.

When t < Tp2, message is delivered & Tp2 is updated with the help of V_P2 (after the merge operation).

EXAMPLE

e31: P3 sends message m3,1 to P2. C3 = (0, 0, 1); t3,1 (0, 0, 1), V3,1 (?, ?, ?); V3 [ ?, (0, 0, 1), ? ] e21: P2 receives message m3,1 from P3. As V3,1[2] = (?, ?, ?)[2] is uninitialized, the message is accepted. V2 [ ?, ?, ? ] and C2 max[(0, 0, 0), (0, 0, 1)] = (0, 0, 1) e22: P2 sends message m2,1 to P1. C2 (0, 1, 1); t2,1 (0, 1, 1), V2,1 [ ?, ?, ? ]; V2 [ (0, 1, 1), ?, ? ] e11: P1 sends message m1,1 to P3. C1 (1, 0, 0); t1,1 (1, 0, 0), V1,1 ( ?, ?, ? ); V1 [ ?, ?, (1, 0, 0) ] e32: P3 receives message m1,1 from P1. As V1,1[3] = ( ?, ?, ? )[3] is uninitialized, the message is accepted. V3 [ ?, (0, 0, 1), ? ] and C3 max[(0, 0, 1), (1, 0, 0)] = (1, 0, 1).

e12: P1 receives message m2,1 from P2. As V2,1[1] = [ ?, ?, ? ][1] is uninitialized, the message is accepted. V1 [ ?, ?, (1, 0, 0) ] and C1 max[(1, 0, 0), (0, 1, 1)] = (1, 1, 1) e23: P2 sends message m2,2 to P1. C2 (0, 2, 1); t2,2 (0, 2, 1), V2,2 [ (0, 1, 1), ?, ? ]; V2 [ (0, 2, 1), ?, ? ] e13: P1 receives message m2,2 from P2. As V2,2[1] = (0, 1, 1) < (1, 1, 1) = C1, the message is accepted. V1 [ ?, ?, (1, 0, 0) ] and C1 max[(0, 2, 1), (1, 1, 1)] = (1, 2, 1)

e31:

P3 sends message m3,1 to P2. C3 = (0, 0, 1); t3,1 (0, 0, 1), V3,1 (?, ?, ?); V3 [ ?, (0, 0, 1), ? ] e21: P2 receives message m3,1 from P3. As V3,1[2] = (?, ?, ?)[2] is uninitialized, the message is accepted.

V2

[ ?, ?, ? ] and C2 max[(0, 0, 0), (0, 0, 1)] = (0, 0, 1) e22: P2 sends message m2,1 to P1. C2 (0, 1, 1); t2,1 (0, 1, 1), V2,1 [ ?, ?, ? ]; V2 [ (0, 1, 1), ?, ? ] e11: P1 sends message m1,1 to P3. C1 (1, 0, 0); t1,1 (1, 0, 0), V1,1 ( ?, ?, ? ); V1 [ ?, ?, (1, 0, 0) ] e32: P3 receives message m1,1 from P1. As V1,1[3] = ( ?, ?, ? )[3] is uninitialized, the message is accepted. V3 [ ?, (0, 0, 1), ? ] and C3 max[(0, 0, 1), (1, 0, 0)] = (1, 0, 1).

e23:

P2 sends message m2,2 to P1. C2 (0, 2, 1); t2,2 (0, 2, 1), V2,2 [ (0, 1, 1), ?, ? ]; V2 [ (0, 2, 1), ?, ? ] e12: P1 receives message m2,2 from P2. But V2,2[1] = (0, 1, 1) </ (1, 0, 0) = C1, so the message is queued. e13: P1 receives message m2,1 from P2. As V2,1[1] = [ ?, ?, ? ][1] is uninitialized, the message is accepted. V1 [ ?, ?, (1, 0, 0) ] and C1 max[(1, 0, 0), (0, 1, 1)] = (1, 1, 1).

The

message on the queue is now checked. As V2,2[1] = (0, 1, 1) < (1, 1, 1) = C1, the message is now accepted. V1 [ ?, ?, (1, 0, 0) ] and C1 is set to (1, 2, 1).

PROBLEM OF VECTOR CLOCK

message size increases since each message needs to be tagged with the vector
size can be reduced in some cases by only sending values that have changed

Capturing Global State

GLOBAL STATE COLLECTION


Applications:

Checking stable properties, checkpoint & recovery

Issues:
Need to capture both node and channel states system cannot be stopped no global clock

Some notations:
LSi : local state of process i send(mij) : send event of message mij from process i to process j rec(mij) : similar, receive instead of send time(x) : time at which state x was recorded time (send(m)) : time at which send(m) occured

send(mij) LSi iff time(send(mij)) < time(LSi) rec(mij) LSj iff time(rec(mij)) < time(LSj) transit(LSi,LSj) = { mij | send(mij) LSi and rec(mij) LSj} inconsistent(LSi, LSj) = {mij | send(mij) LSi and rec(mij) LSj}

Global state: collection of local states GS = {LS1, LS2,, LSn} GS is consistent iff for all i, j, 1 i, j n, inconsistent(LSi, LSj) = GS is transitless iff for all i, j, 1 i, j n, transit(LSi, LSj) = GS is strongly consistent if it is consistent and transitless.

CHANDY-LAMPORTS ALGORITHM

Global-State-Detection Algorithm Send a special message called marker Chandy-Lamport Global State Recording Protocol ( Snapshot Algorithm ) The goal of this distributed algorithm is to capture a consistent global state. It assumes all communication channels are FIFO. It uses a distinguished message called a marker to start the algorithm.
Pi sends marker Pi records its local state

For each channel Cij on which Pi has not already sent a marker, Pi sends a marker before sending other messages.

CHANDY-LAMPORT ALGO..

CONTD.

Pj receives marker from Pi If Pj has not recorded its state:


a) Records the state of Cij as empty b) Sends the marker as described above ( Note: it records local state before sending out marker )

If Pj has recorded its state local state LSj

a) Record the state of Cij to be the sequence of messages received between the computation of LSj and the marker from Cij.

Points to Note:
Markers

sent on a channel distinguish messages sent on the channel before the sender recorded its states and the messages sent after the sender recorded its state state collected may not be any state that actually happened in reality, rather a state that could have happened FIFO channels (works

The

Requires Network

should be strongly connected obviously for connected, undirected also)

Message

complexity O(|E|), where E = no. of links

CHANDY-LAMPORT ALGO..

CONTD.

Pj receives marker from Pi If Pj has not recorded its state:


a) Records the state of Cij as empty b) Sends the marker as described above ( Note: it records local state before sending out marker )

If Pj has recorded its state local state LSj

a) Record the state of Cij to be the sequence of messages received between the computation of LSj and the marker from Cij.

EXAMPLE

Here,

all processes are connected by communications channels Cij. Messages being sent over the channels are represented arrows between the processes.

by

Snapshot s1:
P1

records LS1, sends markers on C12 and C13

P2

receives marker from P1 on C12; it records its state LS2, records state of C12 as empty, and sends marker on C21 and C23 receives marker from P1 on C13; it records its state LS3, records state of C13 as empty, and sends markers on C31 and C32.

P3

P1

receives marker from P2 on C21; as LS1 is recorded, it records the state of C21 as empty.

P1

receives marker from P3 on C31; as LS1 is recorded, it records the state of C31 as empty. receives marker from P3 on C32; as LS2 is recorded, it records the state of C32 as empty. receives marker from P2 on C23; as LS3 is recorded, it records the state of C23 as empty.

P2

P3

Snapshot s2: now messages are in transit on C12 and C21.


P1

records LS1, sends markers on C12 and C13

P2

receives marker from P1 on C12 after the message from P1 arrives; it records its state LS2, records state of C12 as empty, and sends marker on C21 and C23 receives marker from P1 on C13; it records its state LS3, records state of C13 as empty, and sends markers on C31 and C32.

P3

P1

receives marker from P2 on C21; as LS1 is recorded, and a message has arrived since LS1 was recorded, it records the state of C21 as containing that message. receives marker from P3 on C31; as LS1 is recorded, it records the state of C31 as empty. receives marker from P3 on C32; as LS2 is recorded, it records the state of C32 as empty. receives marker from P2 on C23; as LS3 is recorded, it records the state of C23 as empty.

P1

P2

P3

TERMINATION DETECTION
Protocol
Pi

sends a computation message to Pj

1. Set Wi and Wj to values such that Wi + Wj = Wi, Wi > 0, Wj > 0. (Wi is the new weight of Pi.) 2. Send B(Wj) to Pj
Pj

receives a computation message B(W) from

Pi 1. Wj = Wj + W 2. If Pj is idle, Pj becomes active

TERMINATION DETECTION
Pi becomes idle: 1. Send C(Wi) to P0 2. Wi = 0 3. Pi becomes idle

CONTD..

Pi receives a control message C(W): 1. Wi = Wi + W 2. If Wi = 1, the computation has completed.

EXAMPLE

The picture shows a process P0, designated the controlling agent, with W0 = 1. It asks P1 and P2 to do some computation. It sets W1 to 0.2, W2 to 0.3, and W3 to 0.5. P2 in turn asks P3 and P4 to do some computations.

It sets W3 to 0.1 and W4 to 0.1. When P3 terminates, it sends C(W3) = C(0.1) to P2, which changes W2 to 0.1 + 0.1 = 0.2. When P2 terminates, it sends C(W2) = C(0.2) to P0, which changes W0 to 0.5 + 0.2 = 0.7. When P4 terminates, it sends C(W4) = C(0.1) to P0, which changes W0 to 0.7 + 0.1 = 0.8. When P1 terminates, it sends C(W1) = C(0.2) to P0, which changes W0 to 0.8 + 0.2 = 1. P0 thereupon concludes that the computation is finished. Total number of messages passed: 8 (one to start each computation, one to return the weight).

You might also like