0% found this document useful (0 votes)
10 views

Distributed System Lecture 5

The lecture discusses the challenges of recording the global state of distributed systems due to the absence of shared memory and global clocks. It introduces concepts like consistent global states, transitless states, and snapshot algorithms, particularly focusing on the Chandy-Lamport algorithm for FIFO channels. The algorithm allows for the collection of consistent global states by using marker messages to distinguish between messages that should and should not be included in the snapshot.

Uploaded by

vaishg2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Distributed System Lecture 5

The lecture discusses the challenges of recording the global state of distributed systems due to the absence of shared memory and global clocks. It introduces concepts like consistent global states, transitless states, and snapshot algorithms, particularly focusing on the Chandy-Lamport algorithm for FIFO channels. The algorithm allows for the collection of consistent global states by using marker messages to distinguish between messages that should and should not be included in the snapshot.

Uploaded by

vaishg2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

LECTURE 5

GLOBAL STATE AND CHECKPOINTS

Prof. D. S. Yadav
Department of Computer Science 1
IET Lucknow
Distributed Computing: Principles, Algorithms, and Systems

INTRODUCTION

 Recording global state of a distributed system on-the-fly is an


important step.
 The lack of globally shared memory, global clock and
unpredictable message delays in a distributed system make
this problem non-trivial.
 We first defines consistent global states and discusses issues
to be addressed to compute consistent distributed snapshots.
 Then several algorithms to determine state on-the-fly, such
snapshots are presented for several types of networks.

2
Distributed Computing: Principles, Algorithms, and Systems

BANKING EXAMPLE : FUND TRANSFER

C1 : Empty

500 200

C2 : Empty

Site S1 Site S2
Account A Account B
Global State A

3
Distributed Computing: Principles, Algorithms, and Systems

BANKING EXAMPLE : FUND TRANSFER

C1 : 50 ( In Transit)

450 200

C2 : Empty

Site S1 Site S2
Account A Account B
Global State B

4
Distributed Computing: Principles, Algorithms, and Systems

BANKING EXAMPLE : FUND TRANSFER

C1 : Empty

450 250

C2 : Empty

Site S1 Site S2
Account A Account B
Global State C

5
SNAPSHOT ALGORITHM: BANKINGEXAMPLE
SNAPSHOT ALGORITHM: BANKINGEXAMPLE
Distributed Computing: Principles, Algorithms, and Systems

SYSTEM MODEL
The system consists of a collection of n processes p1, p2, ..., pn that
are connected by channels.
There are no globally shared memory and physical global clock and
processes communicate by passing messages through communication
channels.
Cij denotes the channel from process pi to process pj and its state is
denoted by SCij .
The actions performed by a process are modeled as three types of
events: Internal events, the message send event and the message
receive event.
For a message mij that is sent by process pi to process pj , let
send (mij ) and rec (mij ) denote its send and receive events.

8
Distributed Computing: Principles, Algorithms, and Systems

SYSTEM MODEL

At any instant, the state of process pi , denoted by LSi , is a result of the


sequence of all the events executed by pi till that instant.
For an event e and a process state LSi , e LSi iff e belongs to the
sequence of events that have taken process pi to state LSi .
For an event e and a process state LSi , e LSi iff e does not belong
to the sequence of events that have taken process pi to state LSi .
For a channel Cij , the following set of messages can be defined based on
the local states of the processes pi and pj

Transit: transit(LSi , LSj ) = {mij |send (mij ) LSi & rec (mij ) LSj }

9
GLOBAL STATE COLLECTION
 Applications:
– Checking “stable” properties, checkpoint & recovery
 Issues:
– Need to capture both node and channel states
– system cannot be stopped
– no global clock

Some notations:
– LSi: Local state of process I
– send(mij) : Send event of message mij from process i to process j
– rec(mij) : Similar, receive instead of send
– time(x) : Time at which state x was recorded
– time (send(m)) : Time at which send(m) occurred

10
DEFINITIONS

 send(mij) LSi iff time(send(mij)) < time(LSi)

 rec(mij) LSj iff time(rec(mij)) < time(LSj)

 transit(LSi, LSj) = { mij | send(mij) LSi & rec(mij)  LSj }

 inconsistent(LSi, LSj) =
{ mij | send(mij)  LSi & rec(mij) LSj }

11
DEFINITIONS
 Global state: A Global State is defined as collection of local
states of its sites.
GS = {LS1, LS2,…, LSn}

 Global State is known as consistent global state iff


for all i, j, 1 ≤ i, j ≤ n,
inconsistent(LSi, LSj) = Ф

 Global State is known as transitless global state iff


for all i, j, 1 ≤ i, j ≤ n,
transit(LSi, LSj) = Ф

 Global state is known as strongly consistent if it is both


consistent and transitless.

12
Distributed Computing: Principles, Algorithms, and Systems

MODELS OF COMMUNICATION
Recall, there are three models of communication: FIFO, non-
FIFO, and CO.
In FIFO model, each channel acts as a first-in first-out
message queue and thus, message ordering is preserved by a
channel.
In non-FIFO model, a channel acts like a set in which the
sender process adds messages and the receiver process removes
messages from it in a random order.
A system that supports causal delivery of messages satisfies
the following property:
“For any two messages mij and mkj ,
if send (mij ) → send (mkj ), then rec (mij ) → rec (mkj )”.

13
Distributed Computing: Principles, Algorithms, and Systems

CONSISTENT GLOBAL STATE

The global state of a distributed system is a collection of the local states


of the processes and the channels.
Notationally, global state GS is defined as,
GS = { Ui LSi , Ui,j SCij }

A global state GS is a consistent global state iff it satisfies the following


two conditions :
C1: send(mij ) LSi mij SCij rec(mij ) LSj .
( is Ex-OR operator.)
C2: send(mij ) LSi mij SCij rec(mij ) LSj .

14
Distributed Computing: Principles, Algorithms, and Systems

INTERPRETATION IN TERMS OF CUTS


A cut in a space-time diagram is a line joining an arbitrary
point on each process line that slices the space-time diagram
into a PAST and a FUTURE.
A consistent global state corresponds to a cut in which every
message received in the PAST of the cut was sent in the PAST of
that cut.
Such a cut is known as a consistent cut.

For example, consider the space-time diagram for the


computation illustrated in Figure 1.
Cut C1 is inconsistent because message m1 is flowing from the
FUTURE to the PAST.
Cut C2 is consistent and message m4 must be captured in the
state of channel C21.

15
Distributed Computing: Principles, Algorithms, and Systems

C1 C2
1 2 e3 e4
e1 e1 1 1
p
1 m1
4
e12 e22 e23 e2 m
m5
p2 4
m2
e1 e 23 e 33 e34 e35
p3 3

m3
e1 e2
4 4
p4

time

Figure 1: An Interpretation in Terms of a Cut.

16
Distributed Computing: Principles, Algorithms, and Systems

ISSUES IN RECORDING A GLOBALSTATE


The following two issues need to be addressed:
I1: How to distinguish between the messages to be recorded in
the snapshot from those not to be recorded.

-Any message that is sent by a process before recording its


snapshot, must be recorded in the global snapshot (from C1).
-Any message that is sent by a process after recording its
snapshot,
must not be recorded in the global snapshot (from C2).
I2: How to determine the instant when a process takes its
snapshot.
- A process pj must record its snapshot before processing a
message mij that was sent by process pi after recording its snapshot.

17
Distributed Computing: Principles, Algorithms, and Systems

SNAPSHOT ALGORITHMS FOR FIFO CHANNELS


Chandy-Lamport algorithm

The Chandy-Lamport algorithm uses a control message, called a


marker whose role in a FIFO system is to separate messages in the
channels.

After a site has recorded its snapshot, it sends a marker, along all
of its outgoing channels before sending out any more messages.

A marker separates the messages in the channel into those to be


included in the snapshot from those not to be recorded in the
snapshot.

A process must record its snapshot no later than when it receives a


marker on any of its incoming channels.
18
CHANDY-LAMPORT’S ALGORITHM

• Uses special marker messages.

• One process acts as initiator, starts the state collection


by following the marker sending rule below.

• Marker sending rule for process P:


– P records its state and
– For each outgoing channel C from P on which a marker has not
been sent already, P sends a marker along C before any further
message is sent on C

19
CHANDY LAMPORT’S ALGORITHM CONTD..

• When Q receives a marker along a channel C:

– If Q has not recorded its state then Q records the


state of C as empty; Q then follows the marker
sending rule.

– If Q has already recorded its state, it records the state


of C as the sequence of messages received along C
after Q’s state was recorded and before Q received
the marker along C

20
NOTABLE POINTS

• Markers sent on a channel distinguish messages sent on


the channel before the sender recorded its states and the
messages sent after the sender recorded its state
• The state collected may not be any state that actually
happened in reality, rather a state that “could have”
happened

• Requires FIFO channels


• Message complexity O(|E|), where E = no. of links

21
Distributed Computing: Principles, Algorithms, and Systems

CHANDY-LAMPORT ALGORITHM
The algorithm can be initiated by any process by executing the
“Marker Sending Rule” by which it records its local state and
sends a marker on each outgoing channel.
A process executes the “Marker Receiving Rule” on receiving a
marker. If the process has not yet recorded its local state, it
records the state of the channel on which the marker is received
as empty and executes the “Marker Sending Rule” to record its
local state.
The algorithm terminates after each process has received a
marker on all of its incoming channels.
All the local snapshots get disseminated to all other processes
and all the processes can determine the global state.

22
Distributed Computing: Principles, Algorithms, and Systems

CORRECTNESS ANDCOMPLEXITY

Correctness
Due to FIFO property of channels, it follows that no message sent
after the marker on that channel is recorded in the channel state.
Thus, condition C2 is satisfied.
When a process pj receives message mij that precedes the marker on
channel Cij , it acts as follows:
 If process pj has not taken its snapshot yet, then it includes
mij in its recorded snapshot.
 Otherwise, it records mij in the state of the channel Cij . Thus,
condition C1 is satisfied.
Complexity
The recording part of a single instance of the algorithm requires
O(e) messages and O(d ) time, where e is the number of edges in the
network and d is the diameter of the network.

23

You might also like