0% found this document useful (0 votes)
19 views54 pages

Chapter 14 Slides

Chapter 14 discusses the concept of logical time and causality in distributed systems, emphasizing the importance of tracking event precedence without a global physical time. It introduces systems of logical clocks, including scalar and vector clocks, detailing their implementation and properties, such as consistency and total ordering. The chapter concludes with the challenges of efficiently implementing vector clocks in large distributed systems.

Uploaded by

sarkarmayuk2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views54 pages

Chapter 14 Slides

Chapter 14 discusses the concept of logical time and causality in distributed systems, emphasizing the importance of tracking event precedence without a global physical time. It introduces systems of logical clocks, including scalar and vector clocks, detailing their implementation and properties, such as consistency and total ordering. The chapter concludes with the challenges of efficiently implementing vector clocks in large distributed systems.

Uploaded by

sarkarmayuk2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Slides for Chapter 14:

Time and Global States

From Coulouris, Dollimore, Kindberg and


Blair
Distributed Systems:
Concepts and Design
Edition 5, © Addison-Wesley 2012
Distributed Computing: Principles, Algorithms, and Systems

Introduction

The concept of causality between events is fundamental to the design


and analysis of parallel and distributed computing and operating
systems.
Usually causality is tracked using physical time.
A. Kshemkalyani and M. Singhal (Distributed Comput

In distributed systems, it is not possible to have a global physical time.


As asynchronous distributed computations make progress in spurts, the
logical time is sufficient to capture the fundamental monotonicity
property associated with causality in distributed systems.

Logical Time CUP 2008 2 / 67


Distributed Computing: Principles, Algorithms, and Systems

Introduction

Causality among events in a distributed system


is a powerful concept in reasoning, analyzing,
and drawing inferences about a computation.
The knowledge of the causal precedence
relation among the events of processes helps
solve a variety of problems in distributed
A. Kshemkalyani and M. Singhal (Distributed Comput

systems, such as distributed algorithms


design, tracking of dependent events,
knowledge about the progress of a
computation, and concurrency measures.

Logical Time CUP 2008 3 / 67


Distributed Computing: Principles, Algorithms, and Systems

A Framework for a System of Logical Clocks

Definition
A system of logical clocks consists of a time domain T and a logical clock C .
Elements of T form a partially ordered set over a relation <.
Relation < is called the happened before or causal precedence. Intuitively, this
relation is analogous to the earlier than relation provided by the physical time.
The logical clock C is a function that maps an event e in a distributed system to
an element in the time domain T , denoted as C(e) and called the timestamp of
A. Kshemkalyani and M. Singhal (Distributed Comput

e, and is defined as follows:


C : H ›→ T
such that the following property is satisfied:
for two events ei and ej , ei → ej =⇒ C(ei ) < C(ej ).

Logical Time CUP 2008 4 / 67


Distributed Computing: Principles, Algorithms, and Systems

A Framework for a System of Logical Clocks

This monotonicity property is called the clock consistency condition.


When T and C satisfy the following condition,
for two events ei and ej , ei → ej ⇔ C(ei ) < C(ej ) the system of clocks is
said to be strongly consistent.
Implementing Logical Clocks
A. Kshemkalyani and M. Singhal (Distributed Comput

Implementation of logical clocks requires addressing two issues: data


structures local to every process to represent logical time and a protocol
to update the data structures to ensure the consistency condition.
Each process pi maintains data structures that allow it the following two
capabilities:
◮A local logical clock, denoted by lci , that helps process pi measure its own
progress.

Logical Time CUP 2008 5 / 67


Distributed Computing: Principles, Algorithms, and Systems

Implementing Logical Clocks

◮ A logical global clock, denoted by gci , that is a representation of process pi ’s


local view of the logical global time. Typically, lci is a part of gci .
The protocol ensures that a process’s logical clock, and thus its view of
the global time, is managed consistently. The protocol consists of the
following two rules:
A. Kshemkalyani and M. Singhal (Distributed Comput

R1: This rule governs how the local logical clock is updated by a process
when it executes an event.
R2: This rule governs how a process updates its global logical clock to
update its view of the global time and global progress.
Systems of logical clocks differ in their representation of logical time and
also in the protocol to update the logical clocks.

Logical Time CUP 2008 6 / 67


Distributed Computing: Principles, Algorithms, and Systems

Scalar Time

Proposed by Lamport in 1978 as an attempt to totally order events in a


distributed system.
Time domain is the set of non-negative integers.
The logical local clock of a process pi and its local view of the global time are
squashed into one integer variable Ci .
Rules R1 and R2 to update the clocks are as follows:
A. Kshemkalyani and M. Singhal (Distributed Comput

R1: Before executing an event (send, receive, or internal), process pi


executes the following:
Ci := Ci + d (d > 0)
In general, every time R1 is executed, d can have a different value; however,
typically d is kept at 1.

Logical Time CUP 2008 7 / 67


Distributed Computing: Principles, Algorithms, and Systems

Scalar Time

R2: Each message piggybacks the clock value of its sender at sending
time. When a process pi receives a message with timestamp Cmsg , it
executes the following actions:
◮ Ci := max (Ci , Cmsg )
◮ Execute R1.
A. Kshemkalyani and M. Singhal (Distributed Comput

◮ Deliver the message.

Figure 3.1 shows evolution of scalar time.

Logical Time CUP 2008 8 / 67


Distributed Computing: Principles, Algorithms, and Systems

Scalar Time
Evolution of scalar time:

1 2 3 8 9
p
1
9
2
1 4 5 7 11
p
2
A. Kshemkalyani and M. Singhal (Distributed Comput
3 10
4
1 b
p
3
5 6 7

Figure 3.1: The space-time diagram of a distributed execution.

Logical Time CUP 2008 9 / 67


Distributed Computing: Principles, Algorithms, and Systems

Basic Properties

Consistency Property
Scalar clocks satisfy the monotonicity and hence the consistency property: for
two events ei and ej , ei → ej =⇒ C(ei ) < C(ej ).
Total Ordering
Scalar clocks can be used to totally order events in a distributed system.
A. Kshemkalyani and M. Singhal (Distributed Comput

The main problem in totally ordering events is that two or more events at
different processes may have identical timestamp.
For example in Figure 3.1, the third event of process P1 and the second event of
process P2 have identical scalar timestamp.

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Total Ordering

A tie-breaking mechanism is needed to order such events. A tie is broken as


follows:
Process identifiers are linearly ordered and tie among events with identical
scalar timestamp is broken on the basis of their process identifiers.
The lower the process identifier in the ranking, the higher the priority.
The timestamp of an event is denoted by a tuple (t, i ) where t is its time of
A. Kshemkalyani and M. Singhal (Distributed Comput

occurrence and i is the identity of the process where it occurred.


The total order relation ≺ on two events x and y with timestamps (h,i) and
(k,j), respectively, is defined as follows:

x ≺ y ⇔ (h < k or (h = k and i < j))

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

P roperties...

Event counting
If the increment value d is always 1, the scalar time has the following
interesting property: if event e has a timestamp h, then h-1 represents the
minimum logical duration, counted in units of events, required before producing
the event e;
We call it the height of the event e.
In other words, h-1 events have been produced sequentially before the event e
regardless of the processes that produced these events.
For example, in Figure 3.1, five events precede event b on the longest causal
path ending at b.

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

P roperties...

No Strong Consistency
The system of scalar clocks is not strongly consistent; that is, for two events
ei and ej , C(ei ) < C(ej ) /=⇒ ei → ej .
For example, in Figure 3.1, the third event of process P1 has smaller scalar
timestamp than the third event of process P2.However, the former did not
A. Kshemkalyani and M. Singhal (Distributed Comput

happen before the latter.


The reason that scalar clocks are not strongly consistent is that the logical local
clock and logical global clock of a process are squashed into one, resulting in
the loss causal dependency information among events at different processes.
For example, in Figure 3.1, when process P2 receives the first message from
process P1, it updates its clock to 3, forgetting that the timestamp of the latest
event at P1 on which it depends is 2.

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Vector Time

The system of vector clocks was developed independently by Fidge,


Mattern and Schmuck.
In the system of vector clocks, the time domain is represented by a set of
n-dimensional non-negative integer vectors.
Each process pi maintains a vector vti [1..n], where vti [i ] is the local logical
clock of pi and describes the logical time progress at process pi .
A. Kshemkalyani and M. Singhal (Distributed Comput CUP 2008 14 / 67

vti [j] represents process pi ’s latest knowledge of process pj local time.


If vti [j]=x , then process pi knows that local time at process pj has
progressed till x .
The entire vector vti constitutes pi ’s view of the global logical time and is
used to timestamp events.

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Vector Time

Process pi uses the following two rules R1 and R2 to update its clock:
R1: Before executing an event, process pi updates its local logical time as
follows:
vti [i ] := vti [i ] + d (d > 0)
R2: Each message m is piggybacked with the vector clock vt of the sender
A. Kshemkalyani and M. Singhal (Distributed Comput

process at sending time. On the receipt of such a message (m,vt), process pi


executes the following sequence of actions:
◮ Update its global logical time as follows:

1 ≤ k ≤ n : vti [k ] := max (vti [k ], vt[k ])

◮ Execute R1.
◮ Deliver the message m.

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Vector Time

The timestamp of an event is the value of the vector clock of its process
when the event is executed.
Figure 3.2 shows an example of vector clocks progress with the 16 / 67

increment value d=1.


A. Kshemkalyani and M. Singhal (Distributed Comput CUP 2008

Initially, a vector clock is [0, 0, 0, ...., 0].

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Vector Time
An Example of Vector Clocks

1 2 3 4 5
0 0 0 3 3
0 0 0 4 4
p
1 5
2 2 3
0 3 4
0 0 2 2 2 4 5
1 2 3 4 6
0 0 0 17 / 67
0 4
A. Kshemkalyani and M. Singhal (Distributed Comput
p CUP 2008

2 5
2
3 5
0 4
0 2 2 2
0 3 3 3
1 2 3 4
p
3

Figure 3.2: Evolution of vector time.

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Vector Time
Comparing Vector Timestamps

The following relations are defined to compare two vector timestamps, vh


and vk:
vh = vk ⇔ ∀x : vh[x ] = vk [x ]
vh ≤ vk ⇔ ∀x : vh[x ] ≤ vk [x ]
vh < vk ⇔ vh ≤ vk and ∃x : vh[x ] < vk
[x ] 18 / 67

vh ǁ vk ⇔ ¬(vh < vk ) ∧ ¬(vk < vh)


A. Kshemkalyani and M. Singhal (Distributed Comput CUP 2008

If the process at which an event occurred is known, the test to compare


two timestamps can be simplified as follows: If events x and y respectively
occurred at processes pi and pj and are assigned timestamps vh and vk,
respectively, then

x→y ⇔ vh[i ] ≤ vk [i ]
xǁy ⇔ vh[i ] > vk [i ] ∧ vh[j] < vk [j]

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Vector Time
Properties of Vector Time

Isomorphism
If events in a distributed system are timestamped using a system of vector
clocks, we have the following property.
If two events x and y have timestamps vh and vk, respectively, then
A. Kshemkalyani and M. Singhal (Distributed Comput

x→y ⇔ vh < vk x ǁ y ⇔ vh ǁ vk .

Thus, there is an isomorphism between the set of partially ordered events


produced by a distributed computation and their vector timestamps.

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Vector Time

Strong Consistency
The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related.
However, Charron-Bost showed that the dimension of vector clocks cannot be
less than n, the total number of processes in the distributed computation, for
this property to hold.
A. Kshemkalyani and M. Singhal (Distributed Comput

Event Counting
If d=1 (in rule R1), then the i th component of vector clock at process pi , vti [i ],
denotes the number of events that have occurred at pi until that instant.
So, if an event e has timestamp vh, vh[j] denotes the nuΣmber of events
executed by process pj that causally precede e. Clearly, vh[j] − 1 represents
the total number of events that causally precede e in the distributed
computation.

Logical Time
Distributed Computing: Principles, Algorithms, and Systems

Efficient Implementations of Vector Clocks

If the number of processes in a distributed computation is large, then


vector clocks will require piggybacking of huge amount of information in
messages.
The message overhead grows linearly with the number of processors in
the system and when there are thousands of processors in the system,
the message size becomes huge even if there are only a few events
occurring in few processors.
A. Kshemkalyani and M. Singhal (Distributed Comput

We discuss an efficient way to maintain vector clocks.


Charron-Bost showed that if vector clocks have to satisfy the strong
consistency property, then in general vector timestamps must be at least
of size n, the total number of processes.
However, optimizations are possible and next, and we discuss a technique
to implement vector clocks efficiently.

Logical Time
Figure 14.1
Skew between computer clocks in a distributed system

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.2
Clock synchronization using a time server

mr

mt
p Time server,S

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.3
An example synchronization subnet in an NTP implementation

2 2

3 3 3

Note: Arrows denote synchronization control, numbers denote


strata.

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.4
Messages exchanged between a pair of NTP peers

Server B Ti-2 Ti-1


Time

m m'

Time
Server A Ti- 3 Ti

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.5
Events occurring at three processes

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.6
Lamport timestamps for the events shown in Figure 14.5

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.7
Vector timestamps for the events shown in Figure 14.5

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.8
Detecting global properties

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.9
Cuts

0 1 2 3
e1 e1 e1 e1
p1

m1 m2

Physical
p2
0 1 2 time
e2 e2 e2

Inconsistent cut
Consistent cut

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Distributed Computing: Principles, Algorithms, and Systems

Recording the global state of a distributed


system on-the-fly is an important
paradigm.
The lack of globally shared memory, global
clock and unpredictable message delays in
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms

a distributed system make this problem


non-trivial.

CUP 2008 31 / 51
Distributed Computing: Principles, Algorithms, and Systems

System model

The system consists of a collection of n processes p1, p2, ...,


pn that are connected by channels.
There are no globally shared memory and physical global
clock and processes communicate by passing messages
through communication channels.
Cij denotes the channel from process pi to process pj and its
state is denoted by SCij .
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms

The actions performed by a process are modeled as three


types of events: Internal events,the message send event
and the message receive event.
For a message mij that is sent by process pi to process pj , let
send (mij ) and
rec (mij ) denote its send and receive events.

CUP 2008 32 / 51
Distributed Computing: Principles, Algorithms, and Systems

System model

At any instant, the state of process pi , denoted by LSi , is a result of the


sequence of all the events executed by pi till that instant.
For an event e and a process state LSi , e∈LSi iff e belongs to the sequence
of events that have taken process pi to state LSi .
For an event e and a process state LSi , e/∈LSi iff e does not belong to the
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms

sequence of events that have taken process pi to state LSi .


For a channel Cij , the following set of messages can be defined based on
the local states of the processes pi and pj

Transit: transit(LSi , LSj ) = {mij |send (mij ) ∈ Lsi V rec (mij ) /∈ LSj }

CUP 2008 33 / 51
Distributed Computing: Principles, Algorithms, and Systems

Models of communication

Recall, there are three models of communication: FIFO, non-FIFO, and Co.
In FIFO model, each channel acts as a first-in first-out message queue and thus,
message ordering is preserved by a channel.
In non-FIFO model, a channel acts like a set in which the sender process adds
messages and the receiver process removes messages from it in a random
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms

order.
A system that supports causal delivery of messages satisfies the following
property: “For any two messages mij and mkj , if send (mij ) −→ send (mkj ), then
rec (mij ) −→ rec (mkj )”.

CUP 2008 34 / 51
Distributed Computing: Principles, Algorithms, and Systems

Consistent global state

The global state of a distributed system is a collection of the local states of


the processes and the channels.
Notationally, global state GS is defined as,
S S
GS = { i LSi , i ,j SCij }
A global state GS is a consistent global state iff it satisfies the following two
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms

conditions :
C1: send(mij )∈LSi ⇒ mij ∈SCij ⊕ rec(mij )∈LSj . (⊕ is Ex-OR operator.)
C2: send(mij )/∈LSi ⇒ mij /∈SCij ∧ rec(mij )/∈LSj .

CUP 2008 35 / 51
Distributed Computing: Principles, Algorithms, and Systems

Interpretation in terms of cuts


A cut in a space-time diagram is a line joining an arbitrary
point on each process line that slices the space-time
diagram into a PAST and a FUTURE.
A consistent global state corresponds to a cut in which
every message received in the PAST of the cut was sent
in the PAST of that cut.
Such a cut is known as a consistent cut.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms

For example, consider the space-time diagram for the


computation illustrated in Figure 4.1.
Cut C1 is inconsistent because message m1 is flowing from
the FUTURE to the PAST.
Cut C2 is consistent and message m4 must be captured in
the state of channel C21.

CUP 2008 36 / 51
Distributed Computing: Principles, Algorithms, and Systems

C1 C2
1
e1
2
e1 e3 e4
p 1 1
1 m1
4
e12 e22 e23 e2 m
m5
p2 4
m2
e1 e 23 e 33 e34 e35
p3 3

m3
A. Kshemkalyani and M. Singhal (Distributed Comput
e1
Global State and Snapshot Recording Algorithms e2
4 4
p4

time

Figure 4.1: An Interpretation in Terms of a Cut.

CUP 2008 37 / 51
Distributed Computing: Principles, Algorithms, and Systems

Issues in recording a global state

The following two issues need to be addressed:


: How to distinguish between the messages to be recorded in the snapshot from
ose not to be recorded.

ny message that is sent by a process before recording its snapshot, must be


corded in the global snapshot (from C1).
ny message that is sent by a process after recording its snapshot, must not be
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms

corded in the global snapshot (from C2).


: How to determine the instant when a process takes its snapshot.

process pj must record its snapshot before processing a message


ij that was sent by process pi after recording its snapshot.

CUP 2008 38 / 51
Distributed Computing: Principles, Algorithms, and Systems

Snapshot algorithms for FIFO channels

Chandy-Lamport algorithm
The Chandy-Lamport algorithm uses a control message, called a marker
whose role in a FIFO system is to separate messages in the channels.
After a site has recorded its snapshot, it sends a marker, along all of its
outgoing channels before sending out any more messages.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 39 / 51

A marker separates the messages in the channel into those to be included in


the snapshot from those not to be recorded in the snapshot.
A process must record its snapshot no later than when it receives a marker on
any of its incoming channels.
Distributed Computing: Principles, Algorithms, and Systems

Chandy-Lamport algorithm

The algorithm can be initiated by any process by executing


the “Marker Sending Rule” by which it records its local
state and sends a marker on each outgoing channel.
A process executes the “Marker Receiving Rule” on
receiving a marker. If the process has not yet recorded
its local state, it records the state of the channel on
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 40 / 51

which the marker is received as empty and executes the


“Marker Sending Rule” to record its local state.
The algorithm terminates after each process has received
a marker on all of its incoming channels.
All the local snapshots get disseminated to all other
processes and all the processes can determine the
global state.
Distributed Computing: Principles, Algorithms, and Systems

Chandy-Lamport algorithm

Marker Sending Rule for process i


1 Process i records its state.
2 For each outgoing channel C on which a
marker has not been sent, i sends a marker
along C before i sends further messages along
C. Receiving Rule for process j
Marker
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 41 / 51

On receiving a marker along channel C:


if j has not recorded its state then
Record the state of C as the empty set Follow the
“Marker Sending Rule”
else
Record the state of C as the set of messages received
along C after j’s state was recorded and before j
received the marker along C
Figure 14.10
Chandy and Lamport’s ‘snapshot’ algorithm

Marker receiving rule for process pi


On pi’s receipt of a marker message over channel c:
if (pi has not yet recorded its state) it
records its process state now;
records the state of c as the empty set;
turns on recording of messages arriving over other incoming channels;
else
pi records the state of c as the set of messages it has received over c
since it saved its state.
end if
Marker sending rule for process pi
After pi has recorded its state, for each outgoing channel c:
pi sends one marker message over c
(before it sends any other message over c).

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Distributed Computing: Principles, Algorithms, and Systems

Correctness and Complexity

Correctness
Due to FIFO property of channels, it follows that no message sent after the
marker on that channel is recorded in the channel state. Thus, condition C2 is
satisfied.
When a process pj receives message mij that precedes the marker on channel Cij
, it acts as follows: If process pj has not taken its snapshot yet, then it includes
mij in its recorded snapshot. Otherwise, it records mij in the state of the channel
Cij . Thus, condition C1 is satisfied.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms

Complexity
The recording part of a single instance of the algorithm requires O(e) messages
and O(d ) time, where e is the number of edges in the network and d is the
diameter of the network.
Distributed Computing: Principles, Algorithms, and Systems

Properties of the recorded global state

The recorded global state may not correspond to any of the global states
that occurred during the computation.
This happens because a process can change its state asynchronously before
the markers it sent are received by other sites and the other sites record
their states.
But the system could have passed through the recorded global states in some
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 44 / 51

equivalent executions.
◮ The recorded global state is a valid state in an equivalent execution and if a stable

property (i.e., a property that persists) holds in the system before the snapshot
algorithm begins, it holds in the recorded global snapshot.
◮ Therefore, a recorded global state is useful in detecting stable properties.
Figure 14.11
Two processes and their initial states

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.12
The execution of the processes in Figure 14.11

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.13
Reachability between states in the snapshot algorithm

actual execution e0,e1,...

Sinit recording recording Sfinal


begins '
ends

Ssnap
pre-snap: e'0,e 1' ,...e' R-1 post-snap: e R' ,e R+1
' ,...

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.14
Vector timestamps and variable values for the execution of Figure 14.9

(1,0) (2,0) (3,0) (4,3)


x1= 1 x1= 100 x1= 105 x1= 90
p1

m1 m2

Physical
p2
time
x2= 100 x2= 95 x2= 90
(2,1) (2,2) (2,3)
Cut C 2
Cut C1

Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Distributed Computing: Principles, Algorithms, and Systems

Snapshot algorithms for non-FIFO channels

In a non-FIFO system, a marker cannot be used to delineate messages into


those to be recorded in the global state from those not to be recorded in
the global state.
In a non-FIFO system, either some degree of inhibition or piggybacking of
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 49 / 51

control information on computation messages to capture out-of-sequence


messages.
Distributed Computing: Principles, Algorithms, and Systems

Lai-Yang algorithm

The Lai-Yang algorithm fulfills this role of a marker in a non-FIFO system by


using a coloring scheme on computation messages that works as follows:
1 Every process is initially white and turns red while taking a snapshot. The
equivalent of the “Marker Sending Rule” is executed when a process turns
red.
50 / 51

Every message sent by a white (red) process is colored white (red).


A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008

3 Thus, a white (red) message is a message that was sent before (after)
the sender of that message recorded its local snapshot.
4 Every white process takes its snapshot at its convenience, but no later
than the instant it receives a red message.
Distributed Computing: Principles, Algorithms, and Systems

Lai-Yang algorithm

4 Every white process records a history of all white messages sent or


received by it along each channel.
5 When a process turns red, it sends these histories along with its snapshot
to the initiator process that collects the global snapshot.
6 The initiator process evaluates transit(LSi , LSj ) to compute the state of a
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 51 / 51

channel Cij as given below:


SCij = white messages sent by pi on Cij − white messages received by pj on
Cij
= {send (mij )|send (mij ) ∈ LSi } − {rec (mij )|rec (mij ) ∈ LSj }.
Distributed Computing: Principles, Algorithms, and Systems

Mattern’s algorithm
Mattern’s algorithm is based on vector clocks and assumes a single
initiator process and works as follows:
1 The initiator “ticks” its local clock and selects a future vector time s at
which it would like a global snapshot to be recorded. It then broadcasts this
time s and freezes all activity until it receives all acknowledgements of the
receipt of this broadcast.
2 When a process receives the broadcast, it remembers the value s and
returns an acknowledgement to the initiator.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 52 / 51

3 After having received an acknowledgement from every process, the initiator


increases its vector clock to s and broadcasts a dummy message to all
processes.
4 The receipt of this dummy message forces each recipient to increase its
clock to a value ≥ s if not already ≥ s.
5
Each process takes a local snapshot and sends it to the initiator when (just
before) its clock increases from a value less than s to a value ≥ s.
6
The state of Cij is all messages sent along Cij , whose timestamp is smaller
than s and which are received by pj after recording LSj .
Distributed Computing: Principles, Algorithms, and Systems

Mattern’s algorithm

A termination detection scheme for non-FIFO channels is required to detect


that no white messages are in transit.
One of the following schemes can be used for termination detection:
First method:
Each process i keeps a counter cntri that indicates the difference between
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 53 / 51

the number of white messages it has sent and received before recording its
snapshot.
It reports this value to the initiator process along with its snapshot and
forwards all white messages, it receives henceforth, to the initiator.
Σ
Snapshot collection terminates when the initiator has received i cntri
number of forwarded white messages.
Distributed Computing: Principles, Algorithms, and Systems

Mattern’s algorithm

Second method:
Each red message sent by a process carries a piggybacked value of the number
of white messages sent on that channel before the local state recording.
Each process keeps a counter for the number of white messages received on
54 / 51

each channel.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008

A process can detect termination of recording the states of incoming channels


when it receives as many white messages on each channel as the value
piggybacked on red messages received on that channel.

You might also like