Chapter 14 Slides
Chapter 14 Slides
Introduction
Introduction
Definition
A system of logical clocks consists of a time domain T and a logical clock C .
Elements of T form a partially ordered set over a relation <.
Relation < is called the happened before or causal precedence. Intuitively, this
relation is analogous to the earlier than relation provided by the physical time.
The logical clock C is a function that maps an event e in a distributed system to
an element in the time domain T , denoted as C(e) and called the timestamp of
A. Kshemkalyani and M. Singhal (Distributed Comput
R1: This rule governs how the local logical clock is updated by a process
when it executes an event.
R2: This rule governs how a process updates its global logical clock to
update its view of the global time and global progress.
Systems of logical clocks differ in their representation of logical time and
also in the protocol to update the logical clocks.
Scalar Time
Scalar Time
R2: Each message piggybacks the clock value of its sender at sending
time. When a process pi receives a message with timestamp Cmsg , it
executes the following actions:
◮ Ci := max (Ci , Cmsg )
◮ Execute R1.
A. Kshemkalyani and M. Singhal (Distributed Comput
Scalar Time
Evolution of scalar time:
1 2 3 8 9
p
1
9
2
1 4 5 7 11
p
2
A. Kshemkalyani and M. Singhal (Distributed Comput
3 10
4
1 b
p
3
5 6 7
Basic Properties
Consistency Property
Scalar clocks satisfy the monotonicity and hence the consistency property: for
two events ei and ej , ei → ej =⇒ C(ei ) < C(ej ).
Total Ordering
Scalar clocks can be used to totally order events in a distributed system.
A. Kshemkalyani and M. Singhal (Distributed Comput
The main problem in totally ordering events is that two or more events at
different processes may have identical timestamp.
For example in Figure 3.1, the third event of process P1 and the second event of
process P2 have identical scalar timestamp.
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Total Ordering
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
P roperties...
Event counting
If the increment value d is always 1, the scalar time has the following
interesting property: if event e has a timestamp h, then h-1 represents the
minimum logical duration, counted in units of events, required before producing
the event e;
We call it the height of the event e.
In other words, h-1 events have been produced sequentially before the event e
regardless of the processes that produced these events.
For example, in Figure 3.1, five events precede event b on the longest causal
path ending at b.
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
P roperties...
No Strong Consistency
The system of scalar clocks is not strongly consistent; that is, for two events
ei and ej , C(ei ) < C(ej ) /=⇒ ei → ej .
For example, in Figure 3.1, the third event of process P1 has smaller scalar
timestamp than the third event of process P2.However, the former did not
A. Kshemkalyani and M. Singhal (Distributed Comput
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Vector Time
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Vector Time
Process pi uses the following two rules R1 and R2 to update its clock:
R1: Before executing an event, process pi updates its local logical time as
follows:
vti [i ] := vti [i ] + d (d > 0)
R2: Each message m is piggybacked with the vector clock vt of the sender
A. Kshemkalyani and M. Singhal (Distributed Comput
◮ Execute R1.
◮ Deliver the message m.
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Vector Time
The timestamp of an event is the value of the vector clock of its process
when the event is executed.
Figure 3.2 shows an example of vector clocks progress with the 16 / 67
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Vector Time
An Example of Vector Clocks
1 2 3 4 5
0 0 0 3 3
0 0 0 4 4
p
1 5
2 2 3
0 3 4
0 0 2 2 2 4 5
1 2 3 4 6
0 0 0 17 / 67
0 4
A. Kshemkalyani and M. Singhal (Distributed Comput
p CUP 2008
2 5
2
3 5
0 4
0 2 2 2
0 3 3 3
1 2 3 4
p
3
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Vector Time
Comparing Vector Timestamps
x→y ⇔ vh[i ] ≤ vk [i ]
xǁy ⇔ vh[i ] > vk [i ] ∧ vh[j] < vk [j]
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Vector Time
Properties of Vector Time
Isomorphism
If events in a distributed system are timestamped using a system of vector
clocks, we have the following property.
If two events x and y have timestamps vh and vk, respectively, then
A. Kshemkalyani and M. Singhal (Distributed Comput
x→y ⇔ vh < vk x ǁ y ⇔ vh ǁ vk .
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Vector Time
Strong Consistency
The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related.
However, Charron-Bost showed that the dimension of vector clocks cannot be
less than n, the total number of processes in the distributed computation, for
this property to hold.
A. Kshemkalyani and M. Singhal (Distributed Comput
Event Counting
If d=1 (in rule R1), then the i th component of vector clock at process pi , vti [i ],
denotes the number of events that have occurred at pi until that instant.
So, if an event e has timestamp vh, vh[j] denotes the nuΣmber of events
executed by process pj that causally precede e. Clearly, vh[j] − 1 represents
the total number of events that causally precede e in the distributed
computation.
Logical Time
Distributed Computing: Principles, Algorithms, and Systems
Logical Time
Figure 14.1
Skew between computer clocks in a distributed system
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.2
Clock synchronization using a time server
mr
mt
p Time server,S
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.3
An example synchronization subnet in an NTP implementation
2 2
3 3 3
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.4
Messages exchanged between a pair of NTP peers
m m'
Time
Server A Ti- 3 Ti
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.5
Events occurring at three processes
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.6
Lamport timestamps for the events shown in Figure 14.5
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.7
Vector timestamps for the events shown in Figure 14.5
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.8
Detecting global properties
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.9
Cuts
0 1 2 3
e1 e1 e1 e1
p1
m1 m2
Physical
p2
0 1 2 time
e2 e2 e2
Inconsistent cut
Consistent cut
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Distributed Computing: Principles, Algorithms, and Systems
CUP 2008 31 / 51
Distributed Computing: Principles, Algorithms, and Systems
System model
CUP 2008 32 / 51
Distributed Computing: Principles, Algorithms, and Systems
System model
Transit: transit(LSi , LSj ) = {mij |send (mij ) ∈ Lsi V rec (mij ) /∈ LSj }
CUP 2008 33 / 51
Distributed Computing: Principles, Algorithms, and Systems
Models of communication
Recall, there are three models of communication: FIFO, non-FIFO, and Co.
In FIFO model, each channel acts as a first-in first-out message queue and thus,
message ordering is preserved by a channel.
In non-FIFO model, a channel acts like a set in which the sender process adds
messages and the receiver process removes messages from it in a random
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms
order.
A system that supports causal delivery of messages satisfies the following
property: “For any two messages mij and mkj , if send (mij ) −→ send (mkj ), then
rec (mij ) −→ rec (mkj )”.
CUP 2008 34 / 51
Distributed Computing: Principles, Algorithms, and Systems
conditions :
C1: send(mij )∈LSi ⇒ mij ∈SCij ⊕ rec(mij )∈LSj . (⊕ is Ex-OR operator.)
C2: send(mij )/∈LSi ⇒ mij /∈SCij ∧ rec(mij )/∈LSj .
CUP 2008 35 / 51
Distributed Computing: Principles, Algorithms, and Systems
CUP 2008 36 / 51
Distributed Computing: Principles, Algorithms, and Systems
C1 C2
1
e1
2
e1 e3 e4
p 1 1
1 m1
4
e12 e22 e23 e2 m
m5
p2 4
m2
e1 e 23 e 33 e34 e35
p3 3
m3
A. Kshemkalyani and M. Singhal (Distributed Comput
e1
Global State and Snapshot Recording Algorithms e2
4 4
p4
time
CUP 2008 37 / 51
Distributed Computing: Principles, Algorithms, and Systems
CUP 2008 38 / 51
Distributed Computing: Principles, Algorithms, and Systems
Chandy-Lamport algorithm
The Chandy-Lamport algorithm uses a control message, called a marker
whose role in a FIFO system is to separate messages in the channels.
After a site has recorded its snapshot, it sends a marker, along all of its
outgoing channels before sending out any more messages.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 39 / 51
Chandy-Lamport algorithm
Chandy-Lamport algorithm
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Distributed Computing: Principles, Algorithms, and Systems
Correctness
Due to FIFO property of channels, it follows that no message sent after the
marker on that channel is recorded in the channel state. Thus, condition C2 is
satisfied.
When a process pj receives message mij that precedes the marker on channel Cij
, it acts as follows: If process pj has not taken its snapshot yet, then it includes
mij in its recorded snapshot. Otherwise, it records mij in the state of the channel
Cij . Thus, condition C1 is satisfied.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms
Complexity
The recording part of a single instance of the algorithm requires O(e) messages
and O(d ) time, where e is the number of edges in the network and d is the
diameter of the network.
Distributed Computing: Principles, Algorithms, and Systems
The recorded global state may not correspond to any of the global states
that occurred during the computation.
This happens because a process can change its state asynchronously before
the markers it sent are received by other sites and the other sites record
their states.
But the system could have passed through the recorded global states in some
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 44 / 51
◮
equivalent executions.
◮ The recorded global state is a valid state in an equivalent execution and if a stable
property (i.e., a property that persists) holds in the system before the snapshot
algorithm begins, it holds in the recorded global snapshot.
◮ Therefore, a recorded global state is useful in detecting stable properties.
Figure 14.11
Two processes and their initial states
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.12
The execution of the processes in Figure 14.11
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.13
Reachability between states in the snapshot algorithm
Ssnap
pre-snap: e'0,e 1' ,...e' R-1 post-snap: e R' ,e R+1
' ,...
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Figure 14.14
Vector timestamps and variable values for the execution of Figure 14.9
m1 m2
Physical
p2
time
x2= 100 x2= 95 x2= 90
(2,1) (2,2) (2,3)
Cut C 2
Cut C1
Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Distributed Computing: Principles, Algorithms, and Systems
Lai-Yang algorithm
3 Thus, a white (red) message is a message that was sent before (after)
the sender of that message recorded its local snapshot.
4 Every white process takes its snapshot at its convenience, but no later
than the instant it receives a red message.
Distributed Computing: Principles, Algorithms, and Systems
Lai-Yang algorithm
Mattern’s algorithm
Mattern’s algorithm is based on vector clocks and assumes a single
initiator process and works as follows:
1 The initiator “ticks” its local clock and selects a future vector time s at
which it would like a global snapshot to be recorded. It then broadcasts this
time s and freezes all activity until it receives all acknowledgements of the
receipt of this broadcast.
2 When a process receives the broadcast, it remembers the value s and
returns an acknowledgement to the initiator.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008 52 / 51
Mattern’s algorithm
the number of white messages it has sent and received before recording its
snapshot.
It reports this value to the initiator process along with its snapshot and
forwards all white messages, it receives henceforth, to the initiator.
Σ
Snapshot collection terminates when the initiator has received i cntri
number of forwarded white messages.
Distributed Computing: Principles, Algorithms, and Systems
Mattern’s algorithm
Second method:
Each red message sent by a process carries a piggybacked value of the number
of white messages sent on that channel before the local state recording.
Each process keeps a counter for the number of white messages received on
54 / 51
each channel.
A. Kshemkalyani and M. Singhal (Distributed Comput Global State and Snapshot Recording Algorithms CUP 2008