Distributed Synchronization
Distributed Synchronization
Synchronization
Chapter 6
1
?Why Synchronize
• Often important to control access to a single,
shared resource.
• Also often important to agree on the ordering of
events.
• Synchronization in Distributed Systems is much
more difficult than in uniprocessor systems.
• We will study:
1. Synchronization based on “Actual Time”.
2. Synchronization based on “Relative Time”.
3. Synchronization based on Co-ordination (with Election Algorithms).
4. Distributed Mutual Exclusion.
5. Distributed Transactions.
2
Clock Synchronization
• Synchronization based on “Actual Time”.
• Note: time is really easy on a uniprocessor system.
• Achieving agreement on time in a DS is not trivial.
• Question: is it even possible to synchronize all the
clocks in a Distributed System?
• With multiple computers, “clock skew” ensures that
no two machines have the same value for the “current
time”. But, how do we measure time?
3
?How Do We Measure Time
• Turns out that we have only been
measuring time accurately with a “global”
atomic clock since Jan. 1st, 1958 (the
“beginning of time”).
• Bottom Line: measuring time is not as
easy as one might think it should be.
• Algorithms based on the current time
(from some Physical Clock) have been
devised for use within a DS.
4
Clock Synchronization
Cristian's Algorithm
• Getting the current time from a “time server”, using periodic client
requests.
• Major problem if time from time server is less than the client –
resulting in time running backwards on the client! (Which cannot
happen – time does not go backwards).
• Minor problem results from the delay introduced by the network
5 request/response: latency.
6
The Berkeley Algorithm (1)
12
Lamport’s Logical Clocks
• First point: if two processes do not interact, then their
clocks do not need to be synchronized – they can
operate concurrently without fear of interfering with
each other.
• Second (critical) point: it does not matter that two
processes share a common notion of what the “real”
current time is. What does matter is that the processes
have some agreement on the order in which certain
events occur.
• Lamport used these two observations to define the
“happens-before” relation (also often referred to within
the context of Lamport’s Timestamps).
13
Lamport logical clock timestamp
• Let timestamp(a) be the Lamport logical clock
timestamp
a b => timestamp(a) < timestamp(b)
(if a happens before b, then Lamport_timestamp(a) <
Lamport_timestamp(b))
14
The “Happens-Before” Relation (4)
• The question to ask is:
– How can some event that “happens-before” some other
event possibly have occurred at a later time??
• The answer is: it can’t!
• So, Lamport’s solution is to have the receiving process
adjust its clock forward to one more than the sending
timestamp value. This allows the “happens-before”
relation to hold, and also keeps all the clocks running
in a synchronized state. The clocks are all kept in sync
relative to each other.
15
Lamport’s Logical Clocks (1)
23
Totally-Ordered Multicasting, Revisited
(1,0,0) (2,0,0)
p1
a b m1
(2,1,0) (2,2,0)
Physical
p2
time
c d
m2
(0,0,1) (2,2,2)
p3
e f
27
Example: Vector Logical Time
Physical Time
28
Mutual Exclusion within Distributed Systems
30
Mutual Exclusion
A Centralized Algorithm (1)
36
Mutual Exclusion: Distributed Algorithm
1. When a process (the “requesting process”) decides to enter a critical
region, a message is sent to all processes in the Distributed System
(including itself).
2. What happens at each process depends on the “state” of the critical region.
3. If not in the critical region (and not waiting to enter it), a process sends
back an OK to the requesting process.
4. If in the critical region, a process will queue the request and will not send
a reply to the requesting process.
5. If waiting to enter the critical region, a process will:
a) Compare the timestamp of the new message with that in its queue
(note that the lowest timestamp wins).
b) If the received timestamp wins, an OK is sent back, otherwise the
request is queued (and no reply is sent back).
6. When all the processes send OK, the requesting process can safely enter
the critical region.
7. When the requesting process leaves the critical region, it sends an OK to
37 all the processes in its queue, then empties its queue.
Distributed Algorithm (1)
• Three different cases:
1. If the receiver is not accessing the resource and does
not want to access it, it sends back an OK message
to the sender.
2. If the receiver already has access to the resource,
it simply does not reply. Instead, it queues the
request.
3. If the receiver wants to access the resource as well
but has not yet done so, it compares the timestamp
of the incoming message with the one contained in
the message that it has sent everyone. The lowest
one wins.
38
Distributed Algorithm (2)
Crash of any
Distributed 2(n–1) 2(n–1)
process
Lost token,
Token-Ring 1 to 0 to n – 1
process crash
• None are perfect – they all have their problems!
• The “Centralized” algorithm is simple and efficient, but suffers from a single
point-of-failure.
• The “Distributed” algorithm has nothing going for it – it is slow,
complicated, inefficient of network bandwidth, and not very robust.
It “sucks”!
• The “Token-Ring” algorithm suffers from the fact that it can sometimes take
46 a long time to reenter a critical region having just exited it.