0% found this document useful (0 votes)
1 views24 pages

2016 DistributedSystems 1B L4

This lecture discusses clock synchronization in distributed systems, addressing issues such as clock drift and skew. It covers various algorithms for synchronization, including Cristian's Algorithm, the Berkeley Algorithm, and the Network Time Protocol (NTP), as well as the concept of logical clocks and the happens-before relation. The lecture concludes with an overview of vector clocks and their use in establishing causal ordering between events.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views24 pages

2016 DistributedSystems 1B L4

This lecture discusses clock synchronization in distributed systems, addressing issues such as clock drift and skew. It covers various algorithms for synchronization, including Cristian's Algorithm, the Berkeley Algorithm, and the Network Time Protocol (NTP), as well as the concept of logical clocks and the happens-before relation. The lecture concludes with an overview of vector clocks and their use in establishing causal ordering between events.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Distributed systems

Lecture 4: Clock synchronisation; logical clocks

Dr Robert N. M. Watson

1
Last time
• Started to look at time in distributed systems
– Coordinating actions between processes
• Physical clocks ‘tick’ based on physical processes (e.g.
oscillations in quartz crystals, atomic transitions)
– Imperfect, so gain/lose time over time
– (wrt nominal perfect ‘reference’ clock (such as UTC))
• The process of gaining/losing time is clock drift
• The difference between two clocks is called clock skew
• Clock synchronization aims to minimize clock skew
between two (or a set of) different clocks

2
From last lecture
The clock synchronization problem
• In distributed systems, we’d like all the different
nodes to have the same notion of time, but
– quartz oscillators oscillate at slightly different
frequencies (time, temperature, manufacture)
• Hence clocks tick at different rates:
– create ever-widening gap in perceived time
– this is called clock drift
• The difference between two clocks at a given
point in time is called clock skew
• Clock synchronization aims to minimize clock
skew between two (or a set of) different clocks
3
Dealing with drift
• A clock can have positive or negative drift with
respect to a reference clock (e.g. UTC)
– Need to [re]synchronize periodically
• Can’t just set clock to ‘correct’ time
– Jumps (particularly backward!) can confuse apps
• Instead aim for gradual compensation
– If clock fast, make it run slower until correct
– If clock slow, make it run faster until correct

4
Compensation
• Most systems relate real-time to cycle counters or periodic
interrupt sources
– E.g. calibrate CPU Time-Stamp Counter (TSC) against CMOS
Real-Time Clock (RTC) at boot, and compute scaling factor (e.g.
cycles per ms)
– Can now convert TSC differences to real-time
– Similarly can determine how much real-time passes between
periodic interrupts: call this delta
– On interrupt, add delta to software real-time clock
• Making small changes to delta gradually adjusts time
– Once synchronized, change delta back to original value
– (Or try to estimate drift & continually adjust delta)
– Minimise time discontinuities from stepping

5
Obtaining accurate time
• Of course, need some way to know correct time
(e.g. UTC) in order to adjust clock!
– could attach a GPS receiver (or GOES receiver) to
computer, and get ±1ms (or ±0.1ms) accuracy…
– …but too expensive/clunky for general use
– (RF in server rooms and data centres non-ideal)
• Instead can ask some machine with a more
accurate clock over the network: a time server
– e.g. send RPC getTime() to server
– What’s the problem here?

6
Cristian’s Algorithm (1989)
T0 T1
client
request reply
Ts
server
time

• Attempt to compensate for network delays


– Remember local time just before sending: T0
– Server gets request, and puts Ts into response
– When client receives reply, notes local time: T1
– Correct time is then approximately (Ts + (T1- T0) / 2)
– (assumes symmetric behaviour...)
7
Cristian’s Algorithm: Example
T0 08:02:01.670 C

S 08:02:04.325 Ts

C
Time

T1 08:02:02.130

• RTT = 460ms, so one way delay is [approx] 230ms.


• Estimate correct time as (08:02:04.325 + 230ms) = 08:02:04.555
• Client gradually adjusts local clock to gain 2.425 seconds

8
Berkeley Algorithm (1989)
• Don’t assume have an accurate time server
• Try to synchronize a set of clocks to the average
– One machine, M, is designated the master
– M periodically polls all other machines for their time
– (can use Cristian’s technique to account for delays)
– Master computes average (including itself, but ignoring
outliers), and sends an adjustment to each machine
Avg = (01:17+01:12+02:01)/3 +00:00:13
08:01:17 M = (04:30/3) = 01:30 M

-00:00:31
08:02:01

A B C A B C
9
Network Time Protocol (NTP)
• Previous schemes designed for LANs; in practice
today’s systems use NTP:
– Global service designed to enable clients to stay
within (hopefully) a few ms of UTC
• Hierarchy of clocks arranged into strata
– Stratum0 = atomic clocks (or maybe GPS, GEOS)
– Stratum1 = servers directly attached to stratum0 clock
– Stratum2 = servers that synchronize with stratum1
– … and so on
• Timestamps made up of seconds and ‘fraction’
– e.g. 32 bit seconds-since-epoch; 32 bit ‘picoseconds’

10
NTP algorithm
T0 T3
client
request reply
server
T1 T2 time

• UDP/IP messages with slots for four timestamps


– systems insert timestamps at earliest/latest opportunity
Measured difference in average
• Client computes: timestamps: (T1+T2)/2 – (T0+T3)/2
– Offset O = ((T1-T0) + (T2-T3)) / 2
Estimated two-way communication
– Delay D = (T3-T0) – (T2-T1) delay minus processing time
• Relies on symmetric messaging delays to be correct
(but now excludes variable processing delay at server)
11
NTP example
02 03 04 05 06 07 08 09 10 11 12 13
client
request reply
server
35 36 37 38 39 40 41 42 43 44 45 46 time

• First request/reply pair:


– Total message delay is ((6-3) - (38-37)) = 2
– Offset is ((37-3) + (38-6)) / 2 = 33
• Second request/reply pair:
– Total message delay is ((13-8) - (45-42)) = 2
– Offset is ((42-8) + (45-13)) / 2 = 33

12
NTP: additional details (1)
• NTP uses multiple requests per server
– Remember <offset, delay> in each case
– Calculate the filter dispersion of the offsets & discard
outliers
– Chooses remaining candidate with the smallest delay
• NTP can also use multiple servers
– Servers report synchronization dispersion = estimate
of their quality relative to the root (stratum 0)
– Combined procedure to select best samples from best
servers (see RFC 5905 for the gory details)

13
NTP: additional details (2)
• Various operating modes:
– Broadcast (“multicast”): server advertises current
time
– Client-server (“procedure call”): as described on
previous
– Symmetric: between a set of NTP servers
• Security is supported
– Authenticate server, prevent replays
– Cryptographic cost compensated for

14
Physical clocks: summary
• Physical devices exhibit clock drift
– Even if initially correct, they tick too fast or too slow, and
hence time ends up being wrong
– Drift rates depend on the specific device, and can vary
with time, temperature, acceleration, …
• Instantaneous difference between clocks is clock skew
• Clock synchronization algorithms attempt to minimize
the skew between a set of clocks
– Decide upon a target correct time (atomic, or average)
– Communicate to agree, compensating for delays
– In reality, will still have 1-10ms skew after sync ;-(

15
Ordering
• One use of time is to provide ordering
– If I withdrew £100 cash at 23:59.44…
– And the bank computes interest at 00:00.00…
– Then interest calculation shouldn’t include the £100
• But in distributed systems we can’t perfectly
synchronize time => cannot use this for ordering
– Clock skew can be large, and may not be trusted
– And over large distances, relativistic events mean that
ordering depends on the observer
– (similar effect due to finite ‘speed of Internet’ ;-)

16
The “happens-before” relation
• Often don’t need to know when event a occurred
– Just need to know if a occurred before or after b
• Define the happens-before relation, a → b
– If events a and b are within the same process, then
a→ b if a occurs with an earlier local timestamp
– Messages between processes are ordered causally,
i.e. the event send(m) → the event receive(m)
– Transitivity: i.e. if a→ b and b→ c, then a→ c
• Note that this only provides a partial order:
– Possible for neither a→ b nor b→ a to hold
– We say that a and b are concurrent and write a ~ b

17
Example
P1
a b m1
? ?
P2 physical time
c d m2
? ?

P3
e f

• Three processes (each with 2 events), and 2 messages


– Due to process order, we know a→ b, c→ d and e→ f
– Causal order tells us b→ c and d→ f
– And by transitivity a→ c, a→ d, a→ f, b→ d, b→ f, c→ f
• However event e is concurrent with a, b, c and d

18
Implementing Happens-Before
• One early scheme due to Lamport [1978]
– Each process Pi has a logical clock Li
• Li can simply be an integer, initialized to 0
– Li is incremented on every local event e
• We write Li(e) or L(e) as the timestamp of e
– When Pi sends a message, it increments Li and copies
the value into the packet
– When Pi receives a message from Pj, it extracts Lj and
sets Li := max(Li,Lj), and then increments Li
• Guarantees that if a → b, then L(a) < L(b)
– However if L(x) < L(y), this doesn’t imply x → y !

19
Lamport Clocks: Example
1 2
P1
a b m1
3 4
P2 physical time
c d m2
1 5
P3
e f
• When P2 receives m1, it extracts timestamp 2 and sets its
clock to max(0, 2) before increment
• Possible for events to have duplicate timestamps
– e.g. event e has the same timestamp as event a
• If desired can break ties by looking at pids, IP addresses, …
– this gives a total order, but doesn’t imply happens-before!

20
Vector clocks
• With Lamport clocks, given L(a) and L(b), we
can’t tell if a→ b or b→ a or a ~ b
• One solution is vector clocks:
– An ordered list of logical clocks, one per-process
– Each process Pi maintains Vi[], initially all zeroes
– On a local event e, Pi increments Vi[i]
• If the event is message send, new Vi[] copied into packet
– If Pi receives a message from Pj then, for all k = 0, 1, …,
it sets Vi[k] := max(Vj[k], Vi[k]), and increments Vi[i]
• Intuitively Vi[k] captures the number of events at
process Pk that have been observed by Pi

21
Vector clocks: example
(1,0,0) (2,0,0)
P1
a b m1
(2,1,0) (2,2,0)
P2 physical time
c d m2
(0,0,1) (2,2,2)
P3
e f
• When P2 receives m1, it merges the entries from P1’s clock
– choose the maximum value in each position
• Similarly when P3 receives m2, it merges in P2’s clock
– this incorporates the changes from P1 that P2 already saw
• Vector clocks explicitly track the transitive causal order: f’s
timestamp captures the history of a, b, c & d

22
Using vector clocks for ordering
• Can compare vector clocks piecewise:
– Vi = Vj iff Vi[k] = Vj[k] for k = 0, 1, 2, …
– Vi ≤ Vj iff Vi[k] ≤ Vj[k] for k = 0, 1, 2, …
– Vi < Vj iff Vi ≤ Vj and Vi ≠ Vj e.g. [2,0,0] versus [0,0,1]
– Vi ~ Vj otherwise
• For any two event timestamps T(a) and T(b)
– if a → b then T(a) < T(b) ; and
– if T(a) < T(b) then a → b
• Hence can use timestamps to determine if there
is a causal ordering between any two events
– i.e. determine whether a → b, b → a or a ~ b
Does this seem familiar? Recall Time-Stamp Ordering and Optimistic
23
Concurrency Control for transactions last term.
Summary + next time (ironically)
• The clock synchronisation problem
• Cristian’s Algorithm, Berkeley Algorithm, NTP
• Logical time via the happens-before relation
• Vector clocks

• More on vector clocks


• Consistent cuts
• Group communication
• Enforcing ordering vs. asynchrony
• Distributed mutual exclusion

24

You might also like