MODULE 3 Syncronization
MODULE 3 Syncronization
Distributed
Computing
Harish Tiwari
SIR PADAMPAT SINGHANIA UNIVERSITY
CS-4010 DISTRIBUTED COMPUTING
Contents
Chapter 3 Synchronization in Distributed System ...................................................... 2
HARISH TIWARI 1
In single CPU systems, critical regions, mutual exclusion, and other synchronization
problems are solved using methods such as semaphores. These methods will not
work in distributed systems because they implicitly rely on the existence of shared
memory.
Another example is that multiple processes may sometimes need to agree on the
ordering of events, such as whether message m1 from process P was sent before or
after message m2 from process Q.
HARISH TIWARI 2
then a little later process B asks for the time, the value that B gets will be higher than
(or possibly equal to) the value A got. It will certainly not be lower.
Just think, for a moment, about the implications of the lack of global time on the
UNIX make program, as a single example. Normally, in UNIX, large programs are
split up into multiple source files, so that a change to one source file only requires
one file to be recompiled, not all the files. If a program consists of 100 files, not
having to recompile everything because one file has been changed greatly increases
the speed at which programmers can work.
The way make normally works is simple. When the programmer has finished
changing all the source files, he runs make, which examines the times at which all
the source and object files were last modified. If the source file input. c has time
2151 and the corresponding object file input.o has time 2150, make knows that
input.c has been changed since input.o was created, and thus input.c must be
recompiled. On the other hand, if output.c has time 2144 and output.o has time
2145, no compilation is needed. Thus make goes through all the source files to find
out which ones need to be recompiled and calls the compiler to recompile them.
Now imagine what could happen in a distributed system in which there were no
global agreement on time. Suppose that output.o has time 2144 as above, and
shortly thereafter output.c is modified but is assigned time 2143 because the clock
on its machine is slightly behind, as shown in Fig. 1. Make will not call the compiler.
The resulting executable binary program will then contain a mixture of object files
from the old sources and the new sources. It will probably crash and the programmer
will go crazy trying to understand what is wrong with the code.
HARISH TIWARI 3
Figure 1 When each machine has its own clock, an event that occurred after another event may nevertheless be
assigned an earlier time.
When each machine has its own clock, an event that occurred after another event
may nevertheless be assigned an earlier time.
The clock synchronization can be achieved by 2 ways: External and Internal Clock
Synchronization.
HARISH TIWARI 4
A computer timer is usually a precisely machined quartz crystal. When kept under
tension, quartz crystals oscillate at a well-defined frequency that depends on the kind
of crystal, how it is cut, and the amount of tension. Associated with each crystal are
two registers, a counter and a holding register. Each oscillation of the crystal
decrements the counter by one. When the counter gets to zero, an interrupt is
generated, and the counter is reloaded from the holding register. In this way, it is
possible to program a timer to generate an interrupt 60 times a second, or at any
other desired frequency. Each interrupt is called one clock tick.
When the system is booted, it usually asks the user to enter the date and time, which
is then converted to the number of ticks after some known starting date and stored in
memory. Most computers have a special battery-backed up CMOS RAM so that the
date and time need not be entered on subsequent boots. At every clock tick, the
interrupt service procedure adds one to the time stored in memory. In this way, the
(software) clock is kept up to date.
With a single computer and a single clock, it does not matter much if this clock is off
by a small amount. Since all processes on the machine use the same. clock, they
will still be internally consistent.
As soon as multiple CPUs are introduced, each with its own clock, the situation
changes radically. Although the frequency at which a crystal oscillator runs is usually
fairly stable, it is impossible to guarantee that the crystals in different computers all
run at exactly the same frequency. In practice, when a system has n computers, all n
crystals will run at slightly different rates, causing the (software) clocks gradually to
get out of synch and give different values when read out. This difference in time
values is called clock skew.
HARISH TIWARI 5
As a consequence of this clock skew, programs that expect the time associated with
a file, object, process, or message to be correct and independent of the machine on
which it was generated (i.e., which clock it used) can fail.
To avoid the clock skew problem, two types of clocks are used:
In some systems (e.g., real-time systems), the actual clock time is important. Under
these circumstances, external physical clocks are needed. For reasons of efficiency
and redundancy, multiple physical clocks are generally considered desirable, which
yields two problems:
Sometimes we simply need the exact time, not just an ordering so the solution is
UTC (Universal Coordinated Time). based on the number of transitions per second
of the cesium133 atom (pretty accurate), at present, the real time is taken as the
average of some 50 cesium-clocks around the world, introduces a leap second from
time to time to compensate that days are getting longer.
To provide UTC to people who need precise time, the National Institute of Standard
Time (NIST) operates a shortwave radio station with call letters WWV from Fort
Collins, Colorado. WWV broadcasts a short pulse at the start of each UTC second.
The accuracy of WWV itself is about ±1 msec, but due to random atmospheric
fluctuations that can affect the length of the signal path, in practice the accuracy is
no better than ±10 msec.
• every machine has a timer that generates an interrupt H times per second.
HARISH TIWARI 6
• There is a clock in machine p that ticks on each timer interrupt. Denote the
value of that clock by Cip(t), where t is UTC time.
• ideally, we have that for each machine p, Cp(t) = t, or, in other words, dC/dt =
1.
• Ideally: dC/dt = 1, in practice: 1 – p <= dC/dt <= 1 + p .
• in order to protect against difference bigger than ∂ time units → units)
synchronize at least every ∂/(2 p) seconds.
Figure 2 The relation between clock time and UTC when clocks tick at different rates
The constant p is specified by the manufacturer and is known as the maximum drift
rate. Note that the maximum drift rate specifies to what extent a clock's skew is
allowed to fluctuate. Slow, perfect, and fast clocks are shown in Fig-2.
• Cristian’s Algorithm
• The Berkeley Algorithm
HARISH TIWARI 7
Getting the current time from a time server. The basic principal of this algorithm is as
follows:
HARISH TIWARI 8
In Berkeley UNIX, exactly the opposite approach is taken. In Berkeley algorithm time
server (actually, a time daemon) is active, polling every machine from time to time to
ask what time it is there. Based on the answers, it computes an average time and
tells all the other machines to advance their clocks to the new time or slow their
clocks down until some specified reduction has been achieved.
Berkeley algorithm was developed to solve the problems of Cristian‟s algorithm. This
algorithm does not need external synchronization. Master slave approach is used
here.
• The master polls the slaves periodically about their clock readings.
• Estimate of local clock times is calculated using round trip.
• The average values are obtained from a group of processes.
• This method cancels out individual clock‟s tendencies to run fast and tells
slave processes by which amount of time to adjust local clock.
• In case of master failure, master election algorithm is used.
This method is suitable for a system in which no machine has a WWV receiver. The
time daemon's time must be set manually by the operator periodically. The method is
illustrated in Fig.3.
HARISH TIWARI 9
In Fig. 3 (a), at 3:00, the time daemon tells the other machines its time and asks for
theirs. In Fig. 3(b), they respond with how far ahead or behind the time daemon they
are. Armed with these numbers, the time daemon computes the average and tells
each machine how to adjust its clock [see Fig. 3(c)].
• The time daemon asks all the other machines for their clock values.
• The machines answer the request.
• The time daemon tells everyone how to adjust their clock.
Features of NTP:
The NTP service is provided by a network of servers located across the Internet.
Primary servers are connected directly to a time source such as a radio clock
receiving UTC. Secondary servers are synchronized, with primary servers. The
servers are connected in a logical hierarchy called a synchronization subnet. Arrows
denote synchronization control, numbers denote strata. The levels are called strata.
NTP uses a hierarchical, semi-layered system of time sources. Each level of this
hierarchy is termed a stratum and is assigned a number starting with zero for the
reference clock at the top. A server synchronized to a stratum n server runs at
HARISH TIWARI 10
stratum n + 1. The number represents the distance from the reference clock and is
used to prevent cyclical dependencies in the hierarchy.
In above figure-4 Yellow arrows indicate a direct connection; red arrows indicate a
network connection. A brief description of strata 0, 1, 2 and 3 is provided below.
• Strata 0
These are high-precision timekeeping devices such as atomic clocks, GNSS
(including GPS) or other radio clocks. They generate a very accurate pulse per
second signal that triggers an interrupt and timestamp on a connected computer.
Stratum 0 devices are also known as reference clocks. NTP servers cannot
advertise themselves as stratum 0.
• Strata 1
These are computers whose system time is synchronized to within a few
microseconds of their attached stratum 0 devices. Stratum 1 servers may peer
with other stratum 1 servers for sanity check and backup. They are also referred
to as primary time servers.
HARISH TIWARI 11
• Strata 2
These are computers that are synchronized over a network to stratum 1 servers.
Often a stratum 2 computer queries several stratum 1 servers. Stratum 2
computers may also peer with other stratum 2 computers to provide more stable
and robust time for all devices in the peer group.
• Strata 3
These are computers that are synchronized to stratum 2 servers. They employ
the same algorithms for peering and data sampling as stratum 2, and can
themselves act as servers for stratum 4 computers, and so on.
• in many systems the coordinator chosen by hand (e.g. file servers). This leads
to centralized solutions ) single point of failure.
• If a coordinator chosen dynamically, to what extent one can speak about a
centralized or distributed solution? Having a central coordinator does not
necessarily make an algorithm non-distributed.
In general, election algorithms attempt to locate the process with the highest process
number and designate it as coordinator. The algorithms differ in the way they do the
location.
we also assume that every process knows the process number of every other
process. What the processes do not know is which ones are. Currently up and which
ones are currently down. The goal of an election algorithm is to ensure that when an
HARISH TIWARI 12
election starts, it concludes with all processes agreeing on who the new coordinator
is to be. There are many algorithms and variations. Example election algorithms:
Procedure :
At any moment, a process can get an ELECTION message from one of its lower-
numbered colleagues. When such a message arrives, the receiver sends an OK
message back to the sender to indicate that he is alive and will take over. The
receiver then holds an election, unless it is already holding one. Eventually, all
processes give up but one, and that one is the new coordinator. It announces its
victory by sending all processes a message telling them that starting immediately it is
the new coordinator.
If a process that was previously down comes back up, it holds an election. If it
happens to be the highest-numbered process currently running, it will win the
HARISH TIWARI 13
election and take over the coordinator's job. Thus the biggest guy in town always
wins, hence the name "bully algorithm."
Figure 5 The bully election algorithm. (a) Process 4 holds an election. (b) Processes 5 and 6 respond. telling 4 to
stop. (c) Now 5 and 6 each hold an election. (d) Process 6 tells 5 to stop. (e) Process 6 wins and tells everyone.
Another election algorithm is based on the use of a ring. Unlike some ring
algorithms, this one does not use a token. All the processes arranged in a logical
ring. Process priority is obtained by organizing processes into a (logical) ring. We
assume that the processes are physically or logically ordered, so that each process
knows who its successor is.
Here in this algorithm also the process with the highest priority should be elected as
coordinator. When any process notices that the coordinator is not functioning so that
process can initiate the election.
HARISH TIWARI 14
The mutual exclusion makes sure that concurrent process access shared resources
or data in a serialized way. If a process, say Pi, is executing in its critical section,
HARISH TIWARI 15
3.6.1 Overview
Distributed mutual exclusion algorithms can be classified into two different
categories: Token based solution and Permission based Solution.
• when the token is lost (e.g., because the process holding it crashed), an
complex distributed procedure needs to be started to ensure that a new token
is created, but above all, that it is also the only token.
HARISH TIWARI 16
Now suppose that another process, 2 in Fig. 7(b), asks for permission to access the
resource. The coordinator knows that a different process is already at the resource,
so it cannot grant permission. The exact method used to deny permission is system
dependent. In Fig. 7(b), the coordinator just refrains from replying, thus blocking
process 2, which is waiting for a reply. Alternatively, it could send a reply saying
"permission denied." Either way, it queues the request from 2 for the time being and
waits for more messages.
When process 1 is finished with the resource, it sends a message to the coordinator
releasing its exclusive access, as shown in Fig.7(c). The coordinator takes the first
item off the queue of deferred requests and sends that process a grant message. If
the process was still blocked (i.e., this is the first message to it), it unblocks and
accesses the resource.
HARISH TIWARI 17
It is easy to see that the algorithm guarantees mutual exclusion: the coordinator only
lets one process at a time to the resource.
• It is also fair, since requests are granted in the order in which they are
received.
• No process ever waits forever (no starvation).
• The scheme is easy to implement, too, and requires only three messages per
use of resource (request, grant, release).
• Its simplicity makes an attractive solution for many practical situations.
Ricart and Agrawal’s algorithm requires that there be a total ordering of all events in
the system. That is, for any pair of events, such as messages, it must be
unambiguous which one happened first.
HARISH TIWARI 18
After sending out requests asking permission, a process sits back and waits until
everyone else has given permission. As soon as all the permissions are in, it may go
ahead. When it is finished, it sends OK messages to all processes on its queue and
deletes them all from the queue.
Figure 8 dsd
Let us try to understand why the algorithm works. If there is no conflict, it clearly
works. However, suppose that two processes try to simultaneously access the
resource, as shown in Fig. 8(a).
HARISH TIWARI 19
Process 0 sends everyone a request with timestamp 8, while at the same time,
process 2 sends everyone a request with timestamp 12. Process 1 is not interested
in the resource, so it sends OK to both senders. Processes 0 and 2 both see the
conflict and compare timestamps. Process 2 sees that it has lost, so it grants
permission to 0 by sending OK. Process 0 now queues the request from 2 for later
processing and access the resource, as shown in Fig. 8(b). When it is finished, it
removes the request from 2 from its queue and sends an OK message to process 2,
allowing the latter to go ahead, as shown in Fig. 8(c). The algorithm works because
in the case of a conflict, the lowest timestamp wins and everyone agrees on the
ordering of the timestamps.
The number of messages required per entry is now 2(n - 1), where the total number
of processes in the system is n. Best of all, no single point of failure exists. the single
point of failure has been replaced by n points of failure. If any process crashes, it will
fail to respond to requests.
Figure 9 (a) An unordered group of processes on a network (b) A logical ring constructed in software
Here we have a bus network, as shown in Fig. 9(a), (e.g., Ethernet), with no inherent
ordering of the processes. In software, a logical ring is constructed in which each
process is assigned a position in the ring, as shown in Fig. 6-16(b). The ring
positions may be allocated in numerical order of network addresses or some other
HARISH TIWARI 20
means. It does not matter what the ordering is. All that matters is that each process
knows who is next in line after itself.
• When the ring is initialized, process 0 is given a token. The token circulates
around the ring. It is passed from process k to process k +1 (modulo the ring
size) in point-to-point messages.
• When a process acquires the token from its neighbor, it checks to see if it
needs to access the shared resource. If so, the process goes ahead, does all
the work it needs to, and releases the resources. After it has finished, it
passes the token along the ring. It is not permitted to immediately enter the
resource again using the same token.
• If a process is handed the token by its neighbor and is not interested in the
resource, it just passes the token along. As a consequence, when no
processes need the resource, the token just circulates at high speed around
the ring.
• when the token is lost (e.g., because the process holding it crashed), an
complex distributed procedure needs to be started to ensure that a new token
is created, but above all, that it is also the only token.
HARISH TIWARI 21