0% found this document useful (0 votes)
12 views45 pages

3 Synchronization

The document discusses synchronization in distributed systems, highlighting the challenges of mutual exclusion and event ordering without shared memory. It covers clock synchronization, logical clocks, and algorithms like Lamport's for establishing event order, as well as methods for determining global state and electing coordinators. The document also addresses mutual exclusion protocols, including centralized and distributed algorithms, emphasizing their advantages and shortcomings.

Uploaded by

Anant Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views45 pages

3 Synchronization

The document discusses synchronization in distributed systems, highlighting the challenges of mutual exclusion and event ordering without shared memory. It covers clock synchronization, logical clocks, and algorithms like Lamport's for establishing event order, as well as methods for determining global state and electing coordinators. The document also addresses mutual exclusion protocols, including centralized and distributed algorithms, emphasizing their advantages and shortcomings.

Uploaded by

Anant Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Module 3

Synchronization

CS755 6-1
Synchronization Problem
• How processes cooperate and synchronize with one another in a
distributed system
➡ In single CPU systems, critical regions, mutual exclusion, and other
synchronization problems are solved using methods such as semaphores.
➡ These methods will not work in distributed systems because they
implicitly rely on the existence of shared memory.
➡ Examples:
◆ Two processes interacting using a semaphore must both be able to access the
semaphore. In a centralized system, the semaphore is stored in the kernel and
accessed by the processes using system calls
◆ If two events occur in a distributed system, it is difficult to determine which
event occurred first.

• How to decide on relative ordering of events


CS755 ➡ Does one event precede another event? 6-2
Issues
• Part 1 - Clocks
➡ How to synchronize events based on actual time?
◆ Clock synchronization
➡ How to determine relative ordering?
◆ Logical clocks

• Part 2 - Global State and Election


➡ What is the “global state” of a distributed system?
➡ How do we determine the “coordinator” of a distributed
system?
• Part 3 - How do we synchronize for sharing
CS755 6-3
Clock synchronization
• In a centralized system:
➡ Time is unambiguous: A process gets the time by issuing a system call to the
kernel. If process A gets the time and latter process B gets the time. The value
B gets is higher than (or possibly equal to) the value A got
➡ Example: UNIX make examines the times at which all the source and
object files were last modified:
◆ If time (input.c) > time(input.o) then recompile input.c
◆ If time (input.c) < time(input.o) then no compilation is needed

CS755 6-4
Clock synchronization (2)
• In a distributed system:
➡ Achieving agreement on time is not
trivial!
Computer on which 2144 2145 2146 2147 Time
compiler runs according to
input.o created
local clock
2143 2144
Computer on which 2142 2145 Time
editor runs according to
input.c created local clock

■ Is it possible to synchronize all the clocks in a distributed system?

CS755 6-5
Logical vs Physical Clocks
• Clock synchronization need not be absolute! (due to Lamport, 1978):
➡ If two processes do not interact, their clocks need not be synchronized.
➡ What matters is not that all processes agree on exactly what time is it, but
rather, that they agree on the order in which events occur.
➡ We will discuss this later under “logical clock synchronization”.

• For algorithms where only internal consistency of clocks matters (not


whether clocks are close to real time), we speak of logical clocks.
• For algorithms where clocks must not only be the same, but also must
not deviate from real-time, we speak of physical clocks.

CS755 6-6
Logical Clock Synchronization
• Happens-before relation
➡ If a and b are events in the same process, and a happens-before b, then
a→b is true
➡ If a is the event of a message being sent by one process, and b is the event
of the same message being received by another process, then a→b is also
true
➡ If two events, a and b, happen in different processes that do not exchange
messages , then a→b is not true, but neither is b→a. These events are said
to be concurrent
p
(a⎪⎪b).
1
m1a→b and b→c ⇒ a→c
➡ happens-before isa transitive:
b

p2 Physical
c d time
m2

p3
e f
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000
6-7
Lamportʼs Algorithm
• Capturing happens-before relation
• Each process pi has a local monotonically increasing counter, called its
logical clock Li
• Each event e that occurs at process pi is assigned a Lamport timestamp Li(e)
• Rules:
➡ Li is incremented before event e is issued at pi: Li := Li + 1
receive(m) 1 2

➡ When pi sends
p1 message m, it adds t = L : (m, t) [this is event send(m)]
a b m i 1

➡ On receiving (m, t), pj computes 3Lj := max(L


4
j
, t); Lj := Lj + Physical
1; timestamp event
p2
c d time
m2

1 5
p3
e f
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000
6-8
0 1
Example 1 2
2 0
0 0 0 0 0 0
6 A 8 10 6 A 8 10
12 16 20 12 16 20
18 24 B 30 18 24 B 30
24 32 40 24 32 40
30 40 50 30 40 50
36 48 C 60 36 48 C 60
42 56 70 42 61 70
48 D 64 80 48 D 69 80
54 72 90 70 77 90
60 80 10 0 76 85 10 0
a) Three processes, each with its own clock. The clocks run at different rates.

b) Lamport's algorithm corrects the clocks.

• Lamport solution:
➡Between every two events, the clock must tick at least once
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755 © Prentice-Hall, Inc. 2007 6-9
Lamport Timestamps

a) The updates have to be ordered the same way across the replicas.

b) This can be achieved by a totally ordered multicast algorithm.

c)
CS755
Totally ordered multicasting can be achieved by means of Lamport clocks.
nd
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2 edition
© Prentice-Hall, Inc. 2007 6-10
Part 2
Global State & Election Algorithms

CS755 6-1
Global State
• Sometimes it is useful to know the global state of a distributed system
➡ Garbage
p1 p2
collection
Object message
referenc garbage object
➡ e
Deadlocks p1 wait-for p2

wait-for

Termination p1 p2
activate
passive passive

• Global state = local state of each process + msgʼs in transit


➡ Local state depends on the process

From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000 6-12
Distributed Snapshot
• Represents a state in which the distributed system might have been.
➡ Consistent global state: If A sends a message to B and and the system
records B receiving
0 1
the
2
message, it should also record A sending it.
3
e1 e
1
e
1
e
1

p1
m1 m2
Physical
p2
0 1 2 time
e2 e 2 e2

Inconsistent cut
Consistent cut

• Cut of the systemʼs execution: subset of its global history identifying


set of events e.
➡ Cut C is consistent iff ∀e ∈ C, f → e ⇒ f ∈ C
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000 6-13
Election Problem
• Many algorithms require one process to act as a coordinator
◆ Example: Coordinator in the centralized mutual exclusion algorithm

• In general, it does not matter which process takes on this special


responsibility, but one has to do it.
• How to elect a coordinator?

CS755 6-14
General Approach
• Assumptions:
➡ Each process has a unique number (e.g., its network address) to distinguish
them and hence to select one
➡ One process per machine: For simplicity
➡ Every process knows the process number of every other process

➡ Processes do not know which processes are currently up and which ones are
currently down

• Approach:
➡ Locate the process with the highest process number and designate it as
coordinator.
➡ Election algorithms differ in the way they do the location
CS755 6-15
The Bully Algorithm
• When a process, P, notices that the coordinator is no longer responding
to requests, it initiates an election:
➡ P sends an ELECTION message to all processes with higher numbers
➡ If no one responds, P wins the election and becomes coordinator
➡ If one of the higher-ups answers, it takes over. Pʼs job is done

• When a process gets an ELECTION message from one of its lower-


numbered colleagues:
➡ Receiver sends an OK message back to the sender to indicate that he is
alive and will take over
➡ Receiver holds an election, unless it is already holding one
➡ Eventually, all processes give up but one, and that one is the new
coordinator.
➡ New coordinator announces its victory by sending all processes a
message telling them that starting immediately it is the new coordinator
• If a process that was previously down comes back:
CS755 ➡ It holds an election. If it happens to be the highest process currently 6-16
Example
2 1 2 1 2 1
Election OK
4 5 4 5 4 5
OK
Elec ion

Election
t

0 6 0 6 0 6

7 3 7 3 7 3
Previous
coordinator
has crashed
2 1 2 1

4 5 4 5
OK Coordinator
0 6 0 6

7 3 7 3
Coordinator

From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755 © Prentice-Hall, Inc. 2002 6-17
A Ring Algorithm
• Use a ring (processes are physically or logically ordered, so that each
process knows who its successor is). No token is used.
• Algorithm:
➡ When any process notices that coordinator is not functioning:
◆ Builds an ELECTION message (containing its own process number)
◆ Sends the message to its successor (if successor is down, sender skips over it and
goes to next member along the ring, or the one after that, until a running process
is located)
◆ At each step, sender adds its own process number to the list in the message
➡ When the message gets back to the process that started it all:
◆ Process recognizes the message containing its own process number
◆ Changes message type to COORDINATOR
◆ Circulates message once again to inform everyone else: Who the new
CS755
coordinator is (list member with highest number); Who the members of the new
6-18
Example
5 Election message
6 1 2 2
0

0 3
2
5 3
6
Previous 7 4
coordinator
has crashed No response
6 5
5
Election
message

From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755 © Prentice-Hall, Inc. 2007 6-19
Part 3
Sharing in Distributed
Systems

CS755 6-20
Mutual Exclusion Problem
• Systems involving multiple processes are often programmed using critical
regions.
• When a process has to read or update certain shared data structure, it first
enters a critical region to achieve mutual exclusion, i.e., ensure that no
other process will use the shared data structure at the same time
• In single-processor systems, critical regions are protected using
semaphores, and monitors or similar constructs. How to accomplish the
same in distributed systems?
➡ We assume, for now, that transactions are not supported by the system

CS755 6-21
Protocols & Requirements
• Application-level protocol
enter() //enter critical section - block if necessary
resourceAccesses() // access shared resources in critical section
exit() // leave critical section - other processes may enter

• Requirements
1. At most one process may execute in the critical section (safety)
2. Requests to enter and exit the critical section eventually succeed
(liveness)
3. If one request to enter the critical section happened-before another,
then entry to the critical section is granted in that order (→ order)

CS755 6-22
A Centralized Algorithm
(Central Server Algorithm)

One process is elected as the coordinator:


a) Process 1 asks the coordinator for permission to enter a critical region.
Permission is granted
b)
Process 2 then asks permission to enter the same critical region. The coordinator
does not reply.
c)
When process 1 exits the critical region, it tells the coordinator, when then
replies to 2
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755 © Prentice-Hall, Inc. 2002 6-23
A Centralized Algorithm (2)
• Advantages
➡ Obviously it guarantees mutual exclusion
➡ It is fair (requests are granted in the order in which they are received)

➡ No starvation (no process ever waits forever)


➡ Easy to implement (only 3 messages: request, grant and release)

• Shortcomings
➡ Coordinator: A single point of failure; A performance bottleneck

➡ If processes normally block after making a request, they cannot distinguish a


dead coordinator from “permission denied”

CS755 6-24
A Distributed Algorithm
• Based on a total ordering of events in a system (happens-before
relation)
• Algorithm:
➡ When a process wants to enter a critical region:
◆ Builds a message: {name of critical region; process number; current time}
◆ Sends the message to all other processes (assuming reliable transfer)
➡ When a process receives a request message from another process:
◆ If the receiver is not in the critical region and does not want to enter it, it
sends back an OK message to the sender
◆ If the receiver is already in the critical region, it does not reply. Instead it
queues the request
◆ If the receiver wants to enter the critical region but has not yet done so, it
compares the timestamp with the one contained in the message it has sent
everyone: If the incoming message is lower, the receiver sends back an OK
CS755 6-25
Example

a) Two processes want to enter the same critical region at the same
moment.
b) Process 0 has the lowest timestamp, so it wins.
c) When process 0 is done, it sends an OK also, so 2 can now enter
the critical region.
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755 © Prentice-Hall, Inc. 2007 6-26
A Token Ring Algorithm
Processes
0 2 4 9 7 1 6 5 8 3

Network 2
1 3

0 4

9 5

Token holder may enter 6


critical region or pass the 8 7
8
token
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755 © Prentice-Hall, Inc. 2007 6-27
A Token Ring Algorithm (2)
• Pros:
➡ Guarantees mutual exclusion (only 1 process has the token at any instant)
➡ No starvation (token circulates among processes in a well-defined order)

• Cons:
➡ Regeneration of the token if it is ever lost: Detecting that the token is lost is
difficult, since the amount of time between successive appearances of the
token on the network is unbounded
➡ Recovery if a process crashes: It is easier than in other cases though. If we
require a process receiving the token to acknowledge receipt. A dead process
is detected when its neighbor tries to give the token and fails. Dead process is
then removed and the token handed to the next process down the line.

CS755 6-28
Comparison of the 3
Algorithms
Messages per Delay before entry
Algorithm Problems
entry/exit (in message times)

Centralized 3 2 Coordinator crash

Crash of any
Distributed 2(n–1) 2(n–1)
process

Lost token,
Token ring 1 to ∞ 0 to n – 1
process crash

• Centralized Algorithm: Simplest and most efficient. A request and a grant to


enter, and a release to exit. Only 2 message times to enter.

• Decentralized Algorithm: Requires n-1 request messages, one to each of the


other processes, and an additional n-1 grant messages. Entry takes 2(n-1)
message times assuming that messages are passed sequentially over a LAN

• Token Ring Algorithm: If every process constantly wants to enter a


critical region (each token pass will result in one entry). The token may
sometimes circulate for hours without anyone being interested in it
CS755 (number of messages per entry is unbounded). Entry delay varies from 0 6-29
Transaction
A transaction is a collection of actions that make consistent
transformations of system states while preserving system consistency.
➡ concurrency transparency
➡ failure transparency

System may be
System in a temporarily in an System in a
consistent inconsistent state consistent
state during execution state

Begin Execution of End


Transaction Transaction Transaction

CS755 6-30
Transaction Primitives
• Special primitives required for programming using transactions
➡ Supplied by the operating system or by the language runtime system

• Examples of transaction primitives:


➡ BEGIN_TRANSACTION: Mark the start of a transaction

➡ END_TRANSACTION (EOT): Terminate the transaction and try to commit


(there may or may not be a separate COMMIT command)
➡ ABORT_TRANSACTION: Kill the transaction and restore the old values
➡ READ: Read data from a file (or other object)
➡ WRITE: Write data to a file (or other object)

CS755 6-31
Centralized Transaction
Execution
User User

Application Application

Begin_Transaction, Results &


Read, Write, Abort, User Notifications
Transactio
EOT
n Manager
(TM)

Read,
Results
Write,
Abort, EOT Scheduler
(SC)
Scheduled
Results
Operations
Recovery
Manager
(RM)

CS755 6-32
Distributed Transaction
Execution
User application

Begin_transaction, Results &


User notifications Distributed
Read, Write, EOT, Transaction
Abort
Execution Model
TM TM
Replica Control
Read, Protocol
Write,
EOT, Abort Distributed
SC SC Concurrency
Control Protocol

Local
RM RM Recovery
Protocol

CS755 6-33
Properties of Transactions
ATOMICITY
➡ All or nothing
➡ Multiple operations combined as an atomic
transaction

CONSISTENCY
➡ No violation of integrity constraints
➡ Transactions are correct programs

ISOLATION 🡺 Our focus in this module

➡ Concurrent changes invisible 🡺 serializable

DURABILITY
CS755 6-34
Transactions Provide…
• Atomic and reliable execution in the presence of failures

• Correct execution in the presence of multiple user accesses


• Correct management of replicas (if they support it)

CS755 6-35
Lost Update Problem
Initial values: A=100, B=200, C=300
Transaction T : Transaction U:
balance = b.getBalance(); balance = b.getBalance();
b.setBalance(balance*1.1); b.setBalance(balance*1.1);
a.withdraw(balance/10); c.withdraw(balance/10);

balance = b.getBalance();
$200
balance = b.getBalance();
$200

b.setBalance(balance*1.1);
$220
b.setBalance(balance*1.1);
$220
a.withdraw(balance/10);
$80 $280
c.withdraw(balance/10);
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000 6-36
Inconsistent Retrievals Problem
Initial values: A=200, B=200

Transaction V: Transaction W:
a.withdraw(100);
aBranch.branchTotal();
b.deposit(100);

a.withdraw(100);
$100
total = a.getBalance(); total =
$100
total+b.getBalance(); total =
$300
total+c.getBalance();
b.deposit(100);
$300
From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000 6-37
Conflict Rules

Operations of Conflic Reason


different t
transactions No Because the effect of a pair of read operations
does not depend on the order in which they are
read read
executed
read write Yes Because the effect of a read and a write operation
depends on the order of their execution
write write Yes Because the effect of a pair of write operations
depends on the order of their execution

From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000 6-38
Resolving Lost Update Example
Transaction T: Transaction U:
balance = b.getBalance(); balance = b.getBalance();
b.setBalance(balance*1.1); b.setBalance(balance*1.1);
a.withdraw(balance/10); c.withdraw(balance/10);

balance = b.getBalance(); $200


b.setBalance(balance*1.1); $220
balance = b.getBalance(); $220
b.setBalance(balance*1.1); $242
a.withdraw(balance/10); $80
c.withdraw(balance/10); $278

From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000 6-39
Resolving Inconsistent Retrieval
Problem
Transaction V: Transaction W:
a.withdraw(100);
aBranch.branchTotal();
b.deposit(100);

a.withdraw(100); $100
b.deposit(100); $300
$100
total = a.getBalance();
$400
total = total+b.getBalance();
total = total+c.getBalance();
...

From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755 © Addison-Wesley Publishers 2000 6-40
Distributed Transaction
Serializability
• Somewhat more involved. Two execution histories have to be considered:
➡ local histories
➡ global history

• For global transactions (i.e., global history) to be correct, two


conditions are necessary:
➡ Each local history should be correct (serializable).

➡ Two conflicting operations should be in the same relative order in all of the
local histories where they appear together.

CS755 6-41
Locking-Based Algorithms
• Transactions indicate their intentions by requesting locks from the
scheduler (called lock manager).
• Locks are either read lock (rl) [also called shared lock] or write lock (wl)
[also called exclusive lock]
• Read locks and write locks conflict (because Read and Write operations are
incompatible)
rl wl
rl yes no
wl no no

• Locking works nicely to allow concurrent processing of transactions.

CS755 6-42
Two-Phase Locking (2PL)
1. A transaction locks an object before using it.
2. When an object is locked by another transaction, the requesting
transaction must wait.
3. When a transaction releases a lock, it may not request another
lock.

Lock point
Obtain lock
No. of locks

Release lock

Phase 1 Phase 2
BEGIN END
CS755 6-43
Strict 2PL
Hold locks until the end.

No. of locks

Obtain lock

Release lock

Transactio
BEGIN END
n duration
period of
use

CS755 6-44
Locking Example
Transaction T: Transaction U:
balance = b.getBalance(); balance = b.getBalance();
b.setBalance(bal*1.1); b.setBalance(bal*1.1);
a.withdraw(bal/10); c.withdraw(bal/10);
Operations Locks Operations Locks
openTransaction
bal = b.getBalance(); write lock B
b.setBalance(bal*1.1); openTransaction
a.withdraw(bal/10); write lock A bal = b.getBalance();waits for T’s
lock on B
closeTransaction unlock A, B

write lock B
b.setBalance(bal*1.1);
c.withdraw(bal/10); write lock C

closeTransaction
unlock B, C
CS755 6-45

You might also like