Coordination Algorithms
Coordination Algorithms
Engineering (RA)
COURSE NAME: PARALLEL &DISTRIBUTED COMPUTING
COURSE CODE: 22CS4106 R
A distributed algorithm
Mutual exclusion: Ricart & Agrawala
Example with three processes
(a) Two processes want to access a shared resource at the same moment.
(b) P0 has the lowest timestamp, so it wins.
(c) When process P0 is done, it sends an OK also, so P2 can now go ahead.
A distributed algorithm
MUTUAL EXCLUSION: TOKEN RING ALGORITHM
Essence
Organize processes in a logical ring, and let a token be passed between
them. The one that holds the token is allowed to enter the critical region (if it
wants to).
A token-ring algorithm
DECENTRALIZED MUTUAL EXCLUSION
Principle
Assume every resource is replicated N times, with each replica having its own
coordinator ⇒ access requires a majority vote from m > N/2 coordinators.
A coordinator always responds immediately to a request.
Assumption
When a coordinator crashes, it will recover quickly, but will have forgotten
about permissions it had granted.
A decentralized algorithm
DECENTRALIZED MUTUAL EXCLUSION
A decentralized algorithm
Decentralized mutual exclusion
Violation probabilities for various parameter
values
N m p Violation N m p Violation
8 5 3 sec/hour < 10−5 8 5 30 sec/hour < 10−3
8 6 3 sec/hour 8 6 30 sec/hour < 10−7
16 9 3 sec/hour < 10−4 16 9 30 sec/hour < 10−2
16 12 3 sec/hour 16 12 30 sec/hour
32 17 3 sec/hour < 10−4 32 17 30 sec/hour < 10−2
32 24 3 sec/hour 32 24 30 sec/hour
So....
What can we conclude?
A decentralized algorithm
Mutual exclusion: comparison
A decentralized algorithm
EXAMPLE: ZOOKEEPER
Note
ZooKeeper allows a client to be notified when a node, or a branch in the tree, changes. This may easily
lead to race conditions.
Solution
Use version numbers
Example: Simple locking with ZooKeeper
Coordination
ZooKeeper versioning
Notations
• W (n, k )a: request to write a to node n, assuming current version is k
.
• R(n, k ): current version of node n is k .
• R(n): client wants to know the current value of node n
• R(n, k )a: value a from node n is returned with its current version k .
Principle
An algorithm requires that some process acts as a coordinator. The question
is how to select this special process dynamically.
Note
In many systems, the coordinator is chosen manually (e.g., file servers).
This leads to centralized solutions ⇒ single point of failure.
Teasers
1. If a coordinator is chosen dynamically, to what extent can we speak
about a centralized or distributed solution?
2. Is a fully distributed solution, i.e. one without a coordinator, always more
robust than any centralized/coordinated solution?
Basic assumptions
Principle
Consider N processes {P0 , . . . , PN−1} and let id (Pk ) = k . When a
process Pk notices that the coordinator is no longer responding to requests, it
initiates an election:
1. Pk sends an ELECTION message to all processes with higher
identifiers:
Pk + 1 , Pk + 2 , . . . , PN−1.
Principle
Process priority is obtained by organizing processes into a (logical) ring. The process with
the highest priority should be elected as coordinator.
• Any process can start an election by sending an election message to its successor. If a
successor is down, the message is passed on to the next successor.
• If a message is passed on, the sender adds itself to the list. When it gets back to the
initiator, everyone had a chance to make its presence known.
• The initiator sends a coordinator message around the ring containing a list of all living
processes. The one with the highest priority is elected as coordinator.
A ring algorithm
Coordination
Election in a ring
Election algorithm using a ring
A ring algorithm
EXAMPLE: LEADER ELECTION IN ZOOKEEPER SERVER GRO
Basics
• Each server s in the server group has an identifier id(s)
• Each server has a monotonically increasing counter tx(s) of the latest transaction it handled (i.e.,
series of operations on the namespace).
• When follower s suspects leader crashed, it broadcasts an ELECTION
message, along with the pair (voteID,voteTX ). Initially,
• voteID ← id(s)
• voteTX ← tx(s)
• Each server s maintains two variables:
• leader(s): records the server that s believes may be final leader. Initially, leader(s) ← id(s).
• lastTX(s): what s knows to be the most recent transaction. Initially, lastTX(s) ← tx(s).
Note
When s∗ believes it should be the leader, it broadcasts ⟨id(s∗), tx(s∗)⟩. Essentially, we’re bullying.