0% found this document useful (0 votes)
59 views18 pages

Coordination and Agreement: Check Point Threat Extraction Secured This Document

This document discusses algorithms for coordination and agreement in distributed systems. It covers topics like mutual exclusion, elections, and consensus. For mutual exclusion, it describes a central server algorithm where a server grants exclusive access to a critical section, as well as ring-based and multicast-based algorithms. It also discusses failure assumptions, failure detectors, and algorithms for electing a coordinator, including a ring-based algorithm and the bully algorithm.

Uploaded by

brian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views18 pages

Coordination and Agreement: Check Point Threat Extraction Secured This Document

This document discusses algorithms for coordination and agreement in distributed systems. It covers topics like mutual exclusion, elections, and consensus. For mutual exclusion, it describes a central server algorithm where a server grants exclusive access to a critical section, as well as ring-based and multicast-based algorithms. It also discusses failure assumptions, failure detectors, and algorithms for electing a coordinator, including a ring-based algorithm and the bully algorithm.

Uploaded by

brian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Check Point Threat Extraction secured this document Get Original

Coordination And Agreement


Introduction

 Fundamental issue: for a set of processes, how to coordinate their actions


or to agree on one or more values?
 even no fixed master-slave relationship between the components
 Further issue: how to consider and deal with failures when designing
algorithms
 Topics covered
 mutual exclusion
 how to elect one of a collection of processes to perform a special role
 multicast communication
 agreement problem: consensus and byzantine agreement

Failure Assumptions and Failure Detectors

 Failure assumptions of this chapter


 Reliable communication channels
 Processes only fail by crashing unless state otherwise
 Failure detector: object/code in a process that detects failures of other
processes
 unreliable failure detector
 One of two values: unsuspected or suspected
 Evidence of possible failures
 Example: most practical systems
 Each process sends ―alive/I‘m here‖ message to everyone else
 If not receiving ―alive‖ message after timeout, it‘s suspected
 maybe function correctly, but network partitioned
 reliable failure detector
 One of two accurate values: unsuspected or failure – few practical systems
12.2 Distributed Mutual Exclusion
 Process coordination in a multitasking OS
 Race condition: several processes access and manipulate the same data
concurrently and the outcome of the execution depends on the particular
order in which the access take place
 critical section: when one process is executing in a critical section, no other
process is to be allowed to execute in its critical section
 Mutual exclusion: If a process is executing in its critical section, then no
other processes can be executing in their critical sections
 Distributed mutual exclusion
 Provide critical region in a distributed environment
 message passing
 for example, locking files, locked daemon in UNIX (NFS is stateless, no
file-locking at the NFS level)
Algorithms for mutual exclusion
 Problem: an asynchronous system of N processes
 processes don't fail
 message delivery is reliable; not share variables
 only one critical region
 application-level protocol: enter(), resourceAccesses(), exit()
 Requirements for mutual exclusion
 Essential
 [ME1] safety: only one process at a time
 [ME2] liveness: eventually enter or exit
 Additional
 [ME3] happened-before ordering: ordering of enter() is the same as HB
ordering
 Performance evaluation
 overhead and bandwidth consumption: # of messages sent
 client delay incurred by a process at entry and exit
 throughput measured by synchronization delay: delay between one's exit
and next's entry
A central server algorithm

 server keeps track of a token---permission to enter critical region


 a process requests the server for the token
 the server grants the token if it has the token
 a process can enter if it gets the token, otherwise waits
 when done, a process sends release and exits

A central server algorithm: discussion


 Properties  safety, why?
 liveness, why?
 HB ordering not guaranteed, why?
 Performance
 enter overhead: two messages (request and grant)
 enter delay: time between request and grant
 exit overhead: one message (release)
 exit delay: none
 synchronization delay: between release and grant
 centralized server is the bottleneck

A ring-based algorithm
 Arrange processes in a logical ring to rotate a token
 Wait for the token if it requires to enter the critical section
 The ring could be unrelated to the physical configuration
 pi sends messages to p(i+1) mod N
 when a process requires to enter the critical section, waits for the token
 when a process holds the token
 If it requires to enter the critical section, it can enter
 when a process releases a token (exit), it sends to its neighbor
 If it doesn‘t, just immediately forwards the token to its neighbor

An algorithm using multicast and logical clocks


 Multicast a request message for the token (Ricart and Agrawala [1981])
 enter only if all the other processes reply
 totally-ordered timestamps: <T, pi >
 Each process keeps a state: RELEASED, HELD, WANTED
 if all have state = RELEASED, all reply, a process can hold the token and
enter
 if a process has state = HELD, doesn't reply until it exits
 if more than one process has state = WANTED, process with the lowest
timestamp will get all
 N-1 replies first
An algorithm using multicast: discussion
 •Properties  safety, why?
 liveness, why?
 HB ordering, why?
 Performance
 bandwidth consumption: no token keeps circulating
 entry overhead: 2(N-1), why? [with multicast support: 1 + (N -1) = N]
 entry delay: delay between request and getting all replies
 exit overhead: 0 to N-1 messages
 exit delay: none
 synchronization delay: delay for 1 message (one last reply from the
previous holder) Maekawa‘s voting algorithm
 •Observation: not all peers to grant it access
 Only obtain permission from subsets, overlapped by any two processes
 •Maekawa‘s approach
 subsets Vi,Vj for process Pi, Pj
 Pi Vi, Pj Vj
 Vi ∩ Vj ≠ , there is at least one common member
 subset |Vi|=K, to be fair, each process should have the same size
 Pi cannot enter the critical section until it has received all K reply messages
 Choose a subset
 Simple way (2√N): place processes in a √N by √N matrix and let Vi be the
union of the row and column containing Pi
 If P1, P2 and P3 concurrently request entry to the critical section, then its
possible that each process has received one (itself) out of two replies, and
none can proceed  adapted and solved by [Saunders 1987]

Elections

Election: choosing a unique process for a particular role


 All the processes agree on the unique choice
 For example, server in dist. Mutex assumptions
 Each process can call only one election at a time multiple concurrent
elections can be called by different processes
 Participant: engages in an election each process pi has variable electedi = ?
(don't know) initially process with the largest identifier wins.
 The (unique) identifier could be any useful value Properties
 [E1] electedi of a ―participant‖ process must be P (elected
process=largestid) or ⊥ (undefined)
 [E2] liveness: all processes participate and eventually set electedi
!= ⊥ (or crash) Performance
 overhead (bandwidth consumption): # of messages
 turnaround time: # of messages to complete an election

A ring-based election algorithm


 Arrange processes in a logical ring o pi sends messages to p(i+1) mod N
o It could be unrelated to the physical configuration
o Elect the coordinator with the largest id o
Assume no failures
 Initially, every process is a non-participant. Any process can call an
election o Marks itself as participant o Places its id in an election message
o Sends the message to its neighbor o Receiving an election message
 if id > myid, forward the msg, mark participant
 if id < myid o non-participant: replace id with myid: forward the msg, mark
participant o participant: stop forwarding (why? Later, multiple elections)
 if id = myid, coordinator found, mark non-participant, electedi := id, send
elected o message with myid
o Receiving an elected message
 id != myid, mark non-participant, electedi := id forward the msg  if id =
myid, stop forwarding

Figure 12.7 A ring-based election in progress


 Receiving an election message:
 if id > myid, forward the msg, mark participant
 if id < myid
 non-participant: replace id with myid: forward the msg, mark participant
 participant: stop forwarding (why? Later, multiple elections)
 if id = myid, coordinator found, mark non-participant, electedi := id, send
elected message with
 myid
 Receiving an elected message: – id != myid, mark non-participant,
 electedi := id forward the msg
 if id = myid, stop forwarding

A ring-based election algorithm: discussion

 •Properties
 safety: only the process with the largest id can send an elected message
 liveness: every process in the ring eventually participates in the election;
extra elections are stopped  Performance
 one election, best case, when?
 N election messages
 N elected messages
 turnaround: 2N messages
 one election, worst case, when?
 2N - 1 election messages
 N elected messages
 turnaround: 3N - 1 messages
 can't tolerate failures, not very practical
The bully election algorithm
•Assumption
– Each process knows which processes have higher identifiers, and that it can
communicate with all such processes
•Compare with ring-based election
– Processes can crash and be detected by timeouts
• synchronous
• timeout T = 2Ttransmitting (max transmission delay) + Tprocessing (max
processing delay)
•Three types of messages
– Election: announce an election
– Answer: in response to Election
– Coordinator: announce the identity of the elected process

The bully election algorithm: howto


• Start an election when detect the coordinator has failed or begin to replace the
coordinator, which has lower identifier
– Send an election message to all processes with higher id's and waits for answers
(except the failed coordinator/process)
• If no answers in time T
– Considers it is the coordinator
– sends coordinator message (with its id) to all processes with lower id's • else
– waits for a coordinator message and starts an election if T‘ timeout
– To be a coordinator, it has to start an election
• A higher id process can replace the current coordinator (hence ―bully‖)
– The highest one directly sends a coordinator message to all process with lower
identifiers
• Receiving an election message
– sends an answer message back
– starts an election if it hasn't started one—send election messages to all higher-
id processes
(including the ―failed‖ coordinator—the coordinator might be up by now)
• Receiving a coordinator message
– set electedi to the new coordinator

The bully election


algorithm:
discussion 
Properties  safety:
 a lower-id process always yields to a higher-id process
 However, it‘s guaranteed
 if processes that have crashed are replaced by processes with the same
identifier since message delivery order might not be guaranteed and
 failure detection might be unreliable
 liveness: all processes participate and know the coordinator at the end
 Performance
 best case: when?
 overhead: N-2 coordinator messages
 turnaround delay: no election/answer messages

Multicast Communication
 Group (multicast) communication: for each of a group of processes to
receive copies of the messages sent to the group, often with delivery
guarantees
 The set of messages that every process of the group should receive
 On the delivery ordering across the group members
 Challenges
 Efficiency concerns include minimizing overhead activities and increasing
throughput and bandwidth utilization
 Delivery guarantees ensure that operations are completed
 Types of group
 Static or dynamic: whether joining or leaving is considered Closed or open
 A group is said to be closed if only members of the group can multicast to
it. Reliable Multicast
 Simple basic multicasting (B-multicast) is sending a message to every
process that is a member of a defined group
 B-multicast (g, m) for each process p ∈ group g, send (p, message m)
 On receive (m) at p: B-deliver (m) at p
 Reliable multicasting (R-multicast) requires these properties
 Integrity: a correct process sends a message to only a member of the group
 Validity: if a correct process sends a message, it will eventually bedelivered
 Agreement: if a message is delivered to a correct process, all other correct
processes in the group will deliver it
Types of message ordering
Three types of message ordering
– FIFO (First-in, first-out) ordering: if a correct process delivers a message
before another, every correct process will deliver the first message before the
other
– Casual ordering: any correct process that delivers the second message will
deliver the previous message first
– Total ordering: if a correct process delivers a message before another, any
other correct process that delivers the second message will deliver the first
message first
•Note that
– FIFO ordering and casual ordering are only partial orders
– Not all messages are sent by the same sending process
– Some multicasts are concurrent, not able to be ordered by happened before –
Total order demands consistency, but not a particular order

Figure 12.12 Total, FIFO and causal ordering of multicast messages


Notice
 the consistent ordering of totally ordered messages T1 and T2,
 the FIFO-related messages F1 and F2 and
 the causally related messages C1 and C3 and
 the otherwise arbitrary delivery ordering of messages

Note that T1 and T2 are delivered in opposite order to the physical time of
message creation
Bulletin board example (FIFO ordering)
• A bulletin board such as Web Board at NJIT illustrates the desirability of
consistency and FIFO ordering. A user can best refer to preceding messages if
they are delivered in order. Message 25 in Figure 12.13 refers to message 24,
and message 27 refers to message 23.

• Note the further advantage that Web Board allows by permitting messages to
begin threads by replying to a particular message. Thus messages do not have
to be displayed in the same order they are delivered
Implementing total ordering
• The normal approach to total ordering is to assign totally ordered identifiers to
multicast messages, using the identifiers to make ordering decisions.
• One possible implementation is to use a sequencer process to assign identifiers.
See Figure
12.14. A drawback of this is that the sequencer can become a bottleneck.
• An alternative is to have the processes collectively agree on identifiers. A
simple algorithm is shown in Figure 12.15.

Figure 12.15 The ISIS algorithm for total ordering


Each process q in group g keeps
• Aq g: the largest agreed sequence number it has observed so far for the group g
• Pq g: its own largest proposed sequence number

Algorithm for process p to multicast a message m to group g


1. B-multicasts <m, i> to g, where i is a unique identifier for m
2. Each process q replies to the sender p with a proposal for the message‘s agreed
sequence number of Pq g :=Max(Aq g, Pq g)+1
3. Collects all the proposed sequence numbers and selects the largest one a as the
next agreed sequence number. It then B-multicasts <i, a> to g.
4. Each process q in g sets Aq g := Max(Aq g, a) and attaches a to the message
identified by i

Implementing casual ordering


• Causal ordering using vector timestamps (Figure 12.16)
– Only orders multicasts, and ignores one-to-one messages between processes
– Each process updates its vector timestamp before delivering a message to
maintain the count of precedent messages
Consensus and related problems
• Problems of agreement
– For processes to agree on a value (consensus) after one or more of the processes
has proposed what that value should be
– Covered topics: byzantine generals, interactive consistency, totally ordered
multicast
• The byzantine generals problem: a decision whether multiple armies should
attack or retreat, assuming that united action will be more successful than
some attacking and some retreating • Another example might be space ship
controllers deciding whether to proceed or abort. Failure handling during
consensus is a key concern
• Assumptions
– communication (by message passing) is reliable
– processes may fail
• Sometimes up to f of the N processes are faulty

Consensus Process
1. Each process pi begins in an undecided state and proposes a single value vi,
drawn from a set D (i=1…N)
2. Processes communicate with each other, exchanging values
3. Each process then sets the value of a decision variable di and enters the decided
state
Requirements for Consensus

• Three requirements of a consensus algorithm


– Termination: Eventually every correct process sets its decision variable
– Agreement: The decision value of all correct processes is the same: if pi and pj
are correct and have entered the decided state, then di=dj
(i,j=1,2, …, N)
– Integrity: If the correct processes all proposed the same value, then any correct
process in the decided state has chosen that value

The byzantine generals problem

• Problem description
– Three or more generals must agree to attack or to retreat
– One general, the commander, issues the order
– Other generals, the lieutenants, must decide to attack or retreat
– One or more generals may be treacherous
• A treacherous general tells one general to attack and another to retreat
• Difference from consensus is that a single process supplies the value to agree
on
• Requirements
– Termination: eventually each correct process sets its decision variable
– Agreement: the decision variable of all correct processes is the same
– Integrity: if the commander is correct, then all correct processes agree on the
value that the commander has proposed (but the commander need not be
correct)

The interactive consistency problem


• Interactive consistency: all correct processes agree on a vector of values, one
for each process. This is called the decision vector
– Another variant of consensus
• Requirements
– Termination: eventually each correct process sets its decision variable
– Agreement: the decision vector of all correct processes is the same
– Integrity: if any process is correct, then all correct processes decide the correct
value for that
process

Relating consensus to other problems


• Consensus (C), Byzantine Generals (BG), and Interactive Consensus (IC)
are all problems concerned with making decisions in the context of arbitrary
or crash failures
• We can sometimes generate solutions for one problem in terms of another.
For example – We can derive IC from BG by running BG N times, once for
each process with that process acting as commander
– We can derive C from IC by running IC to produce a vector of values at each
process, then applying a function to the vector‘s values to derive a single value.
– We can derive BG from C by
• Commander sends proposed value to itself and each remaining process
• All processes run C with received values
• They derive BG from the vector of C values

Consensus in a Synchronous System


• Up to f processes may have crash failures, all failures occurring during f+1
rounds.
During each round, each of the correct processes multicasts the values among
themselves
• The algorithm guarantees all surviving correct processes are in a position to
agree
• Note: any process with f failures will require at least f+1 rounds to agree
Limits for solutions to Byzantine Generals

• Some cases of the Byzantine Generals problems have no solutions


– Lamport et al found that if there are only 3 processes, there is no solution
– Pease et al found that if the total number of processes is less than three times
the number of failures plus one, there is no solution
• Thus there is a solution with 4 processes and 1 failure, if there are two rounds
– In the first, the commander sends the values
– while in the second, each lieutenant sends the values it received
Figure 12.20 Four Byzantine generals
Asynchronous Systems
• All solutions to consistency and Byzantine generals problems are limited to
synchronous systems
• Fischer et al found that there are no solutions in an asynchronous system with
even one failure
• This impossibility is circumvented by masking faults or using failure detection
• There is also a partial solution, assuming an adversary process, based on
introducing random values in the process to prevent an effective thwarting
strategy. This does not always reach consensus

You might also like