Lecture 7 Disributed Algorithms
Lecture 7 Disributed Algorithms
Memory-wall challenge:
Memory already limits single-processor performance. How can we
design a memory system that provides a bandwidth of several
terabytes/s for data-intensive high-performance applications?
Power-wall challenge:
When there are millions of processing nodes, each drawing a few watts
of power, we are faced with the energy bill and cooling challenges of
MWs of power dissipation, even ignoring the power needs of the
interconnection network and peripheral devices
Reliability-wall challenge:
Ensuring continuous and correct functioning of a system with many
thousands or even millions of processing nodes is non-trivial, given that
a few of the nodes are bound to malfunction at an given time
9
Understand orders of magnitude in computer
performance
1
GigaFLOPS
A 1 gigaFLOPS (GFLOPS) computer system is capable of performing one billion (10 9)
floating-point operations per second. To match what a 1 GFLOPS computer system can
do in just one second, you'd have to perform one calculation every second for 31.69
2 years.
TeraFLOPS
A 1 teraFLOPS (TFLOPS) computer system is capable of performing one trillion (10 12)
floating-point operations per second. The rate 1 TFLOPS is equivalent to 1,000 GFLOPS.
To match what a 1 TFLOPS computer system can do in just one second, you'd have to
perform one calculation every second for 31,688.77 years.
PetaFLOPS
A 1 petaFLOPS (PFLOPS) computer system is capable of performing one quadrillion
3 (1015) floating-point operations per second. The rate 1 PFLOPS is equivalent to 1,000
TFLOPS. To match what a 1 PFLOPS computer system can do in just one second, you'd
have to perform one calculation every second for 31,688,765 years.
ExaFLOPS
A 1 exaFLOPS (EFLOPS) computer system is capable of performing one quintillion (10 18)
floating-point operations per second. The rate 1 EFLOPS is equivalent to 1,000 PFLOPS.
To match what a 1 EFLOPS computer system can do in just one second, you'd have to
perform one calculation every second for 31,688,765,000 years.
Distributed Algorithm: Distributed System vs
Parallel System
11
Process Synchronization
Set of techniques that are used to coordinate execution amongst
processes.
For example, a process may wish to run only to a certain point, at
which it will stop and wait for another process to finish certain actions.
12
Distributed Algorithm: Distributed System vs Parallel System
18
Mutual Exclusion in Distributed
System:
Mutual Exclusion ensures that no other process will use shared
resources at same time..
One process is elected as coordinator.
Whenever process wants to enter a critical region , it sends request
msg to coordinator asking for permission.
If no other process is currently in that critical region, the coordinator
sends back a reply granting permission.
When reply arrives, the requesting process enters the critical region.
If the coordinator knows that a different process is already in critical
regions, so it cannot be granted permission.
19
Centralized Algorithm
In centralized algorithm one process is
elected as the coordinator which may be the
mach
ine with the highest network address.
20
Central Server algorithm
Mimic single processor system
One process elected as coordinator
1. Request resource request(R) C
2. Wait for response grant(R)
3. Receive grant
P
4. access resource
5. Release resource release(R)
21
Central Server Algorithm
When a process wants to enter a critical section, it sends a request
message (identifying the critical section, if there are more than one) to the
coordinator.
If, however, another process has previously claimed the critical section, the
server simply does not reply, so the requesting process is blocked.
The coordinator then can send a grant message to the next process in its
queue of processes requesting a critical section (if any).
22
Central Server algorithm
Benefits
Fair
All requests processed in order
Easy to implement, understand, verify
Problems
Process cannot distinguish being blocked
from a dead coordinator
Centralized server can be a bottleneck
23
Ricart & Agrawala algorithm
Distributed algorithm using reliable multicast
and logical clocks
Process wants to enter critical section:
Compose message containing:
Identifier (machine ID, process ID)
Name of resource
Timestamp (totally-ordered Lamport)
Send request to all processes in group
Wait until everyone gives permission
Enter critical section / use resource
24
Ricart & Agrawala algorithm
When process receives request:
If receiver not interested:
Send OK to sender
Earliest wins
26
Lamport’s Mutual Exclusion
Each process maintains request queue
Contains mutual exclusion requests
27
Lamport’s Mutual Exclusion
Entering critical section (accessing resource):
Pi received a message (ack or release) from
every other process with a timestamp larger
than Ti
Pi’s request has the earliest timestamp in its
queue
Difference from Ricart-Agrawala:
Everyone responds … always - no hold-back
Process decides to go based on whether its
request is the earliest in its queue
28
Lamport’s Mutual Exclusion
Releasing critical section:
Remove request from its own queue
Send a timestamped release message
29
Token Ring Algorithm
assumes a group of processes with no inherent
ordering of processes, but that some ordering can be
imposed on the group.
For example, we can identify each process by its
machine address and process ID to obtain an ordering.
Using this imposed ordering, a logical ring is
constructed in software.
Each process is assigned a position in the ring and
each process must know who is next to it in the ring
(Figure ).
30
Token Ring Algorithm
32
Election Algorithms
Example 1: Your Bank maintains multiple servers, but for each
customer, one of the servers is responsible, i.e., is the leader
33
Assumptions and Requirements
34
Election algorithms
35
What have we Learnt?
Coordination requires a leader process, e.g.,
sequencer for total ordering in multicasts, bank
database example, coordinator-based mutual
exclusion.
Leader process might fail
Need to (re-) elect leader process
Two Algorithms:
Bully algorithm
Ring Algorithm
36
Election by the Bully Algorithm
Assumptions:
Synchronous system
All messages arrive within T units of time.
trans
A reply is dispatched within Tprocess units of time after the receipt of a
message.
if no response is received in 2Ttrans + Tprocess, the node is assumed to be
faulty (crashed).
attr=id
37
Bully algorithm
Selects the process with the largest identifier as the coordinator.
It works as follows:
When a process p detects that the coordinator is not responding to
requests, it initiates an election:
a. p sends an election message to all processes with higher numbers.
b. If nobody responds, then p wins and takes over.
c. If one of the processes answers, then p's job is done.
If a process receives an election message from a lower-numbered process
at any time, it:
a. sends an OK message back.
b. holds an election (unless its already holding one).
A process announces its victory by sending all processes a message telling
them that it is the new coordinator.
If a process that has been down recovers, it holds an election.
38
Performance of Bully Algorithm
Best case scenario: The process with the second highest
id notices the failure of the coordinator and elects itself.
N-2 coordinator messages are sent.
Turnaround time is one message transmission time.
Worst case scenario: When the process with the least id
detects the failure.
N-1 processes altogether begin elections, each sending
messages to processes with higher ids.
There will be message overhead.
Turnaround time is approximately 5 message
transmission times if there are no failures during the
run: election, answer, election, answer, coordinator
39
Ring Algorithm
Uses the same ring arrangement as in the token ring
mutual exclusion algorithm, but does not employ a token.
Processes are physically or logically ordered so that each
knows its successor.
If any process detects failure, it constructs an election
message with its process I.D. (e.g. network address and
local process I.D.) and sends it to its successor.
If the successor is down, it skips over it and sends the
message to the next party. This process is repeated until a
running process is located.
At each step, the process adds its own process I.D. to the
list in the message.
40
Ring Algorithm
42
Thank you
43