SIC 2416 Cloud Computing and Distributed Systems
Lecture 7: Distributed Algorithms
Dr. Cheruiyot w.k, PhD
JKUAT
SCIT
Routing Algorithms in Computer Networks -
Distance Vector, Link State
Data is converted into packets in computer networks before being
transferred from source to destination.
The network layer chooses the best path for data packet
transmission.
The network layers provide a routing protocol, which is a routing
algorithm that determines the best and shortest path for
transmitting data from source to destination.
Routing algorithms are an essential component of computer
networks.
Without them, data cannot flow between different parts of the
network.
This article will look at the various types of routing algorithms and
how they work.
2
What is Routing Algorithm?
A routing algorithm is a routing protocol determined
by the network layer for transmitting data packets
from source to destination.
This algorithm determines the best or least-cost path
for data transmission from sender/source to
receiver/destination.
The network layer performs operations that effectively
and efficiently regulate internet traffic. In computer
networks, this is known as a routing algorithm. It is
used to determine the best path or route
mathematically.
3
What is an Algorithm?
In computer programming terms, an algorithm
is a set of well-defined instructions to solve a
particular problem. It takes a set of input(s)
and produces the desired output;
Qualities of a good algorithm:
Input and output should be defined precisely.
Each step in the algorithm should be clear and unambiguous.
Algorithms should be most effective among many different ways to
solve a problem.
An algorithm shouldn't include computer code. Instead, the
algorithm should be written in such a way that it can be used in
different programming languages.
4
From the perspective of Distributed Systems…
A distributed algorithm is an algorithm
designed to run on computer hardware
constructed from interconnected
processors.
Distributed algorithms are algorithms
designed to run on multiple processors,
without tight centralized control. In
general, they are harder to design and harder
to understand than single-processor
sequential algorithms.
5
Distributed Algorithms
Distributed algorithms are a sub-type of parallel algorithm, typically
executed concurrently, with separate parts of the algorithm being run
simultaneously on independent processors, and having limited
information about what the other parts of the algorithm are doing.
One of the major challenges in developing and implementing
distributed algorithms is successfully coordinating the behavior of the
independent parts of the algorithm in the face of processor failures
and unreliable communications links.
The choice of an appropriate distributed algorithm to solve a given
problem depends on both the characteristics of the problem, and
characteristics of the system the algorithm will run on such as the
type and probability of processor or link failures, the kind of inter-
process communication that can be performed, and the level of
timing synchronization between separate processes.[
6
Definition…
Distributed Algorithm are algorithms designed to run large but
limited set of processors that run similar programs.
It runs on a distributed system that does not assume the
previous existence of a central coordinator.
A distributed system is a group of processors that do not share
memory or a clock.
Each process has its own memory and the processors use
communication networks to communicate.
Distributed algorithms are used in various practical systems,
ranging from large computer networks to multiprocessor shared-
memory systems.
The development of distributed algorithms are different in nature
from the development of centralized algorithms. Distributed
system differs from centralized system
7
Standard problems
Atomic commit An atomic commit is an operation where a set of distinct changes is
applied as a single operation.
If the atomic commit succeeds, it means that all the changes have been applied. If
there is a failure before the atomic commit can be completed, the "commit" is aborted
and no changes will be applied.
Algorithms for solving the atomic commit problem include the two-phase commit
protocol and the three-phase commit protocol. Consensus Consensus algorithms try
to solve the problem of a number of processes agreeing on a common decision.
More precisely, a Consensus protocol must satisfy the four formal properties below.
Termination: every correct process decides some value.
Validity: if all processes propose the same value v , then every correct process decides v .
Integrity: every correct process decides at most one value, and if it decides some value
v , then v must have been proposed by some process.
Agreement: if a correct process decides v , then every correct process decides v .
Common algorithms for solving consensus are the Paxos algorithm and the Raft algorithm.
Distributed Systems Compared to Parallel Systems
The Three Walls of High-Performance Computing:
Memory-wall challenge:
Memory already limits single-processor performance. How can we
design a memory system that provides a bandwidth of several
terabytes/s for data-intensive high-performance applications?
Power-wall challenge:
When there are millions of processing nodes, each drawing a few watts
of power, we are faced with the energy bill and cooling challenges of
MWs of power dissipation, even ignoring the power needs of the
interconnection network and peripheral devices
Reliability-wall challenge:
Ensuring continuous and correct functioning of a system with many
thousands or even millions of processing nodes is non-trivial, given that
a few of the nodes are bound to malfunction at an given time
9
Understand orders of magnitude in computer
performance
1
GigaFLOPS
A 1 gigaFLOPS (GFLOPS) computer system is capable of performing one billion (10 9)
floating-point operations per second. To match what a 1 GFLOPS computer system can
do in just one second, you'd have to perform one calculation every second for 31.69
2 years.
TeraFLOPS
A 1 teraFLOPS (TFLOPS) computer system is capable of performing one trillion (10 12)
floating-point operations per second. The rate 1 TFLOPS is equivalent to 1,000 GFLOPS.
To match what a 1 TFLOPS computer system can do in just one second, you'd have to
perform one calculation every second for 31,688.77 years.
PetaFLOPS
A 1 petaFLOPS (PFLOPS) computer system is capable of performing one quadrillion
3 (1015) floating-point operations per second. The rate 1 PFLOPS is equivalent to 1,000
TFLOPS. To match what a 1 PFLOPS computer system can do in just one second, you'd
have to perform one calculation every second for 31,688,765 years.
ExaFLOPS
A 1 exaFLOPS (EFLOPS) computer system is capable of performing one quintillion (10 18)
floating-point operations per second. The rate 1 EFLOPS is equivalent to 1,000 PFLOPS.
To match what a 1 EFLOPS computer system can do in just one second, you'd have to
perform one calculation every second for 31,688,765,000 years.
Distributed Algorithm: Distributed System vs
Parallel System
Distributed system refers to the all computer applications where several
computers or processors are interconnected in some way.
It incorporates multiprocessor computers in which each processor has its
own control unit, wide-area computer communication networks, local-area
networks and systems of cooperating processes.
Following are some points of comparison between Distributed System and
Parallel System:
11
Process Synchronization
Set of techniques that are used to coordinate execution amongst
processes.
For example, a process may wish to run only to a certain point, at
which it will stop and wait for another process to finish certain actions.
A common resource (such as a device or a location in memory) may
require exclusive access and processes have to coordinate amongst
themselves to ensure that access is fair and exclusive.
In centralized systems, it was common enforce exclusive access to
shared code.
Mutual exclusion was accomplished through mechanisms such as test
and set locks in hardware and semaphores(variable or adt for controlling
access, by multiple processes, to a common resources in a multi user
environment) messages, and condition variables in software.
12
Distributed Algorithm: Distributed System vs Parallel System
Distributed system refers to
the all computer applications
where several computers or
processors are
interconnected in some way.
It incorporates multiprocessor
computers in which each
processor has its own control
unit, wide-area computer
communication networks,
local-area networks and
systems of cooperating
processes.
Following are some points of
comparison between
Distributed System and
Parallel System 13
Why synchronize clocks in Distributed
Systems?
You want to catch the 10 bodaboda at the Voi Town at
6.05 pm, but your watch is off by 5 minutes
What if your watch is Faster by 5 minutes?
What if your watch is Late by 5 minutes?
Synchronizing clocks helps us
Time-stamping events (provide ‘Fairness’)
Ordering events (provide ‘Correctness’)
Time Sources
De Facto Primary Standard – International Atomic Time (TAI)
Keeping of TAI started in 1955
1 atomic second = 9,192,631,770 orbital transitions of Cs 133 (Caesium)
86400 atomic seconds = 1 solar day – 3 ms
Coordinated Universal Time (UTC) – International Standard
Keeping of UTC started 1961
Derived from TAI by adding leap seconds to keep it close to solar time
UTC source signals are synchronized
UTC time is re-transmitted by GPS satellites
Local clocks are based on oscillators
Terminology
Distributed System (DS) consists of N processes p1, p2, .. pN
Process pi, i є {1,…N}
State: values of local variables including time
C (t): the reading of the local clock at process i when the real time is t
i
Actions: send message [send(m)], receive message [recv(m)], compute
[comp]
Occurrence of an action is called an event
Terminology
Events within process pi can be assigned timestamp and thus ordered
Events across different processes in DS need to be ordered, but
Clocks in DS across processes are not synchronized
Process clocks can be different
Need algorithms for either
time synchronization or
telling which event happened before which
Distributed Mutual Exclusion
We assume that there is group agreement on
how a critical section (or exclusive resource)
is identified (e.g. name, number) and that this
identifier is passed as a parameter with any
requests.
Create an algorithm to allow a process to
obtain exclusive access to a resource.
18
Mutual Exclusion in Distributed
System:
Mutual Exclusion ensures that no other process will use shared
resources at same time..
One process is elected as coordinator.
Whenever process wants to enter a critical region , it sends request
msg to coordinator asking for permission.
If no other process is currently in that critical region, the coordinator
sends back a reply granting permission.
When reply arrives, the requesting process enters the critical region.
If the coordinator knows that a different process is already in critical
regions, so it cannot be granted permission.
19
Centralized Algorithm
In centralized algorithm one process is
elected as the coordinator which may be the
mach
ine with the highest network address.
20
Central Server algorithm
Mimic single processor system
One process elected as coordinator
1. Request resource request(R) C
2. Wait for response grant(R)
3. Receive grant
P
4. access resource
5. Release resource release(R)
21
Central Server Algorithm
When a process wants to enter a critical section, it sends a request
message (identifying the critical section, if there are more than one) to the
coordinator.
If nobody is currently in the section, the coordinator sends back a grant
message and marks that process as using the critical section.
If, however, another process has previously claimed the critical section, the
server simply does not reply, so the requesting process is blocked.
When a process is done with its critical section, it sends a release
message to the coordinator.
The coordinator then can send a grant message to the next process in its
queue of processes requesting a critical section (if any).
22
Central Server algorithm
Benefits
Fair
All requests processed in order
Easy to implement, understand, verify
Problems
Process cannot distinguish being blocked
from a dead coordinator
Centralized server can be a bottleneck
23
Ricart & Agrawala algorithm
Distributed algorithm using reliable multicast
and logical clocks
Process wants to enter critical section:
Compose message containing:
Identifier (machine ID, process ID)
Name of resource
Timestamp (totally-ordered Lamport)
Send request to all processes in group
Wait until everyone gives permission
Enter critical section / use resource
24
Ricart & Agrawala algorithm
When process receives request:
If receiver not interested:
Send OK to sender
If receiver is in critical section
Do not reply; add request to queue
If receiver just sent a request as well:
Compare timestamps: received & sent messages
Earliest wins
If receiver is loser, send OK
If receiver is winner, do not reply, queue
When done with critical section
Send OK to all queued requests
25
Ricart & Agrawala algorithm: Limitations
a single point of failure has now been
replaced with n points of failure. A poor
algorithm has been replaced with one that is
essentially n times worse.
A lot of messaging traffic
Demonstrates that a fully distributed
algorithm is possible
26
Lamport’s Mutual Exclusion
Each process maintains request queue
Contains mutual exclusion requests
Requesting critical section:
Process Pi sends request(i, Ti) to all nodes
Lamport time
Places request on its own queue
When a process P receives a request, it
j
returns a timestamped ack
27
Lamport’s Mutual Exclusion
Entering critical section (accessing resource):
Pi received a message (ack or release) from
every other process with a timestamp larger
than Ti
Pi’s request has the earliest timestamp in its
queue
Difference from Ricart-Agrawala:
Everyone responds … always - no hold-back
Process decides to go based on whether its
request is the earliest in its queue
28
Lamport’s Mutual Exclusion
Releasing critical section:
Remove request from its own queue
Send a timestamped release message
When a process receives a release message
Removes request for that process from its queue
This may cause its own entry have the earliest
timestamp in the queue, enabling it to access the
critical section
29
Token Ring Algorithm
assumes a group of processes with no inherent
ordering of processes, but that some ordering can be
imposed on the group.
For example, we can identify each process by its
machine address and process ID to obtain an ordering.
Using this imposed ordering, a logical ring is
constructed in software.
Each process is assigned a position in the ring and
each process must know who is next to it in the ring
(Figure ).
30
Token Ring Algorithm
The ring is initialized by giving a token to process 0. The token
circulates around the ring (process n passes it to (n+1)mod
ringsize.
When a process acquires the token, it checks to see if it is
attempting to enter the critical section. If so, it enters and does
its work. On exit, it passes the token to its neighbor.
If a process isn't interested in entering a critical section, it simply
passes the token along. 31
Traditional Network Design Methodology
Only one process has the token at a time and it must
have the token to work on a critical section, so mutual
exclusion is guaranteed. Order is also well-defined,
so starvation cannot occur.
The biggest drawback of this algorithm is that if a
token is lost, it will have to be generated. Determining
that a token is lost can be difficult.
32
Election Algorithms
Example 1: Your Bank maintains multiple servers, but for each
customer, one of the servers is responsible, i.e., is the leader
Example 2: In the sequencer-based algorithm for total ordering
of multicasts,
What happens if the “special” sequencer process fails?
Example 3: Coordinator-based distributed mutual exclusion:
need to elect (and keep) one coordinator
In a group of processes, elect a Leader to undertake special
tasks. Makes the algorithm design easy.
But leader may fail (crash)
Some process detects this
Then what?
33
Assumptions and Requirements
Any process can call for an election.
A process can call for at most one election at a time.
Multiple processes can call an election simultaneously.
The result of an election should not depend on which process calls
for it.
Each process has
Variable called elected
An attribute value called attr, e.g., id, MAC address, CPU
The non-faulty process with the best (highest) election attribute value
(e.g., highest id or address, or fastest cpu, etc.) is elected.
Requirement: A run (execution) of the election algorithm must always
guarantee at the end.
34
Election algorithms
We often need one process to act as a coordinator. It
may not matter which process does this, but there
should be group agreement on only one.
An assumption in election algorithms is that all
processes are exactly the same with no
distinguishing characteristics.
Each process can obtain a unique identifier (for
example, a machine address and process ID) and
each process knows of every other process but does
not know which is up and which is down.
35
What have we Learnt?
Coordination requires a leader process, e.g.,
sequencer for total ordering in multicasts, bank
database example, coordinator-based mutual
exclusion.
Leader process might fail
Need to (re-) elect leader process
Two Algorithms:
Bully algorithm
Ring Algorithm
36
Election by the Bully Algorithm
Assumptions:
Synchronous system
All messages arrive within T units of time.
trans
A reply is dispatched within Tprocess units of time after the receipt of a
message.
if no response is received in 2Ttrans + Tprocess, the node is assumed to be
faulty (crashed).
attr=id
Each process knows all the other processes in the system
(and thus their id’s)
37
Bully algorithm
Selects the process with the largest identifier as the coordinator.
It works as follows:
When a process p detects that the coordinator is not responding to
requests, it initiates an election:
a. p sends an election message to all processes with higher numbers.
b. If nobody responds, then p wins and takes over.
c. If one of the processes answers, then p's job is done.
If a process receives an election message from a lower-numbered process
at any time, it:
a. sends an OK message back.
b. holds an election (unless its already holding one).
A process announces its victory by sending all processes a message telling
them that it is the new coordinator.
If a process that has been down recovers, it holds an election.
38
Performance of Bully Algorithm
Best case scenario: The process with the second highest
id notices the failure of the coordinator and elects itself.
N-2 coordinator messages are sent.
Turnaround time is one message transmission time.
Worst case scenario: When the process with the least id
detects the failure.
N-1 processes altogether begin elections, each sending
messages to processes with higher ids.
There will be message overhead.
Turnaround time is approximately 5 message
transmission times if there are no failures during the
run: election, answer, election, answer, coordinator
39
Ring Algorithm
Uses the same ring arrangement as in the token ring
mutual exclusion algorithm, but does not employ a token.
Processes are physically or logically ordered so that each
knows its successor.
If any process detects failure, it constructs an election
message with its process I.D. (e.g. network address and
local process I.D.) and sends it to its successor.
If the successor is down, it skips over it and sends the
message to the next party. This process is repeated until a
running process is located.
At each step, the process adds its own process I.D. to the
list in the message.
40
Ring Algorithm
Eventually, the message comes back to the process that
started it:
1. The process sees its ID in the list.
2. It changes the message type to coordinator.
3. The list is circulated again, with each process selecting
the highest numbered ID in the list to act as coordinator.
4. When the coordinator message has circulated fully, it
is deleted.
Multiple messages may circulate if multiple processes
detected failure. This creates a bit of overhead but
produces the same results.
41
Summary
Network design needs to understand and
balance requirements from network users,
applications, devices, and the external
environment
Flow analysis helps capture capacity, delay,
QoS, reliability, and other critical aspects
Then technology choices can be made based
on segmenting the network by geography,
user, flow spec, or functions provided
42
Thank you
43