0% found this document useful (0 votes)
5 views

Lecture 7 Disributed Algorithms

The document discusses routing algorithms in computer networks, focusing on distance vector and link state algorithms that determine the best paths for data transmission. It explains distributed algorithms, which run on interconnected processors without centralized control, and highlights challenges such as coordination and reliability. Additionally, it covers mutual exclusion in distributed systems, comparing centralized and distributed approaches, and introduces various algorithms for managing access to shared resources.

Uploaded by

brianshiaba33
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 7 Disributed Algorithms

The document discusses routing algorithms in computer networks, focusing on distance vector and link state algorithms that determine the best paths for data transmission. It explains distributed algorithms, which run on interconnected processors without centralized control, and highlights challenges such as coordination and reliability. Additionally, it covers mutual exclusion in distributed systems, comparing centralized and distributed approaches, and introduces various algorithms for managing access to shared resources.

Uploaded by

brianshiaba33
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

SIC 2416 Cloud Computing and Distributed Systems

Lecture 7: Distributed Algorithms

Dr. Cheruiyot w.k, PhD


JKUAT
SCIT
Routing Algorithms in Computer Networks -
Distance Vector, Link State

 Data is converted into packets in computer networks before being


transferred from source to destination.
 The network layer chooses the best path for data packet
transmission.
 The network layers provide a routing protocol, which is a routing
algorithm that determines the best and shortest path for
transmitting data from source to destination.
 Routing algorithms are an essential component of computer
networks.
 Without them, data cannot flow between different parts of the
network.
 This article will look at the various types of routing algorithms and
how they work.
2
What is Routing Algorithm?

 A routing algorithm is a routing protocol determined


by the network layer for transmitting data packets
from source to destination.
 This algorithm determines the best or least-cost path
for data transmission from sender/source to
receiver/destination.
 The network layer performs operations that effectively
and efficiently regulate internet traffic. In computer
networks, this is known as a routing algorithm. It is
used to determine the best path or route
mathematically.
3
What is an Algorithm?

 In computer programming terms, an algorithm


is a set of well-defined instructions to solve a
particular problem. It takes a set of input(s)
and produces the desired output;
Qualities of a good algorithm:
 Input and output should be defined precisely.
 Each step in the algorithm should be clear and unambiguous.
 Algorithms should be most effective among many different ways to
solve a problem.
 An algorithm shouldn't include computer code. Instead, the
algorithm should be written in such a way that it can be used in
different programming languages.
4
From the perspective of Distributed Systems…

 A distributed algorithm is an algorithm


designed to run on computer hardware
constructed from interconnected
processors.
 Distributed algorithms are algorithms
designed to run on multiple processors,
without tight centralized control. In
general, they are harder to design and harder
to understand than single-processor
sequential algorithms.
5
Distributed Algorithms

 Distributed algorithms are a sub-type of parallel algorithm, typically


executed concurrently, with separate parts of the algorithm being run
simultaneously on independent processors, and having limited
information about what the other parts of the algorithm are doing.
 One of the major challenges in developing and implementing
distributed algorithms is successfully coordinating the behavior of the
independent parts of the algorithm in the face of processor failures
and unreliable communications links.
 The choice of an appropriate distributed algorithm to solve a given
problem depends on both the characteristics of the problem, and
characteristics of the system the algorithm will run on such as the
type and probability of processor or link failures, the kind of inter-
process communication that can be performed, and the level of
timing synchronization between separate processes.[
6
Definition…
 Distributed Algorithm are algorithms designed to run large but
limited set of processors that run similar programs.
 It runs on a distributed system that does not assume the
previous existence of a central coordinator.
 A distributed system is a group of processors that do not share
memory or a clock.
 Each process has its own memory and the processors use
communication networks to communicate.
 Distributed algorithms are used in various practical systems,
ranging from large computer networks to multiprocessor shared-
memory systems.
 The development of distributed algorithms are different in nature
from the development of centralized algorithms. Distributed
system differs from centralized system
7
Standard problems

 Atomic commit An atomic commit is an operation where a set of distinct changes is


applied as a single operation.
 If the atomic commit succeeds, it means that all the changes have been applied. If
there is a failure before the atomic commit can be completed, the "commit" is aborted
and no changes will be applied.
 Algorithms for solving the atomic commit problem include the two-phase commit
protocol and the three-phase commit protocol. Consensus Consensus algorithms try
to solve the problem of a number of processes agreeing on a common decision.
 More precisely, a Consensus protocol must satisfy the four formal properties below.
 Termination: every correct process decides some value.
 Validity: if all processes propose the same value v , then every correct process decides v .
 Integrity: every correct process decides at most one value, and if it decides some value
v , then v must have been proposed by some process.
 Agreement: if a correct process decides v , then every correct process decides v .
 Common algorithms for solving consensus are the Paxos algorithm and the Raft algorithm.
Distributed Systems Compared to Parallel Systems

The Three Walls of High-Performance Computing:

 Memory-wall challenge:
 Memory already limits single-processor performance. How can we
design a memory system that provides a bandwidth of several
terabytes/s for data-intensive high-performance applications?

 Power-wall challenge:
 When there are millions of processing nodes, each drawing a few watts
of power, we are faced with the energy bill and cooling challenges of
MWs of power dissipation, even ignoring the power needs of the
interconnection network and peripheral devices

 Reliability-wall challenge:
 Ensuring continuous and correct functioning of a system with many
thousands or even millions of processing nodes is non-trivial, given that
a few of the nodes are bound to malfunction at an given time

9
Understand orders of magnitude in computer
performance
1

GigaFLOPS
A 1 gigaFLOPS (GFLOPS) computer system is capable of performing one billion (10 9)
floating-point operations per second. To match what a 1 GFLOPS computer system can
do in just one second, you'd have to perform one calculation every second for 31.69
2 years.
TeraFLOPS
A 1 teraFLOPS (TFLOPS) computer system is capable of performing one trillion (10 12)
floating-point operations per second. The rate 1 TFLOPS is equivalent to 1,000 GFLOPS.
To match what a 1 TFLOPS computer system can do in just one second, you'd have to
perform one calculation every second for 31,688.77 years.
PetaFLOPS
A 1 petaFLOPS (PFLOPS) computer system is capable of performing one quadrillion
3 (1015) floating-point operations per second. The rate 1 PFLOPS is equivalent to 1,000
TFLOPS. To match what a 1 PFLOPS computer system can do in just one second, you'd
have to perform one calculation every second for 31,688,765 years.
ExaFLOPS
A 1 exaFLOPS (EFLOPS) computer system is capable of performing one quintillion (10 18)
floating-point operations per second. The rate 1 EFLOPS is equivalent to 1,000 PFLOPS.
To match what a 1 EFLOPS computer system can do in just one second, you'd have to
perform one calculation every second for 31,688,765,000 years.
Distributed Algorithm: Distributed System vs
Parallel System

 Distributed system refers to the all computer applications where several


computers or processors are interconnected in some way.
 It incorporates multiprocessor computers in which each processor has its
own control unit, wide-area computer communication networks, local-area
networks and systems of cooperating processes.
 Following are some points of comparison between Distributed System and
Parallel System:

11
Process Synchronization
 Set of techniques that are used to coordinate execution amongst
processes.
 For example, a process may wish to run only to a certain point, at
which it will stop and wait for another process to finish certain actions.

 A common resource (such as a device or a location in memory) may


require exclusive access and processes have to coordinate amongst
themselves to ensure that access is fair and exclusive.
 In centralized systems, it was common enforce exclusive access to
shared code.
 Mutual exclusion was accomplished through mechanisms such as test
and set locks in hardware and semaphores(variable or adt for controlling
access, by multiple processes, to a common resources in a multi user
environment) messages, and condition variables in software.

12
Distributed Algorithm: Distributed System vs Parallel System

Distributed system refers to


the all computer applications
where several computers or
processors are
interconnected in some way.
It incorporates multiprocessor
computers in which each
processor has its own control
unit, wide-area computer
communication networks,
local-area networks and
systems of cooperating
processes.
Following are some points of
comparison between
Distributed System and
Parallel System 13
Why synchronize clocks in Distributed
Systems?
 You want to catch the 10 bodaboda at the Voi Town at
6.05 pm, but your watch is off by 5 minutes
 What if your watch is Faster by 5 minutes?
 What if your watch is Late by 5 minutes?

 Synchronizing clocks helps us


 Time-stamping events (provide ‘Fairness’)
 Ordering events (provide ‘Correctness’)
Time Sources

 De Facto Primary Standard – International Atomic Time (TAI)


Keeping of TAI started in 1955
 1 atomic second = 9,192,631,770 orbital transitions of Cs 133 (Caesium)
 86400 atomic seconds = 1 solar day – 3 ms
 Coordinated Universal Time (UTC) – International Standard
 Keeping of UTC started 1961
 Derived from TAI by adding leap seconds to keep it close to solar time
 UTC source signals are synchronized
 UTC time is re-transmitted by GPS satellites

 Local clocks are based on oscillators


Terminology
 Distributed System (DS) consists of N processes p1, p2, .. pN
 Process pi, i є {1,…N}
 State: values of local variables including time
 C (t): the reading of the local clock at process i when the real time is t
i
Actions: send message [send(m)], receive message [recv(m)], compute
[comp]
 Occurrence of an action is called an event
Terminology
 Events within process pi can be assigned timestamp and thus ordered
 Events across different processes in DS need to be ordered, but
  Clocks in DS across processes are not synchronized
 Process clocks can be different
 Need algorithms for either
 time synchronization or
 telling which event happened before which
Distributed Mutual Exclusion
 We assume that there is group agreement on
how a critical section (or exclusive resource)
is identified (e.g. name, number) and that this
identifier is passed as a parameter with any
requests.
 Create an algorithm to allow a process to
obtain exclusive access to a resource.

18
Mutual Exclusion in Distributed
System:
 Mutual Exclusion ensures that no other process will use shared
resources at same time..
 One process is elected as coordinator.
 Whenever process wants to enter a critical region , it sends request
msg to coordinator asking for permission.
 If no other process is currently in that critical region, the coordinator
sends back a reply granting permission.
 When reply arrives, the requesting process enters the critical region.
 If the coordinator knows that a different process is already in critical
regions, so it cannot be granted permission.

19
Centralized Algorithm
 In centralized algorithm one process is
elected as the coordinator which may be the
mach
 ine with the highest network address.

20
Central Server algorithm
 Mimic single processor system
 One process elected as coordinator
1. Request resource request(R) C
2. Wait for response grant(R)
3. Receive grant
P
4. access resource
5. Release resource release(R)

21
Central Server Algorithm
 When a process wants to enter a critical section, it sends a request
message (identifying the critical section, if there are more than one) to the
coordinator.

 If nobody is currently in the section, the coordinator sends back a grant


message and marks that process as using the critical section.

 If, however, another process has previously claimed the critical section, the
server simply does not reply, so the requesting process is blocked.

 When a process is done with its critical section, it sends a release


message to the coordinator.

 The coordinator then can send a grant message to the next process in its
queue of processes requesting a critical section (if any).

22
Central Server algorithm
Benefits
 Fair
 All requests processed in order
 Easy to implement, understand, verify

Problems
 Process cannot distinguish being blocked
from a dead coordinator
 Centralized server can be a bottleneck
23
Ricart & Agrawala algorithm
 Distributed algorithm using reliable multicast
and logical clocks
 Process wants to enter critical section:
 Compose message containing:
 Identifier (machine ID, process ID)
 Name of resource
 Timestamp (totally-ordered Lamport)
 Send request to all processes in group
 Wait until everyone gives permission
 Enter critical section / use resource
24
Ricart & Agrawala algorithm
 When process receives request:
If receiver not interested:
 Send OK to sender

 If receiver is in critical section


 Do not reply; add request to queue

 If receiver just sent a request as well:


 Compare timestamps: received & sent messages

 Earliest wins

 If receiver is loser, send OK

 If receiver is winner, do not reply, queue

 When done with critical section


 Send OK to all queued requests
25
Ricart & Agrawala algorithm: Limitations

 a single point of failure has now been


replaced with n points of failure. A poor
algorithm has been replaced with one that is
essentially n times worse.
 A lot of messaging traffic
 Demonstrates that a fully distributed
algorithm is possible

26
Lamport’s Mutual Exclusion
Each process maintains request queue
 Contains mutual exclusion requests

Requesting critical section:


 Process Pi sends request(i, Ti) to all nodes
Lamport time
 Places request on its own queue
 When a process P receives a request, it
j
returns a timestamped ack

27
Lamport’s Mutual Exclusion
Entering critical section (accessing resource):
 Pi received a message (ack or release) from
every other process with a timestamp larger
than Ti
 Pi’s request has the earliest timestamp in its
queue
Difference from Ricart-Agrawala:
 Everyone responds … always - no hold-back
 Process decides to go based on whether its
request is the earliest in its queue
28
Lamport’s Mutual Exclusion
Releasing critical section:
 Remove request from its own queue
 Send a timestamped release message

 When a process receives a release message


 Removes request for that process from its queue
 This may cause its own entry have the earliest
timestamp in the queue, enabling it to access the
critical section

29
Token Ring Algorithm
 assumes a group of processes with no inherent
ordering of processes, but that some ordering can be
 imposed on the group.
 For example, we can identify each process by its
 machine address and process ID to obtain an ordering.
 Using this imposed ordering, a logical ring is
constructed in software.
 Each process is assigned a position in the ring and
each process must know who is next to it in the ring
(Figure ).

30
Token Ring Algorithm

 The ring is initialized by giving a token to process 0. The token


circulates around the ring (process n passes it to (n+1)mod
ringsize.
 When a process acquires the token, it checks to see if it is
attempting to enter the critical section. If so, it enters and does
its work. On exit, it passes the token to its neighbor.
 If a process isn't interested in entering a critical section, it simply
passes the token along. 31
Traditional Network Design Methodology

 Only one process has the token at a time and it must


have the token to work on a critical section, so mutual
exclusion is guaranteed. Order is also well-defined,
so starvation cannot occur.
 The biggest drawback of this algorithm is that if a
token is lost, it will have to be generated. Determining
that a token is lost can be difficult.

32
Election Algorithms
 Example 1: Your Bank maintains multiple servers, but for each
customer, one of the servers is responsible, i.e., is the leader

 Example 2: In the sequencer-based algorithm for total ordering


of multicasts,
 What happens if the “special” sequencer process fails?

 Example 3: Coordinator-based distributed mutual exclusion:


need to elect (and keep) one coordinator

 In a group of processes, elect a Leader to undertake special


tasks. Makes the algorithm design easy.

 But leader may fail (crash)


 Some process detects this
 Then what?

33
Assumptions and Requirements

 Any process can call for an election.


 A process can call for at most one election at a time.
 Multiple processes can call an election simultaneously.
 The result of an election should not depend on which process calls
for it.
 Each process has
 Variable called elected
 An attribute value called attr, e.g., id, MAC address, CPU
 The non-faulty process with the best (highest) election attribute value
(e.g., highest id or address, or fastest cpu, etc.) is elected.
 Requirement: A run (execution) of the election algorithm must always
guarantee at the end.

34
Election algorithms

 We often need one process to act as a coordinator. It


may not matter which process does this, but there
should be group agreement on only one.
 An assumption in election algorithms is that all
processes are exactly the same with no
distinguishing characteristics.
 Each process can obtain a unique identifier (for
example, a machine address and process ID) and
each process knows of every other process but does
not know which is up and which is down.

35
What have we Learnt?
 Coordination requires a leader process, e.g.,
sequencer for total ordering in multicasts, bank
database example, coordinator-based mutual
exclusion.
 Leader process might fail
 Need to (re-) elect leader process
 Two Algorithms:
 Bully algorithm
 Ring Algorithm

36
Election by the Bully Algorithm
 Assumptions:
 Synchronous system
 All messages arrive within T units of time.
trans
 A reply is dispatched within Tprocess units of time after the receipt of a
message.
 if no response is received in 2Ttrans + Tprocess, the node is assumed to be
faulty (crashed).

 attr=id

Each process knows all the other processes in the system


(and thus their id’s)

37
Bully algorithm
Selects the process with the largest identifier as the coordinator.
It works as follows:
When a process p detects that the coordinator is not responding to
requests, it initiates an election:
a. p sends an election message to all processes with higher numbers.
b. If nobody responds, then p wins and takes over.
c. If one of the processes answers, then p's job is done.
If a process receives an election message from a lower-numbered process
at any time, it:
a. sends an OK message back.
b. holds an election (unless its already holding one).
A process announces its victory by sending all processes a message telling
them that it is the new coordinator.
If a process that has been down recovers, it holds an election.
38
Performance of Bully Algorithm
 Best case scenario: The process with the second highest
id notices the failure of the coordinator and elects itself.
 N-2 coordinator messages are sent.
 Turnaround time is one message transmission time.
 Worst case scenario: When the process with the least id
detects the failure.
 N-1 processes altogether begin elections, each sending
messages to processes with higher ids.
 There will be message overhead.
 Turnaround time is approximately 5 message
transmission times if there are no failures during the
run: election, answer, election, answer, coordinator

39
Ring Algorithm
 Uses the same ring arrangement as in the token ring
mutual exclusion algorithm, but does not employ a token.
 Processes are physically or logically ordered so that each
knows its successor.
 If any process detects failure, it constructs an election
message with its process I.D. (e.g. network address and
local process I.D.) and sends it to its successor.
 If the successor is down, it skips over it and sends the
message to the next party. This process is repeated until a
running process is located.
 At each step, the process adds its own process I.D. to the
list in the message.
40
Ring Algorithm

 Eventually, the message comes back to the process that


started it:
 1. The process sees its ID in the list.
 2. It changes the message type to coordinator.
 3. The list is circulated again, with each process selecting
the highest numbered ID in the list to act as coordinator.
 4. When the coordinator message has circulated fully, it
is deleted.
Multiple messages may circulate if multiple processes
detected failure. This creates a bit of overhead but
produces the same results.
41
Summary
 Network design needs to understand and
balance requirements from network users,
applications, devices, and the external
environment
 Flow analysis helps capture capacity, delay,
QoS, reliability, and other critical aspects
 Then technology choices can be made based
on segmenting the network by geography,
user, flow spec, or functions provided

42
Thank you

43

You might also like