CSC423 - Lec12 - Distributed and Parallel ComputerSystems
CSC423 - Lec12 - Distributed and Parallel ComputerSystems
CSC 423
Spring 2021-2022
Lecture 12
Instructor
Dr / Ayman Soliman
➢ Contents
1) Penalty points
2) Hierarchical Algorithm
3) Sender-Initiated Distributed Heuristic Algorithm
4) Receiver-Initiated Distributed Heuristic Algorithm
5) Bidding Algorithm
6) SCHEDULING IN DISTRIBUTED SYSTEMS
7) FAULT TOLERANCE
8) System Failures
9) Synchronous Vs Asynchronous Systems
10) Agreement in Faulty Systems
• The key players in the economy are the processes, which must buy
CPU time to get their work done, and processors, which auction their
cycles off to the highest bidder.
Component Faults
• Computer systems can fail due to a fault in some component,
such as a processor, memory, I/O device, cable, or software.
FAULT TOLERANCE
• A permanent fault is one that continues to exist until the faulty component is repaired.
• The goal of designing and building fault-tolerant systems is to ensure that the
system as a whole continues to function correctly, even in the presence of faults.
System Failures
• In a critical distributed system, we are interested in making the system be
able to survive component (in particular, processor) without faults.
• Failure to get a reply means that the receiving system has crashed.
Synchronous Vs Asynchronous Systems
• If all three inputs are different, the output is undefined. This kind of design is known as TMR
(Triple Modular Redundancy).
• First, it is simpler during normal operation since messages go to just one server
(the primary) and not to a whole group.
• two-army problem
• the sender of the last message does not know if the last message arrived.
• Now let us assume that the communication is perfect but the processors are not.
• The classical problem occurs in a military setting and is called the Byzantine
generals problem.
• The goal of the problem is for the generals to exchange troop strengths, so that at
the end of the algorithm, each general has a vector of length n corresponding to
all the armies.
Agreement in Faulty Systems
• we illustrate the working of the algorithm for the case of n = 4 and m =1 For these
parameters, the algorithm operates in four steps.
Agreement in Faulty Systems
(1, 2, UNKNOWN, 4)
Agreement in Faulty Systems
Agreement in Faulty Systems
• Lamport et al. (1982) proved that in a system with m faulty processors, agreement
can be achieved only if 2m + 1 correctly functioning processors are present
5/18/2022 Dr/ Ayman Soliman 28