0% found this document useful (0 votes)

38 views28 pages

CSC423 - Lec12 - Distributed and Parallel ComputerSystems

Uploaded by

demro channel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views28 pages

CSC423 - Lec12 - Distributed and Parallel ComputerSystems

Uploaded by

demro channel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Distributed and Parallel Computer Systems

CSC 423
Spring 2021-2022

Lecture 12

Distributed Systems' Processes-2

Instructor
Dr / Ayman Soliman
➢ Contents
1) Penalty points
2) Hierarchical Algorithm
3) Sender-Initiated Distributed Heuristic Algorithm
4) Receiver-Initiated Distributed Heuristic Algorithm
5) Bidding Algorithm
6) SCHEDULING IN DISTRIBUTED SYSTEMS
7) FAULT TOLERANCE
8) System Failures
9) Synchronous Vs Asynchronous Systems
10) Agreement in Faulty Systems

5/18/2022 Dr/ Ayman Soliman 2

❑ Penalty points
➢ When a workstation owner is running processes on other people's
machines, it accumulates penalty points, a fixed number per second.
These points are added to its usage table entry.

➢ Usage table entries can be positive, zero, or negative.

o A positive score indicates that the workstation is a net user of
system resources,
o A negative score means that it needs resources.
o A zero score is neutral.

5/18/2022 Dr/ Ayman Soliman 3

❑ Hierarchical Algorithm
➢ Centralized algorithms, such as up-down, do not scale well to large
systems. The central node soon becomes a bottleneck, not to mention
a single point of failure.
➢ This approach organizes the machines like people in corporate,
military, academic, and other real-world hierarchies.
o Some of the machines are workers and others are managers

5/18/2022 Dr/ Ayman Soliman 4

Sender-Initiated Distributed Heuristic Algorithm

• When a process is created, the machine on which it

originates sends probe messages to a randomly-chosen
machine, asking if its load is below some threshold
value. If so, the process is sent there.

• it should be observed that under conditions of heavy

load, all machines will constantly send probes to other
machines in a futile attempt to find one that is willing
to accept more work.
Receiver-Initiated Distributed Heuristic Algorithm

• Algorithm is one initiated by an underloaded receiver.

• whenever a process finishes, the system checks to see if
it has enough work. If not, it picks some machine at
random and asks it for work.
• An advantage of this algorithm is that it does not put
extra load on the system at critical times.
Bidding Algorithm

• The key players in the economy are the processes, which must buy
CPU time to get their work done, and processors, which auction their
cycles off to the highest bidder.

• Each processor advertises its approximate price by putting it in a

publicly readable file.
SCHEDULING IN DISTRIBUTED SYSTEMS

• Each processor does its own local scheduling (assuming that it

has multiple processes running on it), without regard to what
the other processors are doing.

• When a group of related, heavily interacting processes are all

running on different processors, independent scheduling is
not always the most efficient way.

• The basic difficulty can be illustrated by an example in which

processes A and B run on one processor and processes C and
D run on another.
SCHEDULING IN DISTRIBUTED SYSTEMS

• Several algorithms based on a concept he calls co-scheduling, which

takes interprocess communication patterns into account while
scheduling to ensure that all members of a group run at the same
time.

• The first algorithm uses a conceptual matrix in which each column is

the process table for one processor,
FAULT TOLERANCE

• A system is said to fail when it does not meet its specification.

Component Faults
• Computer systems can fail due to a fault in some component,
such as a processor, memory, I/O device, cable, or software.
FAULT TOLERANCE

• Faults are generally classified as transient, intermittent, or permanent.

• Transient faults occur once and then disappear.

• An intermittent fault occurs, then vanishes, then reappears, and so on.

• A permanent fault is one that continues to exist until the faulty component is repaired.

• The goal of designing and building fault-tolerant systems is to ensure that the
system as a whole continues to function correctly, even in the presence of faults.
System Failures
• In a critical distributed system, we are interested in making the system be
able to survive component (in particular, processor) without faults.

• Two types of processor faults can be distinguished:

1. Fail-silent faults.
Faulty processor just stops and does not respond to subsequent input or produce
further output
2. Byzantine faults.
Faulty processor continues to run, issuing wrong answers to questions,
Synchronous Vs Asynchronous Systems

• If one processor sends a message to another, it is guaranteed to get a

reply within a time T known in advance.

• Failure to get a reply means that the receiving system has crashed.
Synchronous Vs Asynchronous Systems

• System that has the property of always responding to a message within a

known finite bound if it is working is said to be synchronous.

• A system not having this property is said to be asynchronous.

• Asynchronous systems are going to be harder to deal with than

synchronous ones.
Use of Redundancy
• The general approach to fault tolerance is to use redundancy
• Three kinds are possible:
• Information redundancy,
• Extra bits are added to allow recovery from garbled bits.
• Time redundancy,
• an action is performed, and then, if need be, it is performed again.
• Time redundancy is especially helpful when the faults are transient or intermittent.
• Physical redundancy.
• extra equipment is added to make it possible for the system as a whole to tolerate the loss or
malfunctioning of some components (permanent fault )
Fault Tolerance Using Active Replication
• Active replication is a well-known technique for providing fault tolerance using physical redundancy.

• It is used in biology (mammals have two eyes, two ears, etc.),

• If all three inputs are different, the output is undefined. This kind of design is known as TMR
(Triple Modular Redundancy).

Triple modular redundancy

Fault Tolerance Using Primary Backup

• The essential idea of the primary-backup method is that at any one

instant, one server is the primary and does all the work. If the primary
fails, the backup takes over.
Fault Tolerance Using Primary Backup

• Primary-backup fault tolerance has two major advantages over active

replication.

• First, it is simpler during normal operation since messages go to just one server
(the primary) and not to a whole group.

• The problems associated with ordering these messages also disappear.

• Second, in practice it requires fewer machines, because at any instant one

primary and one backup is needed
Fault Tolerance Using Primary Backup
Agreement in Faulty Systems

• The general goal of distributed agreement algorithms is to have all the

non-faulty processors reach consensus on some issue, and do that
within a finite number of steps.

• Examples are electing a coordinator, deciding whether to commit a transaction

or not, dividing up tasks among workers, synchronization, and so on.
Agreement in Faulty Systems

• Different cases are possible depending on system parameters, including:

1. Are messages delivered reliably all the time?

2. Can processes crash?

- if so, fail-silent or Byzantine

3. Is the system synchronous or asynchronous?

Agreement in Faulty Systems

• Let us look at the "easy" case of perfect processors but communication

lines that can lose messages. There is a famous problem, known as the
two-army problem.

• two-army problem

• the sender of the last message does not know if the last message arrived.

• Even with nonfaulty processors (generals), agreement between even two

processes is not possible in the face of unreliable communication.
Agreement in Faulty Systems

• Now let us assume that the communication is perfect but the processors are not.

• The classical problem occurs in a military setting and is called the Byzantine
generals problem.

• The goal of the problem is for the generals to exchange troop strengths, so that at
the end of the algorithm, each general has a vector of length n corresponding to
all the armies.
Agreement in Faulty Systems

• If general / is loyal, then element / is his troop strength; otherwise, it is undefined.

• A recursive algorithm solves this problem under certain conditions.

• we illustrate the working of the algorithm for the case of n = 4 and m =1 For these
parameters, the algorithm operates in four steps.
Agreement in Faulty Systems

• in step 4, each general examines the ith element of each of the

newly received vectors. If any value has a majority, that value is
put into the result vector.
• If no value has a majority, the corresponding element of the
result vector is marked unknown. From Fig. (c) we see that
generals 1, 2, and 4 all come to agreement on

(1, 2, UNKNOWN, 4)
Agreement in Faulty Systems
Agreement in Faulty Systems

• Lamport et al. (1982) proved that in a system with m faulty processors, agreement
can be achieved only if 2m + 1 correctly functioning processors are present
5/18/2022 Dr/ Ayman Soliman 28

Chapter 8-Fault Tolerance
100% (1)
Chapter 8-Fault Tolerance
71 pages
Dis Sys
No ratings yet
Dis Sys
16 pages
Distributed Systems Resilience
No ratings yet
Distributed Systems Resilience
25 pages
Fault Tolerance in Real-Time Systems
No ratings yet
Fault Tolerance in Real-Time Systems
5 pages
Distributed Systems: Consensus & Fault Tolerance
No ratings yet
Distributed Systems: Consensus & Fault Tolerance
10 pages
Availability Tactics in Software Architecture
No ratings yet
Availability Tactics in Software Architecture
3 pages
DS Unit - 4
No ratings yet
DS Unit - 4
20 pages
WRL0004 TMP
No ratings yet
WRL0004 TMP
9 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Chapter 1 - Intro
No ratings yet
Chapter 1 - Intro
31 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
An Adaptive Programming Model For Fault-Tolerant Distributed Computing - Datamining
No ratings yet
An Adaptive Programming Model For Fault-Tolerant Distributed Computing - Datamining
68 pages
Fault Tolerance
No ratings yet
Fault Tolerance
40 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
11 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
9 pages
Module 5 Notes
No ratings yet
Module 5 Notes
10 pages
DS Chapter 8-Fault Tolerance
No ratings yet
DS Chapter 8-Fault Tolerance
68 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
21 pages
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
No ratings yet
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
48 pages
Ch-4-Fault Tularance - Naming-SM
No ratings yet
Ch-4-Fault Tularance - Naming-SM
42 pages
Overview of Distributed Computing Systems
No ratings yet
Overview of Distributed Computing Systems
18 pages
Distributed OS: Concepts & Challenges
100% (1)
Distributed OS: Concepts & Challenges
22 pages
Unit5 Compressed Fault Tolerance - PACE
No ratings yet
Unit5 Compressed Fault Tolerance - PACE
11 pages
Distributed File Systems
No ratings yet
Distributed File Systems
19 pages
Week 04
No ratings yet
Week 04
49 pages
Failure Detector: Degrees of Completeness
No ratings yet
Failure Detector: Degrees of Completeness
4 pages
Intro To DS Chapter 6
No ratings yet
Intro To DS Chapter 6
51 pages
# Consensus and Agreement Algorithms: Distributed Computing
No ratings yet
# Consensus and Agreement Algorithms: Distributed Computing
9 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
30 pages
Fault Tolerance
No ratings yet
Fault Tolerance
10 pages
Fault-Tolerant Parallel Computing
No ratings yet
Fault-Tolerant Parallel Computing
4 pages
APznzaaXffSqYMt6FkYQ0232zlG__fySOaacNFJrnznmxMJ7ZUE8i_5pvQdZTvYrytNVU92wPgbQMEZEBF45ep5ocX5WIYL2XCHGoCfhmwnlgKZo3468oAhaY0f5Ua583UEdpV4DcELKoWag479q3OLktZn6Ysk_ohvdyX0Kj1Y6TpkQ0By1WF8YICb6VBjXKC7-az0n3-dI0
No ratings yet
APznzaaXffSqYMt6FkYQ0232zlG__fySOaacNFJrnznmxMJ7ZUE8i_5pvQdZTvYrytNVU92wPgbQMEZEBF45ep5ocX5WIYL2XCHGoCfhmwnlgKZo3468oAhaY0f5Ua583UEdpV4DcELKoWag479q3OLktZn6Ysk_ohvdyX0Kj1Y6TpkQ0By1WF8YICb6VBjXKC7-az0n3-dI0
7 pages
Lesson 1 - Introduction To Fault-Tolerant Computing
No ratings yet
Lesson 1 - Introduction To Fault-Tolerant Computing
6 pages
DS Unit-3 Notes
No ratings yet
DS Unit-3 Notes
35 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
41 pages
Chapter 8
No ratings yet
Chapter 8
107 pages
Du3 1
No ratings yet
Du3 1
54 pages
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
No ratings yet
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
13 pages
Unit # IV Replication and Fault Tolerance
No ratings yet
Unit # IV Replication and Fault Tolerance
82 pages
اسلام 1
No ratings yet
اسلام 1
16 pages
Rollback-Recovery in Message Systems
No ratings yet
Rollback-Recovery in Message Systems
44 pages
Fault Tolerant Message Passing Systems
No ratings yet
Fault Tolerant Message Passing Systems
26 pages
Ascs 04 0213
No ratings yet
Ascs 04 0213
5 pages
Distributed Systems Study Guide
No ratings yet
Distributed Systems Study Guide
7 pages
Unit 3-1
No ratings yet
Unit 3-1
26 pages
DS CH7 - Fault Tolerance
No ratings yet
DS CH7 - Fault Tolerance
17 pages
CSE446 Lecture 4
No ratings yet
CSE446 Lecture 4
30 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
Understanding Fault Tolerance in Systems
No ratings yet
Understanding Fault Tolerance in Systems
18 pages
Consensus Algorithms for Blockchain
No ratings yet
Consensus Algorithms for Blockchain
32 pages
Lesson 2 - Fault and Error Modelling
No ratings yet
Lesson 2 - Fault and Error Modelling
7 pages
Failover In-Depth
No ratings yet
Failover In-Depth
4 pages
Synchronous Consensus in Faulty Systems
No ratings yet
Synchronous Consensus in Faulty Systems
9 pages
Chapter 7-Fault Tolerance
No ratings yet
Chapter 7-Fault Tolerance
71 pages
MSCS201 FirstIA
No ratings yet
MSCS201 FirstIA
9 pages
w9s1 FaultTolerance1
No ratings yet
w9s1 FaultTolerance1
34 pages
L 2 Statistics
No ratings yet
L 2 Statistics
7 pages
Taylor Swift Case E
No ratings yet
Taylor Swift Case E
6 pages
Case 2 - Magic Millet
No ratings yet
Case 2 - Magic Millet
14 pages
HR CH 1-3 AI
No ratings yet
HR CH 1-3 AI
5 pages
Statistics Lecture 2 Demro
No ratings yet
Statistics Lecture 2 Demro
3 pages
Surplus
No ratings yet
Surplus
7 pages
TB Chapter 13
No ratings yet
TB Chapter 13
15 pages
Test
No ratings yet
Test
6 pages
Puma - See Product Analysis
No ratings yet
Puma - See Product Analysis
29 pages
Chapter 11 Aggregate Planning and Master Scheduling - Part 1
No ratings yet
Chapter 11 Aggregate Planning and Master Scheduling - Part 1
14 pages
L 3 - Demro
No ratings yet
L 3 - Demro
4 pages
Introduction to E-Commerce Systems
No ratings yet
Introduction to E-Commerce Systems
33 pages
Comparison Between Inventory Management Models
No ratings yet
Comparison Between Inventory Management Models
1 page
Cost Accounting Midterm: ABC vs. Traditional
No ratings yet
Cost Accounting Midterm: ABC vs. Traditional
4 pages
MIS Summary
No ratings yet
MIS Summary
14 pages
L 6 Part 1 Summary
No ratings yet
L 6 Part 1 Summary
3 pages
Roles in Information Systems Workforce
No ratings yet
Roles in Information Systems Workforce
5 pages
Thermodynamics1 Ch6 Control Volume p1
No ratings yet
Thermodynamics1 Ch6 Control Volume p1
23 pages
2-Summary L 6
No ratings yet
2-Summary L 6
6 pages
L7 Demro
No ratings yet
L7 Demro
13 pages
Thermodynamics1 Ch2 Basic Concepts
No ratings yet
Thermodynamics1 Ch2 Basic Concepts
42 pages
lEC - 10 - Sorting - Part1
No ratings yet
lEC - 10 - Sorting - Part1
162 pages
Group V
No ratings yet
Group V
9 pages
Priority Queues & Heaps Lecture
No ratings yet
Priority Queues & Heaps Lecture
27 pages
Thermodynamics1 Ch7 Second Law
No ratings yet
Thermodynamics1 Ch7 Second Law
54 pages
BUS1710 Chapter 2 Emotions
No ratings yet
BUS1710 Chapter 2 Emotions
32 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
CSC423 - Lec10 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec10 - Distributed and Parallel ComputerSystems
29 pages
CSC423 - Lec11 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec11 - Distributed and Parallel ComputerSystems
19 pages
CH 10 OB Summary
No ratings yet
CH 10 OB Summary
7 pages
Curs06-07 - Stability, Observability, Controllability
No ratings yet
Curs06-07 - Stability, Observability, Controllability
38 pages
HW Quiz A Key Topic 2.6 Competing Function Model Validation
No ratings yet
HW Quiz A Key Topic 2.6 Competing Function Model Validation
1 page
R Code Part 2
No ratings yet
R Code Part 2
10 pages
Resume of Damineni Sanketh - Data Scientist
No ratings yet
Resume of Damineni Sanketh - Data Scientist
2 pages
ASCE - SEI 41-13 - Seismic Evaluation and Retrofit of Existing Buildings - 7 CRITERIA - InDICE
No ratings yet
ASCE - SEI 41-13 - Seismic Evaluation and Retrofit of Existing Buildings - 7 CRITERIA - InDICE
2 pages
Chapter 7 Algorithms Cs
No ratings yet
Chapter 7 Algorithms Cs
10 pages
Solutions To All DIY Questions
No ratings yet
Solutions To All DIY Questions
15 pages
What Is An Inverse Problem - 10 Lectures On Inverse Problems and Imaging 02
No ratings yet
What Is An Inverse Problem - 10 Lectures On Inverse Problems and Imaging 02
9 pages
Maulana Abul Kalam Azad University of Technology, West Bengal
No ratings yet
Maulana Abul Kalam Azad University of Technology, West Bengal
3 pages
I. Classification of Signals: Signal
No ratings yet
I. Classification of Signals: Signal
13 pages
Notes For The Applied Cryptography
No ratings yet
Notes For The Applied Cryptography
6 pages
Graph Theory for Math Enthusiasts
No ratings yet
Graph Theory for Math Enthusiasts
96 pages
Database Administration Practical Tasks
No ratings yet
Database Administration Practical Tasks
2 pages
MT-1004 Linear Algebra NCEAC CS (Modified) Fall 2021
No ratings yet
MT-1004 Linear Algebra NCEAC CS (Modified) Fall 2021
6 pages
FD X07 021 Métrologie Et Application de La Statistique
No ratings yet
FD X07 021 Métrologie Et Application de La Statistique
58 pages
Comparison Analysis of Euclidean and Gower Distanc
No ratings yet
Comparison Analysis of Euclidean and Gower Distanc
8 pages
Digital Signatures with Modified Hash Code
No ratings yet
Digital Signatures with Modified Hash Code
26 pages
Slides Control System 2
No ratings yet
Slides Control System 2
13 pages
Unit - 1 Introduction of OR
100% (3)
Unit - 1 Introduction of OR
29 pages
Ijacst04742018 PDF
No ratings yet
Ijacst04742018 PDF
4 pages
Field-Oriented Control of PMSM Using Reinforcement Learning
No ratings yet
Field-Oriented Control of PMSM Using Reinforcement Learning
9 pages
Transportation and Assignment Models Guide
No ratings yet
Transportation and Assignment Models Guide
46 pages
An Implementation of K-Means Clustering For Efficient Image Segmentation
No ratings yet
An Implementation of K-Means Clustering For Efficient Image Segmentation
10 pages
ATPG Techniques for VLSI Circuits
No ratings yet
ATPG Techniques for VLSI Circuits
57 pages
Sample Signal 2.o
No ratings yet
Sample Signal 2.o
11 pages
Schrödinger, Heisenberg, Dirac, Picture
No ratings yet
Schrödinger, Heisenberg, Dirac, Picture
22 pages
Understanding Mathematical Expectation
No ratings yet
Understanding Mathematical Expectation
2 pages
Linear Search
No ratings yet
Linear Search
12 pages
Kuder-Richardson in Confinement Routines
No ratings yet
Kuder-Richardson in Confinement Routines
5 pages
Deep Learning CNN
No ratings yet
Deep Learning CNN
26 pages

CSC423 - Lec12 - Distributed and Parallel ComputerSystems

Uploaded by

CSC423 - Lec12 - Distributed and Parallel ComputerSystems

Uploaded by

Distributed and Parallel Computer Systems

Distributed Systems' Processes-2

5/18/2022 Dr/ Ayman Soliman 2

➢ Usage table entries can be positive, zero, or negative.

5/18/2022 Dr/ Ayman Soliman 3

5/18/2022 Dr/ Ayman Soliman 4

• When a process is created, the machine on which it

• it should be observed that under conditions of heavy

• Algorithm is one initiated by an underloaded receiver.

• Each processor advertises its approximate price by putting it in a

• Each processor does its own local scheduling (assuming that it

• When a group of related, heavily interacting processes are all

• The basic difficulty can be illustrated by an example in which

• Several algorithms based on a concept he calls co-scheduling, which

• The first algorithm uses a conceptual matrix in which each column is

• A system is said to fail when it does not meet its specification.

• Faults are generally classified as transient, intermittent, or permanent.

• Transient faults occur once and then disappear.

• An intermittent fault occurs, then vanishes, then reappears, and so on.

• Two types of processor faults can be distinguished:

• If one processor sends a message to another, it is guaranteed to get a

• System that has the property of always responding to a message within a

• A system not having this property is said to be asynchronous.

• Asynchronous systems are going to be harder to deal with than

• It is used in biology (mammals have two eyes, two ears, etc.),

Triple modular redundancy

• The essential idea of the primary-backup method is that at any one

• Primary-backup fault tolerance has two major advantages over active

• The problems associated with ordering these messages also disappear.

• Second, in practice it requires fewer machines, because at any instant one

• The general goal of distributed agreement algorithms is to have all the

• Examples are electing a coordinator, deciding whether to commit a transaction

• Different cases are possible depending on system parameters, including:

1. Are messages delivered reliably all the time?

2. Can processes crash?

3. Is the system synchronous or asynchronous?

• Let us look at the "easy" case of perfect processors but communication

• Even with nonfaulty processors (generals), agreement between even two

• If general / is loyal, then element / is his troop strength; otherwise, it is undefined.

• A recursive algorithm solves this problem under certain conditions.

• in step 4, each general examines the ith element of each of the

You might also like