0% found this document useful (0 votes)

58 views21 pages

Ds Chapter 7

The document provides an introduction to distributed systems with a focus on fault tolerance, outlining key concepts such as types of faults, errors, and failures. It discusses methods for achieving fault tolerance, including redundancy and process resilience, as well as reliable communication strategies in client-server and group communication contexts. Additionally, it touches on failure models and the challenges associated with remote procedure calls (RPC) in the presence of failures.

Uploaded by

esubalewsintie1302

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views21 pages

Ds Chapter 7

Uploaded by

esubalewsintie1302

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Bahir Dar Institute of Technology

Faculty of Computing Network and Internet Chair

Introduction of Distributed System

Content
 Fault Tolerance
 Introduction to Fault Tolerance
 Process Resilience
 Reliable Client-Server Communication
 Reliable Group Communication
 Distributed Commit Recovery
Basic Concepts

 Fault Tolerance is closely related to the notion of

“Dependability”.
 In Distributed Systems, this is characterized under a number of
headings:
 Availability – the system is ready to be used immediately.
 Reliability – the system can run continuously without failure.
 Safety – if a system fails, nothing catastrophic will happen.
 Maintainability – when a system fails, it can be repaired easily
What Is A fault ?

 A fault is a defect or flaw in a system's hardware, software,

or design that has the potential to cause an error.
 Faults can be classified into several types, including:
 Hardware Faults
 Software Faults
 Design Faults
 Operational Faults
Error in Distributed System

 An error is the manifestation of a fault.

 It is an incorrect state or condition within the system that can
potentially lead to a failure.
 Errors can be transient, intermittent, or permanent:
 Transient Errors: Temporary errors that disappear without
intervention.
 Intermittent Errors: Errors that occur sporadically and
unpredictably.
 Permanent Errors: Persistent errors that continue until
corrective action is taken.
Failure In Distributed System
 A system is said to fail when it cannot meet its promises.

 A failure is brought about by the existence of errors in the

system.

 The cause of an error is called a fault.

Types of Fault
There are three main types of ‘fault’:

 Transient Fault – appears once, then disappears.

 Intermittent Fault – occurs, vanishes, reappears; but: follows

no real pattern (worst kind).

 Permanent Fault – once it occurs, only the replacement/repair

of a faulty component will allow the DS to function normally.
Failure Models

 Crash Failure: The system stops functioning and does not

respond to any inputs.
 Omission failure: a server fails to respond to incoming requests
 Timing Failure: The system's response is either too early or too
late, violating timing constraints.
 Response Failure: system produces an incorrect response or
output.
 Receive omission: a server fails to receive incoming
messages; e.g., may be no thread is listening
 Send omission: a server fails to send messages
 Value failure: the value of the response is wrong; e.g., a
search engine returning wrong Web pages as a result of a
search
Failure Masking by Redundancy
 If a system is to be fault tolerant, the best it can do is to try to
hide the occurrence of failures from other processes.
 information redundancy:-add extra bits to allow recovery from
garbled bits (error correction)
 time redundancy:- an action is performed more than once if
needed; Ex. an aborted transaction; useful for transient and
intermittent faults
 physical redundancy:- add extra equipment (HW) or
Process Resilience
 Processes can be made fault tolerant by arranging to have a
group of processes, with each member of the group being
identical.

 A message sent to the group is delivered to all of the “copies”

of the process (the group members), and then only one of
them performs the required service.

 If one of the processes fail, it is assumed that one of the others

will still be able to function (and service any pending request
or operation).
Flat vs. Hierarchical Groups
a) Communication in a flat group – all the processes are equal,
decisions are made collectively. Note: no single point-of-failure,
however: decision making is complicated as consensus is
required.
b) Communication in a simple hierarchical group – one of the
processes is elected to be the coordinator, which selects another
process (a worker) to perform the operation. Note: single point-
of-failure, however: decisions are easily and quickly made by
the coordinator without first having to get consensus.
Reliable Client-Server Communication
 Fault tolerance in distributed systems concentrates on faulty
processes
 but communication failures also have to be considered
 a communication channel may exhibit failures in the form of
 crash
 omission
 timing
 arbitrary (duplicate messages as a result of buffering at
nodes and the sender retransmitting)
Point-to-Point Communication
 reliable transport protocols such as TCP can be used that
mask most communication failures such as omissions (lost
messages) using acknowledgements and retransmissions
RPC Semantics in the Presence of Failures

the goal of RPC is to hide communication by making remote

procedure calls look like local ones.
five different classes of failures can occur in RPC systems, each
requiring a different solution
 The client cannot locate the server, so no request can be sent.
 The client’s request to the server is lost, so no response is
returned by the server to the waiting client.
 The server crashes after receiving the request, and the service
request is left acknowledged, but undone.
 The server’s reply is lost on its way to the client, the service
has completed, but the results never arrive at the client
 The client crashes after sending its request, and the server
sends a reply to a newly-restarted client that may not be
expecting it.
Reliable Group Communication
 Reliable multicast services guarantee that all messages are
delivered to all members of a process group.
 This is a simple solution to reliable multicasting when all
receivers are known and are assumed not to fail.
 The sending process assigns a sequence number to outgoing
messages (making it easy to spot when a message is
missing).
a) Message transmission – note that the third receiver is
expecting 24.
b) Reporting feedback – the third receiver informs the sender.
c) But, how long does the sender keep its history-buffer
populated?
d) Also, such schemes perform poorly as the group grows …
there are too many ACKs.
Read Assignment

 Distributed Commit Recovery in distributed system?

 What is 3 phase commit in distributed system?
Thank You!!!

Module On Computer Networking and Information Security: This Module Was Prepared by The Wolaita Sodo University
No ratings yet
Module On Computer Networking and Information Security: This Module Was Prepared by The Wolaita Sodo University
254 pages
265 - DCCN Lecture Notes
100% (1)
265 - DCCN Lecture Notes
132 pages
Ds Chapter 5
No ratings yet
Ds Chapter 5
31 pages
Chapter 4 Exception Handling
No ratings yet
Chapter 4 Exception Handling
27 pages
Distributed Systems Principles and Paradigms: Chapter 05: Naming
No ratings yet
Distributed Systems Principles and Paradigms: Chapter 05: Naming
56 pages
CHAPTER-6 File and IO
No ratings yet
CHAPTER-6 File and IO
40 pages
Computer Security (Chapter-1)
No ratings yet
Computer Security (Chapter-1)
43 pages
Module 4: Memory Management
No ratings yet
Module 4: Memory Management
35 pages
Chapter 6 Synchronization
No ratings yet
Chapter 6 Synchronization
37 pages
I. Programming Paradigms
No ratings yet
I. Programming Paradigms
13 pages
VB Chapter 3
No ratings yet
VB Chapter 3
69 pages
Chapter Four - Arrays and String Manipulation
No ratings yet
Chapter Four - Arrays and String Manipulation
16 pages
OS Chapter IV Device Management
100% (1)
OS Chapter IV Device Management
5 pages
Bahir Dar Chapter Two
No ratings yet
Bahir Dar Chapter Two
62 pages
Chapter 5 - MultiThreading
No ratings yet
Chapter 5 - MultiThreading
36 pages
Chapter 8 PDF
No ratings yet
Chapter 8 PDF
6 pages
Designing ProLog
No ratings yet
Designing ProLog
17 pages
Unit Vi
No ratings yet
Unit Vi
27 pages
Updated Chapter One - Introduction
No ratings yet
Updated Chapter One - Introduction
24 pages
Chapter 3 - Inheritance and Polymorphism
100% (1)
Chapter 3 - Inheritance and Polymorphism
37 pages
Chapter 8 Input-Output Organization
No ratings yet
Chapter 8 Input-Output Organization
31 pages
Presentation 1
No ratings yet
Presentation 1
18 pages
Computer Security 1 Introduction
No ratings yet
Computer Security 1 Introduction
37 pages
Distributed System
No ratings yet
Distributed System
14 pages
Chapter Three: Data Encoding, Data Transmission and Multiplexing
No ratings yet
Chapter Three: Data Encoding, Data Transmission and Multiplexing
27 pages
Chapter 2 Part One - AWT and Swing-Event
No ratings yet
Chapter 2 Part One - AWT and Swing-Event
46 pages
Chapter 4 Network Management
No ratings yet
Chapter 4 Network Management
37 pages
IR Chap3
No ratings yet
IR Chap3
45 pages
Unit 1 Introduction To Computer Security: COSC 4035
0% (1)
Unit 1 Introduction To Computer Security: COSC 4035
47 pages
Advanced Computer Networking: Prepared By: Mikeyas Meseret ID: PGR/47308/13
No ratings yet
Advanced Computer Networking: Prepared By: Mikeyas Meseret ID: PGR/47308/13
25 pages
Chapter Two HTML: Internet Programming Compiled By:tadesse K
No ratings yet
Chapter Two HTML: Internet Programming Compiled By:tadesse K
162 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Chapter 3 Data Storage
No ratings yet
Chapter 3 Data Storage
51 pages
Chapter 2 Processes and Process Management
No ratings yet
Chapter 2 Processes and Process Management
115 pages
Chapter 1 - Arrays and Strings
No ratings yet
Chapter 1 - Arrays and Strings
45 pages
CSD 205 - Design and Analysis of Algorithms: Instructor: Dr. M. Hasan Jamal Lecture# 01: Introduction
100% (1)
CSD 205 - Design and Analysis of Algorithms: Instructor: Dr. M. Hasan Jamal Lecture# 01: Introduction
101 pages
Starting
No ratings yet
Starting
82 pages
Chapter 10 Applet
No ratings yet
Chapter 10 Applet
20 pages
Hapter: Simple Sorting and Searching Algorithms
No ratings yet
Hapter: Simple Sorting and Searching Algorithms
27 pages
What Is The Difference of Program Flowchart and System Flowchart?
100% (1)
What Is The Difference of Program Flowchart and System Flowchart?
5 pages
Chapter 6 Networkingin Java
No ratings yet
Chapter 6 Networkingin Java
59 pages
Chapter 4 (Part II) - Array and Strings
No ratings yet
Chapter 4 (Part II) - Array and Strings
38 pages
Network Design
No ratings yet
Network Design
42 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
Intro To DSs Lab 2 - RMI Lab 2 (SimpleRmiCalculator 1)
100% (1)
Intro To DSs Lab 2 - RMI Lab 2 (SimpleRmiCalculator 1)
3 pages
Introduction To Computer Architecture
No ratings yet
Introduction To Computer Architecture
81 pages
Chapter One Java
100% (1)
Chapter One Java
45 pages
Chapter 2 Design Principles
100% (1)
Chapter 2 Design Principles
20 pages
Chapter 3 Data Representation and Computer Arithmetic
No ratings yet
Chapter 3 Data Representation and Computer Arithmetic
13 pages
Software Crisis
No ratings yet
Software Crisis
4 pages
C Complete Notes
100% (1)
C Complete Notes
188 pages
Principles of Concurrency
No ratings yet
Principles of Concurrency
7 pages
Powerpoint 2010 - Quick Guide - Tutorialspoint
No ratings yet
Powerpoint 2010 - Quick Guide - Tutorialspoint
265 pages
OS Concepts Chapter 2 Solution To Practice Exercises Part 2
100% (2)
OS Concepts Chapter 2 Solution To Practice Exercises Part 2
2 pages
Chapter Two - Text Operations and Automatic Indexing: 2.1. Text Acquisition Via Crawler
No ratings yet
Chapter Two - Text Operations and Automatic Indexing: 2.1. Text Acquisition Via Crawler
19 pages
The Elements of Event Driven Programs
No ratings yet
The Elements of Event Driven Programs
3 pages
Chapter - 6 Managing Network Services
No ratings yet
Chapter - 6 Managing Network Services
49 pages
Chapter 4 Javascript
No ratings yet
Chapter 4 Javascript
80 pages
Dsa Basic Data Structure
No ratings yet
Dsa Basic Data Structure
72 pages
Intro To DS Chapter 6
No ratings yet
Intro To DS Chapter 6
51 pages
Continental 6pk2200
No ratings yet
Continental 6pk2200
1 page
DLL Technical Drafting Exploratory
100% (1)
DLL Technical Drafting Exploratory
17 pages
Introduction To Financial Services
100% (1)
Introduction To Financial Services
11 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Answer Sheet 03. English
No ratings yet
Answer Sheet 03. English
4 pages
Deep Face Recognition: A Survey: Mei Wang, Weihong Deng
No ratings yet
Deep Face Recognition: A Survey: Mei Wang, Weihong Deng
31 pages
RIA3
No ratings yet
RIA3
4 pages
LDP v. COMELEC
No ratings yet
LDP v. COMELEC
12 pages
The Strengths and Weaknesses of Filipino Character A Socio-Cultural Issue
No ratings yet
The Strengths and Weaknesses of Filipino Character A Socio-Cultural Issue
7 pages
Safeguard Properties Property Preservation Photos Manual
100% (1)
Safeguard Properties Property Preservation Photos Manual
88 pages
Yanmar 6LPA STP2datasheet
No ratings yet
Yanmar 6LPA STP2datasheet
2 pages
Mozilla Thunderbird Cheat Sheet: Basic Search Keyboard Shortcuts
No ratings yet
Mozilla Thunderbird Cheat Sheet: Basic Search Keyboard Shortcuts
1 page
IPC 7530A Temp - Profile
No ratings yet
IPC 7530A Temp - Profile
52 pages
Cyber Work
No ratings yet
Cyber Work
5 pages
How To Make Scatter Graphs in InfoSWMM and H2OMap SWMM
No ratings yet
How To Make Scatter Graphs in InfoSWMM and H2OMap SWMM
2 pages
Pakistan A Deep State
No ratings yet
Pakistan A Deep State
3 pages
Quincke's Method Final
No ratings yet
Quincke's Method Final
4 pages
Candidate Written Assessment Demonstrate Communication Skills Level 5
No ratings yet
Candidate Written Assessment Demonstrate Communication Skills Level 5
6 pages
Lab - 1 Active Directory Installation
No ratings yet
Lab - 1 Active Directory Installation
32 pages
Ar 1
No ratings yet
Ar 1
31 pages
Application Letter - Nna 1 K44 - Khanh Linh Huong Anh Huong Giang
No ratings yet
Application Letter - Nna 1 K44 - Khanh Linh Huong Anh Huong Giang
20 pages
Anna Mwitwa The Zambia's Leading Lawyer
No ratings yet
Anna Mwitwa The Zambia's Leading Lawyer
2 pages
About Torrance 2008 Turnaround Brochure
No ratings yet
About Torrance 2008 Turnaround Brochure
3 pages
Covernote of Krishendu Roy
No ratings yet
Covernote of Krishendu Roy
1 page
Text Detection and Recognition For Images of Medical Laboratory Reports With A Deep Learning Approach
No ratings yet
Text Detection and Recognition For Images of Medical Laboratory Reports With A Deep Learning Approach
10 pages
Narrative Report
No ratings yet
Narrative Report
51 pages
Udsm - Call For Application For Admission Into Certificate of Law
No ratings yet
Udsm - Call For Application For Admission Into Certificate of Law
2 pages
SAIL Presentation
No ratings yet
SAIL Presentation
13 pages
Moonlight Restaurant BRM
No ratings yet
Moonlight Restaurant BRM
11 pages
Hypergeometric Distribution
No ratings yet
Hypergeometric Distribution
9 pages

Ds Chapter 7

Uploaded by

Ds Chapter 7

Uploaded by

Bahir Dar Institute of Technology

Faculty of Computing Network and Internet Chair

Introduction of Distributed System

 Fault Tolerance is closely related to the notion of

 A fault is a defect or flaw in a system's hardware, software,

 An error is the manifestation of a fault.

 A failure is brought about by the existence of errors in the

 The cause of an error is called a fault.

 Transient Fault – appears once, then disappears.

 Intermittent Fault – occurs, vanishes, reappears; but: follows

 Permanent Fault – once it occurs, only the replacement/repair

 Crash Failure: The system stops functioning and does not

 A message sent to the group is delivered to all of the “copies”

 If one of the processes fail, it is assumed that one of the others

the goal of RPC is to hide communication by making remote

 Distributed Commit Recovery in distributed system?

You might also like