Week-04
Week-04
Introduction to Fault
Tolerance
Fault Classification.
Objectives
Failure Classification.
Failure Masking.
Fault Tolerance
Objectives
Failure Masking and Replications.
Replicated-write
Solutions correspond to These groups have no
protocols are used in the
organizing a collection of single point of failure, at
form of active replication,
identical processes into the cost of distributed
as well as by means of
a flat group. coordination.
quorum-based protocols.
Reliable Client-Server
Communication
Objectives
RPC Semantics in the Presence
of Failures.
Reliable Client-Server Communication
Crash failures of
Reliable point-to- connections are not
TCP masks omission
point communication masked. The only
failures, which occur
is established by way to mask such
in the form of lost
making use of a failures is to let the
messages by using
reliable transport distributed system
acknowledgments
protocol, such as attempt to
and retransmissions.
TCP. automatically set up
a new connection.
Remote Procedure Call (RPC)
mechanism works well as long as both
the client and server function perfectly.
RPC
Semantics
in the
Presence Five classes of RPC failure can be
identified:
of Failures
The
The reply
request The server The client
The client message
message crashes crashes
is unable from the
from the after after
to locate server to
client to receiving a sending a
the server. the client
the server request. request.
is lost.
is lost.
Server in Client-Server Communication
Server crashes
• At least once semantics: A guarantee is
are dealt with by given that the RPC occurred at least once, but
implementing (also) possibly more that once.
• At most once semantics: A guarantee is
one of three given that the RPC occurred at most once, but
possible possibly not at all.
• No semantics: Nothing is guaranteed, and
implementation client and servers take their chances.
philosophies:
Client in Client-Server Communication
Objectives
Reliable-Multicasting Schemes.
Reliable Group Communication
(a) Message transmission – note that the third receiver is expecting 24.
(b) Reporting feedback – the third receiver informs the sender.
Distributed Commit
Objectives
Distributed Commit Protocol
Phases.
Distributed Commit
With distributed
In the case of transactions, the Other examples of
reliable operation may be distributed commit,
multicasting, the the commit of a and how it can be
operation is the transaction at a solved are
delivery of a single site that discussed in
message. takes part in the Tanisch (2000).
transaction.
Distributed Commit Cont …
Commit protocol
is distributed into
three types:
Objectives
Types of Recovery.
Recovery
Recovery Disadvantages: