0% found this document useful (0 votes)
21 views37 pages

Chapter 8-Fault Tolerance

about Fault_Tolerance

Uploaded by

mtaddis19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views37 pages

Chapter 8-Fault Tolerance

about Fault_Tolerance

Uploaded by

mtaddis19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Chapter 8 - Fault Tolerance

1
 Introduction
 a major difference between distributed systems and single
machine systems is that with the former, partial failure is
possible, i.e., when one component in a distributed system
fails
 such a failure may affect some components while others will
continue to function properly
 an important goal of distributed systems design is to
construct a system that can automatically recover from
partial failure
 it should tolerate faults and continue to operate to some
extent

2
 Objectives of the Chapter
 we discuss
 fault tolerance, and making distributed systems fault
tolerant
 process resilience (techniques by which one or more
processes can fail without seriously disturbing the rest of
the system)
 reliable multicasting to keep processes synchronized (by
which message transmission to a collection of processes
is guaranteed to succeed)
 distributed commit protocols for ensuring atomicity in
distributed systems
 failure recovery by saving the state of a distributed
system (when and how)

3
8.1 Introduction to Fault Tolerance
 Basic Concepts
 fault tolerance is strongly related to dependable systems
 dependability covers the following
 availability
 refers to the probability that the system is operating
correctly at any given time; defined in terms of an
instant in time
 reliability
 a property that a system can run continuously
without failure; defined in terms of a time interval
 safety
 refers to the situation that even when a system
temporarily fails to operate correctly, nothing
catastrophic happens
 maintainability
 how easily a failed system can be repaired

4
 dependable systems are also required to provide a high degree
of security
 a system is said to fail when it cannot meet its promises; for
instance failing to provide it users one or more of the services
it promises
 an error is a part of a system’s state the may lead to a failure;
e.g., damaged packets in communication
 the cause of an error is called a fault
 building dependable systems closely relates to controlling
faults
 a distinction is made between preventing, removing, and
forecasting faults
 a fault tolerant system is a system that can provide its services
even in the presence of faults

5
 faults are classified into three
 transient
 occurs once and then disappears; if the operation is
repeated, the fault goes away; e.g., a bird flying through a
beam of a microwave transmitter may cause some lost
bits
 intermittent
 it occurs, then vanishes on its own accord, then
reappears, ...; e.g., a loose connection; difficult to
diagnose; take your sick child to the nearest clinic, but
the child does not show any sickness by the time you
reach there
 permanent
 one that continues to exist until the faulty component is
repaired; e.g, disk head crash, software bug

6
 Failure Modes - 5 of them
 Crash failure: a server halts, but was working correctly until it
stopped
 Omission failure: a server fails to respond to incoming requests
 Receive omission: a server fails to receive incoming
messages; e.g., may be no thread is listening
 Send omission: a server fails to send messages
e.g.…send buffer overflows
 Timing failure: a server's response lies outside the specified time
interval; e.g., may be it is too fast over flooding the receiver or
too slow
 Response failure: the server's response is incorrect
 Value failure: the value of the response is wrong; e.g., a search
engine returning wrong Web pages as a result of a search
 State transition failure: the server deviates from the correct
flow of control; e.g., taking default actions when it fails to
understand the request

7
 Arbitrary failure (or Byzantine failure): a server may
produce arbitrary responses at arbitrary times; most
serious
 Failure Masking by Redundancy
 to be fault tolerant, the system tries to hide the occurrence
of failures from other processes - masking
 the key technique for masking faults is redundancy
 three kinds are possible
 information redundancy; add extra bits to allow recovery
from garbled bits (error correction)
 time redundancy: an action is performed more than once
if needed; e.g., redo an aborted transaction; useful for
transient and intermittent faults
 physical redundancy: add (replicate) extra equipment
(hardware) or processes (software)

8
8.2 Process Resilience
 how can fault tolerance be achieved in distributed
systems?
 one method is protection against process failures by
replicating processes into groups
 we discuss
 what are the general design issues of process groups
 what actually is a fault tolerant group
 how to reach agreement within a process group when
one or more of its members cannot be trusted to give
correct answers

9
 Design Issues
 the key approach to tolerating a faulty process is to organize
several identical processes into a group
 all members of a group receive a message hoping that if one
process fails, another one will take over
 process groups may be dynamic
 new groups can be created and old groups can be destroyed
 a process can join or leave a group
 a process can be a member of several groups at the same
time
 hence group management and membership mechanisms are
required
 groups may be flat (all processes are equal) or hierarchical (a
coordinator and several workers)

10
(a) communication in a flat group
(b) communication in a simple hierarchical group

 the flat group has no single point of failure, but decision making is
more complicated (voting may be required for decision making)
 the hierarchical group has the opposite properties
 group membership may be handled
 through a group server where all requests (joining, leaving, ...)
are sent; it has a single point of failure
 in a distributed way (membership is multicasted)

11
 another important issue is how much replication is needed
 a system is said to be k fault tolerant if it can survive faults
in k components and still meets its specifications
 if the processes fail silently, then having k+1 replicas is
enough; if k of them fail, the remaining one can function
 if processes exhibit Byzantine failure, 2k+1 replicas are
required; if the k processes generate the same reply
(wrong), the k+1 will also produce the same answer
(correct); but which of the two is correct is difficult to
ascertain; unless we believe the majority

12
8.3 Reliable Client-Server Communication
 fault tolerance in distributed systems concentrates on faulty
processes
 but communication failures also have to be considered
 a communication channel may exhibit failures in the form of
 crash
 omission
 timing
 arbitrary (duplicate messages as a result of buffering at
nodes and the sender retransmitting)
 Point-to-Point Communication
 reliable transport protocols such as TCP can be used that
mask most communication failures such as omissions (lost
messages) using acknowledgements and retransmissions
 however, crash failures are often not masked; e.g., a
broken TCP connection; in which case the system may
probably setup a new connection automatically

13
 RPC Semantics in the Presence of Failures
 the goal of RPC is to hide communication by making
remote procedure calls look like local ones; may be difficult
to mask remote calls when there are failures
 five different classes of failures can occur in RPC systems,
each requiring a different solution
 the client is unable to locate the sever
 the client’s request to the server is lost
 the server receives the request and crashes
 the reply message from the server to the client is lost
 the client crashes after sending a request

14
1.the client is unable to locate the sever
 may be the client is unable to locate a suitable server or the
server is down
 or an interface at the server has been changed making the
client’s stub obsolete (it was not used for a long time) and the
binder failing to match it with the server
 one solution: let the client raise an exception and have
exception handlers; inserted by the programmer
 examples are exceptions in Java or signal handlers in C
 but
 not every language may provide exceptions or signals
 the major objective of having transparency is violated

2.the client’s request to the server is lost


 let the OS or the client stub start a timer when sending a
request; if the timer expires before a reply or
acknowledgement comes back, the message is retransmitted
15
3.the server receives the request and crashes
 the failure can happen before or after execution

a server in client-server communication; (a) normal case, (b) crash after execution, (c)
crash before execution

 in (b), let the system report failure back to the client (raise
an exception)
 in (c) retransmit the request
 but the client’s OS doesn’t know which is; only a timer has
expired

16
note that there are three events that can happen at the server: send the
completion message (M), print the text (P), and crash (C). These events can
occur in six different orderings:
 M ~P ~C: A crash occurs after sending the completion
message and printing the text.
 M ~C (~P): A crash happens after sending the completion
message, but before the text could be printed.
 p ~M ~C: A crash occurs after sending the completion
message and printing the text
 P~C( ~M): The text printed, after which a crash occurs before
the completion message could be sent.
 C (~P ~M): A crash happens before the server could do
anything.
 C(~M ~P): A crash happens before the server could do
anything.
17
 three possible solutions
 the client tries the operation again after the server has
restarted (or rebooted) or rebinds to a new server, called at
least once semantics; keep on trying until a reply is
received; it guarantees that the RPC has been carried out at
least one time, but possibly more
 the client gives up after the first attempt and reports error,
called at most once semantics; it guarantees that the RPC
has been carried out at most one time, but possibly none at
all
 no guaranty at all (0-N times)
 none of the above is the right solution; what is required is
exactly once semantics; but difficult to achieve

18
4.the reply message from the server to the client is lost
 the client sets a timer and resends
 there is a risk that the request is sent more than once; may
be the server was slow
 however, if the operation is idempotent (an operation such as
requesting a read that can be repeated without any harm), it
does not create any problem; but what if the request is to
transfer money from one account to another?
 let the client assign a sequence number to each request so
that the server knows it is a duplicated one; but needs the
server to maintain information on each client
 in addition, we can have a bit in the message header that is
used to distinguish initial requests from retransmissions so
as to take more care on retransmissions

19
5.the client crashes after sending a request
 in this case a computation may be done without a
parent waiting for the result; such unwanted
computations are called orphans
 orphans
 waste CPU cycles
 can lock valuable resources such as files
 can cause confusion if the client reboots and does the RPC
again and the reply from the orphan comes back
immediately afterward
 proposed solutions
 extermination: let the client stub maintain a log before
sending an RPC message; after a reboot, inspect the log
and kill the orphan; some problems
 the expense of writing a log for every RPC
 it may not work if the orphans themselves do RPC, creating
grandorphans that are difficult to locate
20
 reincarnation: divide time into sequentially numbered epochs;
when a client reboots, it broadcasts a message to all machines
declaring the start of a new epoch; when such a broadcast
comes in, all remote computations on behalf of that client are
killed
 gentle reincarnation: when an epoch broadcast comes in, each
machine checks to see if it has remote computations, and if so,
tries to locate their owner; a computation is killed only if the
owner cannot be found
 expiration: each RPC is given a standard amount of time, T, to
do the job; if it cannot finish, it must explicitly ask for another
quantum; if after a crash the client waits a time T before
rebooting, all orphans are killed; the problem is how to choose
a reasonable value of T, since RPCs have differing needs

21
Group 8 topic assignment:
Reliable Group Communication

22
8.4 Reliable Group Communication
 how to reliably deliver messages to a process group
(multicasting)
 Basic Reliable-Multicasting Schemes
 reliable multicasting means a message sent to a process
group should be delivered to each member of that group
 transport protocols do not offer reliable communication
to a collection of processes
 problems:
 what happens if a process joins a group during
communication?
 what happens if a (sending) process crashes during
communication?
 what if there are faulty processes?
 a weaker solution assuming that all receivers are known
(and they are limited) and that none will fail is for the
sending process to assign a sequence number to each
message and to buffer all messages so that lost ones
can be retransmitted

23
a simple solution to reliable multicasting when all receivers are known and are assumed not to fail; (a) message
transmission, (b) reporting feedback

24
 Scalability in Reliable Multicasting
 a reliable multicast scheme does not support large number
of receivers; there will be feedback implosion
 two approaches for solving the problem
1. Nonhierarchical Feedback Control
 reduce the number of feedback messages returned to the
sender using feedback suppression
 the Scalable Reliable Multicasting Protocol (SRM) uses
this principle
 only negative acks are returned as feedback and they are
multicasted to all so that others that missed the message
will refrain from sending a negative ack
 a receiver R that did not receive a message M schedules
a feedback message with some random delay T; if
another message for retransmitting M reaches R, R
suppresses its transmission; else it transmits
 the sender will receive only a single negative ack and
retransmits M to all 25
several receivers have scheduled a request for retransmission, but the first
retransmission request leads to the suppression of others
 feedback suppression scales well
 disadvantages
 ensuring that only one feedback will reach the sender is
difficult; there is a possibility of many receivers sending
their feedback at the same time
 receivers that successfully receive a message waste their
time processing negative acks and the duplicated message
26
2. Hierarchical Feedback Control
 hierarchical solutions are better to achieve scalability for
very large groups of receivers
 the group of receivers is partitioned into a number of
subgroups, subsequently organized into a tree; the
subgroup containing the sender forms the root of the tree
 within each subgroup, use any reliable multicasting scheme
that works for small groups
 each subgroup appoints a local coordinator, responsible for
handling retransmission requests in its subgroup; hence it
has its history buffer
 dynamically constructing the tree is a problem

building reliable multicast schemes that can scale to a large


number of receivers spread in a WAN is a difficult problem;
more research is required

27
the essence of hierarchical reliable multicasting; each local coordinator forwards the message to its children
and later handles retransmission requests

28
 Atomic Multicast
 how to achieve reliable multicasting in the presence of
process failures
 for example, in a replicated database, how to handle update
operations when a replica crashes during update operations
 the atomic multicast problem: to guarantee that a message is
delivered to either all processes or none at all and that
messages are delivered in the same order to all processes

29
 Virtual Synchrony
 consider the following model in which the distributed
system consists of a communication layer for sending and
receiving messages; a received message is locally
buffered in this layer until it is delivered to the application

the logical organization of a distributed system to distinguish between message receipt and message delivery

30
 the whole idea of atomic multicasting is that a multicast
message m is uniquely associated with a list of processes to
which it should be delivered; this delivery list corresponds to a
group view G, the view on the set of processes contained in the
group, which the sender had at the time message m was
multicast
 a view change takes place by multicasting a message vc
announcing the joining or leaving of a process, while the
multicasting takes place
 then there are two messages in transit: m and vc; we need a
guarantee that m is either delivered to all processes in G before
each of them receives vc, or m is not delivered at all
 a reliable multicast guarantees that a message multicast to
group G is delivered to each non-faulty process in G; if the
sender crashes during the multicast, the message may either
be delivered to all remaining processes or ignored by each of
them; a reliable multicast with this property is said to be
virtually synchronous
31
 e.g., 4 processes; P3 crashes after successfully multicasting a
message to P2 and P4 but not to P1; virtual synchrony guarantees
that the message is not delivered at all; later P3 can join after its
state has been brought up to date

 hence, all multicasts take place between view changes; a view


change acts as a barrier across which no multicast can pass;
similar to synchronization variables used in distributed data
stores
32
 Message ordering
 there are four different orderings
1. Unordered multicasts
 a reliable, unordered multicast is a virtual synchronous
multicast in which no guarantees are given concerning
the order in which received messages (by the
communication layer) are delivered by different
processes

Process P1 Process P2 Process P3


sends m1 receives m1 receives m2
sends m2 receives m2 receives m1
three communicating processes in the same group; the ordering of events per process is
shown along the vertical axis

33
2. FIFO-ordered multicasts
 for reliable FIFO-ordered multicasts, the communication
layer is forced to deliver incoming messages from the same
process in the same order as they have been sent
 in the following example, m1 must always be delivered
before m2; same to m3 and m4; but no constraint regarding
messages originating from different processes
Process P1 Process P2 Process P3 Process P4
sends m1 receives m1 receives m3 sends m3
sends m2 receives m3 receives m1 sends m4
receives m2 receives m2
receives m4 receives m4
four processes in the same group with two different senders, and a possible delivery
order of messages under FIFO-ordered multicasting
3. Causally-ordered multicasts
 reliable causally-ordered multicast delivers messages so that
potential causality between different messages is preserved

34
4. Totally-Ordered Multicasts
 besides the above three, a further constraint can be used;
regardless of whether message delivery is unordered, FIFO
ordered, or causally ordered, when messages are delivered,
they are delivered in the same order to all group members
 i.e., totally-ordered multicast is used in combination with
one of the three
 in the previous example of FIFO-ordered multicast, P2 and
P3 must deliver messages m1, m2, m3, and m4 in the same
order (still respecting FIFO ordering)

35
 virtually synchronous reliable multicasting offering totally-ordered delivery
of messages is called atomic multicasting
 hence, there are a total of six forms of reliable multicasting (the three
message orderings combined with atomicity)

Total-ordered
Multicast Basic Message Ordering
Delivery?
Reliable unordered multicast None No
FIFO multicast FIFO-ordered delivery No
Causal multicast Causal-ordered delivery No
Atomic multicast None Yes
FIFO atomic multicast FIFO-ordered delivery Yes
Causal atomic multicast Causal-ordered delivery Yes

six different versions of virtually synchronous reliable multicasting

36
Thank you!
?

You might also like