Chap 5
Chap 5
• Caveat/Caution :
• Gain in performance
• Cost of increased bandwidth for maintaining replication
More on Replication
• A “consistency model” is a
CONTRACT between a DS
data-store and its processes.
• Wi(x)a – a write by process ‘i’ to item ‘x’ with a value of ‘a’. That
is, ‘x’ is set to ‘a’.
(Note: The process is often shown as ‘Pi’).
• Ri(x)b – a read by process ‘i’ from item ‘x’ producing the value
‘b’. That is, reading ‘x’ returns ‘b’.
• Time moves from left to right in all diagrams.
Consistency Models
• With Strict Consistency, all writes are instantaneously visible to all processes and absolute global
time order is maintained throughout the DS. This is the consistency model “Holy Grail” – not at
all easy in the real world, and all but impossible within a DS.
• So, other, Weaker Model (or “less strict”) models have been developed which has 2 types
Sequential and Casual Consistency Models.
• A weaker consistency model, which represents a relaxation of the rules. It is also possible to
implement.
Sequential Consistency (1)
Four valid execution sequences for these processes. The vertical axis is time.
Causal Consistency
Basic Idea:
• You don’t care that reads and writes of a series of operations are immediately
known to other processes. You just want the effect of the series itself to be
known.
Grouping Operations
Server-Initiated Replicas
• Introduction,
• Process resilience,
• Reliable client-server and group communication, Recovery
Basic Concepts (1/3)
• What is Failure?
– System is said to be in failure state when it cannot meet its promise.
• Why do Failure occurs?
– Failures occurs because of the error state of the system.
• What is the reason for Error?
– The cause of an error is called a fault
• Is there some thing ‘Partial Failure’?
• Faults can be Prevented, Removed and Forecasted.
• Can Faults be Tolerated by a system also?
Basic Concepts (2/3)
• What characteristics makes a system Fault Tolerant?
– Availability: System is ready to used immediately.
– Reliability: System can run continuously without failure.
– Safety: Nothing catastrophic happens if a system temporarily fails.
– Maintainability: How easy a failed system can be repaired.
Timing Failure A server's response lies outside the specified time interval
• Flat Group
• Advantage: Symmetrical and has no single point failure
• Disadvantage: Decision making is more complicated. Voting
• Hierarchical Group
• Advantage: Make decision without bothering others
• Disadvantage: Lost coordinator Entire group halts
Process Groups (2/2)
Group Membership
• Group Server (Client Server Model)
– Straight forward, simple and easy to implement
– Major disadvantage Single point of failure
• Distributed Approach (P2P Model)
– Broadcast message to join and leave the group
– In case of fault, how to identify between a really dead and a dead slow member
– Joining and Leaving must be synchronized on joining send all previous messages to the new member
– Another issue is how to create a new group?
Failure Masking & Replication
Red Troop
5000
1. Let us attack at 6
3. I got your message.
AM. 4. I2. Ok, that
knew that you
is
Attack
Attack got mygood.
message.
3000 3000
The same as in previous slide, except now with 2 loyal generals and one traitor.
63
Server Crashes (3)
64
Server Crashes (4)
• Figure. Different combinations of client and server strategies in the presence of server crashes.
Basic Reliable-Multicasting Schemes
Figure :A simple solution to reliable multicasting when all receivers are known and are assumed not to fail. (a) Message transmission.
(b) Reporting feedback.
66
Nonhierarchical Feedback Control
Figure : Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the
suppression of others.
Recovery
Introduction
❑ Goal: replace an erroneous/wrong state with an error-free state
❑ Forward recovery: an attempt is made to bring the system in a correct new state from
which it can continue to execute
Stable Storage
a) Stable Storage
b) Crash after drive 1 is
updated
c) Bad spot
Stable storage is well suited to applications that require a high degree of fault tolerance
Checkpointing
❑ Fault tolerance is defined as the characteristic by which a system can mask the
occurrence and recovery from failures
❑ There exist several types of failures : Crash failure, Omission failure, Timing failure,
Response failure, Arbitrary/Byzantine failure
❑ Group membership change agreement on the same list of members using commit
protocol