0% found this document useful (0 votes)

22 views75 pages

Chap 5

Chapter 5 discusses replication, consistency, and fault tolerance in distributed systems. It covers the importance of replication for reliability and performance, various consistency models, and the challenges of maintaining consistency across replicas. Additionally, it addresses fault tolerance strategies, including redundancy and process resilience, to ensure system reliability in the face of failures.

Uploaded by

pankajguptace2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views75 pages

Chap 5

Uploaded by

pankajguptace2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

Chapter 5

Consistency, Replication and Fault Tolerance

Content….

5.1 Introduction to replication and consistency, Data-Centric and

Client-Centric Consistency Models, Replica Management

5.2 Fault Tolerance: Introduction, Process resilience, Reliable

client-server and group communication, Recovery
Reasons for Replication

• Data are replicated to increase the reliability of a system.

• Replication for performance:
• Scaling in numbers
• Scaling in geographical area

• Caveat/Caution :
• Gain in performance
• Cost of increased bandwidth for maintaining replication
More on Replication

• Replicas allows remote sites to continue working in the event of local

failures.
• It is also possible to protect against data corruption.
• Replicas allow data to reside close to where it is used.
• Even a large number of replicated “local” systems can improve
performance: think of clusters.
• This directly supports the distributed systems goal of enhanced
scalability.
Replication and Scalability
• Replication is a widely-used scalability technique: think of Web clients and Web
proxies.
• When systems scale, the first problems to surface are those associated with
performance – as the systems get bigger (e.g., more users), they get often slower.
• Replicating the data and moving it closer to where it is needed helps to solve this
scalability problem.
• A problem remains: how to efficiently synchronize all of the replicas created to
solve the scalability issue?
• Dilemma: adding replicas improves scalability, but incurs the (oftentimes
considerable) overhead of keeping the replicas up-to-date!!!
• As we shall see, the solution often results in a relaxation of any consistency
constraints.
Replication and Consistency

• But if there are many replicas of the same thing, how do we

keep all of them up-to-date? How do we keep the replicas
consistent?
• Consistency can be achieved in a number of ways, however, it
is not easy to keep all those replicas consistent
• We will study a number of consistency models
Replication and Consistency Models

• Data-Centric Consistency Models

• Client-Centric Consistency Models,
Data-centric Consistency Models

• A “consistency model” is a
CONTRACT between a DS
data-store and its processes.

• If the processes agree to the rules,

the data-store will perform
properly and as advertised.

The general organization of a logical data store, physically distributed

and replicated across multiple processes
Consistency Model Diagram Notation

• Wi(x)a – a write by process ‘i’ to item ‘x’ with a value of ‘a’. That
is, ‘x’ is set to ‘a’.
(Note: The process is often shown as ‘Pi’).
• Ri(x)b – a read by process ‘i’ from item ‘x’ producing the value
‘b’. That is, reading ‘x’ returns ‘b’.
• Time moves from left to right in all diagrams.
Consistency Models

• Behavior of two processes, operating on same data item:

• A strictly consistent data-store.
• A data-store that is not strictly consistent.

• With Strict Consistency, all writes are instantaneously visible to all processes and absolute global
time order is maintained throughout the DS. This is the consistency model “Holy Grail” – not at
all easy in the real world, and all but impossible within a DS.
• So, other, Weaker Model (or “less strict”) models have been developed which has 2 types
Sequential and Casual Consistency Models.
• A weaker consistency model, which represents a relaxation of the rules. It is also possible to
implement.
Sequential Consistency (1)

• Definition of “Sequential Consistency”:

• The result of any execution is the same as if the (read and write)
operations by all processes on the data-store were executed in the same
sequential order and the operations of each individual process appear in
this sequence in the order specified by its program.
Sequential Consistency (2)

(a) A sequentially consistent data store.

(b) A data store that is not sequentially consistent.
Sequential Consistency (3)

Three concurrently-executing processes

Sequential Consistency (4)

Four valid execution sequences for these processes. The vertical axis is time.
Causal Consistency

• This model distinguishes between events that are “causally

related” and those that are not.
• If event B is caused or influenced by an earlier event A, then
causal consistency requires that every other process see event A,
then event B.
• Operations that are not causally related are said to be concurrent.
Causal Consistency (1)

• For a data store to be considered causally consistent, it is necessary

that the store obeys the following conditions:
• Writes that are potentially causally related
• must be seen by all processes in the same order.
• Concurrent writes
• may be seen in a different order on different machines.
Causal Consistency

This sequence is allowed with a

causally-consistent store, but not with a sequentially consistent store
Causal Consistency

(a) A violation of a causally-consistent store

Causal Consistency (3)

(b) A correct sequence of events in a causally-consistent store

Grouping Operations

• Accesses to locks are sequentially consistent.

• No access to a lock is allowed to be performed until all previous writes have
completed everywhere.
• No data access is allowed to be performed until all previous accesses to locks
have been performed.

Basic Idea:
• You don’t care that reads and writes of a series of operations are immediately
known to other processes. You just want the effect of the series itself to be
known.
Grouping Operations

• At an acquire, all remote changes to guarded data must be brought up to

date.
• Before a write to a data item, a process must ensure that no other process
is trying to write at same time.

• Locks associate with individual data items, as opposed to the entire

data-store.
• Note: P2’s read on ‘y’ returns NIL as no locks have been requested.
Client-Centric Consistency Models

• The previously studied consistency models concern themselves with

maintaining a consistent (globally accessible) data-store in the
presence of concurrent read/write operations.

• Here, the emphasis is more on maintaining a consistent view of

things for the individual client process that is currently operating on
the data-store.
More Client-Centric Consistency

• How fast should updates (writes) be made available to read-only

processes?
• Think of most database systems: mainly read.
• Think of the DNS: write-write conflicts do no occur, only read-write conflicts.
• Think of WWW: as with DNS, except that heavy use of client-side caching is
present: even the return of stale pages is acceptable to most users.

• These systems all exhibit a high degree of acceptable

inconsistency … with the replicas gradually becoming consistent
over time.
Toward Eventual Consistency

• The only requirement is that all replicas will eventually be the

same.
• All updates must be guaranteed to propagate to all replicas …
eventually!
• This works well if every client always updates the same replica.
• Things are a little difficult if the clients are mobile.
Eventual Consistency

The principle of a mobile user accessing different replicas of a distributed database

Monotonic Reads (1)

A data store is said to provide monotonic-read

consistency if the following condition holds:
• If a process reads the value of a data item x , any
successive read operation on x by that process
will always return that same value or a more
recent value.
Monotonic Reads (2)

The read operations performed by a single process P at two different local

copies of the same data store.
(a) A monotonic-read consistent data store.
Monotonic Reads (3)

The read operations performed by a single process P at two different local

copies of the same data store.
(b) A data store that does not provide monotonic reads.
Monotonic Writes (1)

In a monotonic-write consistent store, the following condition

holds:
■ A write operation by a process on a data item x is
completed before any successive write operation on x by
the same process.
Monotonic Writes (2)

The write operations performed by a single process P at two different local

copies of the same data store.
(a) A monotonic-write consistent data store.
Monotonic Writes (3)

The write operations performed by a single process P at two different local

copies of the same data store.
(b) A data store that does not provide monotonic-write consistency.
Read Your Writes (1)

□ A data store is said to provide read-your-writes

consistency, if the following condition holds:
■ The effect of a write operation by a process on data
item x will always be seen by a successive read
operation on x by the same process.
Read Your Writes (2)

(a) A data store that provides read-your-writes consistency

Read Your Writes (3)

(b) A data store that does not

Writes Follow Reads (1)

• A data store is said to provide writes-follow-reads

consistency, if the following holds:
A write operation by a process on a data item x following a
previous read operation on x by the same process is guaranteed
to take place on the same or a more recent value of x that was
read.
Writes Follow Reads (2)

(a) A writes-follow-reads consistent data store

Writes Follow Reads (3)

(b) A data store that does not provide writes-follow-reads consistency

Replica Management

Content Replication and Placement

Regardless of which consistency model is chosen, we need to decide
where, when and by whom copies of the data-store are to be placed.

Choosing a proper cell size for server placement

Replica Management

There are three Replica Placement Types

1. Permanent replicas: tend to be small in number, organized as COWs
(Clusters of Workstations) or mirrored systems.
2. Server-initiated replicas: used to enhance performance at the
initiation of the owner of the data-store. Typically used by web
hosting companies to geographically locate replicas close
to where they are needed most. (Often referred to
as “push caches”).
3. Client-initiated replicas: created as a result of client requests – think
of browser caches. Works well assuming, of course, that the cached
data does not go stale too soon.
Replica Management

Server-Initiated Replicas

Counting access requests from different clients

Replica Management
Client-Initiated Replicas
• When a client initiates an update to a distributed data-store, what
gets propagated?
• There are three possibilities:
• Propagate notification of the update to the other replicas – this is an
“invalidation protocol” which indicates that the replica’s data is no longer
up-to-date. Can work well when there’s many writes.
• Transfer the data from one replica to another – works well when there’s
many reads.
• Propagate the update to the other replicas – this is “active replication”, and
shifts the workload to each of the replicas upon an “initial write”.
Replica Management
Push vs. Pull Protocols
Another design issue relates to whether or not the updates are pushed or
pulled?
1. Push-based/Server-based Approach: sent “automatically” by server, the
client does not request the update. This approach is useful when a high
degree of consistency is needed. Often used between permanent and
server-initiated replicas.
2. Pull-based/Client-based Approach: used by client caches (e.g.,
browsers), updates are requested by the client from the server. No
request, no update!
5.2 Fault Tolerance

• Introduction,
• Process resilience,
• Reliable client-server and group communication, Recovery
Basic Concepts (1/3)
• What is Failure?
– System is said to be in failure state when it cannot meet its promise.
• Why do Failure occurs?
– Failures occurs because of the error state of the system.
• What is the reason for Error?
– The cause of an error is called a fault
• Is there some thing ‘Partial Failure’?
• Faults can be Prevented, Removed and Forecasted.
• Can Faults be Tolerated by a system also?
Basic Concepts (2/3)
• What characteristics makes a system Fault Tolerant?
– Availability: System is ready to used immediately.
– Reliability: System can run continuously without failure.
– Safety: Nothing catastrophic happens if a system temporarily fails.
– Maintainability: How easy a failed system can be repaired.

• What is the availability and reliability of following systems?

– If a system goes down for one millisecond every hour
– If a System never crashes but is shut down for one weeks every March.
Basic Concepts (3/3)
• Classification of Faults
– Transient: Occurs once and than disappears (For example, a fault in the network might result in a
request that is being sent from one node to another to time out or fail.)
– Intermittent: Occurs, vanishes on its own accord, than reappears and so on (eg medical life support
equipment )
– Permanent: They occurs and doesn’t vanish until fixed manually.
• Can you classify the Faults caused by following situations?
– A flying bird obstructing the transmitting waves signals
– A loosely connected power plug
– Burnt out chips
– Software Bugs
• Which Fault you think is more difficult to detect and why?
Faults in Distributed Systems
• If in a Distributed Systems some fault occurs, the error may by in any of
– The collection of servers or
– Communication Channel or
– Even both
• However, out of order server itself may not always be the fault we are looking for.
Why?
• Dependency relations appear in abundance in DS.
• Hence, we need to classify failures to know how serious a failure actually is.
Failure Models

Type of Failure Description

Crash Failure A server halts, but is working correctly until it halts

Omission Failure A server fails to respond to incoming requests

•Receive omission A server fails to receive incoming messages
•Send omission A server fails to send messages

Timing Failure A server's response lies outside the specified time interval

Response Failure The server's response is incorrect

•Value The value of the response is wrong
•State Transition The server deviates from the correct flow of control

Arbitrary Failure A server may produce arbitrary responses at arbitrary times

Failure Masking by Redundancy (1/3)
• A system to be fault tolerant, the best it can do is try to hide the occurrence of
failure from other processes
• Key technique to masking faults is to use Redundancy.
– Information redundancy: Extra bits are added to allow recovery from garbled bits
– Time redundancy: An action is performed, and then, if need be, it is performed again.
– Physical redundancy: Extra equipment or processes are added
• Issue: How much redundancy is needed?
Failure Masking by Redundancy (2/3)
• Some Examples of Redundancy Schemes
– Hamming Code
– Transactions
– Replicated Processes or Components
– Aircraft has four engines, can fly with only three
– Sports game has extra referee.
Failure Masking by Redundancy (3/3)

Figure: Fault Tolerance in Electronic Circuits

• Triple modular redundancy:

– If two or three of the input are the same, the output is equal to that input.
– If all three inputs are different, the output is undefined.
Process Resilience
• Problem:
– How fault tolerance in distributed system is achieved, especially against Process Failures?
• Solution:
– Replicating processes into groups.
– Groups are analogous to Social Organizations.
– Consider collections of process as a single abstraction
– All members of the group receive the same message, if one process fails, the others can take over
for it.
– Process groups are dynamic and a Process can be member of several groups.
– Hence we need some management scheme for groups.
Process Groups (1/2)
Flat Group vs. Hierarchical Group

• Flat Group
• Advantage: Symmetrical and has no single point failure
• Disadvantage: Decision making is more complicated. Voting
• Hierarchical Group
• Advantage: Make decision without bothering others
• Disadvantage: Lost coordinator Entire group halts
Process Groups (2/2)
Group Membership
• Group Server (Client Server Model)
– Straight forward, simple and easy to implement
– Major disadvantage Single point of failure
• Distributed Approach (P2P Model)
– Broadcast message to join and leave the group
– In case of fault, how to identify between a really dead and a dead slow member
– Joining and Leaving must be synchronized on joining send all previous messages to the new member
– Another issue is how to create a new group?
Failure Masking & Replication

• Replicate Process and organize them into groups

• Replace a single vulnerable process with the whole fault tolerant Group
• A system is said to be K fault tolerant if it can survive faults in K components and still
meet its specifications.
• How much replication is needed to support K Fault Tolerance?
– K+1 or 2K+1 ?
Agreement in Faulty Systems
• Why we need Agreements?
• Goal of Agreement
– Make all the non-faulty processes reach consensus on some issue
– Establish that consensus within a finite number of steps.
• Problems of two cases
– Good process, but unreliable communication
• Example: Two-army problem
– Good communication, but crashed process
• Example: Byzantine generals problem
Two-army problem

Red Troop
5000
1. Let us attack at 6
3. I got your message.
AM. 4. I2. Ok, that
knew that you
is
Attack
Attack got mygood.
message.

3000 3000

Blue Troop Blue Troop

Command by X Command by Y
→ It is easy to show that X and Y will never reach agreement, no matter how many
acknowledgements they send. (Due to unreliable communication).
Byzantine generals problem

The Byzantine generals problem for 3 loyal generals and1 traitor.

a) The generals announce their troop strengths (in units of 1 thousand soldiers).
b) The vectors that each general assembles based on (a)
c) The vectors that each general receives in step 3.
Go forward one more step

More than two-thirds agreement

The same as in previous slide, except now with 2 loyal generals and one traitor.

Lamport proved that in a system with m faulty processes, agreement can be

achieved only if 2m+1 correctly functioning processes are present, for a total of
3m+1.
Reliable client-server communication

What about reliable point-to-point transport protocols ?

• TCP masks omission failures

– … by using ACKs & retransmissions
• … but it does not mask crash failures !
– E.g.: When a connection is broken, the client is only notified via an exception
RPC Semantics in the Presence of Failures

Five different classes of failures that can occur in RPC systems:

1.The client is unable to locate the server.
2.The request message from the client to the server is lost.
3.The server crashes after receiving a request.
4.The reply message from the server to the client is lost.
5.The client crashes after sending a request.
Server Crashes (1)

Figure 8-7. A server in client-server communication.

(a) The normal case.
(b) Crash after execution.
(c) Crash before execution.
Server Crashes (2)

• Three events that can happen at the server:

• Send the completion message (M),
• Print the text (P),
• Crash (C).

63
Server Crashes (3)

• These events can occur in six different orderings:

1. M →P →C: A crash occurs after sending the completion message and printing the text.
2. M →C (→P): A crash happens after sending the completion message, but before the text
could be printed.
3. P →M →C: A crash occurs after sending the completion message and printing the text.
4. P→C(→M): The text printed, after which a crash occurs before the completion message
could be sent.
5. C (→P →M): A crash happens before the server could do anything.
6. C (→M →P): A crash happens before the server could do anything.

64
Server Crashes (4)

• Figure. Different combinations of client and server strategies in the presence of server crashes.
Basic Reliable-Multicasting Schemes

Figure :A simple solution to reliable multicasting when all receivers are known and are assumed not to fail. (a) Message transmission.
(b) Reporting feedback.

66
Nonhierarchical Feedback Control

Figure : Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the
suppression of others.
Recovery
Introduction
❑ Goal: replace an erroneous/wrong state with an error-free state

❑ Backward recovery: to restore such a recorded state when things go wrong

Combining checkpoints and message logging

❑ Forward recovery: an attempt is made to bring the system in a correct new state from
which it can continue to execute
Stable Storage

a) Stable Storage
b) Crash after drive 1 is
updated
c) Bad spot

Stable storage is well suited to applications that require a high degree of fault tolerance
Checkpointing

A recovery line: the most recent distributed snapshot

Independent Checkpointing

The domino effect.

Message Logging

Incorrect replay of messages after recovery, leading to an orphan process.

Summarization (I)

❑ Fault tolerance is defined as the characteristic by which a system can mask the
occurrence and recovery from failures

❑ There exist several types of failures : Crash failure, Omission failure, Timing failure,
Response failure, Arbitrary/Byzantine failure

❑ Redundancy is the key technique needed to achieve fault tolerance

❑ Reliable group communication is suitable for small groups

Summarization (II)
❑ Atomic multicasting can be precisely formulated in terms of a virtual synchronous
execution model

❑ Group membership change agreement on the same list of members using commit
protocol

❑ Recovery in fault-tolerant systems is invariably achieved by checkpointing with

message logging
References

• Andrew S. Tanenbaum and Maarten Van Steen, “Distributed Systems:

Principles and Paradigms”, 2nd edition, Pearson Education.

• Pradeep K. Sinha , “Distributed Operating System” PHI Publication

20008.

Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
63 pages
Distributed Systems: Chapter 07: Consistency & Replication
No ratings yet
Distributed Systems: Chapter 07: Consistency & Replication
48 pages
Consistency and Replication - PPT
No ratings yet
Consistency and Replication - PPT
55 pages
Chapter 7
No ratings yet
Chapter 7
73 pages
Client - Centric Consistency Models
0% (1)
Client - Centric Consistency Models
16 pages
Intro To DS Chapter 5
No ratings yet
Intro To DS Chapter 5
76 pages
CMM Level3 Manual Documents
No ratings yet
CMM Level3 Manual Documents
10 pages
Consistency
No ratings yet
Consistency
48 pages
Electrical Specifications
No ratings yet
Electrical Specifications
306 pages
CH 7 Part 2 Distributed System
No ratings yet
CH 7 Part 2 Distributed System
67 pages
Mod 5
No ratings yet
Mod 5
61 pages
ds7 Con
No ratings yet
ds7 Con
71 pages
CH 05 Consistency, Replication N Fault Tolerance
No ratings yet
CH 05 Consistency, Replication N Fault Tolerance
55 pages
Chapter - 7 - Consistency and Replication112
No ratings yet
Chapter - 7 - Consistency and Replication112
30 pages
Slides 07
No ratings yet
Slides 07
73 pages
7.distributed Systems-Consistancy Replication
No ratings yet
7.distributed Systems-Consistancy Replication
82 pages
Bridgelink User Guide
No ratings yet
Bridgelink User Guide
93 pages
Chapter 6-Consistency and Replication
No ratings yet
Chapter 6-Consistency and Replication
37 pages
Chapter Five
No ratings yet
Chapter Five
46 pages
Chapter 7 Consistency and Replication
No ratings yet
Chapter 7 Consistency and Replication
43 pages
Consistency and Replication
No ratings yet
Consistency and Replication
100 pages
Consistency
No ratings yet
Consistency
23 pages
Ds Chapter 6
No ratings yet
Ds Chapter 6
23 pages
Deepak and Deepa - Consistency - and - Replication
No ratings yet
Deepak and Deepa - Consistency - and - Replication
38 pages
7 Consistency
No ratings yet
7 Consistency
41 pages
BCS 413 - Lecture5 - Replication - Consistency
No ratings yet
BCS 413 - Lecture5 - Replication - Consistency
25 pages
Consistency and Replication1
No ratings yet
Consistency and Replication1
30 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
73 pages
Consistency and Replication
No ratings yet
Consistency and Replication
73 pages
Introduction To Distributed Computing
No ratings yet
Introduction To Distributed Computing
57 pages
Chapter 6-Consistency and Replication-Updated
No ratings yet
Chapter 6-Consistency and Replication-Updated
30 pages
Chapter 6-Consistency and Replication
No ratings yet
Chapter 6-Consistency and Replication
59 pages
Chapter 6 - Consistency and Replication
No ratings yet
Chapter 6 - Consistency and Replication
24 pages
DS Lecture Chapter 7
No ratings yet
DS Lecture Chapter 7
38 pages
Consistency and Replication55
No ratings yet
Consistency and Replication55
17 pages
DS CH6 - Consistency and Replication
No ratings yet
DS CH6 - Consistency and Replication
18 pages
Consistency and Replication Lecture
No ratings yet
Consistency and Replication Lecture
25 pages
L25 Data-Centric Consistency NRay
No ratings yet
L25 Data-Centric Consistency NRay
26 pages
Consistency & Replication in Distributed Systems
No ratings yet
Consistency & Replication in Distributed Systems
32 pages
Electrical Switch
No ratings yet
Electrical Switch
6 pages
Consistency Replication
No ratings yet
Consistency Replication
49 pages
L-26 - Client-Centric Consistency Models - Saurabh Jha
No ratings yet
L-26 - Client-Centric Consistency Models - Saurabh Jha
10 pages
DRI Canada Professional Practices (2014-07) PDF
No ratings yet
DRI Canada Professional Practices (2014-07) PDF
42 pages
Chapter 5
No ratings yet
Chapter 5
16 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
30 pages
Data-Centric Consistency Models: Presented by Saadia Jehangir
100% (2)
Data-Centric Consistency Models: Presented by Saadia Jehangir
31 pages
Lecture 7.2 Consistency
No ratings yet
Lecture 7.2 Consistency
9 pages
Chapter 7 - Consistency and Replication
No ratings yet
Chapter 7 - Consistency and Replication
28 pages
Ds Lecture 10 11 11
No ratings yet
Ds Lecture 10 11 11
56 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
53 pages
Application Note: Revision 01
No ratings yet
Application Note: Revision 01
34 pages
Client Centric Consistency Models
100% (1)
Client Centric Consistency Models
11 pages
6.to Study Data Centric and Client Centric Consistency Model
100% (7)
6.to Study Data Centric and Client Centric Consistency Model
6 pages
Consistency and Replication
No ratings yet
Consistency and Replication
8 pages
University of Gondar
No ratings yet
University of Gondar
8 pages
Manual EPLAN - Manual Software Eplan P8 - Iniciante
100% (1)
Manual EPLAN - Manual Software Eplan P8 - Iniciante
141 pages
TUV Certificate - HC900 Safety
No ratings yet
TUV Certificate - HC900 Safety
1 page
Chapter 6-Consistency and Replication
No ratings yet
Chapter 6-Consistency and Replication
39 pages
Slides
No ratings yet
Slides
31 pages
ch07 Consistency Replication
No ratings yet
ch07 Consistency Replication
30 pages
Consistency Model PDF
No ratings yet
Consistency Model PDF
4 pages
A Client-Centric Consistency Model For Distributed Data Stores Using Colored Petri Nets
No ratings yet
A Client-Centric Consistency Model For Distributed Data Stores Using Colored Petri Nets
6 pages
Consistency and Replication SLM
No ratings yet
Consistency and Replication SLM
25 pages
Advanced Distributed Systems Replication: What Is Replication? Reasons For Replication
No ratings yet
Advanced Distributed Systems Replication: What Is Replication? Reasons For Replication
20 pages
Client - Centric Consistency Models
No ratings yet
Client - Centric Consistency Models
9 pages
Distributed System Notes
No ratings yet
Distributed System Notes
24 pages
Photonicsspectra 201208
No ratings yet
Photonicsspectra 201208
84 pages
Strick Pack Dominador
No ratings yet
Strick Pack Dominador
9 pages
Consistency and Replication in Distributed System
No ratings yet
Consistency and Replication in Distributed System
36 pages
8 Switch Magnum 10KT
No ratings yet
8 Switch Magnum 10KT
51 pages
BDP and CapDev Format Sample
No ratings yet
BDP and CapDev Format Sample
17 pages
Store Manager Daily Floor Walk Bahrain
No ratings yet
Store Manager Daily Floor Walk Bahrain
11 pages
Penawaran Harga Pekerjaan Mep Sulfindo
No ratings yet
Penawaran Harga Pekerjaan Mep Sulfindo
4 pages
MPM1D Unit 2 Lesson 9 Zero and Negative Exponent 1vysndd
No ratings yet
MPM1D Unit 2 Lesson 9 Zero and Negative Exponent 1vysndd
2 pages
OS - Question&Answers - M4 & M5
No ratings yet
OS - Question&Answers - M4 & M5
22 pages
CSC118 - Fundamentals of Algorithm Development
0% (1)
CSC118 - Fundamentals of Algorithm Development
3 pages
SO - HPE GreenLake For Aruba
No ratings yet
SO - HPE GreenLake For Aruba
4 pages
Av2012 Final
No ratings yet
Av2012 Final
52 pages
Visual Media Portfolio: Breanne Huber
No ratings yet
Visual Media Portfolio: Breanne Huber
18 pages
Swot
No ratings yet
Swot
9 pages
BUM Traffic Study Guide
No ratings yet
BUM Traffic Study Guide
2 pages
Samsung Cx593 Sct12b
No ratings yet
Samsung Cx593 Sct12b
10 pages
Kongsberg Underwater Acoustic Support Services
No ratings yet
Kongsberg Underwater Acoustic Support Services
2 pages
Amazon Bill Format
No ratings yet
Amazon Bill Format
5 pages
3 - Identifying Information Sources
No ratings yet
3 - Identifying Information Sources
7 pages
Driver's License, Learner's Permit or ID Card Application: A. Service Type
No ratings yet
Driver's License, Learner's Permit or ID Card Application: A. Service Type
2 pages
Refurbished (Good) - Apple Iphone 12 Pro 256GB Smartphone - Pacific Blue - Unlocked Best Buy Canada
No ratings yet
Refurbished (Good) - Apple Iphone 12 Pro 256GB Smartphone - Pacific Blue - Unlocked Best Buy Canada
1 page
Seagate 1.5tb USB2.0 S$168 GSS: Asia Pte LTD Internet TV USB $39.90
No ratings yet
Seagate 1.5tb USB2.0 S$168 GSS: Asia Pte LTD Internet TV USB $39.90
4 pages
Lesson Plan 2
No ratings yet
Lesson Plan 2
4 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet

Chap 5

Uploaded by

Chap 5

Uploaded by

Chapter 5

Consistency, Replication and Fault Tolerance

5.1 Introduction to replication and consistency, Data-Centric and

5.2 Fault Tolerance: Introduction, Process resilience, Reliable

• Data are replicated to increase the reliability of a system.

• Replicas allows remote sites to continue working in the event of local

• But if there are many replicas of the same thing, how do we

• Data-Centric Consistency Models

• If the processes agree to the rules,

The general organization of a logical data store, physically distributed

• Behavior of two processes, operating on same data item:

• Definition of “Sequential Consistency”:

(a) A sequentially consistent data store.

Three concurrently-executing processes

• This model distinguishes between events that are “causally

• For a data store to be considered causally consistent, it is necessary

This sequence is allowed with a

(a) A violation of a causally-consistent store

(b) A correct sequence of events in a causally-consistent store

• Accesses to locks are sequentially consistent.

• At an acquire, all remote changes to guarded data must be brought up to

• Locks associate with individual data items, as opposed to the entire

• The previously studied consistency models concern themselves with

• Here, the emphasis is more on maintaining a consistent view of

• How fast should updates (writes) be made available to read-only

• These systems all exhibit a high degree of acceptable

• The only requirement is that all replicas will eventually be the

The principle of a mobile user accessing different replicas of a distributed database

A data store is said to provide monotonic-read

The read operations performed by a single process P at two different local

The read operations performed by a single process P at two different local

In a monotonic-write consistent store, the following condition

The write operations performed by a single process P at two different local

The write operations performed by a single process P at two different local

□ A data store is said to provide read-your-writes

(a) A data store that provides read-your-writes consistency

(b) A data store that does not

• A data store is said to provide writes-follow-reads

(a) A writes-follow-reads consistent data store

(b) A data store that does not provide writes-follow-reads consistency

Content Replication and Placement

Choosing a proper cell size for server placement

There are three Replica Placement Types

Counting access requests from different clients

• What is the availability and reliability of following systems?

Type of Failure Description

Omission Failure A server fails to respond to incoming requests

Response Failure The server's response is incorrect

Arbitrary Failure A server may produce arbitrary responses at arbitrary times

Figure: Fault Tolerance in Electronic Circuits

• Triple modular redundancy:

• Replicate Process and organize them into groups

Blue Troop Blue Troop

The Byzantine generals problem for 3 loyal generals and1 traitor.

More than two-thirds agreement

Lamport proved that in a system with m faulty processes, agreement can be

What about reliable point-to-point transport protocols ?

• TCP masks omission failures

Five different classes of failures that can occur in RPC systems:

Figure 8-7. A server in client-server communication.

• Three events that can happen at the server:

• These events can occur in six different orderings:

❑ Backward recovery: to restore such a recorded state when things go wrong

A recovery line: the most recent distributed snapshot

The domino effect.

Incorrect replay of messages after recovery, leading to an orphan process.

❑ Redundancy is the key technique needed to achieve fault tolerance

❑ Reliable group communication is suitable for small groups

❑ Recovery in fault-tolerant systems is invariably achieved by checkpointing with

• Andrew S. Tanenbaum and Maarten Van Steen, “Distributed Systems:

• Pradeep K. Sinha , “Distributed Operating System” PHI Publication

You might also like