0% found this document useful (0 votes)
37 views43 pages

Chapter 7 Consistency and Replication

Hhhh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views43 pages

Chapter 7 Consistency and Replication

Hhhh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Consistency and Replication

Chapter 7

1
Consistency and replication
An important issue in distributed systems is the replication of data. Data are
generally replicated to enhance reliability or improve performance. One of the
major problems is keeping replicas consistent.
This means that when one copy is updated we need to ensure that the other
copies are updated as well; otherwise the replicas will no longer be the same.
In this chapter, we take a detailed look at what consistency of replicated data
actually means and the various ways that consistency can be achieved.

2
More on Replication
• Replicas allows remote sites to continue working in
the event of local failures.
• It is also possible to protect against data corruption.
• Replicas allow data to reside close to where
it is used.
• Even a large number of replicated “local” systems
can improve performance: think of clusters.
• This directly supports the distributed systems goal
of enhanced scalability.

3
Replication and Scalability
• Replication is a widely-used scalability technique: think of
Web clients and Web proxies.
• When systems scale, the first problems to surface are those
associated with performance – as the systems get bigger
(e.g., more users), they get often slower.
• Replicating the data and moving it closer to where it is
needed helps to solve this scalability problem.
• A problem remains: how to efficiently synchronize all of
the replicas created to solve the scalability issue?
• Dilemma: adding replicas improves scalability, but incurs
the (oftentimes considerable) overhead of keeping the
replicas up-to-date!!!
• As we shall see, the solution often results in a relaxation of
4 any consistency constraints.
Replication and Consistency
• But if there are many replicas of the same thing,
how do we keep all of them up-to-date? How
do we keep the replicas consistent?
• Consistency can be achieved in a number of
ways. We will study a number of consistency
models, as well as protocols for implementing
the models.
• So, what’s the catch?
– It is not easy to keep all those replicas consistent.
5
Reasons for Replication
1. Performance Enhancement
The copy of data is placed in multiple locations, so client can get the data from
nearby location. This decreases the time take to access data .It enhances
performance of the distributed system. Multiple servers located at different
locations provided the same service to the client. It allows parallel processing of
the client’s request to the resource.
2 . Increase availability
Replication is a technique for automatically maintaining the availability of data
despite server failure.
3 .Fault tolerance
If a server fails , the data can be accessed from other servers.

6
Challenges in Replication
1. Placement (where to place replication )
• permanent replication: permanent replicas consist of cluster of
servers that may be geographically dispersed.
• Server initiated replication :server initiated caches including
placing replicas in the hosting servers and server caches.
• Client initiated replicas: client initiated replicas include web
browsers cache.
2 . Propagation of updates among replicas

3.Lack of consistency

7
Data-centric Consistency Models

The general organization of a logical data store,


physically distributed and replicated across
8 multiple processes
What is a Consistency Model?
• A “consistency model” is a CONTRACT
between a DS data-store and its processes.
• If the processes agree to the rules, the
data-store will perform properly and as
advertised.
• We start with Strict Consistency, which is
defined as:
– Any read on a data item ‘x’ returns a value
corresponding to the result of the most recent write
on ‘x’ (regardless of where the write occurred).
9
Consistency Model Diagram Notation
• Wi(x)a – a write by process ‘i’ to item ‘x’ with
a value of ‘a’. That is, ‘x’ is set to ‘a’.
• (Note: The process is often shown as ‘Pi’).
• Ri(x)b – a read by process ‘i’ from item ‘x’
producing the value ‘b’. That is, reading ‘x’
returns ‘b’.
• Time moves from left to right in all diagrams.

10
Strict Consistency

• Behavior of two processes, operating on same data item:


a) A strictly consistent data-store.
b) A data-store that is not strictly consistent.
• With Strict Consistency, all writes are instantaneously
visible to all processes and absolute global time order is
maintained throughout the DS. This is the consistency
model “Holy Grail” – not at all easy in the real world,
and all but impossible within a DS. So, other, less strict
11 (or “weaker”) models have been developed …
Sequential Consistency (1)

• A weaker consistency model, which represents


a relaxation of the rules.
• It is also much easier (possible) to implement.
• Definition of “Sequential Consistency”:
– The result of any execution is the same as if the
(read and write) operations by all processes on the
data-store were executed in the same sequential
order and the operations of each individual process
appear in this sequence in the order specified by its
program.
12
Sequential Consistency (2)

(a) A sequentially consistent data store.


13 (b) A data store that is not sequentially consistent.
Causal Consistency

• This model distinguishes between events that


are “causally related” and those that are not.
• If event B is caused or influenced by an earlier
event A, then causal consistency requires that
every other process see event A, then event B.
• Operations that are not causally related are
said to be concurrent.

14
Causal Consistency (1)

• For a data store to be considered causally


consistent, it is necessary that the store
obeys the following conditions:
– Writes that are potentially causally related
• must be seen by all processes in the same
order.
– Concurrent writes
• may be seen in a different order on different
machines.

15
Causal Consistency (2)

(a) A violation of a causally-consistent store


16
Causal Consistency (3)

(b) A correct sequence of events in


17 a causally-consistent store
Causal Consistency (4)

This sequence is allowed with a


causally-consistent store, but not with a
sequentially consistent store
18
Client-Centric Consistency Models
• The previously studied consistency models concern
themselves with maintaining a consistent (globally
accessible) data-store in the presence of concurrent
read/write operations
• Another class of distributed data-store is that which is
characterized by the lack of simultaneous updates.
Here, the emphasis is more on maintaining a
consistent view of things for the individual client
process that is currently operating on the data-store.

19
More Client-Centric Consistency
• How fast should updates (writes) be made
available to read-only processes?
– Think of most database systems: mainly read.
– Think of the DNS: write-write conflicts do no
occur, only read-write conflicts.
– Think of WWW: as with DNS, except that heavy
use of client-side caching is present: even the
return of stale pages is acceptable to most users.
• These systems all exhibit a high degree of
acceptable inconsistency … with the replicas
gradually becoming consistent over time.
20
Toward Eventual Consistency
• The only requirement is that all replicas will
eventually be the same.
• All updates must be guaranteed to propagate to
all replicas … eventually!
• This works well if every client always updates
the same replica.
• Things are a little difficult if the clients are
mobile.

21
Eventual Consistency

The principle of a mobile user accessing


22 different replicas of a distributed database
Monotonic Reads (1)

• A data store is said to provide


monotonic-read consistency if the
following condition holds:
– If a process reads the value of a data item x
 any successive read operation on x by
that process will always return that same
value or a more recent value.

23
Monotonic Reads (2)

24
Monotonic Writes (1)

• In a monotonic-write consistent
store, the following condition holds:
– A write operation by a process on a
data item x is completed before any
successive write operation on x by the same
process.

25
Monotonic Writes (2)

26
Read Your Writes (1)

• A data store is said to provide


read-your-writes consistency, if the
following condition holds:
– The effect of a write operation by a
process on data item x will always be
seen by a successive read operation on x by
the same process.

27
Writes Follow Reads (1)

• A data store is said to provide


writes-follow-reads consistency,
if the following holds:
– A write operation by a process on a
data item x following a previous read
operation on x by the same process is
guaranteed to take place on the same or a
more recent value of x that was read.
28
Writes Follow Reads (3)

29
Content Replication and Placement
• Regardless of which consistency model is
chosen, we need to decide where, when and by
whom copies of the data-store are to be placed.

30
Replica Placement Types
• There are three types of replica:
1. Permanent replicas: tend to be small in number,
organized as COWs (Clusters of Workstations) or
mirrored systems.
2. Server-initiated replicas: used to enhance
performance at the initiation of the owner of the
data-store. Typically used by web hosting
companies to geographically locate replicas close
to where they are needed most. (Often referred to
as “push caches”).
3. Client-initiated replicas: created as a result of client
requests – think of browser caches. Works well
assuming, of course, that the cached data does not
31 go stale too soon.
Content Distribution
• When a client initiates an update to a distributed
data-store, what gets propagated?
• There are three possibilities:
1. Propagate notification of the update to the other
replicas – this is an “invalidation protocol” which
indicates that the replica’s data is no longer up-to-
date. Can work well when there’s many writes.
2. Transfer the data from one replica to another –
works well when there’s many reads.
3. Propagate the update to the other replicas – this is
“active replication”, and shifts the workload to each
32 of the replicas upon an “initial write”.
Push vs. Pull Protocols
• Another design issue relates to whether or not the
updates are pushed or pulled?
1. Push-based/Server-based Approach: sent
“automatically” by server, the client does not request
the update. This approach is useful when a high
degree of consistency is needed. Often used between
permanent and server-initiated replicas.
2. Pull-based/Client-based Approach: used by client
caches (e.g., browsers), updates are requested by the
client from the server. No request, no update!
33
Pull versus Push Protocols

A comparison between push-based and pull-based


protocols in the case of multiple-client, single-
server systems
34
Consistency Protocols

• This is a specific implementation of a


consistency model.
• The most widely implemented models
are:
1. Sequential Consistency.
2. Weak Consistency (with sync variables).
3. Atomic Transactions.

35
Primary-Based Protocols

• Each data item is associated with a “primary”


replica.
• The primary is responsible for coordinating
writes to the data item.
• There are two types of Primary-Based
Protocol:
1. Remote-Write
2. Local-Write

36
Remote-Write Protocols

37 The principle of a primary-backup protocol


• Good: The benefit of this scheme is, as the primary is
in control, all writes can be sent to each backup replica
IN THE SAME ORDER, making it easy to implement
sequential consistency.

38
Local-Write Protocols
• In this protocol, a single copy of the data item
is still maintained.
• Upon a write, the data item gets transferred to
the replica that is writing.
• That is, the status of primary for a data item is
transferable.
• This is also called a “fully migrating approach”.

39
Local-Write Protocols

Primary-backup protocol in which the primary migrates to the


process wanting to perform an update, then updates the
40 backups. Consequently, reads are much more efficient.
Replicated-Write Protocols
• With these protocols, writes can be
carried out at any replica.
• Another name might be: “Distributed-
Write Protocols”
• There are two types:
1. Active Replication
2. Majority Voting (Quorums)

41
Cache-coherence protocols
Caches form a special case of replication, in the sense that they are generally
controlled by clients instead of servers. However, cache-coherence protocols,
which ensure that a cache is consistent with the server-initiated replicas are, in
principle, not very different from the consistency protocols discussed so far.
First, caching solutions may differ in their coherence detection strategy, that
is, when inconsistencies are actually detected. In static solutions, a compiler is
assumed to perform the necessary analysis prior to execution, and to determine
which data may actually lead to inconsistencies because they may be cached.
Another design issue for cache-coherence protocols is the coherence en-
forcement strategy, which determines how caches are kept consistent with the
copies stored at servers. The simplest solution is to disallow shared data to be
cached at all. Instead, shared data are kept only at the servers, which maintain
consistency using one of the primary-based or replication-write protocols
discussed above. Clients are allowed to cache only private data. Obviously, this
solution can offer only limited performance improvements.
42
Caching and replication in the Web
The Web is arguably the largest distributed system ever built. Originating from
a relatively simple client-server architecture, it is now a sophisticated system
consisting of many techniques to ensure stringent performance and availability
requirements. These requirements have led to numerous proposals for caching
and replicating Web content.
Proxy.
A Web proxy accepts requests from local clients and passes these to Web
servers. When a response comes in, the result is passed to the client. The
advantage of this approach is that the proxy can cache the result and return that
result to another client, if necessary.
In addition to caching at browsers and proxies, ISPs generally also place caches
in their networks. Such schemes are mainly used to reduce network traffic
(which is good for the ISP) and to improve performance (which is good for end
users)

43

You might also like