Chapter 6-Consistency and Replication
Chapter 6-Consistency and Replication
o We discuss
Why replication is useful and its relation with scalability; in
particular object-based replication
Consistency models
• Data –Centric consistency Model
• Client–Centric consistency models
How consistency and replication are implemented
2
6.1 Reasons for Replication
o Two major reasons: reliability and performance
Reliability
3
Replication as Scaling Technique
Replication and caching are widely applied as scaling
techniques
o processes can use local copies and limit access time and
4
2. Itself be subject to serious scalability problems
o intuitively, a read operation made on any copy should
return the same value (the copies are always the same)
o thus, when an update operation is performed on one
5
dilemma
scalability problems can be alleviated by applying
replication and caching, leading to a better performance
but, keeping copies consistent requires global
synchronization, which is generally costly in terms of
performance
solution: loosen the consistency constraints
updates do not need to be executed as atomic operations
(no more instantaneous global synchronization); but
copies may not be always the same everywhere
to what extent the consistency can be loosened depends
on the specific application (the purpose of data as well as
access and update patterns)
6
6.2 Data-Centric Consistency Models
o Consistency has always been discussed
in terms of read and write operations on shared data
following condition:
Any read on a data item x returns a value corresponding
to the result of the most recent write on x.
this relies on absolute global time
message is sent to B
a process on machine B does a write on x at
9
the following notations and assumptions will be used
W (x) means write by P to data item x with the value a has
i a i
been done
R (x) means a read by P to data item x returning the value
i b i
b has been done
the index may be omitted when there is no confusion as to
intervals 10
2. Sequential Consistency
strict consistency is the ideal but impossible to implement
time does not play a role; no reference to the “most recent” write
operation
11
example: four processes operating on the same data item x
12
3.Causal Consistency
it is a weakening of sequential consistency
following condition:
Writes that are potentially causally related must be seen
13
example
W2(x)b and W1(x)c are concurrent, not a requirement for
processes to see them in the same order
CR
Conc
14
Models with synchronization operations
4.Weak Consistency
FIFO consistency is still unnecessarily restrictive for many
17
6. Entry Consistency
like release consistency, it requires an acquire and release
to be used at the start and end of a critical section
however, it requires that each ordinary shared data item to
be associated with some synchronization variable such as
a lock
if it is desired that elements of an array be accessed
independently in parallel, then different array elements may
be associated with different locks
synchronization variable ownership
each synchronization variable has a current owner, the
process that acquired it last
the owner may enter and exit critical sections
repeatedly without sending messages
other processes must send a message to the current
owner asking for ownership and the current values of
the data associated with that synchronization variable
several processes can also simultaneously own a
synchronization variable, but only for reading 18
a data store exhibits entry consistency if it meets all the
following conditions:
An acquire access of a synchronization variable is not
allowed to perform with respect to a process until all
updates to the guarded shared data have been performed
with respect to that process. (at an acquire, all remote
changes to the guarded data must be made visible)
Before an exclusive mode access to a synchronization
variable by a process is allowed to perform with respect to
that process, no other process may hold the
synchronization variable, not even in nonexclusive mode.
After an exclusive mode access to a synchronization
variable has been performed, any other process's next
nonexclusive mode access to that synchronization variable
may not be performed until it has performed with respect to
that variable's owner. (it must first fetch the most recent
copies of the guarded shared data)
19
a valid event sequence for entry consistency
20
Summary of Data-Centric Consistency Models
22
6.3 Client-Centric Consistency Models
with many applications, updates happen very rarely
systems
Eventual Consistency
24
the principle of a mobile user accessing different replicas of a distributed
database
the solution is to introduce client-centric consistency
it provides guarantees for a single client concerning the
consistency of accesses to a data store by that client; no
guaranties are given concerning concurrent accesses by
different clients 25
there are four client-centric consistency models
consider a data store that is physically distributed across
multiple machines
a process reads and writes to a locally available copy and
updates are propagated
assume that data items have an associated owner, the only
process permitted to modify that item, hence write-write
conflicts are avoided
the following notations are used
x [t] denotes the version of the data item x at local copy
i
Li at time t
version x [t] is the result of a series of write operations at
i
Li that took place since initialization; denote this set by
WS(xi[t])
if operations in WS(x [t ]) have also been performed at
i 1
local copy Lj at a later time t2, we write WS(xi[t1];xj[t2]); it
means that WS(xi[t1]) is part of WS(xj[t2])
the time index may be omitted if ordering of operations is
26
1.Monotonic Reads
a data store is said to provide monotonic-read consistency
condition holds:
A write operation by a process on a data item x is
28
may not be necessary if a later write operation completely
overwrites the present
x = 78;
x = 90;
no need to make sure that x has been first changed to 78
it is important only if part of the state of the data item
changes
e.g., a software library, where one or more functions are
replaced, leading to a new version
operations
a data store is said to provide writes-follow-reads
31
a) a writes-follow-reads consistent data store
b) a data store that does not provide writes-follow-reads consistency
along with the identifiers in the read set (which have now
become relevant for the write operation just performed)
34
problem: in naive implementation, the read and write sets can
become very large
to improve efficiency, read and write operations can be
grouped into sessions, clearing the sets when the session
ends
35
6.4 Distribution Protocols
there are different ways of propagating, i.e., distributing
updates to replicas, independent of the consistency
model
we will discuss
replica placement
update propagation
epidemic protocols
a. Replica Placement
a major design issue for distributed data stores is
deciding where, when, and by whom copies of the data
store are to be placed
three types of copies:
permanent replicas
server-initiated replicas
client-initiated replicas
36
the logical organization of different kinds of copies of a data store into three
concentric rings
37
1. Permanent Replicas
the initial set of replicas that constitute a distributed
data store; normally a small number of replicas
e.g., a Web site: two forms
the files that constitute a site are replicated across a
limited number of servers on a LAN; a request is
forwarded to one of the servers
mirroring: a Web site is copied to a limited number
of servers, called mirror sites, which are
geographically spread across the Internet; clients
choose one of the mirror sites
38
3. Client-Initiated Replicas (client caches or simply caches)
to improve access time
data store from which the data have been fetched has
nothing to do with keeping cached data consistent
39
b.Update Propagation
updates are initiated at a client, forwarded to one of the
is high)
transfer the update operation (also called active
c.Epidemic Protocols
update propagation in eventual consistency is often
42
6.5 Consistency Protocols
so far we have concentrated on various consistency
primary-based protocols
remote-write protocols
local-write protocols
replicated-write protocols
active replication
quorum-based protocols
cache-coherence protocols
43
1. Primary-Based Protocols
each data item x in the data store has an associated
protocols
a. Remote-Write Protocols
all read and write operations are carried out at a
44
primary-based remote-write protocol with a fixed server to which all read
and write operations are forwarded
45
another approach is primary-backup protocols where reads
can be made from local backup servers while writes should
be made directly on the primary server
the backup servers are updated each time the primary is
updated
b.Local-Write Protocols
two approaches
47
primary-based local-write protocol in which a single copy is migrated
between processes
consistency is straight forward
keeping track of the current location of each data item is a
major problem
48
ii. primary-backup local-write protocol
the primary migrates between processes that wish to
is followed
49
primary-backup protocol in which the primary migrates to the process
wanting to perform an update
50
2.Replicated-Write Protocols
unlike primary-based protocols, write operations can be
update operations
updates are generally propagated by means of write
Lamport’s timestamps, or
53
a) forwarding an invocation request from a replicated object
b) returning a reply to a replicated object
54
b.Quorum-Based Protocols
use of voting: clients are required to request and acquire
replicated on N servers
a client must first contact at least half + 1 (majority)
55
the values of NR and Nw are subject to the following two
constraints
N + N > N ; to prevent read-write conflicts
R w
N > N/2 ; to prevent write-write conflicts
w
actually detected
static solution: prior to execution, a compiler performs
57
coherence enforcement strategy: how caches are kept
consistent with the copies stored at the servers
simplest solution: do not allow shared data to be
58
59