0% found this document useful (0 votes)

16 views37 pages

Chapter 6-Consistency and Replication

distributive system chapter 7

Uploaded by

gutataye5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views37 pages

Chapter 6-Consistency and Replication

distributive system chapter 7

Uploaded by

gutataye5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 37

Chapter 6 - Consistency and Replication

Objectives of the Chapter

 we discuss
 why replication is useful and its relation with scalability; in
particular object-based replication
 consistency models
 Data –Centric consistency Model
 client–centric consistency models
 how consistency and replication are implemented

2
6.1 Reasons for Replication
 two major reasons: reliability and performance

 reliability

 if a file is replicated, we can switch to other replicas if

there is a crash on our replica

 we can provide better protection against corrupted data;

similar to mirroring in non-distributed systems

 performance

 if the system has to scale in size and geographical area

 place a copy of data in the proximity of the process using

them, reducing the time of access and increasing its

performance; for example a Web server is accessed by
thousands of clients from all over the world

3
 Replication as Scaling Technique
 replication and caching are widely applied as scaling techniques
 processes can use local copies and limit access time and traffic
 however, we need to keep the copies consistent; but this may requires
more network bandwidth
 if the copies are refreshed more often than used (low access-to-

update ratio), the cost (bandwidth) is more expensive than the

benefits;
 Dilemma( tradeoff)
 scalability problems can be alleviated by applying replication and caching,
leading to a better performance
 but, keeping copies consistent requires global synchronization, which is
generally costly in terms of performance
 solution: loosen the consistency constraints
 updates do not need to be executed as atomic operations (no more
instantaneous global synchronization); but copies may not be always
the same everywhere
 to what extent the consistency can be loosened depends on the specific
application (the purpose of data as well as access and update patterns) 4
6.2 Data-Centric Consistency Models
 consistency has always been discussed
 in terms of read and write operations on shared data

available by means of (distributed) shared memory, a

(distributed) shared database, or a (distributed) file
system
 we use the broader term data store, which may be
physically distributed across multiple machines
 assume also that each process has a local copy of the data
store and write operations are propagated to the other
copies

the general organization of a logical data store, physically distributed and

replicated across multiple processes 5
 a consistency model is a contract between processes and the
data store
 processes agree to obey certain rules

 then the data store promises to work correctly

 ideally, a process that reads a data item expects a value that

shows the results of the last write operation on the data
 in a distributed system and in the absence of a global clock
and with several copies, it is difficult to know which is the last
write operation
 to simplify the implementation, each consistency model
restricts what read operations return
 data-centric consistency models to be discussed
1. strict consistency
2. sequential consistency
3. causal consistency
4. weak consistency
5. release consistency
6. entry consistency
6
1. Strict Consistency
 the most stringent consistency model and is defined by the following
condition:
 Any read on a data item x returns a value corresponding to the
result of the most recent write on x.
 the following notations and assumptions will be used
 Wi(x)a means write by Pi to data item x with the value a has been done
 Ri(x)b means a read by Pi to data item x returning the value b has been
done
 Assume that initially each data item is NIL
 consider the following example; write operations are done locally and
later propagated to other replicas

behavior of two processes operating on the same data item

a) a strictly consistent data store
b) a data store that is not strictly consistent; P2’s first read may be, for example, after 1 nanosecond of
P1’s write
 the solution is to relax absolute time and consider time intervals 7
2.Sequential Consistency
 strict consistency is the ideal but impossible to implement

 fortunately, most programs do not need strict consistency

 sequential consistency is a slightly weaker consistency

 a data store is said to be sequentially consistent when it

satisfies the following condition:

 The result of any execution is the same as if the (read and

write) operations by all processes on the data store were

executed in some sequential order and the operations of
each individual process appear in this sequence in the
order specified by its program
 i.e., all processes see the same interleaving of operations

 time does not play a role; no reference to the “most recent”

write operation

8
 example: four processes operating on the same data item x

 the write operation of P2 appears

to have taken place before that of
P1; but for all processes
a sequentially consistent data
store

 to P3, it appears as if the data item

has first been changed to b, and
later to a; but P4 , will conclude
that the final value is b
a data store that is not
sequentially consistent
 not all processes see the same
interleaving of write operations

9
3.Weak Consistency
 there is no need to worry about intermediate results in a

critical section since other processes will not see the data
until it leaves the critical section; only the final result need to
be seen by other processes
 this can be done by a synchronization variable, S, that has

only a single associated operation synchronize(S), which

synchronizes all local copies of the data store
 a process performs operations only on its locally available

copy of the store

 when the data store is synchronized, all local writes by

process P are propagated to the other copies and writes by

other processes are brought in to P’s copy

10
 this leads to weak consistency models which have three
properties
1. Accesses to synchronization variables associated with a
data store are sequentially consistent (all processes see all
operations on synchronization variables in the same order)
2. No operation on a synchronization variable is allowed to be
performed until all previous writes have been completed
everywhere
3. No read or write operation on data items are allowed to be
performed until all previous operations to synchronization
variables have been performed.All previous
synchronization will have been completed; by doing a
synchronization a process can be sure of getting the most
recent values)

11
 weak consistency enforces consistency on a group of
operations, not on individual reads and writes
 e.g., S stands for synchronizes; it means that a local copy
of a data store is brought up to date

a) a valid sequence of events for weak consistency

b) an invalid sequence for weak consistency; P2 should get b

12
4. Release Consistency
 with weak consistency model, when a synchronization
variable is accessed, the data store does not know whether it
is done because the process has finished writing the shared
data or is about to start reading
 if we can separate the two (entering a critical section and
leaving it), a more efficient implementation might be possible
 the idea is to selectively guard shared data; the shared data
that are kept consistent are said to be protected
 release consistency provides mechanisms to separate the
two kinds of operations or synchronization variables
 an acquire operation is used to tell that a critical region is
about to be entered
 a release operation is used to tell that a critical region has
just been exited

13
 when a process does an acquire, the store will ensure that all
copies of the protected data are brought up to date to be
consistent with the remote ones; does not guarantee that
locally made changes will be sent to other local copies
immediately
 when a release is done, protected data that have been
changed are propagated out to other local copies of the store;
it does not necessarily import changes from other copies

a valid event sequence for release consistency

 a distributed data store is release consistent if it obeys the
following:
 Before a read or write operation on shared data is performed,
all previous acquires done by the process must have
completed successfully.
 Before a release is allowed to be performed, all previous reads
and writes by the process must have been completed. 14
 implementation algorithm :
i. Eager release consistency

 to do an acquire, a process sends a message to a central

synchronization manager requesting an acquire on a particular lock
 if there is no competition, the request is granted
 then, the process does reads and writes on the shared data, locally
 when the release is done, the modified data are sent to the other
copies that use them
 after each copy has acknowledged receipt of the data, the
synchronization manager is informed of the release
ii.Lazy release consistency
 at the time of release, nothing is sent anywhere
 instead, when an acquire is done, the process trying to do an
acquire has to get the most recent values of the data
 this avoids sending values to processes that don’t need them
thereby reducing wastage of bandwidth

15
6.3 Client-Centric Consistency Models
 with many applications, updates happen very rarely

 for these applications, data-centric models where high importance is

given for updates are not suitable

 Eventual Consistency

 there are many applications where few processes (or a single

process) update the data while many read it and there are no
write-write conflicts; we need to handle only read-write conflicts;
e.g., DNS server, Web site
 for such applications, it is even acceptable for readers to see old

versions of the data (e.g., cached versions of a Web page) until

the new version is propagated
 with eventual consistency, it is only required that updates are

guaranteed to gradually propagate to all replicas

 the problem with eventual consistency is when different replicas are

accessed, e.g., a mobile client accessing a distributed database may

acquire an older version of data when it uses a new replica as a
result of changing location 16
the principle of a mobile user accessing different replicas of a distributed database

 the solution is to introduce client-centric consistency

 it provides guarantees for a single client concerning the
consistency of accesses to a data store by that client; no
guaranties are given concerning concurrent accesses by
different clients 17
1. Monotonic Reads
 a data store is said to provide monotonic-read consistency if the

following condition holds:

 If a process reads the value of a data item x, any successive

read operation on x by that process will always return that same

value or a more recent value
 i.e., a process never sees a version of data older than what it has already seen
2. Writes Follow Reads
 a data store is said to provide writes-follow-reads consistency, if:
 A write operation by a process on a data item x following a previous

read operation on x by the same process, is guaranteed to take place

on the same or a more recent value of x that was read
 i.e., any successive write operation by a process on a data item x will be
performed on a copy of x that is up to date with the value most recently
read by that process
 this guaranties, for example, that users of a newsgroup see a posting of a
reaction to an article only after they have seen the original article; if B is
a response to message A, writes-follow-reads consistency guarantees that
B will be written to any copy only after A has been written 18
6.4 Distribution Protocols
 there are different ways of propagating, i.e., distributing
updates to replicas, independent of the consistency
model
 we will discuss
1. replica placement
2. update propagation
3. epidemic protocols
1. Replica Placement

 a major design issue for distributed data stores is

deciding where, when, and by whom copies of the
data store are to be placed
 three types of copies:
i. permanent replicas
ii. server-initiated replicas
iii. client-initiated replicas
19
i. Permanent Replicas
 the initial set of replicas that constitute a distributed data store;
normally a small number of replicas
 e.g., a Web site: two forms
 the files that constitute a site are replicated across a limited number
of servers on a LAN; a request is forwarded to one of the servers
 mirroring: a Web site is copied to a limited number of servers,
called mirror sites, which are geographically spread across the
Internet; clients choose one of the mirror sites
ii. Server-Initiated Replicas (push caches)
 Web Hosting companies dynamically create replicas to improve
performance (e.g., create a replica near hosts that use the Web site very often)
iii. Client-Initiated Replicas (client caches or simply caches)
 to improve access time
 a cache is a local storage facility used by a client to temporarily store a
copy of the data it has just received
 managing the cache is left entirely to the client; the data store from
which the data have been fetched has nothing to do with keeping
20
cached data consistent
2. Update Propagation
 updates are initiated at a client, forwarded to one of the
copies, and propagated to the replicas ensuring
consistency
 some design issues in propagating updates
i. state versus operations
ii. pull versus push protocols
iii. unicasting versus multicasting
i. State versus Operations
 what is actually to be propagated? three possibilities
 send notification of update only (for invalidation

protocols - useful when read/write ratio is small); use of

little bandwidth
 transfer the modified data (useful when read/write ratio

is high)
 transfer the update operation (also called active

replication); it assumes that each machine knows how

to do the operation; use of little bandwidth, but more
processing power needed from each replica 21
ii. Pull versus Push Protocols
 push-based approach (also called server- based protocols):
propagate updates to other replicas without those replicas
even asking for the updates (used when high degree of
consistency is required and there is a high read/write ratio)
 pull-based approach (also called client-based protocols):
often used by client caches; a client or a server requests
for updates from the server whenever needed (used when
the read/write ratio is low)
 a comparison between push-based and pull-based
protocols; for simplicity assume multiple clients and a
single server

22
iii. Unicasting versus Multicasting
 multicasting can be combined with push-based

approach; the underlying network takes care of sending a

message to multiple receivers
 unicasting is the only possibility for pull-based approach;

the server sends separate messages to each receiver

3. Epidemic Protocols
 update propagation in eventual consistency is often
implemented by a class of algorithms known as epidemic
protocols
 updates are aggregated into a single message and then
exchanged between two servers

23
6.5 Consistency Protocols
 so far we have concentrated on various consistency

models and general design issues

 consistency protocols describe an implementation of a

specific consistency model

 there are three types

1. primary-based protocols
 remote-write protocols

 local-write protocols

2. replicated-write protocols
 active replication

 quorum-based protocols

3. cache-coherence protocols

24
1. Primary-Based Protocols
 each data item x in the data store has an associated

primary, which is responsible for coordinating write

operations on x
 two approaches: remote-write protocols, and local-write

protocols
a. Remote-Write Protocols
 all read and write operations are carried out at a

(remote) single server; in effect, data are not

replicated; traditionally used in client-server systems,
where the server may possibly be distributed

25
primary-based remote-write protocol with a fixed server to which all read and write operations are
forwarded

26
 another approach is primary-backup protocols where reads
can be made from local backup servers while writes should
be made directly on the primary server
 the backup servers are updated each time the primary is
updated

the principle of primary-backup protocol 27

 may lead to performance problems since it may take time
before the process that initiated the write operation is
allowed to continue - updates are blocking
 primary-backup protocols provide straightforward
implementation of sequential consistency; the primary can
order all incoming writes

b.Local-Write Protocols
 two approaches

i. there is a single copy; no replicas

 when a process wants to perform an operation on some

data item, the single copy of the data item is transferred

to the process, after which the operation is performed

28
primary-based local-write protocol in which a single copy is migrated between processes

 consistency is straight forward

 keeping track of the current location of each data item is a
major problem

29
ii. primary-backup local-write protocol
 the primary migrates between processes that wish to

perform a write operation

 multiple, successive write operations can be carried out

locally, while (other) reading processes can still access their

local copy
 such improvement is possible only if a nonblocking protocol

is followed

30
primary-backup protocol in which the primary migrates to the process wanting to perform an update

31
2.Replicated-Write Protocols
 unlike primary-based protocols, write operations can be

carried out at multiple replicas; two approaches: Active

Replication and Quorum-Based Protocols
a. Active Replication
 each replica has an associated process that carries out

update operations
 updates are generally propagated by means of write

operations (the operation is propagated); also possible to

send the update
 the operations need to be done in the same order

everywhere; totally-ordered multicast

 two possibilities to ensure that the order is followed

 Lamport’s timestamps, or

 use of a central sequencer that assigns a unique

sequence number for each operation; the operation is

first sent to the sequencer then the sequencer forwards
the operation to all replicas 32
 a problem is replicated invocations
 suppose object A invokes B, and B invokes C; if object B is

replicated, each replica of B will invoke C independently

 this may create inconsistency and other effects; what if the

operation on C is to transfer $10

the problem of replicated invocations 33

 one solution is to have a replication-aware communication
layer that avoids the same invocation being sent more than
once
 when a replicated object B invokes another replicated object C,
the invocation request is first assigned the same, unique
identifier by each replica of B
 a coordinator of the replicas of B forwards its request to all
replicas of object C; the other replicas of object B hold back;
hence only a single request is sent to each replica of C
 the same mechanism is used to ensure that only a single reply
message is returned to the replicas of B

34
a) forwarding an invocation request from a replicated object
b) returning a reply to a replicated object

35
3. Cache-Coherence Protocols
 cashes form a special case of replication as they are

controlled by clients instead of servers

 cache-coherence protocols ensure that a cache is

consistent with the server-initiated replicas

 two design issues in implementing caches: coherence

detection and coherence enforcement

 coherence detection strategy: when inconsistencies are

actually detected
 static solution: prior to execution, a compiler performs

the analysis to determine which data may lead to

inconsistencies if cached and inserts instructions that
avoid inconsistencies
 dynamic solution: at runtime, a check is made with the

server to see whether a cached data have been

modified since they were cached

36
 coherence enforcement strategy: how caches are kept
consistent with the copies stored at the servers
 simplest solution: do not allow shared data to be

cached; suffers from performance improvement

 allow caching shared data and

 let a server send an invalidation to all caches

whenever a data item is modified

or
 propagate the update

Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
63 pages
6.to Study Data Centric and Client Centric Consistency Model
100% (7)
6.to Study Data Centric and Client Centric Consistency Model
6 pages
Data-Centric Consistency Models: Presented by Saadia Jehangir
100% (2)
Data-Centric Consistency Models: Presented by Saadia Jehangir
31 pages
Distributed Systems: Chapter 07: Consistency & Replication
No ratings yet
Distributed Systems: Chapter 07: Consistency & Replication
48 pages
Consistency and Replication
No ratings yet
Consistency and Replication
73 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
51 pages
Consistency and Replication
No ratings yet
Consistency and Replication
100 pages
Chapter 6-Consistency and Replication
No ratings yet
Chapter 6-Consistency and Replication
59 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
73 pages
ds7 Con
No ratings yet
ds7 Con
71 pages
Intro To DS Chapter 5
No ratings yet
Intro To DS Chapter 5
76 pages
7.distributed Systems-Consistancy Replication
No ratings yet
7.distributed Systems-Consistancy Replication
82 pages
Chap 5
No ratings yet
Chap 5
75 pages
CH 7 Part 2 Distributed System
No ratings yet
CH 7 Part 2 Distributed System
67 pages
Chapter 7
No ratings yet
Chapter 7
73 pages
Slides 07
No ratings yet
Slides 07
73 pages
Mod 5
No ratings yet
Mod 5
61 pages
Consistency
No ratings yet
Consistency
48 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
53 pages
Chapter 6-Consistency and Replication
No ratings yet
Chapter 6-Consistency and Replication
39 pages
Chapter Five
No ratings yet
Chapter Five
46 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
78 pages
Chapter 6-Consistency and Replication-Updated
No ratings yet
Chapter 6-Consistency and Replication-Updated
30 pages
7 Consistency
No ratings yet
7 Consistency
41 pages
CH 05 Consistency, Replication N Fault Tolerance
No ratings yet
CH 05 Consistency, Replication N Fault Tolerance
55 pages
Deepak and Deepa - Consistency - and - Replication
No ratings yet
Deepak and Deepa - Consistency - and - Replication
38 pages
Chapter 7 Consistency and Replication
No ratings yet
Chapter 7 Consistency and Replication
43 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
30 pages
Introduction To Distributed Computing
No ratings yet
Introduction To Distributed Computing
57 pages
Consistency Replication
No ratings yet
Consistency Replication
49 pages
D.S Consistency and Replication
No ratings yet
D.S Consistency and Replication
44 pages
Chapter-6 Consistency and Replication
No ratings yet
Chapter-6 Consistency and Replication
67 pages
L25 Data-Centric Consistency NRay
No ratings yet
L25 Data-Centric Consistency NRay
26 pages
Consistency and Replication1
No ratings yet
Consistency and Replication1
30 pages
Chapter - 7 - Consistency and Replication112
No ratings yet
Chapter - 7 - Consistency and Replication112
30 pages
Chapter 7 - Consistency and Replication
No ratings yet
Chapter 7 - Consistency and Replication
28 pages
Consistency and Replication Lecture
No ratings yet
Consistency and Replication Lecture
25 pages
BCS 413 - Lecture5 - Replication - Consistency
No ratings yet
BCS 413 - Lecture5 - Replication - Consistency
25 pages
Consistency
No ratings yet
Consistency
23 pages
Consistency and Replication: CS403/534 Distributed Systems Erkay Savas Sabanci University
No ratings yet
Consistency and Replication: CS403/534 Distributed Systems Erkay Savas Sabanci University
44 pages
Ds Chapter 6
No ratings yet
Ds Chapter 6
23 pages
Chapter 6 - Consistency and Replication
No ratings yet
Chapter 6 - Consistency and Replication
24 pages
ch07 Consistency Replication
No ratings yet
ch07 Consistency Replication
30 pages
Consistency and Replication SLM
No ratings yet
Consistency and Replication SLM
25 pages
Memory Consistency Model
No ratings yet
Memory Consistency Model
17 pages
Slides
No ratings yet
Slides
31 pages
Consistency and Replication in Distributed System
No ratings yet
Consistency and Replication in Distributed System
36 pages
DS CH6 - Consistency and Replication
No ratings yet
DS CH6 - Consistency and Replication
18 pages
DS Consistancy and Replication (Mod 7)
No ratings yet
DS Consistancy and Replication (Mod 7)
13 pages
University of Gondar
No ratings yet
University of Gondar
8 pages
Chapter 5
No ratings yet
Chapter 5
16 pages
Distributed System Notes
No ratings yet
Distributed System Notes
24 pages
Consistency Models
No ratings yet
Consistency Models
15 pages
4 Chap Slides - Replication
No ratings yet
4 Chap Slides - Replication
15 pages
Advanced Distributed Systems Replication: What Is Replication? Reasons For Replication
No ratings yet
Advanced Distributed Systems Replication: What Is Replication? Reasons For Replication
20 pages
Chapter 7kec
No ratings yet
Chapter 7kec
8 pages
Consistency Model PDF
No ratings yet
Consistency Model PDF
4 pages
Consistency and Replication
No ratings yet
Consistency and Replication
8 pages
Lecture 7.2 Consistency
No ratings yet
Lecture 7.2 Consistency
9 pages
Rsync Solutions: Definitive Reference for Developers and Engineers
From Everand
Rsync Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Chapter 6-Consistency and Replication

Uploaded by

Chapter 6-Consistency and Replication

Uploaded by

Chapter 6 - Consistency and Replication

Objectives of the Chapter

 if a file is replicated, we can switch to other replicas if

there is a crash on our replica

similar to mirroring in non-distributed systems

 if the system has to scale in size and geographical area

 place a copy of data in the proximity of the process using

them, reducing the time of access and increasing its

update ratio), the cost (bandwidth) is more expensive than the

available by means of (distributed) shared memory, a

the general organization of a logical data store, physically distributed and

 then the data store promises to work correctly

 ideally, a process that reads a data item expects a value that

behavior of two processes operating on the same data item

 fortunately, most programs do not need strict consistency

 sequential consistency is a slightly weaker consistency

 a data store is said to be sequentially consistent when it

satisfies the following condition:

write) operations by all processes on the data store were

 time does not play a role; no reference to the “most recent”

 the write operation of P2 appears

 to P3, it appears as if the data item

only a single associated operation synchronize(S), which

copy of the store

process P are propagated to the other copies and writes by

a) a valid sequence of events for weak consistency

a valid event sequence for release consistency

 to do an acquire, a process sends a message to a central

 for these applications, data-centric models where high importance is

given for updates are not suitable

 there are many applications where few processes (or a single

versions of the data (e.g., cached versions of a Web page) until

guaranteed to gradually propagate to all replicas

accessed, e.g., a mobile client accessing a distributed database may

 the solution is to introduce client-centric consistency

following condition holds:

read operation on x by that process will always return that same

read operation on x by the same process, is guaranteed to take place

 a major design issue for distributed data stores is

protocols - useful when read/write ratio is small); use of

replication); it assumes that each machine knows how

approach; the underlying network takes care of sending a

the server sends separate messages to each receiver

models and general design issues

specific consistency model

primary, which is responsible for coordinating write

(remote) single server; in effect, data are not

the principle of primary-backup protocol 27

i. there is a single copy; no replicas

data item, the single copy of the data item is transferred

 consistency is straight forward

perform a write operation

locally, while (other) reading processes can still access their

carried out at multiple replicas; two approaches: Active

operations (the operation is propagated); also possible to

everywhere; totally-ordered multicast

 use of a central sequencer that assigns a unique

sequence number for each operation; the operation is

replicated, each replica of B will invoke C independently

operation on C is to transfer $10

the problem of replicated invocations 33

controlled by clients instead of servers

consistent with the server-initiated replicas

detection and coherence enforcement

the analysis to determine which data may lead to

server to see whether a cached data have been

cached; suffers from performance improvement

 let a server send an invalidation to all caches

whenever a data item is modified

You might also like