0% found this document useful (0 votes)
13 views31 pages

Slides

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views31 pages

Slides

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

CS476

Parallel and Distributed Computing

Module 8
Consistency & Replication

282
Contents
1.Consistency: Performance and scalability, Consistency models serializability,
transactions, Basic architecture, Web-cache consistency.

2. Replication: Replica placement, Content replication, managing replicated


objects, Replicated-write protocols, Implementing client-centric consistency
and aalternatives for caching and replication.

283
Weekly Learning Outcomes
1. Learn about Consistency and Replication.

284
Required Reading
Chapter 7: Consistency and Replication, (Distributed Systems, 4th Edition, Version 4.01
Author(s) Maarten van Steen, Andrew S. Tanenbaum Publisher: CreateSpace; 3.01 edition
(January 2023) ISBN: 978-90-815406-3-6, (Printed version)

Recommended Reading
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_9481.pdf

https://www.youtube.com/watch?v=pdxGtahoqlY

285
Consistency and Replication
An important issue in distributed systems is the replication of data. Data
are generally replicated to enhance reliability or improve performance.
One of the major problems is keeping replicas consistent. Informally, this
means that when one copy is updated, we need to ensure that the other
copies are updated as well; otherwise, the replicas will no longer be the
same. Main questions here are “why replication is useful? and how it
relates to scalability? what consistency actually means”. First, we start
with concentrating on managing replicas, which considers not only the
placement of replica servers, but also how content is distributed to these
servers. The second issue is how replicas are kept consistent. In most
cases, applications require a strong form of consistency. Informally, this
means that updates are to be propagated more or less immediately
between replicas.
Replication
Why replicate
Assume a simple model in which we make a copy of a specific part of a system
(meaning code and data).
Increase reliability: if one copy does not live up to specifications, switch over to the
other copy while repairing the failing one.
Performance: simply spread requests between different replicated parts to keep
load balanced, or to ensure quick responses by taking proximity into account.
The problem
Having multiple copies, means that when any copy changes, that change should
be made at all copies: replicas need to be kept the same, that is, be kept
consistent.
Performance and scalability
Main issue
To keep replicas consistent, we generally need to ensure that all
conflicting operations are done in the the same order everywhere
Conflicting operations: From the world of transactions
Read–write conflict: a read operation and a write operation act
concurrently
Write–write conflict: two concurrent write operations

Issue
Guaranteeing global ordering on conflicting operations may be a costly
operation, downgrading scalability. Solution: weaken consistency
requirements so that hopefully global synchronization can be avoided
Data-centric consistency models
Consistency model
A contract between a (distributed) data store and processes, in which the data
store specifies precisely what the results of read and write operations are in the
presence of concurrency.
Essential
A data store is a distributed collection of storages:
Sequential consistency
Definition
The result of any execution is the same as if the operations of all processes
were executed in some sequential order, and the operations of each individual
process appear in this sequence in the order specified by its program.

A sequentially consistent data store

A data store that is not sequentially consistent


Causal consistency
Definition
Writes that are potentially causally related must be seen by all processes in the
same order. Concurrent writes may be seen in a different order by different
processes.

A violation of a causally-consistent store

A correct sequence of events in a causally-consistent store


Consistency models, serializability, transactions
Sequential Consistency
Overwhelming, but often already known
Again, from the world of transactions: can we order the execution of all operations
in a set of transactions in such a way that the final result matches a serial
execution of those transactions? The keyword is serializability.
BEGIN TRANSACTION BEGIN TRANSACTION BEGIN TRANSACTION
x =0 x =0 x =0
x =x +1 x =x +2 x =x +3
END TRANSACTION END TRANSACTION END TRANSACTION
Transaction T1 Transaction T2 Transaction T3

A number of schedules
Time − →

S1 x=0 x=x+1 x=0 x=x+2 x=0 x=x+3 Legal

S2 x=0 x=0 x=x+1 x=x+2 x=0 x=x+3 Legal

S3 x=0 x=0 x=x+1 x=0 x=x+2 x=x+3 Illegal

S4 x=0 x=0 x=x+3 x=0 x=x+1 x=x+2 Illegal


Eventual consistency WhatsApp

Definition
Consider a collection of data stores and (concurrent) write operations. The strores are
eventually consistent when in lack of updates from a certain moment, all updates to that point
are propagated in such a way that replicas will have the same data stored (until updates are
accepted again).
Srong eventual consistency
Basic idea: if there are conflicting updates, have a globally determined resolution mechanism
(for example, using NTP, simply let the “most recent” update win).
Network Time Protocol
Program consistency
P is a monotonic problem if for any input sets S and T , P(S) ⊆ P(T ). Observation: A program
solving a monotonic problem can start with incomplete information, but is guaranteed not to
have to roll back when missing information becomes available. Example: filling a shopping cart.
Important observation
In all cases, we are avoiding global synchronization.
Consistency for mobile users
Example
Consider a distributed database to which you have access through your notebook. Assume your
notebook acts as a front end to the database.
At location A you access the database doing reads and updates.
At location B you continue your work, but unless you access the same server as the one at location
A, you may detect inconsistencies:
your updates at A may not have yet been propagated to B
you may be reading newer entries than the ones available at A
your updates at B may eventually conflict with those at A

Note
The only thing you really want is that the entries you updated and/or read at A, are in B the way you
left them in A. In that case, the database will appear to be consistent to you.
Basic architecture
The principle of a mobile user accessing different replicas of a
distributed database
Example: ZooKeeper consistency
Yet another model?
ZooKeeper’s consistency model mixes elements of data-centric and
client-centric models
Take a naive example
Replica placement
Essence
Figure out what the best K places are out of N possible locations.
Select best location out of N −K for which the average distance to clients is
minimal. Then choose the next best server. (Note: The first chosen location
minimizes the average distance to all clients.) Computationally expensive.
Select the K -th largest autonomous system and place a server at the best-
connected host. Computationally expensive.
Position nodes in a d -dimensional geometric space, where distance reflects
latency. Identify the K regions with highest density and place a server in every
one. Computationally cheap.
Content replication

Distinguish different processes


A process is capable of hosting a replica of an object or data:
- Permanent replicas: Process/machine always having a replica.
- Server-initiated replica: Process that can dynamically host a replica on
request of another server in the data store.
-Client-initiated replica: Process that can dynamically host a replica on
request of a client (client cache).
Content replication
The logical organization of different kinds of copies of
a data store into three concentric rings.
Server-initiated replicas
Counting access requests from different clients

Keep track of access counts per file, aggregated by considering


server closest to requesting clients
Number of accesses drops below threshold D ⇒ drop file
Number of accesses exceeds threshold R ⇒ replicate file
Number of access between D and R ⇒ migrate file
Managing replicated objects
Prevent concurrent execution of multiple invocations on the same object:
access to the internal data of an object has to be serialized. Using local
locking mechanisms are sufficient.
Ensure that all changes to the replicated state of the object are the same:
no two independent method invocations take place on different replicas at
the same time: we need deterministic thread scheduling.
Replicated-object invocations
Problem when invocating a replicated object
Replicated-object invocations

Forwarding a request Returning the reply


Primary-based protocols
Primary-backup protocol

Example primary-backup protocol


Traditionally applied in distributed databases and file systems that require a high degree
of fault tolerance. Replicas are often placed on the same LAN.
Replicated-write protocols
Quorum-based protocols
Assume N replicas. Ensure that each operation is carried out in such a way that a
majority vote is established: distinguish read quorum NR and write quorum NW . Ensure:
1. NR + NW > N (prevent read-write conflicts)
2. NW > N/2 (prevent write-write conflicts)

Correct Write-write Correct (ROWA)


conflict
Continuous consistency: Numerical errors

Principal operation
Every server Si has a log, denoted as Li .
Consider a data item x and let val (W ) denote the
numerical change in its value after a write operation W .
Assume that
∀W : val (W ) > 0

W is initially forwarded to one of the N replicas,


denoted as origin(W ). TW [i, j ] are the writes executed
by server Si that originated from Sj :

TW [i, j ] = ∑{val (W )|origin(W ) = Sj & W ∈ Li }


Implementing client-centric consistency
Keeping it simple
Each write operation W is assigned a globally unique identifier by its origin server. For each
client, we keep track of two sets of writes:
Read set: the (identifiers of the) writes relevant for that client’s read operations
Write set: the (identifiers of the) client’s write operations.

Monotonic-read consistency
When client C wants to read at server S, C passes its read set. S can pull in any updates before
executing the read operation, after which the read set is updated.
Monotonic-write consistency
When client C wants to write at server S, C passes its write set. S can pull in any updates,
executes them in the correct order, and then executes the write operation, after which the write set
is updated.
Implementing client-centric consistency

Read-your-writes consistency
When client C wants to read at server S, C passes its write set. S can
pull in any updates before executing the read operation, after which the
read set is updated.
Writes-follows-reads consistency
When client C wants to write at server S, C passes its read set. S can
pull in any updates, executes them in the correct order, and then
executes the write operation, after which the write set is updated.
Example: replication in the Web
Client-side caches
In the browser
At a client’s site, notably through a Web proxy

Caches at ISPs
Internet Service Providers also place caches to (1) reduce
cross-ISP traffic and (2) improve client-side performance. May
get nasty when a request needs to pass many ISPs.
Cooperative caching
Web-cache consistency
How to guarantee freshness?
To prevent that stale information is returned to a client:
Option 1: let the cache contact the original server to see if content is
still up to date.
Option 2: Assign an expiration time Texpire that depends on how long
ago the document was last modified when it is cached. If Tlast modified is
the last modification time of a document (as recorded by its owner),
and Tcached is the time it was cached, then

Texpire = α(Tcached −T ast modified) + Tcached


l

with α = 0.2. Until Texpire, the document is considered valid.


Alternatives for caching and replication

Database copy: the edge has the same as the origin server
Content-aware cache: check if a (normal query) can be answered with cached data. Requires
that the server knows about which data is cached at the edge.
Content-blind cache: store a query, and its result. When the exact same query is issued
again, return the result from the cache.

You might also like