Slides
Slides
Module 8
Consistency & Replication
282
Contents
1.Consistency: Performance and scalability, Consistency models serializability,
transactions, Basic architecture, Web-cache consistency.
283
Weekly Learning Outcomes
1. Learn about Consistency and Replication.
284
Required Reading
Chapter 7: Consistency and Replication, (Distributed Systems, 4th Edition, Version 4.01
Author(s) Maarten van Steen, Andrew S. Tanenbaum Publisher: CreateSpace; 3.01 edition
(January 2023) ISBN: 978-90-815406-3-6, (Printed version)
Recommended Reading
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_9481.pdf
https://www.youtube.com/watch?v=pdxGtahoqlY
285
Consistency and Replication
An important issue in distributed systems is the replication of data. Data
are generally replicated to enhance reliability or improve performance.
One of the major problems is keeping replicas consistent. Informally, this
means that when one copy is updated, we need to ensure that the other
copies are updated as well; otherwise, the replicas will no longer be the
same. Main questions here are “why replication is useful? and how it
relates to scalability? what consistency actually means”. First, we start
with concentrating on managing replicas, which considers not only the
placement of replica servers, but also how content is distributed to these
servers. The second issue is how replicas are kept consistent. In most
cases, applications require a strong form of consistency. Informally, this
means that updates are to be propagated more or less immediately
between replicas.
Replication
Why replicate
Assume a simple model in which we make a copy of a specific part of a system
(meaning code and data).
Increase reliability: if one copy does not live up to specifications, switch over to the
other copy while repairing the failing one.
Performance: simply spread requests between different replicated parts to keep
load balanced, or to ensure quick responses by taking proximity into account.
The problem
Having multiple copies, means that when any copy changes, that change should
be made at all copies: replicas need to be kept the same, that is, be kept
consistent.
Performance and scalability
Main issue
To keep replicas consistent, we generally need to ensure that all
conflicting operations are done in the the same order everywhere
Conflicting operations: From the world of transactions
Read–write conflict: a read operation and a write operation act
concurrently
Write–write conflict: two concurrent write operations
Issue
Guaranteeing global ordering on conflicting operations may be a costly
operation, downgrading scalability. Solution: weaken consistency
requirements so that hopefully global synchronization can be avoided
Data-centric consistency models
Consistency model
A contract between a (distributed) data store and processes, in which the data
store specifies precisely what the results of read and write operations are in the
presence of concurrency.
Essential
A data store is a distributed collection of storages:
Sequential consistency
Definition
The result of any execution is the same as if the operations of all processes
were executed in some sequential order, and the operations of each individual
process appear in this sequence in the order specified by its program.
A number of schedules
Time − →
Definition
Consider a collection of data stores and (concurrent) write operations. The strores are
eventually consistent when in lack of updates from a certain moment, all updates to that point
are propagated in such a way that replicas will have the same data stored (until updates are
accepted again).
Srong eventual consistency
Basic idea: if there are conflicting updates, have a globally determined resolution mechanism
(for example, using NTP, simply let the “most recent” update win).
Network Time Protocol
Program consistency
P is a monotonic problem if for any input sets S and T , P(S) ⊆ P(T ). Observation: A program
solving a monotonic problem can start with incomplete information, but is guaranteed not to
have to roll back when missing information becomes available. Example: filling a shopping cart.
Important observation
In all cases, we are avoiding global synchronization.
Consistency for mobile users
Example
Consider a distributed database to which you have access through your notebook. Assume your
notebook acts as a front end to the database.
At location A you access the database doing reads and updates.
At location B you continue your work, but unless you access the same server as the one at location
A, you may detect inconsistencies:
your updates at A may not have yet been propagated to B
you may be reading newer entries than the ones available at A
your updates at B may eventually conflict with those at A
Note
The only thing you really want is that the entries you updated and/or read at A, are in B the way you
left them in A. In that case, the database will appear to be consistent to you.
Basic architecture
The principle of a mobile user accessing different replicas of a
distributed database
Example: ZooKeeper consistency
Yet another model?
ZooKeeper’s consistency model mixes elements of data-centric and
client-centric models
Take a naive example
Replica placement
Essence
Figure out what the best K places are out of N possible locations.
Select best location out of N −K for which the average distance to clients is
minimal. Then choose the next best server. (Note: The first chosen location
minimizes the average distance to all clients.) Computationally expensive.
Select the K -th largest autonomous system and place a server at the best-
connected host. Computationally expensive.
Position nodes in a d -dimensional geometric space, where distance reflects
latency. Identify the K regions with highest density and place a server in every
one. Computationally cheap.
Content replication
Principal operation
Every server Si has a log, denoted as Li .
Consider a data item x and let val (W ) denote the
numerical change in its value after a write operation W .
Assume that
∀W : val (W ) > 0
Monotonic-read consistency
When client C wants to read at server S, C passes its read set. S can pull in any updates before
executing the read operation, after which the read set is updated.
Monotonic-write consistency
When client C wants to write at server S, C passes its write set. S can pull in any updates,
executes them in the correct order, and then executes the write operation, after which the write set
is updated.
Implementing client-centric consistency
Read-your-writes consistency
When client C wants to read at server S, C passes its write set. S can
pull in any updates before executing the read operation, after which the
read set is updated.
Writes-follows-reads consistency
When client C wants to write at server S, C passes its read set. S can
pull in any updates, executes them in the correct order, and then
executes the write operation, after which the write set is updated.
Example: replication in the Web
Client-side caches
In the browser
At a client’s site, notably through a Web proxy
Caches at ISPs
Internet Service Providers also place caches to (1) reduce
cross-ISP traffic and (2) improve client-side performance. May
get nasty when a request needs to pass many ISPs.
Cooperative caching
Web-cache consistency
How to guarantee freshness?
To prevent that stale information is returned to a client:
Option 1: let the cache contact the original server to see if content is
still up to date.
Option 2: Assign an expiration time Texpire that depends on how long
ago the document was last modified when it is cached. If Tlast modified is
the last modification time of a document (as recorded by its owner),
and Tcached is the time it was cached, then
Database copy: the edge has the same as the origin server
Content-aware cache: check if a (normal query) can be answered with cached data. Requires
that the server knows about which data is cached at the edge.
Content-blind cache: store a query, and its result. When the exact same query is issued
again, return the result from the cache.