Introduction To Distributed Computing
Introduction To Distributed Computing
1
Introduction to Replication and Consistency
2
• Reasons for Replication
There are two primary reasons for replicating data: reliability and performance.
File system replication makes it possible to continue working after one replica
crashes by simply switching to one of the other replicas.
3
Improving performance:
Replication for performance is important when the distributed system needs to
scale in numbers and geographical area.
Scaling in numbers:
for example, when an increasing number of processes need to access data that
are managed by a single server.
In that case, performance can be improved by replicating the server and
subsequently dividing the work.
Scaling in geographical area:
The basic idea is that by placing a copy of data in the proximity of the process
using them, the time to access the data decreases.
As a consequence, the performance as perceived by that process increases.
4
• Replication for Improving Performance
• Example Applications
Caching webpages at the client browser
Caching IP addresses at clients and DNS Name Servers
Caching in Content Delivery Network (CDNs)
Commonly accessed contents, such as software and streaming media, are cached at various
network locations.
5
• Replication for High-Availability
Availability can be increased by storing the data at replicated locations (instead
of storing one copy of the data at a server).
Example: Google File-System replicates the data at computers across different
racks, clusters and data-centers.
If one computer or a rack or a cluster crashes, then the data can still be accessed
from another source.
6
• Replication for Enhancing Scalability
Distributing the data across replicated servers helps in avoiding bottle-necks at
the main server.
It balances the load between the main and the replicated servers.
Example: Content Delivery Networks decrease the load on main servers of the
website
7
• An example:
In an e-commerce application, the bank database has been replicated across two
servers.
Maintaining consistency of replicated data is a challenge.
8
• Replication as Scaling Technique
Keeping multiple copies consistent may itself be subject to serious scalability
problems.
A collection of copies is consistent when the copies are always the same.
Like a read operation performed at any copy will always return the same result
and
When an update operation is performed on one copy, the update should be
propagated to all copies before a subsequent operation takes place, no matter at
which copy that operation is initiated or performed.
This type of consistency is informally referred to as tight consistency or
synchronous replication.
The key idea is that an update is performed at all copies as a single atomic
operation, or transaction.
To keep all copies consistent generally requires global synchronization, which is
inherently costly in terms of performance.
9
Consistency Models
• Data-Centric Consistency Models:
These models define how the data updates are propagated across the replicas to keep them
consistent:
Continuous Consistency
Consistent Ordering of Operations
11
Fig: The general organization of a logical data store, physically distributed and
replicated across multiple processes
12
Continuous Consistency
• There is no such thing as a best solution to replicating data.
• Replicating data poses consistency problems that cannot be solved efficiently in a
general way.
• Continuous Consistency Model is used to measure inconsistencies and express
what inconsistencies can be expected in the system.
• There are different ways for applications to specify what inconsistencies they
can tolerate.
• There are three independent axes for defining inconsistencies:
1. Numerical Deviation: Deviation in numerical values between replicas,
2. Staleness Deviation: Deviation in staleness between replicas, and
3. Order Deviation: Deviation with respect to the ordering of update
operations.
13
1. Deviation in numerical values between replicas:
14
2. Deviation in staleness between replicas:
15
3. Deviation with respect to the ordering of update operations
There are classes of applications in which the ordering of updates are allowed
to be different at the various replicas, as long as the differences remain
bounded.
One way of looking at these updates is that they are applied tentatively to a
local copy, awaiting global agreement from all replicas.
As a consequence, some updates may need to be rolled back and applied in a
different order before becoming permanent.
16
The Notion of a Conit (Consistency Unit)
To define inconsistencies, a consistency unit is defined abbreviated to Conit.
A Conit specifies the unit over which consistency is to be measured.
For example, in our stock-exchange example, a Conit could be defined as a
record representing a single stock.
Another example is an individual weather report.
Consider the two replicas(operating on local copy) as shown in Fig. below.
Each replica i maintains a two-dimensional vector clock VCi.
The notation t, i express an operation that was carried out by replica i at its logical
time t.
17
5 – Timestamp, B is the other replica,
x and y - data items
•
• 1 = A missed 1 operation of B,
• 5 = Maximum drift of y wrt other replica (tentative operations at B will bring y=5)
•
• 3 = B missed 3 operations of A,
• 6 = Maximum drift of x wrt other replica (tentative operations at A will bring x=6)
20
Consistent Ordering of Operations
• This has led to at least one important consistency model that is widely used.
21
Sequential Consistency
It is a weaker consistency model than strict consistency.
Assumes all operations are executed in some sequential order and each
process issues operations in program order.
Any valid interleaving is allowed.
All agree on the same interleaving.
Each process preserves its program order.
Nothing is said about “most recent write”.
(a) A sequentially consistent data store. (b) A data store that is not sequentially consistent.
22
• To make the notion of sequential consistency more concrete, consider three
concurrently executing processes P1, P2 and P3 executing multiple instructions
on three shared variables x, y and z.
23
Four valid execution sequences for the processes P1, P2, and P3 will be as
follows:
If we concatenate the output of P1, P2, and P3 in that order, we get a 6-bit
string that characterizes a particular interleaving of statements.
This is the string listed as the Signature in figure above.
24
Causal Consistency
• Suppose that process P1 writes a data item x. Then P2 reads x and writes y. Here
the reading of x and the writing of y are potentially causally related because
the computation of y may have depended on the value of x as read by P2.
• On the other hand, if two processes spontaneously and simultaneously write
two different data items, these are not causally related.
• Operations that are not causally related are said to be concurrent.
• For a data store to be considered causally consistent, it is necessary that the store
obeys the following condition:
Writes that are potentially causally related must be seen by all processes in
the same order. Concurrent writes may be seen in a different order on
different machines.
• If event b is caused or influenced by an earlier event a, causality requires that
everyone else first see a, then see b.
25
Figure: This sequence is allowed with a causally-consistent store, but not with a sequentially consistent store.
26
a) A violation of a casually-consistent store. W2(x)b may be related to
W1(x)a because the b may be a result of a computation involving the value
read by R2(x)a. The two writes are causally related, so all processes must
see them in the same order.
b) On the other hand, in Fig. (b) the read has been removed; W1(x)a and
W2(x)b are concurrent writes. A causally-consistent store does not require
concurrent writes to be globally ordered so Fig. (b) is correct. This
situation would not be acceptable by sequentially consistent store.
27
Grouping Operations
The sequential and causal consistency are defined at the level of read and write
operations.
Concurrency between programs sharing data is generally kept under control through
synchronization mechanisms for mutual exclusion and transactions.
For the operations like ENTER_CS and LEAVE_CS, the semantics can be formulated in
terms of shared synchronization variables.
The following criteria is should be met in this regard:
1. An acquire access of a synchronization variable is not allowed to perform with respect
to a process until all updates to the guarded shared data have been performed with
respect to that process.
2. Before an exclusive mode access to a synchronization variable by a process is
allowed to perform with respect to that process, no other process may hold the
synchronization variable, not even in nonexclusive mode.
3. After an exclusive mode access to a synchronization variable has been performed,
any other process' next nonexclusive mode access to that synchronization variable may
not be performed until it has performed with respect to that variable's owner.
28
Figure below shows an example of what is known as entry consistency.
Instead of operating on the entire shared data, in this example we associate
locks with each data item.
29
Client-Centric Consistency Models
Being able to handle-concurrent operations on shared data while maintaining
sequential consistency is fundamental to distributed systems.
For performance reasons, sequential consistency may possibly be guaranteed only
when processes use synchronization mechanisms such as transactions or locks.
We take a look at a special class of distributed data stores.
The data stores we consider are characterized by the lack of simultaneous
updates, or when such updates happen, they can easily be resolved.
Most operations involve reading data.
By introducing special client-centric consistency models, it turns out that many
inconsistencies can be hidden in a relatively cheap way.
30
Client Centric Consistency Modes Examples
• Example 1:
in many database systems, most processes hardly ever perform update
operations; they mostly read data from the database. Only one, or very few
processes perform update operations. The question then is how fast updates should
be made available to only reading processes.
• Example 2:
Consider a worldwide naming system such as DNS. The DNS name space is
partitioned into domains, where each domain is assigned to a naming authority,
which acts as owner of that domain. Only that authority is allowed to update its
part of the name space.
Consequently, conflicts resulting from two operations that both want to perform
an update on the same data (i.e., write-write conflicts), never occur.
The only situation that needs to be handled are read-write conflicts. As it turns
out, it is often acceptable to propagate an update in a lazy fashion, meaning that a
reading process will see an update only after some time has passed since the
update took place.
31
• Example 3: is the World Wide Web.
In virtually all cases, Web pages are updated by a single authority, such as a
webmaster or the actual owner of the page.
There are normally no write-write conflicts to resolve.
On the other hand, to improve efficiency, browsers and Web proxies are often
configured to keep a fetched page in a local cache and to return that page upon the
next request.
These examples can be viewed as cases of (large-scale) distributed and replicated
databases that tolerate a relatively high degree of inconsistency.
32
Eventual Consistency
• They have in common that if no updates take place for a long time, all replicas
will gradually become consistent.
• This form of consistency is called eventual consistency.
• Eventual consistency essentially requires only that updates which are guaranteed
to propagate to all replicas.
• Write-write conflicts are often relatively easy to solve when assuming that only a
small group of processes can perform updates.
• Eventual consistency is therefore often cheap to implement.
• However, problems arise when different replicas are accessed over a short period
of time.
• This is best illustrated by considering a mobile user accessing a distributed
database, as shown in figure below.
33
• Eventual consistency for replicated data is fine if clients always access the same
replica.
• Client centric consistency provides consistency guarantees for a single client
with respect to the data stored by that client.
Figure: A mobile user accessing different replicas of a distributed database has problems with
eventual consistency. 34
Client Centric Consistency Models
Clients access distributed data store using, generally, the local copy.
Updates are eventually propagated to other copies.
• Monotonic read:
If a process reads the value of a data item x, any successive read operation on x by
that process will always return that same value or a more recent value.
In other words, monotonic-read consistency guarantees that if a process has seen a
value of x at time t, it will never see an older version of x at a later time.
eg: consider a distributed database. In such a database, each user's mailbox may
be distributed and replicated across multiple machines. Mail can be inserted in a
mailbox at any location. However, updates are propagated in a lazy (i.e., on
demand) fashion. Only when a copy needs certain data for consistency are those
data propagated to that copy.
35
• Suppose a user reads his mail in San Francisco.
• Assume that only reading mail does not affect the mailbox, that is, messages are
not removed, stored in subdirectories, or even tagged as having already been read,
and so on.
• When the user later flies to New York and opens his mailbox again, monotonic-
read consistency guarantees that the messages that were in the mailbox in San
Francisco will also be in the mailbox when it is opened in New York.
36
• Two different local copies of the data store are shown, L1, and L2. we are interested
in the operations carried out by a single process P. These specific operations are
shown in boldface are connected by a dashed line representing the order in which
they are carried out by P.
38
• To ensure monotonic-write consistency, it is necessary that the previous write
operation at L1 has already been propagated to L2. This explains operation W(Xl)
at L2, and why it takes place before W(X2)·
• In other words, no guarantees can be given that the copy of x on which the second
write is being performed has the same or more recent value at the time W(x1)
completed at L1.
41
• In Fig. below, process P performed a write operation W(XI) and later a read
operation at a different local copy.
• Read-your-writes consistency guarantees that the effects of the write operation can
be seen by the succeeding read operation.
• This is expressed by WS(XI ;X2), which states that W(Xl) is part of WS(X2).
• In contrast, in Fig. below, W(Xl) has been left out of WS(X2), meaning that the
effects of the previous write operation by process P have not been propagated to
L2·
• Figure 7-14.
(a) A data store that provides read-your-writes consistency.
(b) A data store that does not. 42
Writes follow reads
• A write operation by a process on a data item x following a previous read operation
on x by the same process, is guaranteed to take place on the same or more recent
values of x that was read In other words, any successive write operation by a
process on a data item x will be performed on a copy of x that is up to date with the
value most recently read by that process.
• Example:
Writes-follow-reads consistency can be used to guarantee that users of a network
newsgroup see a posting of a reaction to an article only after they have seen the
original article.
To understand the problem, assume that a user first reads an article A.
Then, he reacts by posting a response B.
By requiring writes-follow-reads consistency, B will be written to any copy of the
newsgroup only after A has been written as well.
Note that users who only read articles need not require any specific client-centric
consistency model.
The writes-follows reads consistency assures that reactions to articles are stored at a
local copy only if the original is stored there as well. 43
• In Fig. (a), a process reads x at local copy L1.
• The write operations that led to the value just read, also appear in the write set at
L2. where the same process later performs a write operation. (Note that other
processes at L2 see those write operations as well.)
• In contrast, no guarantees are given that the operation performed at L2, as shown
in Fig. (b), are performed on a copy that is consistent with the one just read at L1.
• In Figure
(a) A writes-follow-reads consistent data store.
(b) A data store that does not provide writes-follow-reads consistency.
44
Replica Placement
• A key issue for any distributed system that supports replication is to decide where,
when, and by whom replicas should be placed, and subsequently which
mechanisms to use for keeping the replicas consistent.
• The placement problem itself should be split into two sub-problems: that of
placing replica servers, and that of placing content.
1.Replica-server placement is concerned with finding the best locations to place a
server that can host (part of) a data store.
2. Content placement deals with finding the best servers for placing content.
• Obviously, before content placement can take place, replica servers will have to
be placed first.
45
• Where, when, by whom copies of data are to be placed?
Figure : The logical organization of different kinds of copies of a data store into three
concentric rings.
46
47
• Permanent Replica:
Permanent replicas can be considered as the initial set of replicas that constitute a
distributed data store.
In many cases, the number of permanent replicas is small.
Consider, for example, a Web site.
Distribution of a Web site generally comes in one of two forms.
The first kind of distribution is one in which the files that constitute a site are
replicated across a limited number of servers at a single location. Whenever a
request comes in, it is forwarded to one of the servers, for instance, using a
round-robin strategy.
48
• Server-Initiated Replicas:
In contrast to permanent replicas, server-initiated replicas are copies of a data store
that exist to enhance performance and which are created at the initiative of the (owner
of the) data store.
Consider, for example, a Web server placed in New York.
Normally, this server can handle incoming requests quite easily, but it may happen
that over a couple of days a sudden burst of requests come in from an unexpected
location far from the server.
In that case, it may be worthwhile to install a number of temporary replicas in regions
where requests are coming from.
To provide optimal facilities such hosting services can dynamically replicate files to
servers where those files are needed to enhance performance, that is, close to
demanding (groups of) clients.
The algorithm for dynamic replication takes two issues into account. First,
replication can take place to reduce the load on a server. Second, specific files on
a server can be migrated or replicated to servers placed in the proximity of
clients that issue many requests for those files.
49
Fig: Counting access requests from different clients
50
• Client-initiated Replicas
An important kind of replica is the one initiated by a client.
Client-initiated replicas are more commonly known as (client) caches.
In essence, a cache is a local storage facility that is used by a client to
temporarily store a copy of the data it has just requested.
In principle, managing the cache is left entirely to the client.
The data store from where the data had been fetched has nothing to do with
keeping cached data consistent.
However, there are many occasions in which the client can rely on participation
from the data store to inform it when cached data has become stale.
Client caches are used only to improve access times to data.
51
2. Content Distribution
• State versus Operations
Replica management also deals with propagation of (updated) content to the
relevant replica servers. There are various trade-offs to make.
Possibilities for what is to be propagated:
1. Propagate only a notification of an update.
2. Transfer data from one copy to another.
3. Propagate the update operation to other copies
52
1. Propagate only a notification of an update:
Propagating a notification is what invalidation protocols do.
In an invalidation protocol, other copies are informed that an update has taken
place and that the data they contain are no longer valid.
The invalidation may specify which part of the data store has been updated, so
that only part of a copy is actually invalidated.
The important issue is that no more than a notification is propagated. Whenever
an operation on an invalidated copy is requested, that copy generally needs to be
updated first.
The main advantage of invalidation protocols is that they use little network
bandwidth.
53
2. Transfer data from one copy to another
Transferring the modified data among replicas is the second alternative, and is
useful when the read-to-write ratio is relatively high.
In that case, the probability that an update will be effective in the sense that the
modified data will be read before the next update takes place is high.
Instead of propagating modified data, it is also possible to log the changes and
transfer only those logs to save bandwidth.
In addition, transfers are often aggregated in the sense that multiple modifications
are packed into a single message, thus saving communication overhead.
54
3. Propagate the update operation to other copies
The third approach is not to transfer any data modifications at all, but to tell
each replica which update operation it should perform (and sending only the
parameter values that those operations need).
This approach, also referred to as active replication.
The main benefit of active replication is that updates can often be propagated at
minimal bandwidth costs, provided the size of the parameters associated with an
operation are relatively small.
More processing power may be required by each replica, especially in those
cases when operations are relatively complex.
55
Pull versus Push Protocols
Another design issue is whether updates are pulled or pushed.
• In a pushbased approach, also referred to as server-based protocols, updates
are propagated to other replicas without those replicas even asking for the updates.
Push-based approaches are often used between permanent and server-initiated
replicas, but can also be used to push updates to client caches.
Server-based protocols are applied when replicas need to be kept identical.
• In a pull-based approach, a server or client requests another server to send it any
updates it has at that moment.
Pull-based protocols, also called client-based protocols, are often used by client
caches.
A pull-based approach is efficient when the read-to-update ratio is relatively low.
This is often the case with (nonshared) client caches, which have only one client.
56
Unicasting versus Multicasting