0% found this document useful (0 votes)

194 views25 pages

Distributed Database Systems: January 2002

The document discusses distributed database systems and their key concepts. It addresses issues like data distribution through fragmentation and replication, architectural designs including client-server models, and algorithms for query processing, concurrency control, reliability, and replication control in distributed databases. The outline and glossary provide definitions for fundamental terms in distributed databases.

Uploaded by

Mubashra Rehmani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

194 views25 pages

Distributed Database Systems: January 2002

Uploaded by

Mubashra Rehmani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228806512

Distributed Database Systems

Article · January 2002

CITATIONS READS
14 10,527

1 author:

M. Tamer Özsu
University of Waterloo
401 PUBLICATIONS 12,822 CITATIONS

SEE PROFILE

All content following this page was uploaded by M. Tamer Özsu on 04 June 2014.

The user has requested enhancement of the downloaded file.

DISTRIBUTED DATABASE SYSTEMS

M. Tamer Özsu
University of Waterloo
Department of Computer Science
Waterloo, Ontario Canada N2L 3G1
{[email protected]}

Outline
In this article, we discuss the fundamentals of distributed DBMS technology. We

address the data distribution and architectural design issues as well as the algorithms that

need to be implemented to provide the basic DBMS functions such as query processing,

concurrency control, reliability, and replication control.

Glossary
Atomicity: The property of transaction processing whereby either all the operations of a

transaction are executed or none of them are (all-or-nothing).

Client/server architecture: A distributed/parallel DBMS architecture where a set of cli-

ent machines with limited functionality access a set of servers which manage data.

Concurrency control algorithm: Algorithms that synchronize the operations of concur-

rent transactions that execute on a shared database.

Distributed database management system: A database management system that man-

ages a database that is distributed across the nodes of a computer network and makes

this distribution transparent to the users.

1
Deadlock: An occurrence where each transaction in a set of transactions circularly waits

on locks that are held by other transactions in the set.

Durability: The property of transaction processing whereby the effects of successfully

completed (i.e., committed) transactions endure subsequent failures.

Isolation: The property of transaction execution which states that the effects of one

transaction on the database are isolated from other transactions until the first com-

pletes its execution.

Locking: A method of concurrency control where locks are placed on database units

(e.g., pages) on behalf of transactions that attempt to access them.

Logging protocol: The protocol which records, in a separate location, the changes that a

transaction makes to the database before the change is actually made.

One copy equivalence: Replica control policy which asserts that the values of all copies

of a logical data item should be identical when the transaction that updates that item

terminates.

Query optimization: The process by which the ``best'' execution strategy for a given

query is found from among a set of alternatives.

Query processing: The process by which a declarative query is translated into low-level

data manipulation operations.

2
Quorum-based voting algorithm: A replica control protocol where transactions collect

votes to read and write copies of data items. They are permitted to read or write data

items if they can collect a quorum of votes.

Read-Once/Write-All protocol: The replica control protocol which maps each logical

read operation to a read on one of the physical copies and maps a logical write opera-

tion to a write on all of the physical copies.

Serializability: The concurrency control correctness criterion which requires that the

concurrent execution of a set of transactions should be equivalent to the effect of some

serial execution of those transactions.

Termination protocol: A protocol by which individual sites can decide how to terminate

a particular transaction when they cannot communicate with other sites where the

transaction executes.

Transaction: A unit of consistent and atomic execution against the database.

Transparency: Extension of data independence to distributed systems by hiding the dis-

tribution, fragmentation and replication of data from the users.

Two-phase commit: An atomic commitment protocol which ensures that a transaction is

terminated the same way at every site where it executes. The name comes from the

fact that two rounds of messages are exchanged during this process.

Two-phase locking: A locking algorithm where transactions are not allowed to request

new locks once they release a previously held lock.

3
1. Introduction
The maturation of database management system (DBMS) technology has coincided

with significant developments in computer network and distributed computing technolo-

gies. The end result is the emergence of distributed database management systems. These

systems have started to become the dominant data management tools for highly data-

intensive applications. Many DBMS vendors have incorporated some degree of distribu-

tion into their products.

A distributed database (DDB) is a collection of multiple, logically interrelated da-

tabases distributed over a computer network. A distributed database management system

(distributed DBMS) is the software system that permits the management of the distrib-

uted database and makes the distribution transparent to the users. The term “distributed

database system” (DDBS) is typically used to refer to the combination of DDB and the

distributed DBMS. These definitions point to two identifying architectural principles. The

first is that the system consists of a (possibly empty) set of query sites and a non-empty

set of data sites. The data sites have data storage capability while the query sites do not.

The latter only run the user interface routines in order to facilitate the data access at data

sites. The second is that each site (query or data) is assumed to logically consist of a sin-

gle, independent computer. Therefore, each site has its own primary and secondary stor-

age, runs its own operating system (which may be the same or different at different sites),

and has the capability to execute applications on its own. A computer network, rather

than a multiprocessor configuration, interconnects the sites. The important point here is

the emphasis on loose interconnection between processors that have their own operating

systems and operate independently.

4
2. Data Distribution Alternatives
A distributed database is physically distributed across the data sites by fragmenting

and replicating the data. Given a relational database schema, fragmentation subdivides

each relation into horizontal or vertical partitions. Horizontal fragmentation of a relation

is accomplished by a selection operation that places each tuple of the relation in a differ-

ent partition based on a fragmentation predicate (e.g., an Employee relation may be

fragmented according to the location of the employees). Vertical fragmentation divides a

relation into a number of fragments by projecting over its attributes (e.g., the Employee

relation may be fragmented such that the Emp_number, Emp_name and Address in-

formation is in one fragment, and Emp_number, Salary and Manager information is

in another fragment). Fragmentation is desirable because it enables the placement of data

in close proximity to its place of use, thus potentially reducing transmission cost, and it

reduces the size of relations that are involved in user queries. Based on the user access

patterns, each of the fragments may also be replicated. This is preferable when the same

data are accessed from applications that run at a number of sites. In this case, it may be

more cost-effective to duplicate the data at a number of sites rather than continuously

moving it between them. Figure 1 depicts a data distribution where Employee, Pro-

ject and Assignment relations are fragmented, replicated and distributed across

multiple sites of a distributed database.

Figure 1 goes here

Figure 1. A Fragmented, Replicated, and Distributed Database Example

5
3. Architectural Alternatives
There are many possible alternatives for architecting a distributed DBMS. The sim-

plest is the client/server architecture, where a number of client machines access a single

database server. The simplest client/server systems involve a single server that is ac-

cessed by a number of clients (these can be called multiple-client/single-server). In this

case, the database management problems are considerably simplified since the database is

stored on a single server. The pertinent issues relate to the management of client buffers

and the caching of data and (possibly) locks. The data management is done centrally at

the single server. A more distributed, and more flexible, architecture is the multiple-

client/multiple-server architecture where the database is distributed across multiple serv-

ers that have to communicate with each other in responding to user queries and in exe-

cuting transactions. Each client machine has a “home” server to which it directs user re-

quests. The communication of the servers among themselves is transparent to the users.

Most current database management systems implement one or the other type of the cli-

ent-server architectures. A truly distributed DBMS does not distinguish between client

and server machines. Ideally, each site can perform the functionality of a client and a

server. Such architectures, called peer-to-peer, require sophisticated protocols to manage

data that is distributed across multiple sites. The complexity of required software has de-

layed the offering of peer-to-peer distributed DBMS products.

If the distributed database systems at various sites are autonomous and (possibly)

exhibit some form of heterogeneity, they are usually referred to as multidatabase systems

or federated database systems. If the data and DBMS functionality distribution is accom-

plished on a multiprocessor computer, then it is referred to as a parallel database system.

6
These are different than a distributed database system where the logical integration

among distributed data is tighter than is the case with multidatabase systems or federated

database systems, but the physical control is looser than that in parallel DBMSs. In this

article, we do not consider multidatabase systems or parallel database systems.

4. Overview of Technical Issues

A distributed DBMS has to provide the same functionality that its centralized

counterparts provide, such as support for declarative user queries and their optimization,

transactional access to the database involving concurrency control and reliability, en-

forcement of integrity constraints and others. In the remaining sections we discuss some

of these functions; in this section we provide a brief overview.

Query processing deals with designing algorithms that analyze queries and convert

them into a series of data manipulation operations. Besides the methodological issues, an

important aspect of query processing is query optimization. The problem is how to decide

on a strategy for executing each query over the network in the most cost-effective way,

however cost is defined. The factors to be considered are the distribution of data, com-

munication costs, and lack of sufficient locally available information. The objective is to

optimize where the inherent parallelism of the distributed system is used to improve the

performance of executing the query, subject to the above-mentioned constraints. The

problem is NP-hard in nature, and the approaches are usually heuristic.

User accesses to shared databases are formulated as transactions, which are units of

execution that satisfy four properties: atomicity, consistency, isolation, and durability –

jointly known as the ACID properties. Atomicity means that a transaction is an atomic

7
unit and either the effects of all of its actions are reflected in the database, or none of

them are. Consistency generally refers to the correctness of the individual transactions;

i.e., that a transaction does not violate any of the integrity constraints that have been de-

fined over the database. Isolation addresses the concurrent execution of transactions and

specifies that actions of concurrent transactions do not impact each other. Finally, dura-

bility concerns the persistence of database changes in the face of failures. ACID proper-

ties are enforced by means of concurrency control algorithms and reliability protocols.

Concurrency control involves the synchronization of accesses to the distributed

database, such that the integrity of the database is maintained. The concurrency control

problem in a distributed context is somewhat different than in a centralized framework.

One not only has to worry about the integrity of a single database, but also about the con-

sistency of multiple copies of the database. The condition that requires all the values of

multiple copies of every data item to converge to the same value is called mutual consis-

tency.

Reliability protocols deal with the termination of transactions, in particular, their

behavior in the face of failures. In addition to the typical failure types (i.e., transaction

failures and system failures), distributed DBMSs have to account for communication

(network) failures as well. The implication of communication failures is that, when a

failure occurs and various sites become either inoperable or inaccessible, the databases at

the operational sites remain consistent and up to date. This complicates the picture, as the

actions of these sites have to be eventually reconciled with those of failed ones. There-

fore, recovery protocols coordinate the termination of transactions so that they terminate

uniformly (i.e., they either abort or they commit) at all the sites where they execute. Fur-

8
thermore, when the computer system or network recovers from the failure, the distributed

DBMS should be able to recover and bring the databases at the failed sites up-to-date.

This may be especially difficult in the case of network partitioning, where the sites are

divided into two or more groups with no communication among them.

Distributed databases are typically replicated; that is, a number of the data items re-

side at more than one site. Replication improves performance (since data access can be

localized) and availability (since the failure of a site does not make a data item inaccessi-

ble). However, management of replicated data requires that the values of multiple copies

of a data item are the same. This is called the one copy equivalence property. Distributed

DBMSs that allow replicated data implement replication protocols to enforce one copy

equivalence.

5. Distributed Query Optimization

Query processing is the process by which a declarative query is translated into low-

level data manipulation operations. SQL is the standard query language that is supported

in current DBMSs. Query optimization refers to the process by which the “best” execu-

tion strategy for a given query is found from among a set of alternatives.

In distributed DBMSs, the process typically involves four steps (Figure 2): query

decomposition, data localization, global optimization, and local optimization. Query de-

composition takes an SQL query and translates it into one expressed in relational algebra.

In the process, the query is analyzed semantically so that incorrect queries are detected

and rejected as early as possible, and correct queries are simplified. Simplification in-

volves the elimination of redundant predicates that may be introduced as a result of query

9
modification to deal with views, security enforcement and semantic integrity control.

The simplified query is then restructured as an algebraic query.

The initial algebraic query generated by the query decomposition step is input to the

second step: data localization. The initial algebraic query is specified on global relations

irrespective of their fragmentation or distribution. The main role of data localization is to

localize the query’s data using data distribution information. In this step, the fragments

that are involved in the query are determined and the query is transformed into one that

operates on fragments rather than global relations. As indicated earlier, fragmentation is

defined through fragmentation rules that can be expressed as relational operations (hori-

zontal fragmentation by selection, vertical fragmentation by projection). A distributed

relation can be reconstructed by applying the inverse of the fragmentation rules. This is

called a localization program. The localization program for a horizontally (vertically)

fragmented query is the union (join) of the fragments. Thus, during the data localization

step each global relation is first replaced by its localization program, and then the result-

ing fragment query is simplified and restructured to produce another “good” query. Sim-

plification and restructuring may be done according to the same rules used in the decom-

position step. As in the decomposition step, the final fragment query is generally far

from optimal; the process has only eliminated “bad” algebraic queries.

Figure 2 goes in here

Figure 2. Distributed Query Processing Methodology

For a given SQL query, there is more than one possible algebraic query. Some of

these algebraic queries are “better”' than others. The quality of an algebraic query is de-

10
fined in terms of expected performance. The process of query optimization involves tak-

ing the initial algebraic query and, using algebraic transformation rules, transforming it

into other algebraic queries until the “best” one is found. The “best” algebraic query is

determined according to a cost function that calculates the cost of executing the query

according to that algebraic specification. In a distributed setting, the process involves

global optimization to handle operations that involve data from multiple sites (e.g., join)

followed by local optimization for further optimizing operations that will be performed at

a given site.

The input to the third step, global optimization, is a fragment query, that is, an alge-

braic query on fragments. The goal of query optimization is to find an execution strategy

for the query that is close to optimal. Remember that finding the optimal solution is

computationally intractable. An execution strategy for a distributed query can be de-

scribed with relational algebra operations and communication primitives (send/receive

operations) for transferring data between sites. The previous layers have already opti-

mized the query – for example, by eliminating redundant expressions. However, this op-

timization is independent of fragment characteristics such as cardinalities. In addition,

communication operations are not yet specified. By permuting the order of operations

within one fragment query, many equivalent query execution plans may be found. Query

optimization consists of finding the “best” one among candidate plans examined by the

optimizer1.

1
The difference between an optimal plan and the best plan is that the optimizer does not, because of com-
putational intractability, examine all of the possible plans.

11
The final step, local optimization, takes a part of the global query (called a

subquery) that will run at a particular site and optimizes it further. This step is very simi-

lar to query optimization in centralized DBMSs. Thus, it is at this stage that local infor-

mation about data storage, such as indexes, etc, are used to determine the best execution

strategy for that subquery.

The query optimizer is usually modeled as consisting of three components: a search

space, a cost model, and a search strategy. The search space is the set of alternative exe-

cution plans to represent the input query. These plans are equivalent, in the sense that

they yield the same result but they differ on the execution order of operations and the way

these operations are implemented. The cost model predicts the cost of a given execution

plan. To be accurate, the cost model must have accurate knowledge about the parallel

execution environment. The search strategy explores the search space and selects the

best plan. It defines which plans are examined and in which order.

In a distributed environment, the cost function, often defined in terms of time units,

refers to computing resources such as disk space, disk I/Os, buffer space, CPU cost,

communication cost, and so on. Generally, it is a weighted combination of I/O, CPU, and

communication costs. Nevertheless, a typical simplification made by distributed DBMSs

is to consider communication cost as the most significant factor. This is valid for wide

area networks, where the limited bandwidth makes communication much more costly

than it is in local processing. To select the ordering of operations it is necessary to pre-

dict execution costs of alternative candidate orderings. Determining execution costs be-

fore query execution (i.e., static optimization) is based on fragment statistics and the for-

mulas for estimating the cardinalities of results of relational operations. Thus the optimi-

12
zation decisions depend on the available statistics on fragments. An important aspect of

query optimization is join ordering, since permutations of the joins within the query may

lead to improvements of several orders of magnitude. One basic technique for optimizing

a sequence of distributed join operations is through use of the semijoin operator. The

main value of the semijoin in a distributed system is to reduce the size of the join oper-

ands and thus the communication cost. However, more recent techniques, which consider

local processing costs as well as communication costs, do not use semijoins because they

might increase local processing costs. The output of the query optimization layer is an

optimized algebraic query with communication operations included on fragments.

6. Distributed Concurrency Control

Whenever multiple users access (read and write) a shared database, these accesses

need to be synchronized to ensure database consistency. The synchronization is achieved

by means of concurrency control algorithms that enforce a correctness criterion such as

serializability. User accesses are encapsulated as transactions, whose operations at the

lowest level are a set of read and write operations to the database. Concurrency control

algorithms enforce the isolation property of transaction execution, which states that the

effects of one transaction on the database are isolated from other transactions until the

first completes its execution.

The most popular concurrency control algorithms are locking-based. In such

schemes, a lock, in either shared or exclusive mode, is placed on some unit of storage

(usually a page) whenever a transaction attempts to access it. These locks are placed ac-

cording to lock compatibility rules such that read-write, write-read, and write-write con-

13
flicts are avoided. It is a well known theorem that if lock actions on behalf of concurrent

transactions obey a simple rule, then it is possible to ensure the serializability of these

transactions: “No lock on behalf of a transaction should be set once a lock previously

held by the transaction is released.” This is known as two-phase locking, since transac-

tions go through a growing phase when they obtain locks and a shrinking phase when

they release locks. In general, releasing of locks prior to the end of a transaction is prob-

lematic. Thus, most of the locking-based concurrency control algorithms are strict in that

they hold on to their locks until the end of the transaction.

In distributed DBMSs, the challenge is to extend both the serializability argument

and the concurrency control algorithms to the distributed execution environment. In

these systems, the operations of a given transaction may execute at multiple sites where

they access data. In such a case, the serializability argument is more difficult to specify

and enforce. The complication is due to the fact that the serialization order of the same

set of transactions may be different at different sites. Therefore, the execution of a set of

distributed transactions is serializable if and only if

1. the execution of the set of transactions at each site is serializable, and

2. the serialization orders of these transactions at all these sites are identical.

Distributed concurrency control algorithms enforce this notion of global seri-

alizability. In locking-based algorithms there are three alternative ways of enforcing

global serializability: centralized locking, primary copy locking, and distributed locking

algorithm.

14
In centralized locking, there is a single lock table for the entire distributed database.

This lock table is placed, at one of the sites, under the control of a single lock manager.

The lock manager is responsible for setting and releasing locks on behalf of transactions.

Since all locks are managed at one site, this is similar to centralized concurrency control

and it is straightforward to enforce the global serializability rule. These algorithms are

simple to implement, but suffer from two problems. The central site may become a bot-

tleneck, both because of the amount of work it is expected to perform and because of the

traffic that is generated around it; and the system may be less reliable since the failure or

inaccessibility of the central site would cause system unavailability. Primary copy lock-

ing is a concurrency control algorithm that is useful in replicated databases where there

may be multiple copies of a data item stored at different sites. One of the copies is desig-

nated as a primary copy and it is this copy that has to be locked in order to access that

item. All the sites know the set of primary copies for each data item in the distributed

system, and the lock requests on behalf of transactions are directed to the appropriate

primary copy. If the distributed database is not replicated, copy locking degenerates into

a distributed locking algorithm.

In distributed (or decentralized) locking, the lock management duty is shared by all

the sites in the system. The execution of a transaction involves the participation and co-

ordination of lock managers at more than one site. Locks are obtained at each site where

the transaction accesses a data item. Distributed locking algorithms do not have the over-

head of centralized locking ones. However, both the communication overhead to obtain

all the locks, and the complexity of the algorithm are greater.

15
One side effect of all locking-based concurrency control algorithms is that they

cause deadlocks. The detection and management of deadlocks in a distributed system is

difficult. Nevertheless, the relative simplicity and better performance of locking algo-

rithms make them more popular than alternatives such as timestamp-based algorithms or

optimistic concurrency control.

7. Distributed Reliability Protocols

Two properties of transactions are maintained by reliability protocols: atomicity and

durability. Atomicity requires that either all the operations of a transaction are executed

or none of them are (all-or-nothing property). Thus, the set of operations contained in a

transaction is treated as one atomic unit. Atomicity is maintained in the face of failures.

Durability requires that the effects of successfully completed (i.e., committed) transac-

tions endure subsequent failures.

The underlying issue addressed by reliability protocols is how the DBMS can con-

tinue to function properly in the face of various types of failures. In a distributed DBMS,

four types of failures are possible transaction failures, site (system) failures, media (disk)

failures and communication failures. Transactions can fail for a number of reasons: due

to an error in the transaction caused by input data or by an error in the transaction code,

or the detection of a present or potential deadlock. The usual approach to take in cases of

transaction failure is to abort the transaction, resetting the database to its state prior to the

start of the database.

16
Site (or system) failures are due to a hardware failure (eg, processor, main memory,

power supply) or a software failure (bugs in system code). The effect of system failures

is the loss of main memory contents. Therefore, any updates to the parts of the database

that are in the main memory buffers (also called volatile database) are lost as a result of

system failures. However, the database that is stored in secondary storage (also called

stable database) is safe and correct. To achieve this, DBMSs typically employ logging

protocols, such as Write-Ahead Logging, which record changes to the database in system

logs and move these log records and the volatile database pages to stable storage at ap-

propriate times. From the perspective of distributed transaction execution, site failures

are important since the failed sites cannot participate in the execution of any transaction.

Media failures refer to the failure of secondary storage devices that store the stable

database. Typically, these failures are addressed by introducing redundancy of storage

devices and maintaining archival copies of the database. Media failures are frequently

treated as problems local to one site and therefore are not specifically addressed in the

reliability mechanisms of distributed DBMSs.

The three types of failures described above are common to both centralized and

distributed DBMSs. Communication failures, on the other hand, are unique to distributed

systems. There are a number of types of communication failures. The most common

ones are errors in the messages, improperly ordered messages, lost (or undelivered) mes-

sages, and line failures. Generally, the first two of these are considered to be the respon-

sibility of the computer network protocols and are not addressed by the distributed

DBMS. The last two, on the other hand, have an impact on the distributed DBMS proto-

cols and, therefore, need to be considered in the design of these protocols. If one site is

17
expecting a message from another site and this message never arrives, this may be be-

cause (a) the message is lost, (b) the line(s) connecting the two sites may be broken, or

possible to distinguish between site failures and communication failures. The waiting

site simply timeouts and has to assume that the other site is incommunicado. Distributed

DBMS protocols have to deal with this uncertainty. One drastic result of line failures

may be network partitioning in which the sites form groups where communication within

each group is possible but communication across groups is not. This is difficult to deal

with in the sense that it may not be possible to make the database available for access

while at the same time guaranteeing its consistency.

The enforcement of atomicity and durability requires the implementation of atomic

commitment protocols and distributed recovery protocols. The most popular atomic

commitment protocol is two-phase commit. The recoverability protocols are built on top

of the local recovery protocols, which are dependent upon the supported mode of interac-

tion (of the DBMS) with the operating system.

Two-phase commit (2PC) is a very simple and elegant protocol that ensures the

atomic commitment of distributed transactions. It extends the effects of local atomic

commit actions to distributed transactions by insisting that all sites involved in the exe-

cution of a distributed transaction agree to commit the transaction before its effects are

made permanent (i.e., all sites terminate the transaction in the same manner). If all the

sites agree to commit a transaction then all the actions of the distributed transaction take

effect; if one of the sites declines to commit the operations at that site, then all of the

other sites are required to abort the transaction. Thus, the fundamental 2PC rule states:

18
1. If even one site rejects to commit (which means it votes to abort) the transac-

tion, the distributed transaction has to be aborted at each site where it executes;

and

2. If all the sites vote to commit the transaction, the distributed transaction is

committed at each site where it executes.

The simple execution of the 2PC protocol is as follows (Figure 3). There is a coor-

dinator process at the site where the distributed transaction originates, and participant

processes at all the other sites where the transaction executes. Initially, the coordinator

sends a “prepare” message to all the participants each of which independently determines

whether or not it can commit the transaction at that site. Those that can commit send back

a “vote-commit” message while those who are not able to commit send back a “vote-

abort” message. Once a participant registers its vote, it cannot change it. The coordinator

collects these messages and determines the fate of the transaction according to the 2PC

rule. If the decision is to commit, the coordinator sends a “global-commit” message to all

the participants; if the decision is to abort, it sends a “global-abort” message to those par-

ticipants who had earlier voted to commit the transaction. No message needs to be sent to

those participants who had originally voted to abort since they can assume, according to

the 2PC rule, that the transaction is going to be eventually globally aborted. This is

known as the “unilateral abort” option of the participants.

Figure 3 goes in here

Figure 3. 2PC Protocol Actions

19
There are two rounds of message exchanges between the coordinator and the par-

ticipants; hence the name 2PC protocol. There are a number of variations of 2PC, such as

the linear 2PC and distributed 2PC, that have not found much favor among distributed

DBMS vendors. Two important variants of 2PC are the presumed abort 2PC and pre-

sumed commit 2PC. These are important because they reduce the message and I/O over-

head of the protocols. Presumed abort protocol is included in the X/Open XA standard

and has been adopted as part of the ISO standard for Open Distributed Processing.

One important characteristic of 2PC protocol is its blocking nature. Failures can oc-

cur during the commit process. As discussed above, the only way to detect these failures

is by means of a time-out of the process waiting for a message. When this happens, the

process (coordinator or participant) that timeouts follows a termination protocol to de-

termine what to do with the transaction that was in the middle of the commit process. A

non-blocking commit protocol is one whose termination protocol can determine what to

do with a transaction in case of failures under any circumstance. In the case of 2PC, if a

site failure occurs at the coordinator site and one participant site while the coordinator is

collecting votes from the participants, the remaining participants cannot determine the

fate of the transaction among themselves, and they have to remain blocked until the coor-

dinator or the failed participant recovers. During this period, the locks that are held by the

transaction cannot be released, which reduces the availability of the database. There have

been attempts to devise non-blocking commit protocols (e.g., three-phase commit) but the

high overhead of these protocols has precluded their adoption.

The inverse of termination is recovery. When a failed site recovers from the failure,

what actions does it have to take to recover the database at that site to a consistent state?

20
This is the domain of distributed recovery protocols. If each site can look at its own log

and decide what to do with the transaction, then the recovery protocol is said to be inde-

pendent. For example, if the coordinator fails after it sends the “prepare” command and

while waiting for the responses from the participants, upon recovery, it can determine

from its log where it was in the process and can restart the commit process for the trans-

action from the beginning by sending the “prepare” message one more time. If the par-

ticipants had already terminated the transaction, they can inform the coordinator. If they

were blocked, they can now resend their earlier votes and resume the commit process.

However, this is not always possible and the failed site has to ask others for the fate of

the transaction.

8. Replication Protocols
In replicated distributed databases, each logical data item has a number of physical

instances. For example, the salary of an employee (logical data item) may be stored at

three sites (physical copies). The issue in this type of a database system is to maintain

some notion of consistency among the copies. The most discussed consistency criterion is

one copy equivalence, which asserts that the values of all copies of a logical data item

should be identical when the transaction that updates it terminates.

If replication transparency is maintained, transactions will issue read and write op-

erations on a logical data item x. The replica control protocol is responsible for mapping

operations on x to operations on physical copies of x (x1, …, xn). A typical replica control

protocol that enforces one copy equivalence is known as Read-Once/Write-All (ROWA)

protocol. ROWA maps each read on x [Read(x)] to a read on one of the physical copies xi

21
[Read(xi)]. The copy that is read is insignificant from the perspective of the replica con-

trol protocol and may be determined by performance considerations. On the other hand,

each write on logical data item x is mapped to a set of writes on all copies of x.

ROWA protocol is simple and straightforward, but it requires that all copies of all

logical data items that are updated by a transaction be accessible for the transaction to

terminate. Failure of one site may block a transaction, reducing database availability.

A number of alternative algorithms have been proposed which reduce the require-

ment that all copies of a logical data item be updated before the transaction can terminate.

They relax ROWA by mapping each write to only a subset of the physical copies. The

majority consensus algorithm is one such algorithm which terminates a transaction as

long as a majority of the copies can be updated. Thus, all the copies of a logical data item

may not be updated to the new value when the transaction terminates.

This idea of possibly updating only a subset of the copies, but nevertheless success-

fully terminating the transaction, has formed the basis of quorum-based voting for replica

control protocols. The majority consensus algorithm can be viewed from a slightly differ-

ent perspective: It assigns equal votes to each copy and a transaction that updates that

logical data item can successfully complete as long as it has a majority of the votes.

Based on this idea, an early quorum-based voting algorithm assigns a (possibly unequal)

vote to each copy of a replicated data item. Each operation then has to obtain a read quo-

rum (Vr) or a write quorum (Vw) to read or write a data item, respectively. If a given data

item has a total of V votes, the quorums have to obey the following rules:

22
1. Vr + Vw > V (a data item is not read and written by two transactions concur-

rently, avoiding the read-write conflict);

2. Vw > V r /2 (two write operations from two transactions cannot occur concur-

rently on the same data item; avoiding write-write conflict).

The difficulty with this approach is that transactions are required to obtain a quo-

rum even to read data. This significantly and unnecessarily slows down read access to the

database. An alternative quorum-based voting protocol that overcomes this serious per-

formance drawback has also been proposed. However, this protocol makes unrealistic

assumptions about the underlying communication system. It requires that all sites detect

failures that change the network’s topology instantaneously, and that each site has a view

of the network consisting of all the sites with which it can communicate. In general,

communication networks cannot guarantee to meet these requirements. The single copy

equivalence replica control protocols are generally considered to be restrictive in terms of

the availability they provide. Voting-based protocols, on the other hand, are considered

too complicated with high overheads. Therefore, these techniques are not used in current

distributed DBMS products. More flexible replication schemes have been investigated

where the type of consistency between copies is under user control. A number of replica-

tion servers have been developed or are being developed with this principle. Unfortu-

nately, there is no clear theory that can be used to reason about the consistency of a repli-

cated database when the more relaxed replication policies are used.

23
Bibliography
Bernstein, P. A. and Newcomer, E. 1997. Principles of Transaction Processing for the

Systems Professional. Morgan Kaufmann, San Mateo, California.

Gray, J. and Reuter, A. 1993. Transaction Processing: Concepts and Techniques.

Morgan Kaufmann, San Mateo, California.

Helal, A. A. , Heddaya, A. A. and Bhargava, B. B. 1997. Replication Techniques in Dis-

tributed Systems, Kluwer Academic Publishers, Boston, MA.

Kumar, V. (ed.). 1996. Performance of Concurrency Control Mechanisms in Centralized

Database Systems. Prentice-Hall, Englewood Cliffs, NJ.

Özsu, M.T. and Valduriez, P. 1999. Principles of Distributed Database Systems, 2nd edi-

tion, Prentice-Hall, Englewood Cliffs, NJ.

Sheth, A. and Larson, J. September 1990. “Federated Databases: Architectures and Inte-

gration”, ACM Computing Surveys, Vol. 22, No. 3, pages 183-236.

Yu, C. and Meng, W. 1998. Principles of Query Processing for Advanced Database Ap-

plictions, Morgan Kaufmann, San Francisco, California.

View publication stats

Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Unit - 1 DDB
No ratings yet
Unit - 1 DDB
34 pages
Distributed Database System (KCA045)
No ratings yet
Distributed Database System (KCA045)
9 pages
ICT Market Maker Series 2021 Full PDF 1-5
67% (3)
ICT Market Maker Series 2021 Full PDF 1-5
19 pages
Concurrency Control in Distributed Databases
100% (1)
Concurrency Control in Distributed Databases
12 pages
DDM Unit 4
No ratings yet
DDM Unit 4
24 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
Ankit Sir All Units Dbms
100% (1)
Ankit Sir All Units Dbms
142 pages
Transaction Management Unit III
No ratings yet
Transaction Management Unit III
28 pages
The Man Who Walked On The Moon by J G Ballard
86% (7)
The Man Who Walked On The Moon by J G Ballard
7 pages
Orthodontics and Pediatric Dentistry
100% (1)
Orthodontics and Pediatric Dentistry
69 pages
Concurrency Control in Distributed Transactions
No ratings yet
Concurrency Control in Distributed Transactions
17 pages
Transactions &ConcurrencyControl
No ratings yet
Transactions &ConcurrencyControl
40 pages
Distributed Catalog Management
100% (1)
Distributed Catalog Management
12 pages
Practice Questions For Theory - DBMS
No ratings yet
Practice Questions For Theory - DBMS
18 pages
Distributed Database Systems (DDBS)
No ratings yet
Distributed Database Systems (DDBS)
30 pages
Chapter 6
No ratings yet
Chapter 6
45 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
29 pages
Distributed Database System For Exams
100% (1)
Distributed Database System For Exams
13 pages
DBMS Architecture: 1-Tier, 2-Tier & 3-Tier: What Is Database Architecture?
100% (1)
DBMS Architecture: 1-Tier, 2-Tier & 3-Tier: What Is Database Architecture?
3 pages
Going To The Market British English Teacher
No ratings yet
Going To The Market British English Teacher
11 pages
Unit 5
No ratings yet
Unit 5
28 pages
Introduction To Oracle: Lecturer: J. Mutai
No ratings yet
Introduction To Oracle: Lecturer: J. Mutai
12 pages
Distributed Database Architecture
No ratings yet
Distributed Database Architecture
14 pages
Master of Computer Application: Lab Manual
No ratings yet
Master of Computer Application: Lab Manual
30 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Distributed Database Management Notes - 3
86% (7)
Distributed Database Management Notes - 3
48 pages
Introduction To Distributed Database Presentation
100% (1)
Introduction To Distributed Database Presentation
67 pages
Unit 2
No ratings yet
Unit 2
73 pages
Data Recovery Presentation
No ratings yet
Data Recovery Presentation
8 pages
CS8492 /database Management Systems 2017 Regulations
No ratings yet
CS8492 /database Management Systems 2017 Regulations
20 pages
DB Question
No ratings yet
DB Question
209 pages
More Details On Data Models
No ratings yet
More Details On Data Models
23 pages
M.Pharm - Pharmaceutics Drug Regulatory Affairs
No ratings yet
M.Pharm - Pharmaceutics Drug Regulatory Affairs
34 pages
Distributed Database: Database Storage Devices CPU Database Management System Computers Network
No ratings yet
Distributed Database: Database Storage Devices CPU Database Management System Computers Network
9 pages
The Metamorphosis NEW PP
No ratings yet
The Metamorphosis NEW PP
48 pages
DDBMS MCQ - 1
No ratings yet
DDBMS MCQ - 1
10 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
Ch#22 TRANSACTION - MANAGEMENT
No ratings yet
Ch#22 TRANSACTION - MANAGEMENT
80 pages
Heavy Lift Ships Cargo
100% (3)
Heavy Lift Ships Cargo
28 pages
Unit-1 Transparency in DDBMS
No ratings yet
Unit-1 Transparency in DDBMS
15 pages
The Relational Data Model and Relational Database Constraints
No ratings yet
The Relational Data Model and Relational Database Constraints
41 pages
DISTRIBUTED DATABASES Presentation
No ratings yet
DISTRIBUTED DATABASES Presentation
13 pages
Poverty Porn PDF
No ratings yet
Poverty Porn PDF
4 pages
Query Processing - Database Questions & Answers - Sanfoundry 00
No ratings yet
Query Processing - Database Questions & Answers - Sanfoundry 00
7 pages
CS8492 DBMS Unit 2 PDF
100% (1)
CS8492 DBMS Unit 2 PDF
18 pages
Unit-3 Part 1 Normalization
No ratings yet
Unit-3 Part 1 Normalization
31 pages
Cse V Database Management Systems
No ratings yet
Cse V Database Management Systems
82 pages
Introduction To Entomology: Definition-The Scientific Study of Insects
No ratings yet
Introduction To Entomology: Definition-The Scientific Study of Insects
25 pages
7 Query Localization
No ratings yet
7 Query Localization
27 pages
DDBMS Exam Questions
No ratings yet
DDBMS Exam Questions
3 pages
Unit 3 (Distributed DBMS Architecture) : Architecture: The Architecture of A System Defines Its Structure
No ratings yet
Unit 3 (Distributed DBMS Architecture) : Architecture: The Architecture of A System Defines Its Structure
11 pages
Elmasri and Navathe DBMS Concepts 25
No ratings yet
Elmasri and Navathe DBMS Concepts 25
10 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Ideas For Change and Reforms in Police
No ratings yet
Ideas For Change and Reforms in Police
196 pages
DBMS Self Notes CHP 1
No ratings yet
DBMS Self Notes CHP 1
7 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Sonali DBMS Notes
100% (13)
Sonali DBMS Notes
61 pages
Memo (Blank)
No ratings yet
Memo (Blank)
3 pages
Parallel Database Systems
No ratings yet
Parallel Database Systems
17 pages
WillMaker Legal Manual
No ratings yet
WillMaker Legal Manual
455 pages
Mass Media and Globalization: January 2012
No ratings yet
Mass Media and Globalization: January 2012
15 pages
ADBMS Sem 1 Mumbai University (MSC - CS)
No ratings yet
ADBMS Sem 1 Mumbai University (MSC - CS)
39 pages
Advanced DBMS Concepts Practical 1
No ratings yet
Advanced DBMS Concepts Practical 1
1 page
Competency - Based Learning Material: Tourism Bread and Pastry Production Ncii
No ratings yet
Competency - Based Learning Material: Tourism Bread and Pastry Production Ncii
103 pages
Hashing Concepts in DBMS PDF
No ratings yet
Hashing Concepts in DBMS PDF
7 pages
Keybook For Principles Accounting I.Com Part 1 PDF
No ratings yet
Keybook For Principles Accounting I.Com Part 1 PDF
163 pages
Civil Peace PDF
No ratings yet
Civil Peace PDF
11 pages
Ensure You Have Session Handout: Private Discussion
No ratings yet
Ensure You Have Session Handout: Private Discussion
20 pages
Backup and Recovery
No ratings yet
Backup and Recovery
35 pages
Mohammed or Christ 00 Z We M
No ratings yet
Mohammed or Christ 00 Z We M
312 pages
Functional Dependency
No ratings yet
Functional Dependency
2 pages
MR Chips
No ratings yet
MR Chips
10 pages
Mpstme Prospectus 2011
No ratings yet
Mpstme Prospectus 2011
76 pages
sathyabama-IIsem-Distributed Computing-683201-783201
No ratings yet
sathyabama-IIsem-Distributed Computing-683201-783201
2 pages
7 - Term Structure
No ratings yet
7 - Term Structure
44 pages
Distributed Database
No ratings yet
Distributed Database
7 pages
Presentation On Kotak Mahindra Bank
No ratings yet
Presentation On Kotak Mahindra Bank
9 pages
(WWW - Entrance-Exam - Net) - NET Comparative Literature (Paper II) Sample Paper 7 PDF
No ratings yet
(WWW - Entrance-Exam - Net) - NET Comparative Literature (Paper II) Sample Paper 7 PDF
24 pages
Chinese Cinema Syllabus
No ratings yet
Chinese Cinema Syllabus
6 pages
Mobilebankingguideen
No ratings yet
Mobilebankingguideen
43 pages
The Exposition of Themes in The Fellowsh
No ratings yet
The Exposition of Themes in The Fellowsh
11 pages
Report Writing 3315
No ratings yet
Report Writing 3315
5 pages
Comparative Literature: Test Booklet No
No ratings yet
Comparative Literature: Test Booklet No
16 pages
NLC & ST Product Catalog
No ratings yet
NLC & ST Product Catalog
8 pages
Research Proposalextra-Curricular Activities
No ratings yet
Research Proposalextra-Curricular Activities
26 pages
Comparative Study of Different Formats of MCQS: Multiple True-False and Single Best Answer Test Formats, in A New Medical School of Malaysia
No ratings yet
Comparative Study of Different Formats of MCQS: Multiple True-False and Single Best Answer Test Formats, in A New Medical School of Malaysia
5 pages
OS 2 Marks
100% (11)
OS 2 Marks
15 pages
Mahindra & Mahindra Acquires SSanYong
No ratings yet
Mahindra & Mahindra Acquires SSanYong
11 pages
Toxicity and Growth Inhibitory Activitie
No ratings yet
Toxicity and Growth Inhibitory Activitie
7 pages
The Effects of Job Satisfaction of Employees in Fa
No ratings yet
The Effects of Job Satisfaction of Employees in Fa
14 pages
Pharos Media
No ratings yet
Pharos Media
4 pages
The Absurd in Endgame and The Metamorphosis
No ratings yet
The Absurd in Endgame and The Metamorphosis
7 pages
CS 8803 AIAD - Project Report
No ratings yet
CS 8803 AIAD - Project Report
17 pages
Expectations and Experiences of Customer PDF
No ratings yet
Expectations and Experiences of Customer PDF
12 pages
Govt. Postgraduate College Samanabad Faisalabad: Submitted To
No ratings yet
Govt. Postgraduate College Samanabad Faisalabad: Submitted To
6 pages
ECS - Anika 211011083 Final-Assessment
No ratings yet
ECS - Anika 211011083 Final-Assessment
4 pages
Ive Got A Feeling
No ratings yet
Ive Got A Feeling
3 pages
Class XII B (24-25)
No ratings yet
Class XII B (24-25)
2 pages
What Is World Literature
No ratings yet
What Is World Literature
3 pages
Concurrency Control in Distributed Datab
No ratings yet
Concurrency Control in Distributed Datab
5 pages
The Symbolism in J. R. R. Tolkien "Trilogy of The Lord of
No ratings yet
The Symbolism in J. R. R. Tolkien "Trilogy of The Lord of
11 pages
Subject: Letter of Acceptance and Invitation
No ratings yet
Subject: Letter of Acceptance and Invitation
2 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)