0% found this document useful (0 votes)
84 views16 pages

Distributed Database Systems: Objectives

This document provides an overview of distributed database systems. It discusses client/server databases and the two-tier model where clients make requests to servers that store and process the database. The document also describes three-tier architectures and distributed database systems, comparing strategies for performing updates in distributed systems like two phase commit protocols and asynchronous replication.

Uploaded by

herokul
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views16 pages

Distributed Database Systems: Objectives

This document provides an overview of distributed database systems. It discusses client/server databases and the two-tier model where clients make requests to servers that store and process the database. The document also describes three-tier architectures and distributed database systems, comparing strategies for performing updates in distributed systems like two phase commit protocols and asynchronous replication.

Uploaded by

herokul
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 17.

Distributed Database
Systems

Table of Contents
Objectives ................................................................................................................. 1
Introduction ............................................................................................................... 1
Context ..................................................................................................................... 2
Client/Server Databases ................................................................................................ 2
The 2-tier model ................................................................................................................. 2
Advantages of the client/server approach ................................................................................. 4
Disadvantages of the client/server approach ............................................................................. 4
Variants of the 2-tier model ................................................................................................... 5
The three-tier architecture ..................................................................................................... 6
Distributed Atabase Systems ......................................................................................... 7
Background to distributed systems ......................................................................................... 7
Motivation for distributed database systems ............................................................................. 8
Update Strategies for Replicated and Non-Replicated Data ........................................................ 11
The Two Phase Commit (2PC) protocol ................................................................................ 11
Read-Once / Write-All Protocol ........................................................................................... 12
Lazy or asynchronous Replication ........................................................................................ 12
SYBASE and ORACLE Distributed Database Technology ....................................................... 13
Application replication methodologies: ................................................................................. 15
The Effect of Distributed Database Technologies on End Users ................................................. 15
Summary of replication strategies ......................................................................................... 15
Discussion Topics ..................................................................................................... 16

Objectives
At the end of this unit you should be able to:

• Understand what is meant by Client/Server database systems, and describe variations of the Client/
Server approach

• Describe the essential characteristics of Distributed database systems

• Distinguish between Client/Server databases and Distributed databases

• Describe mechanisms to support distributed transaction processing

Compare strategies for performing updates in distributed database systems

Introduction
Distributed databases have become an integral part of business computing in the last 10 years. The abil-
ity to maintain the integrity of data and provide accurate and timely processing of database queries and
updates across multiple sites has been an important factor in enabling businesses to utilise data in a

1
Distributed Database Systems

range of different locations, sometimes on a global scale. Standardisation of query languages, and of the
Relational and Object models has assisted the integration of different database systems to form networks
of integrated data services. The difficulties of ensuring the integrity of data, that updates are timely, and
that users receive a uniform rate of response no matter where on the network they are situated remain, in
many circumstances, major challenges to database vendors and users. In this unit we shall introduce the
topic of distributed database systems. We shall examine a range of approaches to distributing data across
networks, and examine a range of strategies for ensuring the integrity and timeliness of the data con-
cerned. We shall look at mechanisms for enabling transactions to be performed across different ma-
chines, and the various update strategies that can be applied when data is distributed across different
sites.

Context
Many of the issues considered in other units of this module require a degree of further consideration
when translated into a distributed context. When it becomes a requirement to distribute data across a net-
work, the processes of transaction processing, concurrency control, recovery, security and integrity con-
trol and update propagation become significantly more involved. In this unit we shall introduce a num-
ber of extensions to mechanisms which we have previously considered for non-distributed systems.

Client/Server Databases
For years, serious business databases were monolithic systems running only on one large machine, ac-
cessed by dumb terminals. In the late 1980s, databases evolved so that an application, called a ‘client’,
on one machine could run against a database, called a ‘server’ on another machine. At first, this client/
server database architecture was only used on mini and mainframe computers, Between 1987 and 1988,
vendors like Oracle Corp. and Gupta first moved the client function and then the database server down
to microcomputers and Local Area Networks (LANs).

Today, the client/server concept has evolved to cover a range of approaches to distributing the pro-
cessing of an application in a variety of ways between different machines.

An example of the client-server approach is the SQLserver system, from Microsoft. The SQLserver sys-
tem is run on a server machine, which is usually a fairly powerful PC. A client program is run, usually
on a separate machine, and makes requests for data to SQLserver via a local area network (LAN). The
application program would typically be written in a language such as Visual Basic or Java. This ap-
proach allows multiple client machines on the network to request the same records from the database on
the server. SQLserver will ensure that only one user at a time modifies any specific record.

The 2-tier model


On any LAN, there will always be at least two entities: The Client machine, which is used to make re-
quests of the database, and the server machine, which stores the database and processes received re-
quests. In general, the client/server architecture involves multiple computers connected in a network.
Some of the computers (clients) process application programs and some computers (servers) perform
database processing.

This approach is known as the 2-tier model of client/server computing, as it is made up of the two types
of component, clients and servers. It is also possible that a machine that acts as a server to some clients,
may itself act as a client to another server. This arrangement also falls under the 2-tier model, as it still
only comprises the two types of machine within the network.

The Client
This is the front-end of the client/server system. It handles all aspects of the user interface — it is the
‘front-end’ of the system since the client presents the system to the user . It can also be used to provide

2
Distributed Database Systems

PC based application development tools used to enter, display, query and manipulate data on the central
server, and to build applications. The Client operating system is usually Windows, , MACOS, Linux, or
Unix.

The Server
Servers perform functions such as database integrity checking, data dictionary maintenance and provide
concurrent access control. Moreover, they also perform recovery and optimise query processing. The
Server controls access to the data by enforcing locking rules to ensure data integrity during transactions.
The Server can be a PC, mini or mainframe computer, and usually employs a multi-tasking operating
system such as OS/2, Unix or MVS.

Query processing in 2 tier client/server systems


Typically in a client/server environment, the user will interact with the client machine through a series
of menus, forms and other interface components. Supposing the user completes a form to issue a query
against a customer database. This query may be transformed into an SQL SELECT statement by code
running on the client machine. The client will then transmit the SQL query over the network to the serv-
er. The server receives the command, verifies the syntax, checks the existence and availability of the ref-
erenced objects, verifies that the user has SELECT privileges, and finally executes the query. The result-
ing data is formatted and sent to the application, along with return codes (used to identify whether the
query was successful, or if not, which error occurred). On receipt of the data, the client might carry out
further formatting, for example creating a graph of the data, before displaying it to the user.

The following diagram illustrates how a user interacts with a client system, which transmits queries to
the database server. The server processes the queries and returns the results of the query or a code indic-
ating failure of the query for some reason.

3
Distributed Database Systems

Advantages of the client/server approach


• Users do not have to retain copies of corporate data on their own PC's, which would become quickly
out of date. They can be assured that they are always working with the current data, which is stored
on the Server machine.

• Processing can be carried out on the machine most appropriate to the task. Data intensive processes
can be carried out on the Server, whereas data entry validation and presentation logic can be ex-
ecuted on the Client machine. This reduces unnecessary network traffic and improves overall per-
formance. It also gives the possibility of optimising the hardware on each machine to the particular
tasks required of that machine.

• Scalability. if the number of users of an application grows, extra client machines can be added (up to
a limit determined by the capacity of the network or server) without significant changes to the serv-
er.

Disadvantages of the client/server approach


• Operating database systems over a LAN or wide area network (WAN) brings extra complexities of
developing and maintaining the network. The interfaces between the programs running on the Client
and Server machines must be well understood. This usually becomes increasingly complex when the
applications and/or DBMS software come from different vendors.

4
Distributed Database Systems

• Security is a major consideration, as preventative measures must be in place to protect data theft or
corruption on the client and server machines, and during transmission over the network.

Reviw Questions

• Explain what is meant by the 2-tier model of Client/Server computing.

• Why is it sometimes said that Client/Server computing improves the scalability of applications?

• What additional security issues are involved in the use of a Client/Server system, compared with a
traditional mainframe database accessed via dumb terminals?

Variants of the 2-tier model


Business (application) logic
The relative workload on the client and server machines is greatly affected by the way in which the ap-
plication code is distributed. The more application logic that is placed on client machines, the more of
the work these machines will have to do. Furthermore, more data will need to be transmitted from serv-
ers to client machines because the servers do not contain the application logic that might have been used
to eliminate some of the data prior to transmission. We can avoid this by transferring some of the applic-
ation logic to the server. This helps to reduce the load on both the clients and the network. This is known
as the split logic model of client/server computing. We can take this a stage further, and leave the client
with only the logic needed to handle the presentation of data to users, placing all of the functional logic
on servers, this is known as the remote presentation model.

Business logic implemented as Stored Procedures


The main mechanism for placing business logic on a server is known as ‘stored procedures’ — stored
procedures are collections of code that usually include SQL for accessing the database. Stored proced-
ures are invoked from client programs, and use parameters to pass data between the procedure and in-
voking program. If we choose to place all the business logic in stored procedures on the server, we re-
duce network traffic, as intermediate SQL results do not have to be returned to client machines. Stored
procedures are compiled, in contrast with uncompiled SQL sent from the client. A further performance
gain is that stored procedures are usually cached, therefore subsequent calls to them do not require addi-
tional disk access.

2 tier client/server architectures in which a significant amount of application logic resides on client ma-
chines suffer from the following further disadvantages:

• Upgrades and bug fixes must be made on every client machine. This problem is compounded by the
fact that the client machines may vary in type and configuration.

• The procedural languages used commonly to implement stored procedures are not portable between
machines. Therefore, if we have servers of different types, for example Oracle and Sybase, the
stored procedures will have to be coded differently on the different server machines. In addition, the
programming environments for these languages do not provide comprehensive language support as
is found in normal programming languages such as C++ or Java, and the testing/debugging facilities
are limited.

Though the use of stored procedures improves performance (by reducing the load on the clients and net-
work) taking this too far will limit the number of users that can be accommodated by the server — i.e.

5
Distributed Database Systems

there will reach a point at which the server becomes a bottleneck because it is overloaded with the pro-
cessing of application transactions.

The three-tier architecture


The 3-tier architecture introduces an applications server, which is a computer that fits in the middle
between the clients and server. With this configuration, the client machines are freed up to focus on the
validation of data entered by users, and the formatting and presentation of results. The server is also
freed to concentrate on data-intensive operations, i.e. the retrieval and update of data, query optimisa-
tion, processing of declarative integrity constraints and database triggers etc. The new middle tier is ded-
icated to the efficient execution of the application logic.

The application server can perform transaction management, and if required ensure distributed database
integrity. It centralises the application logic for easier administration and change.

The 3-tier model of client/server computing is best suited to larger installations. This is true because
smaller installations simply do not have the volume of transactions to warrant an intermediate machine
dedicated to application logic.

The following diagram illustrates how a user interacts with a client system, which deals with the valida-
tion of user input and the presentation of query results. The client system communicates with a middle
tier system, the applications server, which formulates queries from the validated input data and sends
queries to the database server. The database server processes the queries and sends to the applications
server the results of the query or a code indicating failure of the query for some reason. The application
server then processes these results (or code) and sends data to the client system to be formatted and
presented to the user.

6
Distributed Database Systems

Activity 1 – Investigating stored procedures


Read the documentation of the DBMS of your choice and investigate stored procedures as implemented
in the environment. A visit to the sofware's website maybe also be useful. Identify the types of situations
in which stored procedures are used, and by looking at a number of examples of stored procedures de-
velop an overall understanding for the structure of procedures and how they are called from a PL/SQL
program.

Distributed Atabase Systems


Background to distributed systems
A distributed database system consists of several machines, the database itself being stored on these ma-
chines which communicate with one another usually via high-speed networks or telephone lines. It is not
uncommon for these different machines to vary in both technical specification and function, depending
upon their importance and position in the system as a whole. Note that the generally understood descrip-
tion of a distributed database system given here is rather different from the client/server systems we ex-
amined earlier in the unit. In client/server systems, the data is not itself distributed; it is stored on a serv-
er machine, and accessed remotely from client machines. In a distributed database system on the other
hand, the data is itself distributed among a number of different machines. Decisions therefore need to be
made about the way in which the data is to be distributed and also about how updates are to be propag-

7
Distributed Database Systems

ated over multiple sites.

The distribution of data, by an organisation, throughout its various sites and departments allows data to
be stored where it was generated, or indeed is most used, while still being easily accessible from else-
where. The general structure of a distributed system is shown below;

In effect, each separate site in the example is really a database system site in its own right, each location
having its own databases, as well as its own DBMS and transaction management software. It is com-
monly assumed when discussion arises concerning distributed database systems, that the many databases
involved are widely spread from each other in geographical terms. Importantly, it should be made clear
that geographical distances will have little or no effect upon the overall operation of the distributed sys-
tem, the same technical problems arise whether the databases are simply logically separated as they
would if separated by a great distance.

Motivation for distributed database systems


So why have distributed databases become so desirable? There are a number of reasons that promote the
use of a distributed database system, these can include such things as:

• the sharing of data

• local autonomy

• data availability.

Furthermore, it is likely that the organisation that has chosen to implement the system will itself be dis-
tributed. By this we mean that there are almost always several departments and divisions within the
company structure. An illustrative example is useful here in clarifying the benefits that can be gained by
the use of distributed database systems;

Scenario banking system


8
Distributed Database Systems

Imagine a banking system that operates over a number of separate sites; for the sake of this example let
us consider two offices, one in Manchester and another in Birmingham. Account data for Manchester
accounts is stored in Manchester, while Birmingham’s account data is stored in Birmingham. We can
now see that two major benefits are afforded by the use of this system, efficiency of processing is in-
creased due to the data being stored where it is most frequently accessed, while an increase in the ac-
cessibility of account data is also gained.

The use of distributed database systems is not without its drawbacks however. The main disadvantage
being the added complexity that is involved in ensuring that proper co-ordination between the various
sites is possible. This increase in complexity can take a variety of forms:

• Greater potential for bugs - With a number of databases operating concurrently; ensuring that al-
gorithms for the operation of the system are correct becomes an area of great difficulty. The poten-
tial is here for the existence of extremely subtle bugs.

• Greater potential for bugs - With a number of databases operating concurrently; ensuring that al-
gorithms for the operation of the system are correct becomes an area of great difficulty. The poten-
tial is here for the existence of extremely subtle bugs.

• Increased processing overhead - The additional computation required in order to achieve intersite
co-ordination is a considerable overhead not present in centralised systems.

Date (1999) gives the ‘fundamental principle’ behind a truly distributed database:

‘to the user; a distributed system should look exactly like a NONdistributed system.’

In order to accomplish this fundamental principal twelve subsidiary rules have been established. These
twelve objectives are listed below:

1. Local autonomy

2. No reliance on a central site

3. Continuous operation

4. Location independence

5. Fragmentation independence

6. Replication independence

7. Distributed query processing

8. Distributed transaction management

9. Hardware independence

10. Operating system independence

11. Network independence

12. DBMS independence

These twelve objectives are not all independent of one another. Furthermore, they are not necessarily ex-
haustive and moreover, they are not all equally significant.

9
Distributed Database Systems

Probably the major issue to be handled in distributed database systems is the way in which updates are
propagated throughout the system. Two key concepts play a major role in this process

• Data fragmentation: the splitting up of parts of the overall database across different sites

• Data replication: the process of maintaining updates to data across different sites

Fragmentation independence
A system can support data fragmentation if a given stored relation can be divided up into pieces or
‘fragments’ for physical storage purposes. Fragmentation is desirable for performance reasons: data can
be stored at the location where it is most frequently used, so that most operations are purely local and
network traffic is reduced.

A fragment can be any arbitrary sub-relation that is derivable from the original relation via restriction
(horizontal fragmentation) and projection (vertical fragmentation) operations. Fragmentation independ-
ence (also known as fragmentation transparency), therefore, allows users to behave, at least from a lo-
gical standpoint, as if the data were not fragmented at all. This implies that users will be presented with
a view of the data in which the fragments are logically combined together by suitable joins and unions.
It is the responsibility of the system optimiser to determine which fragment needs to be physically ac-
cessed in order to satisfy any given user request.

Replication Independence
A system supports data replication if a given stored relation - or, more generally, a given fragment - can
be represented by many distinct copies or replicas, stored at many distinct sites.

Replication is desirable for at least two reasons: First it can mean better performance (applications can
operate on local copies instead of having to communicate with remote sites); second, it can also mean
better availability (a given replicated object - fragment or whole relation - remains available for pro-
cessing so long as at least one copy remains available, at least for retrieval purposes).

What problems are associated with data replication and fragmenta-


tion?
Both data replication and fragmentation have their related problems in implementation. However, dis-
tributed non-replicated data only has problems when the relations are fragmented.

The problem of supporting operations, such as updating, on fragmented relations has certain points in
common with the problem of supporting operations on join and union views. It follows too that updating
a given tuple might cause that tuple to migrate from one fragment to another, if the updated tuple no
longer satisfies the relation predicate for the fragment it previously belonged to. (Date, 1999).

Replication also has its associated problems. The major disadvantage of replication is that when a given
replicated object is updated, all copies of that object must be updated — the update propagation prob-
lem. Therefore, in addition to transaction, system and media failures that can occur in a centralised
DBMS, a distributed database system (DDBMS) must also deal with communication failures. Commu-
nications failures can result in a site holding a copy of the object being unavailable at the time of the up-
date.

Furthermore, the existence of both system and communication failures poses complications because it is
not always possible to differentiate between the two (Ozsu and Valduriez, 1996).

Update Strategies for Replicated and Non-Replicated


10
Distributed Database Systems

Data
There are many update strategies for replicated and fragmented data. This section will explore these
strategies and will illustrate them with examples from two of the major vendors.

Eager (synchronous) Replication


Gray et al (1996) states that Eager Replication keeps all replicas exactly synchronised at all nodes (sites)
by updating all the replicas as part of one atomic transaction. Eager replication gives serialisable execu-
tion, therefore there are no concurrency anomalies. But, eager replication reduces update performance
and increases transaction times because extra updates and messages are added to the transaction. Eager
Replication typically uses a locking scheme to detect and regulate concurrent execution.

With Eager Replication, reads at connected nodes give current data. Reads at disconnected nodes may
give stale (out-of-date) data. Simple eager replication systems prohibit updates if any node is disconnec-
ted. For high availability, Eager Replication systems allow updates among members of the quorum or
cluster. When a node joins the quorum, the quorum sends the node all replica updates since the node was
disconnected.

Eager Replication and Distributed Reliability Protocols


Distributed Reliability Protocols (DRPs) are implementation examples of Eager Replication. DRPs are
synchronous in nature (ORACLE, 1993), and often use remote procedure calls (RPCs).

DRPs enforce atomicity (the all-or-nothing property) of transactions by implementing atomic commit-
ment protocols such as the two-phase commit (2PC) (Gray, 1979). Although 2PC are required in any en-
vironment in which a single transaction can interact with several autonomous resource managers, it is
particularly important in a distributed system (Ozsu and Valduriez, 1996). 2PC extends the effects of
local atomic commit actions to distributed transactions by insisting that all sites involved in the execu-
tion of a distributed transaction agree to commit the transaction before its effects are made permanent.
Therefore, 2PC is an example of one copy equivalence, which asserts that the values of all physical cop-
ies of a logical data item should be identical when the transaction that updates it terminates.

The inverse of termination is recovery. Distributed recovery protocols deal with the problem of recover-
ing the database at a failed site to a consistent state when that site recovers from failure. (Ozsu and Val-
duriez, 1996). The 2PC protocol also incorporates recovery into its remit.

Exercise 1
What is meant by the terms “atomic commitment protocol” and “one copy equivalence”?

The Two Phase Commit (2PC) protocol


The 2PC protocol works in the following way (adapted from Date, 1999): COMMIT or ROLLBACK is
handled by a system component called the Co-ordinator, whose task it is to guarantee that all resource
managers commit or roll-back the updates they are responsible for in unison - and furthermore, to
provide that guarantee even if the system fails in the middle of the process.

Assume that the transaction has completed its database processing successfully, so that the system-wide
operation it issues is COMMIT, not ROLLBACK. On receiving the COMMIT request, the Co-ordinator
goes through the following two-phase process:

1. First, the co-ordinator instructs all resource managers to get ready either to commit or rollback the
current transaction.

2. In practice, this means that each participant in the process must force-write all log entries for local

11
Distributed Database Systems

resources used by the transaction out to its own physical log. Assuming the force-write is success-
ful, the resource manager now replies ‘OK’ to the Co-ordinator, otherwise it replies ‘Not OK’.

3. When the Co-ordinator has received replies from all participants, it force-writes an entry to its own
physical log, recording its decision regarding the transaction. If all replies were ‘OK’, that decision
is COMMIT; if any replies were ‘Not OK’, the decision is ROLLBACK. Either way, the Co-
ordinator then informs each participant of its decision, and each participant must then COMMIT or
ROLLBACK the transaction locally, as instructed.

If the system or network fails at some point during the overall process, the restart procedure will look for
the decision record in the Co-ordinator’s log. If it finds it, then the 2PC process can pick up where it left
off. If it does not find it, then it assumes that the decision was ROLLBACK, and again the process can
complete appropriately. However, in a distributed system, a failure on the part of the Co-ordinator might
keep some participants waiting for the Co-ordinator’s decision. Therefore, as long as the participant is
waiting, any updates made by the transaction via that participant are kept locked.

Review Questions

• Explain the role of an Application Server in a 3-tier Client/Server network.

• Distinguish between the term’s “fragmentation” and “replication” in a distributed database environ-
ment.

• Describe the main advantage and disadvantage of Eager Replication

• During the processing of the 2-phase commit protocol, what does the co-ordinator process do if it is
informed by a local resource manager process that it was unable to force-write all log entries for the
local resources used by the transaction out to its own physical log?

Read-Once / Write-All Protocol


A further replica-control protocol that enforces one-copy serialisability is known as Read-Once / Write-
All (ROWA) protocol. ROWA protocol is simple, but it requires that all copies of a logical data item be
updated before the transaction can terminate.

Failure of one site may block a transaction, reducing database availability.

A number of alternative algorithms have been proposed that reduce the requirement that all copies of a
logical data item be updated before the transaction can terminate. They relax ROWA by mapping each
write to only a subset of the physical copies. One well-known approach is quorum-based voting, where
copies are assigned votes and read and write operations have to collect votes and achieve a quorum to
read/write data.

Three phase commit is a non-blocking protocol which prevents the 2PC blocking problem from occur-
ring by removing the uncertainty for participants after their votes have been placed. This is done through
the inclusion of a pre-commit phase that relays information to participants advising them that a commit
will occur in the near future.

Lazy or asynchronous Replication


Eager Replication update strategies, as identified above, are synchronous, in the sense that they require
the atomic updating of some number of copies. Lazy Group Replication and Lazy Master Replication
both operate asynchronously.

12
Distributed Database Systems

If the users of distributed database systems are willing to pay the price of some inconsistency in ex-
change for the freedom to do asynchronous updates, they will insist that:

1. the degree of inconsistency be bounded precisely, and that

2. the system guarantees convergence to standard notions of ‘correctness’.

Without such properties, the system in effect becomes partitioned as the replicas diverge more and more
from one another (Davidson et al, 1985).

Lazy Group Replication


Lazy Group Replication, however, allows any node to update any local data. When the transaction com-
mits, a transaction is sent to every other node to apply the root transaction’s updates to the replicas at the
destination node. It is possible for two nodes to update the same object and race each other to install
their updates at other nodes. The replication mechanism must detect this and reconcile the two transac-
tions so that their updates are not lost (Gray et al, 1996).

Timestamps are commonly used to detect and reconcile lazy-group transactional updates. Each object
carries the timestamp of its most recent update. Each replica update carries the new value and is tagged
with the old object timestamp. Each node detects incoming replica updates that would overwrite earlier
committed updates. The node tests if the local replica’s timestamp and the update’s old timestamp are
equal. If so, the update is safe. The local replica’s timestamp advances to the new transaction’s
timestamp and the object value is updated. If the current timestamp of the local replica does not match
the old timestamp seen by the root transaction, then the update may be ‘dangerous’. In such cases, the
node rejects the incoming transaction and submits it for reconciliation. The reconciliation process is then
responsible for applying all waiting update transactions in their correct time sequence.

Transactions that would wait in an Eager Replication system face reconciliation in a Lazy Group Replic-
ation system. Waits are much more frequent than deadlocks because it takes two waits to make a dead-
lock.

Lazy Master Replication


Another alternative to Eager Replication is Lazy Master Replication. Gray et al (1996) states that this
replication method assigns an owner to each object. The owner stores the object’s correct value. Updates
are first done by the owner and then propagated to other replicas. When a transaction wants to update an
object, it sends a Remote Procedure Call (RPC) to the node owning the object. To achieve serialisability,
a read action should send read-lock RPCs to the masters of any objects it reads. Therefore, the node ori-
ginating the transaction broadcasts the replica updates to all the slave replicas after the master transac-
tion commits. The originating node sends one slave transaction to each slave node. Slave updates are
time-stamped to assure that all the replicas converge to the same final state. If the record timestamp is
newer than a replica update timestamp, the update is ‘stale’ and can be ignored. Alternatively, each mas-
ter node sends replica updates to slaves in sequential commit order.

Review Question
When an asynchronous update strategy is being used, if two copies of a data item are stored at different
sites, what mechanism can be used to combine the effect of two separate updates being applied to these
different copies.

SYBASE and ORACLE Distributed Database Technology


The following section details the strategies that are employed by the ORACLE and Sybase Corporations
with reference to replicated data in the distributed environment. Where applicable they will be compared

13
Distributed Database Systems

with the strategies mentioned for coping with data replication in a distributed database environment.

Strategies used by SYBASE in supporting update propagation?


At the SYBASE application level the programmer has a choice of three models for update propagation:

1. Local First Update

The method which is most robust in the face of component and network failure is to update the loc-
al copy first. Changes to the local copy propagate to the primary copy where they will be applied
and then replicated to others copies. This is a loose consistency protocol which allows for the cre-
ation of inconsistencies. It is possible that by the time the transaction is applied at the primary, it
has become invalid. In such a case, the inconsistency report will be transmitted back to the source
of the update. At the local site, rules can be put in place to handle typical inconsistencies introduced
by this transaction mechanism. The benefit of this method is that transactions can commit even
when the primary site is unavailable.

Local First Update is an example of asynchronous symmetric Lazy Group Replication strategy.

2. Primary First Update

This method allows the primary copy to be updated first. Once the update succeeds at the primary,
it propagates to all replicas, including the site of the transaction’s origin. This transaction model re-
duces the risk of inconsistency, but at the cost of lower availability.

Primary First Update is an example of asynchronous symmetric Lazy Master Replication strategy.

In order to prevent lost updates, a remote site can ask to replicate the ’version number’ of each row.
This control number is incremented each time the row is modified. The remote site can include the
version number as part of the qualifying clause of its UPDATE statement. Thus, if a row has been
changed since it was last copied to the replica, the update will fail and an error message will be re-
turned to the remote site.

3. Two-Phase Commit

An application will typically replicate all the necessary data to the local site and then perform a
Local First Update transaction, the pieces of which migrate to the various sites.

As previously stated, 2PC is an example of synchronous Eager Replication.

Strategies used by ORACLE 7 in supporting update propagation


Although ORACLE stored procedure calls are a synchronous RPC 2PC mechanism, ORACLE provides
two capabilities through asynchronous distributed technology:

ORACLE propagation of asynchronous RPCs is controlled by the application. The application may
choose event-based propagation at frequent intervals. Alternatively, propagation may be initiated at
points in time when connectivity is available or communications costs are lowest. Multiple queues can
be propagated at different frequencies according to priority. If a remote system is unavailable the asyn-
chronous RPCs targeted for that system remain in their local queue for later propagation.

Changes to local copies of data are stored locally and then propagated and applied to remote copies in a
store-and-forward manner with separate transactions. If a remote system is unavailable, the changes re-
main in the local system and are propagated when the remote system becomes available. At a given
point in time some replicated copies may contain older data than others but over time the copies con-
verge to the same values.

14
Distributed Database Systems

Application replication methodologies:


There are two overall approaches to replication:

1. Primary Site, as identified in the SYBASE Primary First Update example and

2. Dynamic Ownership, as identified in the SYBASE Local First Update example.

ORACLE also supports Symmetric Replication of both full-tables and fragments of tables. This is
achieved through updateable snap-shots and N-way Replication. These can be read about in the refer-
enced Oracle white paper.

We have seen that current DDBMSs provide support for a range of replication strategies. It is up to the
application developer which of the available technologies is incorporated into their distributed database
applications. Their choice is based on a cost-benefit analysis of the relevant protocols available for
users.

The Effect of Distributed Database Technologies on End


Users
The way in which an application is coded and therefore which distributed technologies it uses should be
decided by the business case.

For example, SYBASE’s asynchronous Lazy Group Replication called Local First Update or OR-
ACLE’s Dynamic Ownership method would be chosen for low-value transactions and transactions
where the cost of such an error is very much less than the cost of failing to perform the transaction. It is
also suitable for the case where the chance of inconsistency is very low. The benefit of this method is
that transactions can commit even when the primary (master) site is unavailable.

Furthermore, for transactions with a higher probability of failure, or for transactions where the cost of
errors exceeds the cost of being unable to perform the transaction, it would be advisable to use a Lazy
Master Replication strategy such as SYBASE’ Primary First Update or ORACLE’s Primary Site asyn-
chronous symmetric data replication strategy.

However, for most applications, Eager Replication strategies - such as two-phase commit - will repres-
ent an unnecessary overhead due to the considerable additional communications required for message
and data transmission over the network.

However, there are some high value transactions which require a tightly consistent simultaneous update
of all primary sites. In this case, two-phase commit is the appropriate protocol.

A further consideration is mobile applications. Gray et al (1996) states that Eager Replication is not an
option for mobile applications where most nodes are normally disconnected. Mobile applications require
Lazy Replication algorithms that asynchronously propagate replica updates to other nodes after the up-
dating transaction commits. Some continuously connected systems use Lazy Master Replication to im-
prove response time. Lazy Replication also has its shortcomings, the most serious being state data ver-
sions. When transactions read and write data concurrently, one transaction’s updates should be serialised
after the other’s. This avoids concurrency anomalies.

As a final point it is important to note that from the point of view of performance, synchronous methods
decrease system availability and throughput as the size of the system increases. Moreover, from the
point of view of autonomy, federated databases may not wish to support this kind of tight coupling (Pu
and Leff, 1991).

Summary of replication strategies


15
Distributed Database Systems

It can be seen that both asynchronous and synchronous update strategies have their role to play in dis-
tributed database systems. Both SYBASE and ORACLE have options to incorporate either approach in-
to distributed applications. However, which strategy is employed is reliant on the application developers
understanding of the business case. Moreover, when choosing which strategy to adopt, one must con-
sider, the value of transactions, the size of data transfer over a Wide Area Network (WAN), the robust-
ness of the system, the performance of the system, and the performance of the system in relation to end-
user query, insertion, update, and delete requirements.

A final consideration would be the status of local databases. As mobile computing become increasingly
popular, it is likely that the problems associated with data replication in distributed database systems
will continue to draw considerable attention.

Review Question
List the characteristics of applications that can benefit most from

• Synchronous replication

• Asynchronous replication

Discussion Topics
1. We have covered the client/server and true distributed database approaches in this unit. Client/serv-
er systems distribute the processing, whereas distributed systems distribute both the processing and
the data. Discuss the proposition that most commercial applications are adequately supported by a
client/server approach, and do not require the additional features of a truly distributed database sys-
tem.

2. Discuss the proposition that, in those situations where a distributed database solution is required,
most applications are adequately provided for by a lazy or asynchronous replication strategy, and
do not require the sophistication of an eager or synchronous replication system. Discuss the implic-
ations for end users of synchronous and asynchronous updating.

16

You might also like