0% found this document useful (0 votes)
40 views15 pages

Unit-4 Ch-12

The document discusses the evolution and characteristics of Distributed Database Management Systems (DDBMS), highlighting their advantages over centralized systems, such as improved performance, reliability, and scalability. It explains the concepts of distributed processing, data distribution, and the importance of transparency features like distribution, transaction, and failure transparency in maintaining database integrity. Additionally, it covers the two-phase commit protocol for managing transactions across multiple sites to ensure consistency and reliability in distributed environments.

Uploaded by

priyaoza189
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views15 pages

Unit-4 Ch-12

The document discusses the evolution and characteristics of Distributed Database Management Systems (DDBMS), highlighting their advantages over centralized systems, such as improved performance, reliability, and scalability. It explains the concepts of distributed processing, data distribution, and the importance of transparency features like distribution, transaction, and failure transparency in maintaining database integrity. Additionally, it covers the two-phase commit protocol for managing transactions across multiple sites to ensure consistency and reliability in distributed environments.

Uploaded by

priyaoza189
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Unit Title: Distributed Databases

The Evolution of DDBMS


A distributed database management system (DDBMS) governs the storage and
processing of logically related data over interconnected computer systems in which
both data and processing are distributed among several sites.
During the 1970s, corporations implemented centralized database management
systems to meet their structured information needs. The use of a centralized database
required that corporate data be stored in a single central site, usually a mainframe
computer. Data access was provided through dumb terminals. The centralized
approach, worked well to fill the structured information needs of corporations, but it
fell short when quickly moving events required faster response times and equally
quick access to information. The slow progression from information request to
approval to specialist to user simply did not serve decision makers well in a dynamic
environment.

however, other factors such as performance and failure tolerance often require the
use of data replication techniques similar to those in distributed databases.
• The increased focus on mobile business intelligence. More and more companies are
acceptance mobile technologies within their business plans. As companies use social

BY, PROF. KAJAL PANDYA 1


Unit Title: Distributed Databases

networks to get closer to customers, the need for on-the-spot decision making
increases. Although a data warehouse is not usually a distributed database, it does
rely on techniques such as data replication and distributed queries that facilitate data
extraction and integration.
At this point, the long-term impact of the Internet and the mobile revolution on
distributed database design and management is just starting to be felt. Perhaps the
success of the Internet and mobile technologies will foster the use of distributed
databases as bandwidth becomes a less troublesome bottleneck.
In any case, distributed database concepts and components are likely to find a place
in future database development, particularly for specialized mobile and location-
aware applications.
The distributed database is especially desirable because centralized database
management is subject to problems such as:
• Performance degradation because of a growing number of remote locations over
greater distances.
• High costs associated with maintaining and operating large central (mainframe)
database systems and physical infrastructure.
• Reliability problems created by dependence on a central site (single point of failure
syndrome) and the need for data replication.
• Scalability problems associated with the physical limits imposed by a single
location, such as physical space, temperature conditioning, and power consumption.
• Organizational rigidity executed by the database, which means it might not support
the flexibility and agility required by modern global organizations.

BY, PROF. KAJAL PANDYA 2


Unit Title: Distributed Databases

Advantages and Disadvantages of DDBMS

BY, PROF. KAJAL PANDYA 3


Unit Title: Distributed Databases

Distributed Processing and Distributed Databases


In distributed processing, a database’s logical processing is shared among two
or more physically independent sites that are connected through a network. For
example, the data input/output (I/O), data selection, and data validation might be
performed on one computer, and a report based on that data might be created on
another computer.
Although the database resides at only one site (Miami), each site can access the data
and update the database. The database is located on Computer A, a network
computer known as the database server.

A distributed database stores a logically related database over two or more


physically independent sites. The sites are connected via a computer network. In
contrast, the distributed processing system uses only a single-site database but
shares the processing chores among several sites. In a distributed database system,
a database is composed of several parts known as database fragments. The
database fragments are located at different sites and can be virtual among various
sites. Each database fragment is, in turn, managed by its local database process.

BY, PROF. KAJAL PANDYA 4


Unit Title: Distributed Databases

Keep the following points in mind:


• Distributed processing does not require a distributed database, but a distributed
database requires distributed processing. (Each database fragment is managed by
its own local database process.)
• Distributed processing may be based on a single database located on a single
computer. For the management of distributed data to occur, copies or parts of the
database processing functions must be distributed to all data storage sites.
• Both distributed processing and distributed databases require a network of
interconnected components.

BY, PROF. KAJAL PANDYA 5


Unit Title: Distributed Databases

Levels of Process and data distribution


Single site processing single site data:
In the single-site processing, single-site data (SPSD) scenario, all processing is done
on a single host computer and all data are stored on the host computer’s local disk
system. Processing cannot be done on the end user’s side of the system. Such a
scenario is typical of most mainframe and midrange UNIX/Linux server DBMSs. The
DBMS is on the host computer, which is accessed by terminals connected to it. This
scenario is also typical of the first generation of single-user microcomputer databases.

Multiple site processing single site data:


Under the multiple-site processing, single-site data (MPSD) scenario, multiple
processes run on different computers that share a single data repository. Typically,
the MPSD scenario requires a network file server running conventional applications
that are accessed through a network. Many multiuser accounting applications running
under a personal computer network fit such a description.

BY, PROF. KAJAL PANDYA 6


Unit Title: Distributed Databases

Multiple site processing multiple site data:


The multiple-site processing, multiple-site data (MPMD) scenario describes a fully
distributed DBMS with support for multiple data processors and transaction
processors at multiple sites. Depending on the level of support for various types of
databases, DDBMSs are classified as either homogeneous or heterogeneous.
Homogeneous DDBMSs integrate multiple instances of the same DBMS over a
network.
Heterogeneous DDBMSs integrate different types of DBMSs over a network, but all
support the same data model.
A fully heterogeneous DDBMS will support different DBMSs, each one supporting a
different data model, running under different computer systems.

BY, PROF. KAJAL PANDYA 7


Unit Title: Distributed Databases

Distributed database Transparency Features


A distributed database system should provide some desirable transparency features
that make all the system’s complexities hidden to the end user. In other words, the
end user should have the sense of working with a centralized DBMS. For this reason,
the minimum desirable DDBMS transparency features are:
• Distribution transparency, which allows a distributed database to be treated as a
single logical database. If a DDBMS displays distribution transparency, the user does
not need to know:
-- That the data are partitioned—meaning the table’s rows and columns are split
vertically or horizontally and stored among multiple sites.
-- That the data are geographically dispersed among multiple sites.
-- That the data are replicated among multiple sites.
• Transaction transparency, which allows a transaction to update data at more than
one network site. Transaction transparency ensures that the transaction will be either
entirely completed or aborted, thus maintaining database integrity.
• Failure transparency, which ensures that the system will continue to operate in the
event of a node or network failure. Functions that were lost because of the failure will
be picked up by another network node. This is a very important feature, particularly in
organizations that depend on web presence as the backbone for maintaining trust in
their business.
• Performance transparency, which allows the system to perform as if it were a
centralized DBMS. The system will not suffer any performance degradation due to its
use on a network or because of the network’s platform differences. Performance
transparency also ensures that the system will find the most cost-effective path to
access remote data.

BY, PROF. KAJAL PANDYA 8


Unit Title: Distributed Databases

• Heterogeneity transparency, which allows the integration of several different local


DBMSs (relational, network, and hierarchical) under a common, or global, schema.
The DDBMS is responsible for translating the data requests from the global schema
to the local DBMS schema.
Distribution Transparency
Distribution transparency allows a physically spread database to be managed as
though it were a centralized database. The level of transparency supported by the
DDBMS varies from system to system. Three levels of distribution transparency are
recognized:
• Fragmentation transparency is the highest level of transparency. The end user or
programmer does not need to know that a database is partitioned. Therefore, neither
fragment names nor fragment locations are specified prior to data access.
• Location transparency exists when the end user or programmer must specify the
database fragment names but does not need to specify where those fragments are
located.
• Local mapping transparency exists when the end user or programmer must
specify both the fragment names and their locations.
Distribution transparency is supported by a distributed data dictionary (DDD) or a
distributed data catalog (DDC). The DDC contains the description of the entire
database as seen by the database administrator. The database description, known as
the distributed global schema, is the common database schema used by local TPs to
translate user requests into subqueries (remote requests) that will be processed by
different DPs. The DDC is itself distributed, and it is virtual at the network nodes.
Therefore, the DDC must maintain consistency through updating at all sites.

BY, PROF. KAJAL PANDYA 9


Unit Title: Distributed Databases

Transaction Transparency
Transaction transparency is a DDBMS property that ensures database
transactions will maintain the distributed database’s integrity and consistency.
Remember that a DDBMS database transaction can update data stored in many
different computers connected in a network. Transaction transparency ensures
that the transaction will be completed only when all database sites involved in
the transaction complete their part of the transaction.
Distributed database systems require complex mechanisms to manage transactions
and ensure the database’s consistency and integrity.
Distributed Request and Distributed transaction
Whether or not a transaction is distributed, it is formed by one or more database
requests. The basic difference between a nondistributed transaction and a
distributed transaction is that the latter can update or request data from several
different remote sites on a network.
A distributed transaction can reference several different local or remote DP sites.
Although each single request can reference only one local or remote DP site, the
transaction as a whole can reference multiple DP sites because each request can
reference a different site.
A distributed request lets a single SQL statement reference data located at several
different local or remote DP sites. Because each request (SQL statement) can access
data from more than one local or remote DP site, a transaction can access several
sites. The ability to execute a distributed request provides fully distributed database
processing because you can:
• Partition a database table into several fragments.
• Reference one or more of those fragments with only one request. In other words,
there is fragmentation transparency.

BY, PROF. KAJAL PANDYA 10


Unit Title: Distributed Databases

Distributed Concurrency Control


Concurrency control becomes especially important in distributed databases because
multisite, multiple-process operations are more likely to create data inconsistencies
and deadlocked transactions than single-site systems. For example, the TP
component of a DDBMS must ensure that all parts of the transaction are completed
at all sites before a final COMMIT is issued to record the transaction.
Suppose that a transaction updates data at three DP sites. The first two DP sites
complete the transaction and commit the data at each local DP; however, the
third DP site cannot commit the transaction. Such a scenario would produce an
inconsistent database, with its inevitable integrity problems, because committed data
cannot be uncommitted!
Two-phase commit protocol
Centralized databases require only one DP. All database operations take place
at only one site, and the significances of database operations are immediately known
to the DBMS. In contrast, distributed databases make it possible for a transaction
to access data at several sites. A final COMMIT must not be issued until all sites
have committed their parts of the transaction. The two-phase commit protocol
(2PC) guarantees that if a portion of a transaction operation cannot be
committed, all changes made at the other sites participating in the transaction
will be undone to maintain a consistent database state.
Each DP maintains its own transaction log. The two-phase commit protocol requires
that the transaction log entry for each DP be written before the database fragment is
actually updated (see Chapter 10). Therefore, the two-phase commit protocol
requires a DO-UNDO-REDO protocol and a write-ahead protocol.
The DO-UNDO-REDO protocol is used by the DP to roll transactions back and forward
with the help of the system’s transaction log entries.

BY, PROF. KAJAL PANDYA 11


Unit Title: Distributed Databases

The DO-UNDO-REDO protocol defines three types of operations:


• DO performs the operation and records the “before” and “after” values in the
transaction log.
• UNDO reverses an operation, using the log entries written by the DO portion of the
sequence.
• REDO redoes an operation, using the log entries written by the DO portion of the
sequence.
To ensure that the DO, UNDO, and REDO operations can continue a system crash
while they are being executed, a write-ahead protocol is used. The write-ahead
protocol forces the log entry to be written to permanent storage before the actual
operation takes place.
The two-phase commit protocol defines the operations between two types of
nodes: the coordinator and one or more subordinates, or cohorts. The
participating nodes agree on a coordinator. Generally, the coordinator role is assigned
to the node that initiates the transaction. The protocol is implemented in two phases,
as illustrated in the following sections.
Phase 1: Preparation
The coordinator sends a PREPARE TO COMMIT message to all subordinates.
1. The subordinates receive the message, write the transaction log using the write-
ahead protocol, and send an acknowledgment message (YES/PREPARED TO
COMMIT or NO/NOT PREPARED) to the coordinator.
2. The coordinator makes sure that all nodes are ready to commit, or it aborts the
action.
If all nodes are PREPARED TO COMMIT, the transaction goes to Phase 2. If one or
more nodes reply NO or NOT PREPARED, the coordinator broadcasts an ABORT
message to all subordinates.

BY, PROF. KAJAL PANDYA 12


Unit Title: Distributed Databases

Phase 2: The Final COMMIT


1. The coordinator broadcasts a COMMIT message to all subordinates and waits for
the replies.
2. Each subordinate receives the COMMIT message and then updates the database
using the DO protocol.
3. The subordinates reply with a COMMITTED or NOT COMMITTED message to the
coordinator.
If one or more subordinates do not commit, the coordinator sends an ABORT
message, thereby forcing them to UNDO all changes.
The objective of the two-phase commit is to ensure that each node commits its part
of the transaction; otherwise, the transaction is aborted. If one of the nodes fails to
commit, the information necessary to recover the database is in the transaction log,
and the database can be recovered with the DO-UNDO-REDO protocol.
Performance and failure transparency
Performance transparency allows a DDBMS to perform as if it were a centralized
database. In other words, no performance degradation should be incurred due to data
distribution.
Failure transparency ensures that the system will continue to operate in the case of a
node or network failure. Although these are two separate issues, they are interrelated
in that a failing node or congested network path could cause performance problems.
The objective of query optimization is to minimize the total cost associated with the
execution of a request. The costs associated with a request are a function of the
following:
• Access time (I/O) cost involved in accessing the data from multiple remote sites.
• Communication cost associated with data transmission among nodes in distributed
database systems.

BY, PROF. KAJAL PANDYA 13


Unit Title: Distributed Databases

• CPU time cost associated with the processing overhead of managing distributed
transactions.
Although costs are often classified either as communication or processing costs, it is
difficult to separate the two. Not all query optimization algorithms use the same
parameters, and not all algorithms assign the same weight to each parameter. For
example, some algorithms minimize total time, others minimize the communication
time, and still others do not factor in the CPU time, considering its cost insignificant
relative to other costs.
A centralized database evaluates every data request to find the most-efficient way to
access the data. This is a reasonable requirement, considering that all data are locally
stored and all active transactions, working on the data are known to the central DBMS.
In contrast, in a DDBMS, transactions are distributed among multiple nodes;
therefore, determining what data are being used becomes more complex. Hence,
resolving data requests in a distributed data environment must take the
following points into consideration:
• Data distribution. In a DDBMS, query translation is more complicated because the
DDBMS must decide which fragment to access. (Distribution transparency was
explained earlier in this chapter.) In this case, a TP executing a query must choose
what fragments to access, create multiple data requests to the chosen remote DPs,
combine the DP responses, and present the data to the application.
• Data replication. In addition, the data may also be virtual at several different sites.
The data replication makes the access problem even more complex because the
database must ensure that all copies of the data are consistent. Therefore, an
important characteristic of query optimization in distributed database systems is that
it must provide replica transparency. Replica transparency refers to the DDBMS’s
ability to hide multiple copies of data from the user. This ability is particularly

BY, PROF. KAJAL PANDYA 14


Unit Title: Distributed Databases

important with data update operations. If a read-only request is being processed, it


can be satisfied by accessing any available remote DP.
• Network and node availability. The response time associated with remote sites
cannot be easily prearranged because some nodes finish their part of the query in
less time than others and network path performance differ because of bandwidth and
traffic loads. Hence, to achieve performance transparency, the DDBMS should
consider issues such as network latency, the delay imposed by the amount of time
required for a data packet to make a round trip from point A to point B; or network
partitioning, the delay imposed when nodes become suddenly unavailable due to a
network failure.

BY, PROF. KAJAL PANDYA 15

You might also like