Distributed DBMS (Good)
Distributed DBMS (Good)
ManagementChapter
Systems
10
In this chapter, you will
learn:
What a distributed database management
system (DDBMS) is and what its components
are
How database implementation is affected by
different levels of data and process distribution
How transactions are managed in a distributed
database environment
How database design is affected by the
distributed database environment
The Evolution of
Distributed Database
Management Systems
Distributed database management
system (DDBMS)
Dynamic business environment and
centralized database’s shortcomings
spawned a demand for applications
based on data access from different
sources at multiple locations
Centralized Database
Management System
DDBMS Advantages
Data are located near “greatest demand” site
Faster data access
Faster data processing
Growth facilitation
Improved communications
Reduced operating costs
User-friendly interface
Less danger of a single-point failure
Processor independence
DDBMS Disadvantages
Complexity of management and control
Security
Lack of standards
Increased storage requirements
Greater difficulty in managing the data
environment
Increased training cost
Distributed Processing
Environment
Distributed Database
Environment
Characteristics of Distributed
Management Systems
Application interface
Validation
Transformation
Query optimization
Mapping
I/O interface
Formatting
Security
Backup and recovery
DB administration
Concurrency control
Transaction management
Characteristics of
Distributed Management
Systems (continued)
Must perform all the functions of a
centralized DBMS
Must handle all necessary functions
imposed by the distribution of data
and processing
Must perform these additional
functions transparently to the end
user
A Fully Distributed
Database Management
System
DDBMS Components
Must include (at least) the following components:
Computer workstations
Network hardware and software
Communications media
Transaction processor (or, application processor, or
transaction manager)
Software component found in each computer that
requests data
Data processor or data manager
Software component residing on each computer that
stores and retrieves data located at the site
May be a centralized DBMS
Distributed Database
System Components
Database Systems: Levels
of Data and Process
Distribution
Single-Site Processing,
Single-Site Data (SPSD)
All processing is done on single CPU or host
computer (mainframe, midrange, or PC)
All data are stored on host computer’s local disk
Processing cannot be done on end user’s side of
the system
Typical of most mainframe and midrange computer
DBMSs
DBMS is located on the host computer, which is
accessed by dumb terminals connected to it
Also typical of the first generation of single-user
microcomputer databases
Single-Site Processing,
Single-Site Data
(Centralized)
Multiple-Site Processing,
Single-Site Data (MPSD)
Multiple processes run on different
computers sharing a single data
repository
MPSD scenario requires a network file
server running conventional applications
that are accessed through a LAN
Many multi-user accounting applications,
running under a personal computer
network, fit such a description
Multiple-Site Processing,
Single-Site Data
Multiple-Site Processing,
Multiple-Site Data (MPMD)
Fully distributed database management system
with support for multiple data processors and
transaction processors at multiple sites
Classified as either homogeneous or
heterogeneous
Homogeneous DDBMSs
Integrate only one type of centralized DBMS
over a network
Multiple-Site Processing,
Multiple-Site Data (MPMD) (continued)
Heterogeneous DDBMSs
Integrate different types of centralized DBMSs
over a network
Fully heterogeneous DDBMS
Support different DBMSs that may even support
different data models (relational, hierarchical, or
network) running under different computer
systems, such as mainframes and
microcomputers
Heterogeneous
Distributed
Database Scenario
Distributed Database
Transparency Features
Allow end user to feel like database’s only
user
Features include:
Distribution transparency
Transaction transparency
Failure transparency
Performance transparency
Heterogeneity transparency
Distribution Transparency
Allows management of a physically dispersed
database as though it were a centralized
database
Three levels of distribution transparency are
recognized:
Fragmentation transparency
Location transparency
Local mapping transparency
A Summary of
Transparency Features
Fragment Locations
Transaction Transparency
Ensures database transactions will
maintain distributed database’s
integrity and consistency
Distributed Requests and
Distributed Transactions
Distributed transaction
Can update or request data from several
different remote sites on a network
Remote request
Lets a single SQL statement access data to be
processed by a single remote database
processor
Remote transaction
Accesses data at a single remote site
Distributed Requests and
Distributed Transactions
(continued)
Distributed transaction
Allows a transaction to reference
several different (local or remote) DP
sites
Distributed request
Lets a single SQL statement
reference data located at several
different local or remote DP sites
A Remote Request
A Remote Transaction
A Distributed Transaction
A Distributed Request
Another Distributed
Request
Distributed Concurrency
Control
Multisite, multiple-process
operations are much more likely to
create data inconsistencies and
deadlocked transactions than are
single-site systems
The Effect of a Premature
COMMIT
Two-Phase Commit
Protocol
Distributed databases make it possible for a
transaction to access data at several sites
Final COMMIT must not be issued until all
sites have committed their parts of the
transaction
Two-phase commit protocol requires each
individual DP’s transaction log entry be written
before the database fragment is actually
updated
Performance
Transparency
and Query Optimization
Objective of query optimization routine
is to minimize total cost associated
with the execution of a request
Costs associated with a request are a
function of the:
Access time (I/O) cost
Communication cost
CPU time cost
Performance Transparency
and Query Optimization (continued)
Must provide distribution transparency as well as
replica transparency
Replica transparency:
DDBMS’s ability to hide the existence of multiple
copies of data from the user
Query optimization techniques:
Manual or automatic
Static or dynamic
Statistically based or rule-based algorithms
Distributed Database
Design
Data fragmentation:
How to partition the database into fragments
Data replication:
Which fragments to replicate
Data allocation:
Where to locate those fragments and replicas
Data Fragmentation
Breaks single object into two or more
segments or fragments
Each fragment can be stored at any site over
a computer network
Information about data fragmentation is
stored in the distributed data catalog (DDC),
from which it is accessed by the TP to
process user requests
Data Fragmentation
Strategies
Horizontal fragmentation:
Division of a relation into subsets (fragments)
of tuples (rows)
Vertical fragmentation:
Division of a relation into attribute (column)
subsets
Mixed fragmentation:
Combination of horizontal and vertical
strategies
A Sample CUSTOMER
Table
Horizontal Fragmentation
of the CUSTOMER Table
by State
Table Fragments in Three
Locations
Vertically Fragmented
Table Contents
Mixed Fragmentation of
the
CUSTOMER Table
Data Replication
Storage of data copies at multiple sites served
by a computer network
Fragment copies can be stored at several sites
to serve specific information requirements
Features a user of resources, or a client, and
a provider of resources, or a server
Can be used to implement a DBMS in which
the client is the TP and the server is the DP
Client/Server Advantages
Less expensive than alternate minicomputer or
mainframe solutions
Allow end user to use microcomputer’s GUI, thereby
improving functionality and simplicity
More people with PC skills than with mainframe
skills in the job market
PC is well established in the workplace
Numerous data analysis and query tools exist to
facilitate interaction with DBMSs available in the PC
market
Considerable cost advantage to offloading
applications development from the mainframe to
powerful PCs
Client/Server Disadvantages
Creates a more complex environment, in which
different platforms (LANs, operating systems,
and so on) are often difficult to manage
An increase in the number of users and
processing sites often paves the way for security
problems
Possible to spread data access to a much wider
circle of users increases demand for people
with broad knowledge of computers and
software increases burden of training and cost
of maintaining the environment
C. J. Date’s Twelve
Commandments for
1.
Distributed
Local site independence
Databases
2. Central site independence
3. Failure independence
4. Location transparency
5. Fragmentation transparency
6. Replication transparency
7. Distributed query processing
8. Distributed transaction processing
9. Hardware independence
10. Operating system independence
11. Network independence
12. Database independence
Summary
Distributed database stores logically related
data in two or more physically independent
sites connected via a computer network
Database is divided into fragments
Distributed databases require distributed
processing
Main components of a DDBMS are the
transaction processor and the data processor
Summary (continued)
Current database systems can be classified by
extent to which they support processing and data
distribution
DDBMS characteristics are best described as a
set of transparencies
A transaction is formed by one or more database
requests
A database can be replicated over several
different sites on a computer network
Client/server architecture refers to the way in
which two computers interact over a computer
network to form a system