0% found this document useful (0 votes)
173 views

Distributed DBMS (Good)

This document discusses distributed database management systems (DDBMS). It begins by defining a DDBMS and its components. It then covers different levels of data and process distribution, including single-site processing/single-site data, multiple-site processing/single-site data, and multiple-site processing/multiple-site data systems. The document also discusses transaction transparency, distributed transactions, data fragmentation, replication, and allocation strategies for distributed database design.

Uploaded by

Lakhveer Kaur
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
173 views

Distributed DBMS (Good)

This document discusses distributed database management systems (DDBMS). It begins by defining a DDBMS and its components. It then covers different levels of data and process distribution, including single-site processing/single-site data, multiple-site processing/single-site data, and multiple-site processing/multiple-site data systems. The document also discusses transaction transparency, distributed transactions, data fragmentation, replication, and allocation strategies for distributed database design.

Uploaded by

Lakhveer Kaur
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

Distributed Database

ManagementChapter
Systems
10
In this chapter, you will
learn:

What a distributed database management
system (DDBMS) is and what its components
are

How database implementation is affected by
different levels of data and process distribution

How transactions are managed in a distributed
database environment

How database design is affected by the
distributed database environment
The Evolution of
Distributed Database
Management Systems

Distributed database management
system (DDBMS)

 Governs storage and processing of


logically related data over
interconnected computer systems
in which both data and processing
functions are distributed among
several sites
The Evolution of Distributed
Database Management Systems
(continued)

Centralized database required that
corporate data be stored in a single
central site


Dynamic business environment and
centralized database’s shortcomings
spawned a demand for applications
based on data access from different
sources at multiple locations
Centralized Database
Management System
DDBMS Advantages

Data are located near “greatest demand” site

Faster data access

Faster data processing

Growth facilitation

Improved communications

Reduced operating costs

User-friendly interface

Less danger of a single-point failure

Processor independence
DDBMS Disadvantages

Complexity of management and control

Security

Lack of standards

Increased storage requirements

Greater difficulty in managing the data
environment

Increased training cost
Distributed Processing
Environment
Distributed Database
Environment
Characteristics of Distributed
Management Systems

Application interface

Validation

Transformation

Query optimization

Mapping

I/O interface

Formatting

Security

Backup and recovery

DB administration

Concurrency control

Transaction management
Characteristics of
Distributed Management
Systems (continued)

Must perform all the functions of a
centralized DBMS


Must handle all necessary functions
imposed by the distribution of data
and processing


Must perform these additional
functions transparently to the end
user
A Fully Distributed
Database Management
System
DDBMS Components

Must include (at least) the following components:
 Computer workstations
 Network hardware and software
 Communications media
 Transaction processor (or, application processor, or
transaction manager)
 Software component found in each computer that
requests data
 Data processor or data manager
 Software component residing on each computer that
stores and retrieves data located at the site
 May be a centralized DBMS
Distributed Database
System Components
Database Systems: Levels
of Data and Process
Distribution
Single-Site Processing,
Single-Site Data (SPSD)

All processing is done on single CPU or host
computer (mainframe, midrange, or PC)

All data are stored on host computer’s local disk

Processing cannot be done on end user’s side of
the system

Typical of most mainframe and midrange computer
DBMSs

DBMS is located on the host computer, which is
accessed by dumb terminals connected to it

Also typical of the first generation of single-user
microcomputer databases
Single-Site Processing,
Single-Site Data
(Centralized)
Multiple-Site Processing,
Single-Site Data (MPSD)

Multiple processes run on different
computers sharing a single data
repository

MPSD scenario requires a network file
server running conventional applications
that are accessed through a LAN

Many multi-user accounting applications,
running under a personal computer
network, fit such a description
Multiple-Site Processing,
Single-Site Data
Multiple-Site Processing,
Multiple-Site Data (MPMD)

Fully distributed database management system
with support for multiple data processors and
transaction processors at multiple sites

Classified as either homogeneous or
heterogeneous

Homogeneous DDBMSs
 Integrate only one type of centralized DBMS
over a network
Multiple-Site Processing,
Multiple-Site Data (MPMD) (continued)

Heterogeneous DDBMSs
 Integrate different types of centralized DBMSs
over a network

Fully heterogeneous DDBMS
 Support different DBMSs that may even support
different data models (relational, hierarchical, or
network) running under different computer
systems, such as mainframes and
microcomputers
Heterogeneous
Distributed
Database Scenario
Distributed Database
Transparency Features

Allow end user to feel like database’s only
user

Features include:
 Distribution transparency
 Transaction transparency
 Failure transparency
 Performance transparency
 Heterogeneity transparency
Distribution Transparency

Allows management of a physically dispersed
database as though it were a centralized
database

Three levels of distribution transparency are
recognized:
 Fragmentation transparency
 Location transparency
 Local mapping transparency
A Summary of
Transparency Features
Fragment Locations
Transaction Transparency

Ensures database transactions will
maintain distributed database’s
integrity and consistency
Distributed Requests and
Distributed Transactions

Distributed transaction
 Can update or request data from several
different remote sites on a network

Remote request
 Lets a single SQL statement access data to be
processed by a single remote database
processor

Remote transaction
 Accesses data at a single remote site
Distributed Requests and
Distributed Transactions
(continued)

Distributed transaction
 Allows a transaction to reference
several different (local or remote) DP
sites


Distributed request
 Lets a single SQL statement
reference data located at several
different local or remote DP sites
A Remote Request
A Remote Transaction
A Distributed Transaction
A Distributed Request
Another Distributed
Request
Distributed Concurrency
Control

Multisite, multiple-process
operations are much more likely to
create data inconsistencies and
deadlocked transactions than are
single-site systems
The Effect of a Premature
COMMIT
Two-Phase Commit
Protocol

Distributed databases make it possible for a
transaction to access data at several sites

Final COMMIT must not be issued until all
sites have committed their parts of the
transaction

Two-phase commit protocol requires each
individual DP’s transaction log entry be written
before the database fragment is actually
updated
Performance
Transparency
and Query Optimization

Objective of query optimization routine
is to minimize total cost associated
with the execution of a request

Costs associated with a request are a
function of the:
 Access time (I/O) cost
 Communication cost
 CPU time cost
Performance Transparency
and Query Optimization (continued)

Must provide distribution transparency as well as
replica transparency

Replica transparency:
 DDBMS’s ability to hide the existence of multiple
copies of data from the user

Query optimization techniques:
 Manual or automatic
 Static or dynamic
 Statistically based or rule-based algorithms
Distributed Database
Design

Data fragmentation:
 How to partition the database into fragments


Data replication:
 Which fragments to replicate


Data allocation:
 Where to locate those fragments and replicas
Data Fragmentation

Breaks single object into two or more
segments or fragments

Each fragment can be stored at any site over
a computer network

Information about data fragmentation is
stored in the distributed data catalog (DDC),
from which it is accessed by the TP to
process user requests
Data Fragmentation
Strategies

Horizontal fragmentation:
 Division of a relation into subsets (fragments)
of tuples (rows)

Vertical fragmentation:
 Division of a relation into attribute (column)
subsets

Mixed fragmentation:
 Combination of horizontal and vertical
strategies
A Sample CUSTOMER
Table
Horizontal Fragmentation
of the CUSTOMER Table
by State
Table Fragments in Three
Locations
Vertically Fragmented
Table Contents
Mixed Fragmentation of
the
CUSTOMER Table
Data Replication

Storage of data copies at multiple sites served
by a computer network


Fragment copies can be stored at several sites
to serve specific information requirements

 Can enhance data availability and response time

 Can help to reduce communication and total


query costs
Table Contents After the
Mixed Fragmentation
Process
Data Replication
Replication Scenarios

Fully replicated database:
 Stores multiple copies of each database
fragment at multiple sites
 Can be impractical due to amount of overhead

Partially replicated database:
 Stores multiple copies of some database
fragments at multiple sites
 Most DDBMSs are able to handle the partially
replicated database well

Unreplicated database:
 Stores each database fragment at a single
site
 No duplicate database fragments
Data Allocation

Deciding where to locate data

Allocation strategies:
 Centralized data allocation
 Entire database is stored at one site
 Partitioned data allocation
 Database is divided into several disjointed parts
(fragments) and stored at several sites
 Replicated data allocation
 Copies of one or more database fragments are
stored at several sites

Data distribution over a computer network is
achieved through data partition, data
replication, or a combination of both
Client/Server vs. DDBMS

Way in which computers interact to form a
system


Features a user of resources, or a client, and
a provider of resources, or a server


Can be used to implement a DBMS in which
the client is the TP and the server is the DP
Client/Server Advantages

Less expensive than alternate minicomputer or
mainframe solutions

Allow end user to use microcomputer’s GUI, thereby
improving functionality and simplicity

More people with PC skills than with mainframe
skills in the job market

PC is well established in the workplace

Numerous data analysis and query tools exist to
facilitate interaction with DBMSs available in the PC
market

Considerable cost advantage to offloading
applications development from the mainframe to
powerful PCs
Client/Server Disadvantages

Creates a more complex environment, in which
different platforms (LANs, operating systems,
and so on) are often difficult to manage

An increase in the number of users and
processing sites often paves the way for security
problems

Possible to spread data access to a much wider
circle of users increases demand for people
with broad knowledge of computers and
software increases burden of training and cost
of maintaining the environment
C. J. Date’s Twelve
Commandments for
1.
Distributed
Local site independence
Databases
2. Central site independence
3. Failure independence
4. Location transparency
5. Fragmentation transparency
6. Replication transparency
7. Distributed query processing
8. Distributed transaction processing
9. Hardware independence
10. Operating system independence
11. Network independence
12. Database independence
Summary

Distributed database stores logically related
data in two or more physically independent
sites connected via a computer network

Database is divided into fragments

Distributed databases require distributed
processing

Main components of a DDBMS are the
transaction processor and the data processor
Summary (continued)

Current database systems can be classified by
extent to which they support processing and data
distribution

DDBMS characteristics are best described as a
set of transparencies

A transaction is formed by one or more database
requests

A database can be replicated over several
different sites on a computer network

Client/server architecture refers to the way in
which two computers interact over a computer
network to form a system

You might also like