0% found this document useful (0 votes)
16 views28 pages

Topic 7 DDBMS

Uploaded by

2022646448
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Topic 7 DDBMS

Uploaded by

2022646448
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

DISTRIBUTED DATABASE

MANAGEMENT SYSTEMS

M ISS NOORFADZ IL AH ARIF IN


FACULT Y OF COM PUT ER AND M AT HEM AT ICAL SCIENCES
UIT M KEL ANTAN
Objectives
At the end of this lesson, you be able to :
▪ understand the difference between centralized and distributed database
▪ explain the advantages and disadvantages of distributed database system
▪ understand the difference between distributed processing and distributed database
▪ explain the characteristics of DDBMS
▪ understand what are fragmentation, location and local mapping transparency
▪ describe the horizontal, vertical and mixed fragmentation of database design
▪ explain full, partial and non-replicated replication of database design
▪ explain the data allocation of database design
Introduction
❖ Originally, there is a single database which located in a single site and stored all the data. This
is known as Centralized database.

Site 1

Site 2

Site 3 Site 4
Centralized database and single processing

Centralized database and multiple processing


Problems with Centralized Database
❖ Performance degradation - as number of remote sites grew
❖ High maintenance cost to maintain large central (mainframe) database
system and physical infrastructure
❖ Reliability problem since there is only one central database and the need for
data replication
❖ Bottleneck can occur to access a single database
❖ Data availability might not be very efficient
❖ Problem with no data access might occur
Distributed Database (DB)
Distributed database: spread the data out among
various sites on the distributed network. Each of
which has its own computer and data storage Site 1
facilities.
Site 2
Distributed processing: Database’s logical processing
is shared among two or more physically independent
sites via network
Distributed database management system (DDBMS): Site 3 Site 4
software that manages distributed database (DB) and
provides an access mechanism that makes the
distribution transparent to users. Distributed database and distributed processing
Why DDBMS becomes more popular?
Acceptance of Internet as a platform for business

Mobile wireless revolution

Usage of application as a service

Focus on mobile business intelligence


Characteristics of DDBMS
A DBMS must have at least the following functions to be classified as distributed:
❖ Application interface to interact with the end user, application programs, and other DBMSs
within the distributed database.
❖ Validation to analyse data requests for syntax correctness.
❖ Transformation to decompose complex requests into atomic data request components.
❖ Query optimization to find the best access strategy. (Which database fragments must be
accessed by the query, and how must data updates, if any, be synchronized?)
❖ Mapping to determine the data location of local and remote fragments.
❖ I/O interface to read or write data from or to permanent local storage.
Characteristics of DDBMS
❖ Formatting to prepare the data for presentation to the end user or to an application
program.
❖ Security to provide data privacy at both local and remote databases.
❖ Backup and recovery to ensure the availability and recoverability of the database in case
of a failure.
❖ DB administration features for the database administrator.
❖ Concurrency control to manage simultaneous data access and to ensure data consistency
across database fragments in the DDBMS.
❖ Transaction management to ensure that the data moves from one consistent state to
another.
❖ Must perform all the functions of a centralized DBMS
❖ Must handle all necessary functions imposed by the distribution of data and processing
❖ Must perform any additional functions transparently to the end user
DDBMS Components
❖ Computer workstations or remote devices (sites or nodes) that form the network
system.
❖ Network hardware and software components that reside in each workstation or
device. The network components allow all sites to interact and exchange data.
❖ Communications media that carry the data from one node to another. The DDBMS
must be communications media-independent; that is, it must be able to support
several types of communications media.
❖ The transaction processor (TP), which is the software component found in each
computer or device that requests data. The transaction processor receives and
processes the application’s data requests (remote and local). The TP is also known as
the application processor (AP) or the transaction manager (TM).
❖ The data processor (DP), which is the software component residing on each
computer or device that stores and retrieves data located at the site. The DP is also
known as the data manager (DM). A data processor may even be a centralized DBMS.
Functions of DDBMS
❖ Receives the request of an application
❖ Validates analyzes, and decomposes the request
❖ Maps the request
❖ Decomposes request into several I/O operations
❖ Searches and validates data
❖ Ensures consistency, security, and integrity
❖ Validates data for specific conditions
❖ Presents data in required format
Restrictions of DDBMS
❖ Remote access is provided on a read-only basis
❖ Restrictions on the number of remote tables that may be accessed in a single
transaction
❖ Restrictions on the number of distinct databases that may be accessed
❖ Restrictions on the database model that may be accessed
Advantages and Disadvantages of DDBMS

Advantages Disadvantages
• Data are located near greatest demand • Complexity of management and control
site • Technological difficulty
• Faster data access and processing • Security
• Growth facilitation • Lack of standards
• Improved communications • Increased storage and infrastructure
• Reduced operating costs requirements
• User-friendly interface • Increased training cost
• Less danger of a single-point failure • Costs incurred due to the requirement of
• Processor independence duplicated infrastructure
Distributed Database Transparency
❖ Distribution transparency : user does not know where data is located and if
replicated or partitioned
❖ Transaction transparency : transaction can update at several network sites to ensure
data integrity. Ensures transaction will be either entirely completed or aborted
❖ Failure transparency : ensures system continues to operate in the event of a node
failure (other nodes pick up lost functionality)
❖ Performance transparency : allows system to perform as if it were a centralized
DBMS. No performance degradation due to use of a network or platform differences.
Ensures system will find the most cost-effective path to access remote data.
❖ Heterogeneity transparency : allows the integration of several different local
DBMSs under a common schema
Distribution Transparency
❖ Allows management of a physically dispersed database as though it were a
centralized database
❖ Supported by a distributed data dictionary (DDD) which contains the description
of the entire database as seen by the DBA - The DDD is itself distributed and
replicated at the network nodes
❖ There are 3 levels of transparency which are fragmentation transparency,
location transparency and local mapping transparency
1. Fragmentation transparency – user does not need to know if a database is partitioned;
fragment names and/or fragment locations are not needed​
2. Location transparency – fragment name is required but location is not required​
3. Local mapping transparency – user must specify fragment name and location
As an example shown in previous slide:
Suppose we want to find all employees with a birthdate prior to Jan 1, 1940

Fragmentation transparency-
SELECT * FROM EMPLOYEE WHERE EMP_DOB < ’01-JAN-1940’;

Location transparency-
SELECT * FROM E1 WHERE EMP_DOB < ’01-JAN-1940’
UNION
SELECT * FROM E2 WHERE EMP_DOB < ’01-JAN-1940’
UNION
SELECT * FROM E3 WHERE EMP_DOB < ’01-JAN-1940’

Local Mapping Transparency-


SELECT * FROM E1 NODE NY WHERE EMP_DOB < ’01-JAN-1940’
UNION
SELECT * FROM E2 NODE ATL WHERE EMP_DOB < ’01-JAN-1940’
UNION
SELECT * FROM E3 NODE MIA WHERE EMP_DOB < ’01-JAN-1940’
Distributed Database Design
Data fragmentation

• How to partition database into fragments

Data replication

• Which fragments to replicate

Data allocation

• Where to locate those fragments and replicas


Data Fragmentation
❖ Breaks single object into two or more segments or fragments

❖ Each fragment can be stored at any site over a computer network

❖ Information about data fragmentation is stored in the distributed data catalog


(DDC), from which it is accessed by the TP to process user requests
Data Fragmentation Strategies / Types
Horizontal fragmentation:
◦ Division of a relation into subsets (fragments) of tuples
(rows)
◦ Same Schema

Vertical fragmentation:
◦ Division of a relation into attribute (column) subsets
◦ Each fragment must contain the primary key

Mixed fragmentation:
◦ Combination of horizontal and vertical strategies
Example :
Horizontal Fragmentation of the CUSTOMER Table
by State
Vertically Fragmented Table Contents

Two separate areas in the company use different fields of the table in the daily activities – the SERVICE dept and the
COLLECTIONS dept
Mixed Fragmentation of the CUSTOMER Table

The table is divided horizontally by the three states and within each state there is a vertical fragmentation by
department
Table Content After the Mixed Fragmentation Process
Correctness Rules
Completeness
◦ Any data item must be covered by at least one fragment
Reconstruction
◦ There exist relation operators (select, project, join…) to reconstruct the original table from
fragments
◦ Horizontal fragmentation - Union
◦ Vertical fragmentation - Join
◦ Hybrid fragmentation - combination of relational operators
Disjointness
◦ Remove duplicate data
◦ Horizontal fragmentation - a row cannot appear in more than one fragment.
◦ Vertical fragmentation – non-key attributes cannot appear in more than one fragment.
Data Replication
❖ Storage of data copies at multiple sites served by
a computer network
❖ Fragment copies can be stored at several sites to
serve specific information requirements
❖ Can enhance data availability and response time
❖ Can help to reduce communication and total
query costs
❖ Imposes additional processing overhead
❖ Which copy do you read when submitting a query
❖ All copies must be updated when a write occurs
Data Replication Scenarios

❖ Fully replicated database:


Stores multiple copies of each database fragment at multiple sites
Can be impractical due to amount of overhead
❖ Partially replicated database:
Stores multiple copies of some database fragments at multiple sites
Most DDBMSs are able to handle the partially replicated database well

❖ Un-replicated database:
Stores each database fragment at a single site
No duplicate database fragments
Database size, usage frequency and costs (performance, overhead, management)
influence the decision to replicate
Data Allocation
Deciding where to locate data: Data distribution over a computer network is achieved
through data partition, data replication, or a combination of both

Allocation strategies:
❖Centralized data allocation
◦ Entire database is stored at one site

❖Partitioned data allocation


◦ Database is divided into several disjointed parts (fragments) and stored at several sites

❖Replicated data allocation


◦ Copies of one or more database fragments are stored at several sites

You might also like