Distributed Databases
Distributed Databases
Chapter 12
Distributed Database
Management Systems
Database Systems:
Design, Implementation, and Management,
Seventh Edition, Rob and Coronel
16/10/2022
1
3/12/2024
16/10/2022
Distributed Processing
and Distributed Databases
• Distributed processing
– Database’s logical processing is shared among two or
more physically independent sites
– Connected through a network
– For example, the data input/output (I/O), data selection, and data validation might
be performed on one computer, and a report based on that data might be
created on another computer
• Distributed database
– Stores logically related database over two or more
physically independent sites
– Database composed of database fragments
4
16/10/2022
2
3/12/2024
16/10/2022
16/10/2022
3
3/12/2024
DDBMS Advantages
• Advantages include:
– Data are located near “greatest demand” site
– Faster data access
– Faster data processing
– Growth facilitation: New sites can be added to the network without affecting
the operations of other sites.
– Improved communications: Because local sites are smaller and located
closer to customers
– Reduced operating costs: Add workstation not mainframe
– User-friendly interface
– Less danger of a single-point failure
– Processor independence: end user is able to access any available copy
of the data, and an end user’s request is processed by any processor at the data
location.
16/10/2022
DDBMS Disadvantages
• Disadvantages include:
– Complexity of management and control
– Security
– Lack of standards
– Increased storage requirements: Multiple copies of data are
required at different sites
16/10/2022
4
3/12/2024
Characteristics of Distributed
Management Systems
• Application interface: interact with the end user, application programs, and other DBMSs
• Validation: to analyze data requests for syntax correctness
• Transformation: to decompose complex requests into atomic data request components
• Query optimization: to find the best access strategy
• Mapping: to determine the data location of local and remote fragments
• I/O interface: to read or write data from or to permanent local storage
• Formatting: to prepare the data for presentation to the end user or to an application program
• Security: to provide data privacy at both local and remote databases
• Backup and recovery: to ensure the availability and recoverability of DB in case of a failure
• DB administration
• Concurrency control: to manage simultaneous data access and to ensure data consistency
• Transaction management: to ensure that the data moves from one consistent state to another
16/10/2022
16/10/2022
5
3/12/2024
Characteristics of Distributed
Management Systems (continued)
16/10/2022
16/10/2022
6
3/12/2024
16/10/2022
16/10/2022
7
3/12/2024
Single-Site Processing,
Single-Site Data (SPSD)
• All processing is done on single CPU or host computer
(mainframe, midrange, or PC)
• All data are stored on host computer’s local disk
• Processing cannot be done on end user’s side of
system. several processes to run concurrently on a host
computer accessing a single DP
• Typical of most mainframe and midrange computer
DBMSs
• DBMS is located on host computer, which is accessed
by dumb terminals connected to it
15
16/10/2022
16
16/10/2022
8
3/12/2024
Multiple-Site Processing,
Single-Site Data (MPSD)
• Multiple processes run on different computers sharing
single data repository
• MPSD scenario requires network file server running
conventional applications that are accessed through LAN
• Many multiuser accounting applications, running under
personal computer network, fit such a description
16/10/2022
SELECT *
FROM CUSTOMER
WHERE CUS_BALANCE > 1000;
All 10,000 CUSTOMER rows must travel through the network to be evaluated at site A, even if
50 of them have balances greater than $1,000
Client/server
architecture is similar
to that of the network
file server except that
all database
processing is done at
the server site, thus
reducing network
traffic.
16/10/2022
9
3/12/2024
Multiple-Site Processing,
Multiple-Site Data (MPMD)
• Fully distributed database management system with
support for multiple data processors and transaction
processors at multiple sites
• Classified as either homogeneous or heterogeneous
• Homogeneous DDBMSs
– Integrate only one type of centralized DBMS over a
network
16/10/2022
Multiple-Site Processing,
Multiple-Site Data (MPMD) (continued)
• Heterogeneous DDBMSs
– Integrate different types of centralized DBMSs over a
network
• Fully heterogeneous DDBMS
– Support different DBMSs that may even support
different data models (relational, hierarchical, or
network) running under different computer systems,
such as mainframes and microcomputers
16/10/2022
10
3/12/2024
16/10/2022
Distributed Database
Transparency Features
• Allow end user to feel like database’s only user
• Features include:
– Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency
– Heterogeneity transparency
16/10/2022
11
3/12/2024
Distribution Transparency
16/10/2022
Transaction Transparency
24
16/10/2022
12
3/12/2024
25
16/10/2022
16/10/2022
13
3/12/2024
16/10/2022
16/10/2022
14
3/12/2024
16/10/2022
16/10/2022
15
3/12/2024
Performance Transparency
• Objective of query optimization routine is to minimize total cost
associated with execution of request
• Costs associated with request are function of:
– Access time (I/O) cost
– Communication cost
– CPU time cost
• Must provide:
– distribution transparency: Allows management of physically
dispersed database as though it were a centralized
database
– Replica transparency: DDBMS’s ability to hide existence of
multiple copies of data from user
16/10/2022
16/10/2022
16
3/12/2024
Data Fragmentation
16/10/2022
• Strategies
– Horizontal fragmentation
• Division of a relation into subsets (fragments) of tuples (rows)
• Each fragment represents the equivalent of a SELECT statement,
with the WHERE clause on a single attribute.
– Vertical fragmentation
• Division of a relation into attribute (column) subsets
• This is the equivalent of the PROJECT statement in SQL.
– Mixed fragmentation
• Combination of horizontal and vertical strategies
• A table may be divided into several horizontal subsets (rows), each
one having a subset of the attributes (columns).
16/10/2022
17
3/12/2024
16/10/2022
16/10/2022
18
3/12/2024
Each horizontal fragment may have a different number of rows, but each
fragment must have the same attributes.
16/10/2022
16/10/2022
19
3/12/2024
Each vertical fragment must have the same number of rows, but the inclusion
of the different attributes depends on the key column
16/10/2022
16/10/2022
20
3/12/2024
16/10/2022
Data Replication
16/10/2022
21
3/12/2024
16/10/2022
• Replication scenarios
– Fully replicated database
• Stores multiple copies of each database fragment at multiple sites
• Can be impractical due to amount of overhead
– Partially replicated database
• Stores multiple copies of some database fragments at multiple
sites
• Most DDBMSs are able to handle the partially replicated database
well
– Unreplicated database
• Stores each database fragment at single site
• No duplicate database fragments
16/10/2022
22
3/12/2024
Data Allocation
• Deciding where to locate data: which data to locate where
• Allocation strategies
– Centralized data allocation
• Entire database is stored at one site
– Partitioned data allocation
• Database is divided into several disjointed parts (fragments) and stored
at several sites
16/10/2022
23
3/12/2024
16/10/2022
16/10/2022
24