Chapter 4
Distributed
Database Systems
Chapter 4 - Objectives
Basic Concepts in Distributed Database System.
Advantages and disadvantages of distributed
databases.
Functions and architecture for a DDBMS.
Distributed Database Design issues.
Levels of DDBMS Transparency.
Rules for DDBMSs.
2 Distributed Database Systems 04/13/2024
Basic Concepts of Distributed DBs
Distributed Database
A logically interrelated collection of shared data (and a
description of data), physically distributed over a
computer network.
Distributed DBMS
Software system that permits the management of the
distributed database and makes the distribution
transparent to users.
Distributed DBMSs should help resolve the islands of
information problem in organizations.
3 04/13/2024
Distributed Databases & Distributed Computing
Distributed databases bring the advantages of
distributed computing to the database domain.
A distributed computing system consists of a
number of processing sites or nodes that are
interconnected by a computer network and that
cooperate in performing certain assigned tasks
Distributed computing systems partition a big,
unmanageable problem into smaller pieces to solve it
efficiently in a coordinated manner.
Hence, more computing power is harnessed to solve a
complex task, with the cooperation between the
4 independent nodes. 04/13/2024
Basic Concepts of Distributed DBs cont’d…
Collection of logically-related shared data.
Data split into fragments.
Fragments may be replicated.
Fragments/replicas allocated to sites.
Sites linked by a communications network.
Data at each site is under control of a local DBMS.
DBMSs handle local applications autonomously.
Each DBMS participates in at least one global
application.
5 04/13/2024
Distributed DBMS Architecture
6 04/13/2024
Data distribution and replication among
distributed databases – an example
7 04/13/2024
Distributed Processing
A centralized database that can be accessed over a
computer network. (this is not a distributed database)
8 04/13/2024
Parallel DBMS
A DBMS running across multiple processors and
disks designed to execute operations in parallel,
whenever possible, to improve performance.
Based on premise that single processor systems can
no longer meet requirements for cost-effective
scalability, reliability, and performance.
Parallel DBMSs link multiple, smaller machines to
achieve same throughput as single, larger machine,
with greater scalability and reliability.
9 04/13/2024
Parallel DBMS
Parallel technology is typically used for
very large databases possibly of the order
of terabytes (1012 bytes), or systems that
have to process thousands of transactions
per second.
Also note that most DBMS vendors have
a parallel DMBS version of their
products.
Also, this is not a distributed database
systems.
10 04/13/2024
Parallel DBMS
Main architectures for parallel DBMSs are:
Shared memory,
This architecture provides high-speed data access for a limited
number of processors, but it is not scalable beyond about 64
processors, at which point the interconnection network becomes
a bottleneck
Shared disk,
Architecture optimized for applications that are inherently
centralized and require high availability and performance
Shared nothing.
Often known as massively parallel processing (MPP), is a
multiple-processor architecture in which each processor is part
of a complete system, with its own memory and disk storage
11 04/13/2024
Parallel DBMS
(a) shared memory
(b) shared disk
(c) shared nothing
12 04/13/2024
Multi- Database System (MDBS)
MDBS -A distributed DBMS in which each site maintains
complete autonomy
Simply speaking , MDBS is a DBMS that resides
transparently on top of existing database and file systems,
and presents a single database to its users
MDBS attempt to logically integrate a number of
independent DDBMSs while allowing the local DBMSs to
maintain complete control of their operations.
If there is no provision for the local sites to function as a
standalone DBMS, then the system has no local
autonomy
For a centralized database, there is complete autonomy
13 but a total lack of distribution and heterogeneity. 04/13/2024
Classification of DDBMS
There are unfederated (where there are no local users)
and federated(there are local users) MDBSs.
A federated system is a cross between a distributed
DBMS and a centralized DBMS; it is a distributed
system for global users and a centralized system for
local
In General, Classification of DDBMS is based on three
important factors
Level of data distribution
Degree of local site autonomy
Extent of site Heterogeneity
14 04/13/2024
Advantages of DDBMSs
Reflects organizational structure
Improved shareability and local autonomy
Improved availability
Improved reliability
Improved performance
15 04/13/2024
Disadvantages of DDBMSs
Complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more complex
16 04/13/2024
Types of DDBMS( based on site heterogeneity )
Homogeneous DDBMS
Heterogeneous DDBMS
17 04/13/2024
Homogeneous DDBMS
All sites use same DBMS product.
Much easier to design and manage.
Approach provides incremental growth and allows
increased performance.
Usually are results of a new system being designed
18 04/13/2024
Heterogeneous DDBMS
Sites may run different DBMS products, with possibly
different underlying data models.
Occurs when sites have implemented their own
databases and integration is considered later.
Translations required to allow for:
Different hardware.
Different DBMS products.
Different hardware and different DBMS products.
Typical solution is to use gateways.
Gateways: convert the language and model of each
different DBMS into the language and model of the
relational system.
19 04/13/2024
Overview of Networking
Network - Interconnected collection of autonomous
computers, capable of exchanging information.
Local Area Network (LAN) intended for connecting
computers at same site.
Wide Area Network (WAN) used when computers
or LANs need to be connected over long distances.
WAN relatively slow and less reliable than LANs.
DDBMS using LAN provides much faster response
time than one using WAN.
LANs can be extended over a long geographic areas
20
using Virtual Private Networks(VPNs)
Distributed Database Systems 04/13/2024
Overview of Networking- Summary of WAN
and LAN
21 Distributed Database Systems 04/13/2024
Reference Architecture for DDBMS
Due to diversity, there is no accepted DDBMS
architecture equivalent to the ANSI/SPARC 3-level
architecture.
A reference architecture for DDBMSs consists of:
Set of global external schemas.
Global conceptual schema (GCS).
Fragmentation schema and allocation schema.
Set of schemas for each local DBMS conforming to 3-
level ANSI/SPARC architecture.
Some levels may be missing, depending on levels/type
of of transparency supported.
22 Distributed Database Systems 04/13/2024
Reference Architecture for DDBMS
23 Distributed Database Systems 04/13/2024
Functions of a DDBMS
Expect DDBMS to have at least the functionality of a
centralized DBMS, and .
Also to have the following functionality:
Extended communication services.
Extended Data Dictionary.
Distributed query processing.
Extended concurrency control.
Extended recovery services.
24 04/13/2024
Components of a DDBMS
25 04/13/2024
Distributed Database Design Issues
Three key issues need to be considered:
Fragmentation,
Allocation,
Replication.
26 04/13/2024
Distributed Database Design
Fragmentation
Relation may be divided into a number of sub-relations,
which are then distributed.
Allocation
Each fragment is stored at a site with “optimal”
distribution.
Replication
Copy of fragment may be maintained at several sites.
27 04/13/2024
Why Fragment?
Usage
Applications work with views rather than entire
relations.
Efficiency
Data is stored close to where it is most frequently used.
Data that is not needed by local applications is not
stored.
28 Distributed Database Systems 04/13/2024
Why Fragment?
Parallelism
With fragments as unit of distribution, transaction can
be divided into several subqueries( sub transactions)
that operate on the fragments.
Security
Data not required by local applications is not stored
and so not available to unauthorized users.
29 Distributed Database Systems 04/13/2024
Why Fragment?
Disadvantages
Performance: The performance of global applications
that require data from several fragments located at
different sites may be slower
Integrity: Integrity control may be more difficult if data
and functional dependencies are fragmented and
located at different sites.
30 Distributed Database Systems 04/13/2024
Fragmentation
Definition and allocation of fragments is carried out
strategically to achieve the following :
Locality of Reference.
Improved Reliability and Availability.
Improved Performance.
Balanced Storage Capacities and Costs.
Minimal Communication Costs.
Fragmentation involves analyzing most important
applications, based on quantitative/qualitative
information.
31 04/13/2024
Data Allocation
Four alternative strategies regarding placement of
data:
Centralized( Distributed Processing),
Partitioned (or Fragmented),
Complete Replication,
Selective Replication.
32 04/13/2024
Data Allocation
Centralized: Consists of single database and DBMS
stored at one site with users distributed across the
network.(Not really a distributed database)
Partitioned: Database partitioned into disjoint
fragments, each fragment assigned to one site.
Complete Replication: Consists of maintaining
complete copy of database at each site.
Selective Replication: Combination of partitioning,
replication, and centralization, based on the nature of
data
This is the most commonly used strategy because of
its flexibility.
33 04/13/2024
Comparison of Strategies for Data Distribution
34 04/13/2024
Correctness of Fragmentation
Three correctness rules:
Completeness,
Reconstruction,
Disjointness.
35 04/13/2024
Correctness of Fragmentation
Completeness
If relation R is decomposed into fragments R1, R2, ... Rn,
each data item that can be found in R must appear in at
least one fragment.
Reconstruction
It must be possible to define a relational operation
that will reconstruct R from the fragments.
Reconstruction for horizontal fragmentation is Union
operation and for vertical Natural Join is used.
36 04/13/2024
Correctness of Fragmentation
Disjointness
If data item di appears in fragment Ri, then it should
not appear in any other fragment.
Exception: vertical fragmentation, where primary
key attributes must be repeated to allow
reconstruction.
For horizontal fragmentation, data item is a tuple.
For vertical fragmentation, data item is an
attribute.
37 04/13/2024
Fragmentation Options
Fragmenting a relation should be done with
caution. The following are the fragmentation
possibilities for relations in a database
Horizontal
Vertical
Mixed
Derived
No Fragmentation
38 04/13/2024
Horizontal and Vertical Fragmentation
39 04/13/2024
Mixed Fragmentation
40 04/13/2024
Horizontal Fragmentation
Consists of a subset of the tuples of a relation.
Defined using Selection operation of relational
algebra:
p(R)
For example:
P1 = type=‘House’(PropertyForRent)
P2 = type=‘Flat’(PropertyForRent)
Reconstruction expression ?
41 04/13/2024
Vertical Fragmentation
Consists of a subset of attributes of a relation.
Defined using Projection operation of relational algebra:
a1, ... ,an(R)
For example:
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
Determined by establishing affinity of one attribute to
another.
Reconstruction expression ?
42 04/13/2024
Mixed Fragmentation
Consists of a horizontal fragment that is vertically
fragmented, or a vertical fragment that is horizontally
fragmented.
Defined using Selection and Projection operations of
relational algebra:
p(a1, ... ,an(R)) or
a1, ... ,an(σp(R))
43 04/13/2024
Example - Mixed Fragmentation
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
S21 = branchNo=‘B003’(S2)
S22 = branchNo=‘B005’(S2)
S23 = branchNo=‘B007’(S2)
Reconstruction expression ?
44 04/13/2024
Derived Horizontal Fragmentation
A horizontal fragment of a child relation that is based
on horizontal fragmentation of a parent
relation( primary key table).
Some applications may involve a join of two or more
relations.
If the relations are stored at different locations, there
may be a significant overhead in processing the join.
45 04/13/2024
Derived Horizontal Fragmentation
In such cases, it may be more appropriate to ensure
that the relations, or fragments of relations, are at
the same location
Ensures that fragments that are frequently joined
together are at the same site.
Defined using Semijoin operation of relational
algebra:
Ri = R F Si, 1iw
where w is the number of horizontal fragments defined
on S( the parent table) and f is the join attribute
46 04/13/2024
Example - Derived Horizontal Fragmentation
If we have staff fragments below,
S3 = branchNo=‘B003’(Staff)
S4 = branchNo=‘B005’(Staff)
S5 = branchNo=‘B007’(Staff)
We Could use derived fragmentation for Property:
Pi = PropertyForRent branchNo S i, 3i5
Hence, we have three fragments named P 3, P4 P5
Reconstruction expression ?
47 04/13/2024
Derived Horizontal Fragmentation
If a child relation contains more than one foreign
key, need to select one of the parent tables.
Choice can be based on fragmentation used most
frequently or fragmentation with better join
characteristics.
48 04/13/2024
No fragmentation
A final strategy is not to fragment a relation.
For example, the Branch relation contains only
a small number of tuples and is not updated
very frequently.
Rather than trying to horizontally fragment the
relation on branch number for example, it
would be more sensible to leave the relation
whole and simply replicate the Branch relation
at each site
49 04/13/2024
Distributed Database Design Methodology
1. Use normal methodology to produce a design for the
global relations.
2. Examine topology of system to determine where
databases will be located.
3. Analyse most important transactions and identify
appropriateness of horizontal/vertical fragmentation.
4. Decide which relations are not to be fragmented.
5. Examine relations on 1 side of relationships(Parent
Relations) and determine a suitable fragmentation
schema. Relations on many side (Child Relations) may
be suitable for derived horizontal fragmentation.
50 Distributed Database Systems 04/13/2024
Levels of Transparencies in a DDBMS
The objective of transparency is to make the
distributed system appear like a centralized system
for the user.
This is sometimes referred to as the
fundamental principle of distributed DBMSs
Four main Transparencies
Distribution transparency
Transaction transparency
Performance transparency
DBMS transparency
51 04/13/2024
Transparencies in a DDBMS – sub
transparencies
Distribution Transparency
Fragmentation Transparency
Location Transparency
Replication Transparency
Transaction Transparency
Concurrency Transparency
Failure Transparency
Performance Transparency
DBMS Transparency
52 Distributed Database Systems 04/13/2024
Distribution Transparency
Distribution transparency allows user to perceive database
as single, logical entity.
If DDBMS exhibits distribution transparency, the user has
freedom not to know the operational details of the network
and the placement of the data in the distributed system:
Fragmentation Transparency gives the user the freedom to be
unaware of the fact that data is fragmented (fragmentation
transparency),
This is the highest level of distribution transparency
local mapping transparency:. With local mapping transparency,
the user needs to specify both fragment names and the location
of data items, including replication sites if any.
This is the lowest level of distribution transparency
53 04/13/2024
Distribution Transparency
Location Transparency is the middle level of
distribution transparency.
With location transparency, the user must know
how the data has been fragmented but still does
not have to know the location of the data
With replication transparency, user is unaware
of replication of fragments – the existence of
floating copies of same data item at different
sites.
Closely related to location transparency
54 Distributed Database Systems 04/13/2024
Transaction Transparency
Ensures that all distributed transactions maintain
distributed database’s integrity and consistency.
Distributed transaction accesses data stored at
more than one location.
Each transaction is divided into number of sub
transactions, one for each site that has to be
accessed.
DDBMS must ensure the indivisibility (atomicity)
of both the global transaction and each of the sub
transactions.
55 Distributed Database Systems 04/13/2024
Example - Distributed Transaction
T prints out names of all staff, using schema defined
in the fragmentation example as S1, S2, S21, S22, and
S23.
DDBMS defines three sub-transactions TS3, TS5, and
TS7 to represent agents at sites 3, 5, and 7.
56 Distributed Database Systems 04/13/2024
Concurrency Transparency
All transactions must execute independently and be
logically consistent with results obtained if
transactions executed one at a time, in some
arbitrary serial order.
Same fundamental principles as for centralized
DBMS hold.
DDBMS must ensure both global and local
transactions do not interfere with each other.
Similarly, DDBMS must ensure consistency of all sub
transactions of a global transaction.
57 Distributed Database Systems 04/13/2024
Classification of Transactions
In IBM’s Distributed Relational Database
Architecture (DRDA), four types of distributed
transactions:
Remote request(Remote Query)
Remote unit of work( Remote Transaction)
Distributed unit of work( Distributed Transaction)
Distributed request.(Distributed Query)
In this context “Request” is a SQL Select Statement
(query) and “Unit of work” is a transaction that
manipulates the content of a database.
58 Distributed Database Systems 04/13/2024
Classification of Transactions
59 Distributed Database Systems 04/13/2024
Concurrency Transparency
Replication makes concurrency more complex.
If a copy of a replicated data item is updated, update
must be propagated to all copies.
Could propagate changes as part of original
transaction, making it an atomic operation.
However, if one site holding copy is not reachable,
then transaction is delayed until site is reachable.
60 Distributed Database Systems 04/13/2024
Concurrency Transparency
Could limit update propagation to only those sites
currently available. Remaining sites updated when
they become available again.( but update propagation
should be the first thing that such sites should do)
Could allow updates to copies to happen
asynchronously, sometime after the original update.
Delay in regaining consistency may range from a few
seconds to several hours.
61 Distributed Database Systems 04/13/2024
Failure Transparency
DDBMS must ensure atomicity and durability of global
transaction.
Which means, ensuring that sub transactions of global
transaction either all commit or all abort.
Thus, DDBMS must synchronize global transaction to
ensure that all sub transactions have completed
successfully before recording a final COMMIT for global
transaction.
Must do this in presence of site and network failures.
DDBMS Commit Protocols ( reading assignment)
Two phase commit(2PC)
Three Phase Commit(3PC)
62 Distributed Database Systems 04/13/2024
Performance Transparency
DDBMS must perform as if it were a centralized
DBMS.
DDBMS should not suffer any performance
degradation due to the “distributed
architecture”.
DDBMS should determine most cost-effective strategy
to execute a request.
63 Distributed Database Systems 04/13/2024
Performance Transparency
Distributed Query Processor (DQP) maps data
request into ordered sequence of operations on local
databases.
Must consider fragmentation, replication, and
allocation schemas.
DQP has to decide:
which fragment to access;
which copy of a fragment to use;
which location to use.
64 Distributed Database Systems 04/13/2024
Performance Transparency
DQP produces execution strategy optimized with
respect to some cost function.
Typically, costs associated with a distributed request
include:
I/O cost;
CPU cost;
Communication cost.
65 Distributed Database Systems 04/13/2024
DBMS transparency
With DBMS transparency, it should be possible to
have different DBMSs in the system with out
bothering the user to know about it.
DBMS transparency hides the knowledge that the local
DBMSs may be different and is therefore applicable
only to heterogeneous DDBMSs
66 Distributed Database Systems 04/13/2024
Date’s 12 Rules for a DDBMS
Fundamental Principle
To the user, a distributed system should look exactly like a
non-distributed system.
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence
67 Distributed Database Systems 04/13/2024
Date’s 12 Rules for a DDBMS
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence
Last four rules are ideals.
68 Distributed Database Systems 04/13/2024
Quiz #2
1. If Only one of the sub transactions failed to return either commit or
rollback, then the global transaction must commit ( True/False)
2. Given the following fragments of relation R ,Write the reconstruction
relational algebra expression
R1= p1(R), R2= p2(R), R3= L1(R2), R4= L2(R2)
R=?
3. A transaction that issues a select statement to a single remote site is called.
A. Remote Request B. Remote Transaction C. Distributed Request D.
Distributed Transaction
4. What is a data Item for Derived Horizontal Fragmentation
5. A.What kind of fragmentation is done in
the following figure?
B. What relational algebra operation is used to
produce the fragments?
69 Distributed Database Systems 04/13/2024