1
DISTRIBUTED DBMS – CONCEPTS
AND DESIGN
Fiaz Majeed (University of Gujrat)
2
Objectives
• Concepts.
• Advantages and disadvantages of distributed
databases.
• Functions and architecture for a DDBMS.
• Distributed database design.
3
Concepts
Distributed Database
A logically interrelated collection of shared data (and a
description of this data), physically distributed over a
computer network.
Distributed DBMS
Software system that permits the management of the
distributed database and makes the distribution transparent
to users.
4
Concepts
• Collection of logically-related shared data.
• Data split into fragments.
• Fragments may be replicated.
• Fragments/replicas allocated to sites.
• Sites linked by a communications network.
• Data at each site is under control of a DBMS.
• DBMSs handle local applications autonomously.
• Each DBMS participates in at least one global
application.
5
Distributed DBMS
6
Distributed Processing
A centralized database that can be accessed over a
computer network.
7
Types of DDBMS
• Homogeneous DDBMS
• Heterogeneous DDBMS
8
Homogeneous DDBMS
• All sites use same DBMS product.
• Much easier to design and manage.
• Approach provides incremental growth and allows
increased performance.
9
Heterogeneous DDBMS
• Sites may run different DBMS products, with
possibly different underlying data models.
• Occurs when sites have implemented their own
databases and integration is considered later.
• Translations required to allow for:
• Different hardware.
• Different DBMS products.
• Typical solution is to use gateways.
10
Open Database Access and Interoperability
• Open Group formed a Working Group to provide
specifications that will create a database
infrastructure environment where there is:
• Common SQL API that allows client applications
to be written that do not need to know vendor of
DBMS they are accessing.
• Common database protocol that enables DBMS
from one vendor to communicate directly with
DBMS from another vendor without the need for a
gateway.
• A common network protocol that allows
communications between different DBMSs.
11
Open Database Access and Interoperability
• Most ambitious goal is to find a way to enable
transaction to span DBMSs from different vendors
without use of a gateway.
12
Multidatabase System (MDBS)
DDBMS in which each site maintains complete
autonomy.
• DBMS that resides transparently on top of existing
database and file systems and presents a single
database to its users.
• Allows users to access and share data without
requiring physical database integration.
13
Functions of a DDBMS
• Expect DDBMS to have at least the functionality of a
DBMS.
• Also to have following functionality:
• Extended communication services.
• Extended Data Dictionary.
• Distributed query processing.
• Extended concurrency control.
• Extended recovery services.
14
Reference Architecture for DDBMS
• Due to diversity, no accepted architecture equivalent
to ANSI/SPARC 3-level architecture.
• A reference architecture consists of:
• Set of global external schemas.
• Global conceptual schema (GCS).
• Fragmentation schema and allocation schema.
• Set of schemas for each local DBMS conforming to
3-level ANSI/SPARC.
15
Reference Architecture for DDBMS
16
Reference Architecture for MDBS
• In DDBMS, GCS is union of all local conceptual
schemas.
• GCS of tightly coupled system involves integration
of either parts of LCSs or local external schemas.
17
Components of a DDBMS
18
Distributed Database Design
• Three key issues:
• Fragmentation,
• Allocation,
• Replication.
19
Distributed Database Design
Fragmentation
Relation may be divided into a number of sub-relations,
which are then distributed.
Allocation
Each fragment is stored at site with “optimal” distribution.
Replication
Copy of fragment may be maintained at several sites.
20
Fragmentation
• Definition and allocation of fragments carried out
strategically to achieve:
• Locality of Reference.
• Improved Reliability and Availability.
• Improved Performance.
• Balanced Storage Capacities and Costs.
• Minimal Communication Costs.
21
Data Allocation
• Four alternative strategies regarding placement of
data:
• Centralized,
• Partitioned (or Fragmented),
• Complete Replication,
• Selective Replication.
22
Data Allocation
Centralized: Consists of single database and DBMS
stored at one site with users distributed across the
network.
Partitioned: Database partitioned into disjoint
fragments, each fragment assigned to one site.
Complete Replication: Consists of maintaining
complete copy of database at each site.
Selective Replication: Combination of partitioning,
replication, and centralization.
23
Comparison of Strategies for Data
Distribution
24
Why Fragment?
• Usage
• Applications work with views rather than entire relations.
• Efficiency
• Data is stored close to where it is most frequently used.
• Data that is not needed by local applications is not stored.
25
Why Fragment?
• Parallelism
• With fragments as unit of distribution, transaction can be
divided into several subqueries that operate on fragments.
• Security
• Data not required by local applications is not stored and so
not available to unauthorized users.
26
Why Fragment?
• Disadvantages
• Performance,
• Integrity.
27
Types of Fragmentation
• Four types of fragmentation:
• Horizontal,
• Vertical,
• Mixed,
• Derived.
• Other possibility is no fragmentation:
• If relation is small and not updated frequently, may be better
not to fragment relation.
28
Horizontal and Vertical Fragmentation
29
Mixed Fragmentation
30
Horizontal Fragmentation
• Consists of a subset of the tuples of a relation.
• Defined using Selection operation of relational
algebra:
p(R)
• For example:
P1 = type=‘House’(PropertyForRent)
P2 = type=‘Flat’(PropertyForRent)
31
Vertical Fragmentation
• Consists of a subset of attributes of a relation.
• Defined using Projection operation of relational
algebra:
a1, ... ,an(R)
• For example:
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
32
Mixed Fragmentation
• Consists of a horizontal fragment that is vertically
fragmented, or a vertical fragment that is horizontally
fragmented.
• Defined using Selection and Projection operations of
relational algebra:
p(a1, ... ,an(R)) or
a1, ... ,an(σp(R))
33
Example - Mixed Fragmentation
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
S21 = branchNo=‘B003’(S2)
S22 = branchNo=‘B005’(S2)
S23 = branchNo=‘B007’(S2)
34
Derived Horizontal Fragmentation
• A horizontal fragment that is based on horizontal
fragmentation of a parent relation.
• Ensures that fragments that are frequently joined
together are at same site.
• Defined using Semijoin operation of relational
algebra:
Ri = R F Si, 1iw
35
Example - Derived Horizontal Fragmentation
S3 = branchNo=‘B003’(Staff)
S4 = branchNo=‘B005’(Staff)
S5 = branchNo=‘B007’(Staff)
Could use derived fragmentation for Property:
Pi = PropertyForRent branchNo Si, 3i5
36
Derived Horizontal Fragmentation
• If relation contains more than one foreign key, need
to select one as parent.
• Choice can be based on fragmentation used most
frequently or fragmentation with better join
characteristics.
37
Distributed Database Design Methodology
1. Use normal methodology to produce a design for
the global relations.
2. Examine topology of system to determine where
databases will be located.
3. Analyze most important transactions and identify
appropriateness of horizontal/vertical
fragmentation.
4. Decide which relations are not to be fragmented.
5. Examine relations on 1 side of relationships and
determine a suitable fragmentation schema.
Relations on many side may be suitable for
derived fragmentation.