Distributed Database Design: Basics
Distributed Database Design: Basics
BASICS
Introduction
Distributed Database defn:
A logicallly interrelated collection of shared
data, physically distributed over a computer
network
Distributed DBMS defn:
the software system that permits the
management of the distributed database and
makes the distribution transparent to users
Distributed Systems
Data spread over multiple machines
(also referred to as sites or nodes).
Network interconnects the machines
Data shared by users on multiple
machines
Advantages
Organisational structure - many organisations
cover several sites
Shareability -users at different sites can share
Local autonomy – each site is able to retain a
degree of control over data stored locally
Improved availability - node failure will not
make system inoperable
Improved Reliability - replicated data allows
data accessability
Improved Performance - data located near
site
Modular Growth - easier expansion
Disadvantages
Complexity - more complex than centralised
Cost - added network and maintenance costs
Security - network must be made secure
Integrity control - more difficult to ensure
proper coordination among sites.
Lack of standards experience - no tools or
methodologies
Complex Design- Database design more
complex
Classification of DDBMS
Homogeneous DDBMS
Same software/schema on all sites, data may be
partitioned among sites
Goal: provide a view of a single database, hiding
details of distribution
Characteristics
All sites use same DBMS product.
Integrated banking
Heterogeneous Inter-divisional and inter-banking
information
systems systems
Design Issues with DDBMS
In designing a distributed database, the same issues are faced
as for a centralized database plus, in addition:
Fragmentation
Relation may be divided into a number of sub-relations,
which are then distributed.
Allocation:
Each fragment is stored at site with "optimal" distribution.
Replication
Copy of fragment may be maintained at several sites.
Functions of DDBMS
Functions of a Centralised DBMS plus:
extended communication to allow the transfer of
queries and data among sites
extended system catalog to store data distribution
details
distributed query processing , including query
optimisation
extended concurrency control to maintain consistency
of replicated data
extended recovery services to take account of failures
of individual sites and comms links
Component Architecture
Local DBMS (LDBMS) - responsible for local data
Transaction, Buffer and Recovery Managers and Scheduler
Components of a
DDBMS
Reference Architecture for
DDBMS
Due to diversity, no accepted architecture
equivalent to ANSI/SPARC 3-level
architecture.
different DBMSs.
Most ambitious goal is to find a way to enable transaction to span DBMSs
from different vendors without use of a gateway.
Other Categories Contd..
Multi Database System (MDBS)
DDBMS in which each site maintains complete
autonomy.
DBMS that resides transparently on top of existing
database and file systems and presents a single
database to its users.
Allows users to access and share data without
requiring physical database integration.
Two Categories: Unfederated MDBS (no local
users) and federated MDBS.