0% found this document useful (0 votes)
53 views

Distributed Database Design: Basics

Distributed database systems allow data to be shared across a network of connected machines. They provide advantages like improved availability, reliability, and performance but also introduce complexity. There are two main types - homogeneous systems which use the same software across all sites, and heterogeneous which integrate different existing databases. Key challenges in design include fragmentation, allocation, replication, and ensuring consistency and atomicity of transactions that span multiple sites.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Distributed Database Design: Basics

Distributed database systems allow data to be shared across a network of connected machines. They provide advantages like improved availability, reliability, and performance but also introduce complexity. There are two main types - homogeneous systems which use the same software across all sites, and heterogeneous which integrate different existing databases. Key challenges in design include fragmentation, allocation, replication, and ensuring consistency and atomicity of transactions that span multiple sites.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Distributed Database Design

BASICS
Introduction
 Distributed Database defn:
 A logicallly interrelated collection of shared
data, physically distributed over a computer
network
 Distributed DBMS defn:
 the software system that permits the
management of the distributed database and
makes the distribution transparent to users
Distributed Systems
 Data spread over multiple machines
(also referred to as sites or nodes).
 Network interconnects the machines
 Data shared by users on multiple
machines
Advantages
 Organisational structure - many organisations
cover several sites
 Shareability -users at different sites can share
 Local autonomy – each site is able to retain a
degree of control over data stored locally
 Improved availability - node failure will not
make system inoperable
 Improved Reliability - replicated data allows
data accessability
 Improved Performance - data located near
site
 Modular Growth - easier expansion
Disadvantages
 Complexity - more complex than centralised
 Cost - added network and maintenance costs
 Security - network must be made secure
 Integrity control - more difficult to ensure
proper coordination among sites.
 Lack of standards experience - no tools or
methodologies
 Complex Design- Database design more
complex
Classification of DDBMS
Homogeneous DDBMS
 Same software/schema on all sites, data may be
partitioned among sites
 Goal: provide a view of a single database, hiding
details of distribution

Characteristics
 All sites use same DBMS product.

 Much easier to design and manage.

 Approach provides incremental growth and allows


increased performance.
Heterogeneous DDBMS
 Different software/schema on different sites
 Goal: integrate existing databases to provide
useful functionality
Characteristics
 Sites may run different DBMS products, with possibly different
underlying data models.
 Occurs when sites have implemented their own databases and
integration is considered later.
 Translations required to allow for:
 Different hardware.
 Different DBMS products.
 Different HW and different DBMS products.
 Typical solution is to use gateways.
Classification Contd…
Examples of typical applications:

Type of DBMS LAN network WAN network

Data management Travel management


Homogenous and financial and finanacial
applications applications

Integrated banking
Heterogeneous Inter-divisional and inter-banking
information
systems systems
Design Issues with DDBMS
 In designing a distributed database, the same issues are faced
as for a centralized database plus, in addition:

 Fragmentation
Relation may be divided into a number of sub-relations,
which are then distributed.

 Allocation:
Each fragment is stored at site with "optimal" distribution.

 Replication
Copy of fragment may be maintained at several sites.
Functions of DDBMS
Functions of a Centralised DBMS plus:
 extended communication to allow the transfer of
queries and data among sites
 extended system catalog to store data distribution
details
 distributed query processing , including query
optimisation
 extended concurrency control to maintain consistency
of replicated data
 extended recovery services to take account of failures
of individual sites and comms links
Component Architecture
 Local DBMS (LDBMS) - responsible for local data
 Transaction, Buffer and Recovery Managers and Scheduler

 Data Communications (DC) component - allows all sites to


communicate with each other
 Global system catalog (GSC) - catalog information re:
fragmentation and allocation schema
 Distributed DBMS (DDBMS) - controlling unit of the entire system

Components of a
DDBMS
Reference Architecture for
DDBMS
Due to diversity, no accepted architecture
equivalent to ANSI/SPARC 3-level
architecture.

A reference architecture consists of:


•Set of global external schemas. (GES)
•Global conceptual schema (GCS).
•Fragmentation schema
•Allocation schema.
•Set of schemas for each local DBMS
conforming to 3-level ANSI/SPARC.

Some levels may be missing, depending on


levels of transparency supported.
Local and Global Transactions

 A local transaction accesses data in the single site


at which the transaction was initiated.

 A global transaction either accesses data in a site


different from the one at which the transaction
was initiated or accesses data in several different
sites.
Implementation Issues for Distributed
Databases
 Atomicity needed even for transactions that update
data at multiple sites
 The two-phase commit protocol (2PC) is used to
ensure atomicity
 Basic idea: each site executes transaction until just before
commit, and the leaves final decision to a coordinator
 Each site must follow decision of coordinator, even if there is a
failure while waiting for coordinators decision
 Distributed concurrency control (and deadlock detection)
required
 Data items may be replicated to improve data availability
DDBMS Network Types
 Local-area networks (LANs) – composed
of processors that are distributed over small
geographical areas, such as a single
building or a few adjacent buildings.

 Wide-area networks (WANs) –


composed of processors distributed over a
large geographical area.
Networks Types (Cont.)
 WANs with continuous connection (e.g. the
Internet) are needed for implementing distributed
database systems
 Groupware applications such as Lotus notes can
work on WANs with discontinuous connection:
 Data is replicated.

 Updates are propagated to replicas periodically.

 Copies of data may be updated independently.

 Non-serializable executions can thus result.

Resolution is application dependent.


Other Categories
Open Database Access and Interoperability
 Open Group has formed a Working Group to provide specifications that
will create database infrastructure environment where there is:
 Common SQL API that allows client applications to be written that do

not need to know vendor of DBMS they are accessing.


 Common database protocol that enables DBMS from one vendor to

communicate directly with DBMS from another vendor without the


need for a gateway.
 A common network protocol that allows communications between

different DBMSs.
 Most ambitious goal is to find a way to enable transaction to span DBMSs
from different vendors without use of a gateway.
Other Categories Contd..
Multi Database System (MDBS)
 DDBMS in which each site maintains complete
autonomy.
 DBMS that resides transparently on top of existing
database and file systems and presents a single
database to its users.
 Allows users to access and share data without
requiring physical database integration.
 Two Categories: Unfederated MDBS (no local
users) and federated MDBS.

You might also like