0% found this document useful (0 votes)
7 views

Lecture 8 - Distributed Databases

The document discusses distributed databases, which are collections of shared data distributed across a network, and the software systems (DBMS) that manage them. It covers types of distributed databases (homogeneous and heterogeneous), their architectures, and key design issues such as fragmentation, allocation, and replication. Additionally, it highlights the importance of transparency for users regarding data location, fragmentation, and replication.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture 8 - Distributed Databases

The document discusses distributed databases, which are collections of shared data distributed across a network, and the software systems (DBMS) that manage them. It covers types of distributed databases (homogeneous and heterogeneous), their architectures, and key design issues such as fragmentation, allocation, and replication. Additionally, it highlights the importance of transparency for users regarding data location, fragmentation, and replication.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

10/13/2014

Distributed Databases

• Distributed Database

A logically interrelated collection of shared data (and a


Distributed Databases description of this data), physically distributed over a
computer network.

Ref Connolly and Begg • Distributed DBMS

Software system that permits the management of the


distributed database and makes the distribution
transparent to users.

An Example of a Distributed Database System


Distribution
• A distributed database is a logical collection of Site 1
shared data that is physically distributed over a
computer network Site 4

• A distributed DBMS is a software system that


enables the management of a distributed Computer
Network
database and that makes such distribution Site 2
transparent to its users
• Distributed processing involves physically
distributing aspects of an application’s processing
Site 3
over a computer network
3

An Example of Distributed Processing Types of Distributed Databases


• Homogeneous: Every site runs same type of
DBMS.
Site 1

Site 4 • Heterogeneous: Different sites run different


DBMSs (different RDBMSs or even non-
relational DBMSs).
Computer
Network

Site 2

Site 3

1
10/13/2014

Homogeneous DDBMS Heterogeneous DDBMS

• All sites use same DBMS product. • Sites may run different DBMS products, with
possibly different underlying data models.
• Occurs when sites have implemented their own
• Much easier to design and manage. databases and integration is considered later.
• Translations required to allow for:

• Approach provides incremental growth and – Different hardware.


allows increased performance. – Different DBMS products.
– Different hardware and different DBMS products.
• Typical solution is to use gateways.

7 8

QUERY

Distributed DBMS Architectures Functions of a DDBMS

• Client-Server
– Client chips query to Query • Expect DDBMS to have at least the functionality
client client
of a DBMS.
single site. All query
processing at server Server Server Server

• Also to have following functionality:


• Collaborating-Server Server
– Extended communication services.
– Query can span Server
– Extended Data Dictionary.
multiple sites Server
– Distributed query processing.
Query
– Extended concurrency control.
– Extended recovery services.

9 10

Reference Architecture for DDBMS Distributed Database Design

• Due to diversity, no accepted architecture • Three key issues:


equivalent to ANSI/SPARC 3-level architecture. – Fragmentation,
• A reference architecture consists of:
– Allocation,
– Set of global external schemas.
– Replication.
– Global conceptual schema (GCS).
– Fragmentation schema and allocation schema.
– Set of schemas for each local DBMS conforming to 3-
level ANSI/SPARC.
• Some levels may be missing, depending on
levels of transparency supported.
11 12

2
10/13/2014

Distributed Database Design


Distributed Database Design
• Fragmentation

• Fragmentation
– Horizontal: Usually disjoint. – Relation may be divided into a number of sub-
relations, which are then distributed.
– Vertical: Lossless-join • Allocation
• Replication – Each fragment is stored at site with “optimal”
distribution.
– Gives increased availability.
– Faster query evaluation.
• Replication
– Synchronous vs. Asynchronous. – Copy of fragment may be maintained at several
• Vary in how current copies sites.
are.

13 14

Fragmentation…
Fragmentation
• Definition and allocation of fragments carried • Quantitative information may include:
out strategically to achieve: – frequency with which an application is run;
– Locality of Reference. – site from which an application is run;
– Improved Reliability and Availability. – performance criteria for transactions and
– Improved Performance. applications.
– Balanced Storage Capacities and Costs. • Qualitative information may include
– Minimal Communication Costs. transactions that are executed by application,
type of access (read or write), and predicates
• Involves analyzing most important applications,
of read operations.
based on quantitative/qualitative information.
15 16

Data Allocation Data Allocation…

• Four alternative strategies regarding • Centralized


placement of data: – Consists of single database and DBMS stored at
– Centralized, one site with users distributed across the
– Partitioned (or Fragmented), network.
– Complete Replication, • Partitioned
– Selective Replication. – Database partitioned into disjoint fragments,
each fragment assigned to one site.

17 18

3
10/13/2014

Data Allocation…
Why Fragment?
• Complete Replication • Usage
– Consists of maintaining complete copy of – Applications work with views rather than entire
database at each site. relations.
• Selective Replication • Efficiency
– Combination of partitioning, replication, and – Data is stored close to where it is most
centralization. frequently used.
– Data that is not needed by local applications is
not stored.

19 20

Why Fragment? Why Fragment?


• Parallelism • Disadvantages
– With fragments as unit of distribution, – Performance,
transactions can be divided into several – Integrity.
subqueries that operate on fragments.
• Security
– Data not required by local applications is not
stored and so not available to unauthorized
users.

21 22

Rationale - Distribution
• Location Transparency - Users do not need to be aware
of at what sites data is located. Simplifies user programs
and interface activities. Data can migrate from site to site
without invalidating any of those programs or activities.
Data may be migrated around the network in response to
changing usage or performance requirements
• Fragmentation Transparency - Users do not need to be
aware of how data is fragmented.
• Replication Transparency - Users should not need to be
aware of how data is replicated. Replication is desirable
because performance is better if applications can operate
on local copies and availability is better so long as at least
one copy remains available for retrieval purposes.
23

You might also like