0% found this document useful (0 votes)
125 views

Lecture 8 - Distributed Database Management Systems

Here is the horizontal fragmentation based on projects with a budget less than $200,000: P1 = σ budget < $200,000 (Projects) This selects all tuples from the Projects relation where the budget attribute is less than $200,000.

Uploaded by

fatini
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

Lecture 8 - Distributed Database Management Systems

Here is the horizontal fragmentation based on projects with a budget less than $200,000: P1 = σ budget < $200,000 (Projects) This selects all tuples from the Projects relation where the budget attribute is less than $200,000.

Uploaded by

fatini
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Distributed Database

Management Systems
(DDBMS)

Muhammad Hamiz Mohd Radzi


Objectives
✘ Describe Distributed Database (DDB), DDBMS,
distributed processing, shared disk, shared memory
and shared nothing of parallel DBMS
✘ Explain the advantages and disadvantages of DDBMS
✘ Describe type of homogeneous & heterogeneous
DDBMS and the Multi Database System (MDBS)
✘ Explain functions and reference architecture of
DDBMS, MDBS and components of DDBMS architecture
Objectives
✘ Explain the concept of allocation in centralized,
fragmented, complete and partial replication of
Distributed Relational Database Design (DDD)
✘ Explain the horizontal, vertical, mixed and derived
fragmentation together with its correctness rules
✘ Describe the distribution, transaction, performance
and DBMS transparencies.
✘ Describe the fragment, location, local mapping and
naming in Distribution Transparency.
DBMS Approach
Application
program 1 ✘ DB is located at the
(with data
semantics) server
DBMS
✘ Processing is split
description
Application
program 2 manipulation between server and
(with data database client
semantics) control
✘ Less data traffic on the
Application network
program 3
(with data
semantics)
Centralized Database (Distributed
Processing)
✘ A database system which resides at one of the nodes
of a network of computers.

Site 1
Site 2
Site 5
Communication
Network

Site 4 Site 3
Problems with Centralized DB
✗ Performance degradation as number of remote sites grew
✗ High cost to maintain large centralized DBs
✗ Reliability problems with one, central site
✗ The site with the database can become a bottleneck.
✗ Data availability is not efficient
✗ Possible availability problem: if the site with the database
goes down, there can be no data access.
Concept of DDBMS
✘ Hence, to overcome the problem of centralized DBMS, DDBMS is
introduced.

✘ Distributed Database: A logically interrelated collection of shared


data (and a description of this data), physically distributed over a
computer network.

✘ Distributed DBMS (DDBMS): Software system that permits the


management of the distributed database and makes the
distribution transparent to users.
✘ Collection of logically-related shared data.
✘ Data split into fragments.
✘ Fragments may be replicated.
✘ Fragments/replicas allocated to sites.
✘ Sites linked by a communications network.
✘ Data at each site is under control of a DBMS.
✘ DBMSs handle local applications autonomously.
✘ Each DBMS participates in at least one global application.
Parallel DBMS
✘ A DBMS running across multiple processors and disks
designed to execute operations in parallel, whenever
possible, to improve performance.

✘ Main architecture are:


✗ Shared memory
✗ Shared disk
✗ Shared nothing
(a) shared memory

(b) shared disk

(c) shared nothing


Advantages & Disadvantages of
DDBMS
Types of DDBMS
✘ Homogeneous

✗ All sites use same DBMS product.

✗ Much easier to design and manage.

✗ Approach provides incremental growth and allows


increased performance.
✘ Heterogeneous:

✗ Sites may run different DBMS products, with possibly different


underlying data models.

✗ Occurs when sites have implemented their own databases and


integration is considered later.

✗ Translations required to allow for:

✗ Different hardware.

✗ Different DBMS products.

✗ Different hardware and different DBMS products.


Multi Database Systems (MDBS)
✘ DDBMS in which each site maintains complete autonomy.

✘ DBMS that resides transparently on top of existing database and


file systems and presents a single database to its users.

✘ Allows users to access and share data without requiring physical


database integration.

✘ Unfederated MDBS (no local users) and federated MDBS.


Functions & Architecture of DDBMS
✘ Functions: Expect DDBMS to have at least the
functionality of a DBMS.

✘ Also to have following functionality:


✗ Extended communication services.
✗ Extended Data Dictionary.
✗ Distributed query processing.
✗ Extended concurrency control.
✗ Extended recovery services.
✘ Global Conceptual Schema (GCS): Logical description of the whole database
which contains definitions of entities, relationships, constraints, security, and
integrity information.

✘ Fragmentation schema is a description of how the data is to be logically


partitioned.

✘ The allocation schema is a description of where the data is to be located,


taking account of any replication.

✘ Local schemas: Each local DBMS has its own set of schemas.
Reference Architecture for DDBMS
✘ Due to diversity, no accepted architecture equivalent
to ANSI/SPARC 3-level architecture.

✘ A reference architecture consists of:


✗ Set of global external schemas.
✗ Global conceptual schema (GCS).
✗ Fragmentation schema and allocation schema.
✗ Set of schemas for each local DBMS conforming to
3-level ANSI/SPARC.
Reference Architecture for DDBMS
Reference Architecture for FMDBS
✘ In DDBMS, GCS is union of all local conceptual schemas.

✘ In FMDBS, GCS is subset of local conceptual schemas (LCS),


consisting of data that each local system agrees to share.

✘ GCS of tightly coupled system involves integration of either parts


of LCSs or local external schemas.

✘ FMDBS with no GCS is called loosely coupled.


Reference Architecture for Tightly-Coupled FMDBS
Components of DDBMS Architecture
✘ Global System Catalog (GSC): Holds information such as
the fragmentation, replication, and allocation schemas.

✘ Local DBMS (LDBMS): Controlling the local data at each site


that has a database.

✘ Data Communications (DC): Software that enables all sites


to communicate with each other
Distributed Relational Database Design
✘ Data fragmentation:

✗ How to partition the database into fragments

✘ Data replication:

✗ Which fragments to replicate

✘ Data allocation:

✗ Where to locate those fragments and replicas


Fragmentation
✘ Definition and allocation of fragments carried out strategically to
achieve:

✗ Locality of Reference.
✗ Improved Reliability and Availability.
✗ Improved Performance.
✗ Balanced Storage Capacities and Costs.
✗ Minimal Communication Costs.

✘ Involves analyzing most important applications, based on


quantitative/qualitative information.
Data Allocation
✘ Centralized: Consists of single database and DBMS stored at one site with
users distributed across the network.

✘ Partitioned: Database partitioned into disjoint fragments, each fragment


assigned to one site.

✘ Complete Replication: Consists of maintaining complete copy of database at


each site.

✘ Selective Replication: Combination of partitioning, replication, and


centralization.
Reasons for Fragmentation
✘ Usage: Applications work with views rather than entire relations.

✘ Efficiency: Data is stored close to where it is most frequently used.

✘ Parallelism: With fragments as unit of distribution, transaction can be


divided into several subqueries that operate on fragments.

✘ Security: Data not required by local applications is not stored and so


not available to unauthorized users.
Types of Fragmentation
✘ Four types of fragmentation:

✗ Horizontal,
✗ Vertical,
✗ Mixed,
✗ Derived.

✘ Other possibility is no fragmentation:

✘ If relation is small and not updated frequently, may be better not


to fragment relation.
Horizontal and Vertical Fragmentation
Mixed Fragmentation
Horizontal Fragmentation
✘ Consists of a subset of the tuples of a relation.

✘ Defined using Selection operation of relational algebra:


σp(R)
✘ Assuming that there are only two property types, Flat and
House, the horizontal fragmentation of PropertyForRent by
property type can be obtained as follows:
P1 = σ type=‘House’(PropertyForRent)
P2 = σ type=‘Flat’(PropertyForRent)
Vertical Fragmentation
✘ Consists of a subset of attributes of a relation.

✘ Defined using Projection operation of relational algebra:


∏a1, ... ,an(R)

✘ For example:
S1 = ∏staffNo, position, sex, DOB, salary(Staff)
S2 = ∏staffNo, fName, lName, branchNo(Staff)

✘ Determined by establishing affinity of one attribute to


another.
Mixed Fragmentation
✘ Consists of a horizontal fragment that is vertically
fragmented, or a vertical fragment that is horizontally
fragmented.

✘ Defined using Selection and Projection operations of


relational algebra:

σ p(∏a1, ... ,an(R))


or
∏a1, ... ,an(σp(R))
Derived Horizontal Fragmentation

✘ A horizontal fragment that is based on horizontal


fragmentation of a parent relation.

✘ Ensures that fragments that are frequently joined


together are at same site.

✘ Defined using Semijoin operation of relational algebra:


Ri = R F Si, 1≤i≤w
Case study
✘ Supposed that
we have these
tables in our
database.
Question 1 raw

✘ Do a horizontal fragmentation
based on:
✗ PROJ1: projects with
budget less than $200,000
✗ PROJ2: projects with
budget greater than or
equal to $200,000
By using RA: Reconstruction:
Proj1
Proj1 = σ BUDGET<200K (Proj) ⋃
Proj2
Proj2 = σ BUDGET>=200K(Proj)
Question 2
✘ Do a vertical fragmentation
based on: column

✗ PROJ3: information about


project budgets.
✗ PROJ4: information about
project names and its
locations
For RA: Reconstruction:

PROJ3 = ∏PNO, BUDGET (PROJ) PROJ3 ⨝ PROJ4


PNO

PROJ4 = ∏PNO, NAME, LOC (PROJ)


Question 3
✘ Do a mixed fragmentation based PROJ1&3 PROJ1&4
on:
✗ PROJ1&3: information about
project budgets and it must
be less than $200,000
✗ PROJ1&4: information about
project names and its
locations and it must be less
than $200,000
✗ PROJ2&3: information about
PROJ2&3 PROJ2&4

project budgets and it must


be greater than or equal
$200,000
✗ PROJ2&4: information about
project names and its
locations and it must be
greater than or equal
$200,000
For RA:
PROJ1&3 = ∏PNO, BUDGET σ BUDGET<200K (PROJ)
PROJ1&4 = ∏PNO, NAME, LOC σ BUDGET<200K (PROJ)
PROJ2&3 = ∏PNO, BUDGET σ BUDGET>=200K (PROJ)
PROJ2&4 = ∏PNO, NAME, LOC σ BUDGET>=200K (PROJ)
PROJ1&3 PROJ1&4
Reconstruction:

PROJ1&3 ⨝ PROJ1&4

PROJ2&3 PROJ2&4
PROJ2&3 ⨝ PROJ2&4
QUESTION 4
• Do a horizontal fragmentation based
on:
– PAY1: salary less than $30,000
– PAY2: salary greater than or equal to
$30,000
By using RA: Reconstruction:
Pay1
Pay1 = σ SALARY<30K (PAY) ⋃
Pay2
Pay2 = σ SALARY>30K(PAY)
Question 5
• Identify which table is a CHILD table
to PAY table.
– EMPLOYEE
HAMIZ RADZI

Question 6
• Do a derived fragmentation
of an EMPLOYEE table.
– EMP1: employee with
salary less than $30,000
– EMP2: employee with
salary greater than
$30,000
BY RA:

EMP1 = EMP TITLE PAY1

EMP2 = EMP TITLE PAY2


No Fragmentation
✘ A final strategy is not to fragment a relation.

✘ For example, the Branch relation contains


only a small number of tuples and is not updated very frequently.

✘ Hence, it is better to leave the table that way as fragmenting it


will lead to nothing better.
Correctness of Fragmentation
Completeness
✘ If relation R is decomposed into fragments R1, R2, ... Rn, each data item that
can be found in R must appear in at least one fragment.

Reconstruction
✘ Must be possible to define a relational operation that will reconstruct R from
the fragments.
✘ Reconstruction for horizontal fragmentation is Union operation and Join for
vertical .
Correctness of Fragmentation
Disjointness
✘ If data item di appears in fragment Ri, then it should
not appear in any other fragment.
✘ Exception: vertical fragmentation, where primary key
attributes must be repeated to allow reconstruction.
✘ For horizontal fragmentation, data item is a tuple.
✘ For vertical fragmentation, data item is an attribute.
Transparencies in a DDBMS

✘ Distribution Transparency ✘ Transaction Transparency


✘ Performance Transparency
✗ Fragmentation Transparency
✘ DBMS Transparency
✗ Location Transparency
✗ Replication Transparency
✗ Local Mapping Transparency
✗ Naming Transparency
Distribution Transparency
✘ Allows management of a physically dispersed database as though it were
a centralized database
✘ Supported by a distributed data dictionary (DDD) which contains the
description of the entire database as seen by the DBA
✗ The DDD is itself distributed and replicated at the network nodes

✘ Three levels of distribution transparency are recognized:


✗ Fragmentation transparency – user does not need to know if a
database is partitioned; fragment names and/or fragment locations
are not needed
✗ Location transparency – fragment name, but not location, is
required
✗ Local mapping transparency – user must specify fragment name
and location
A Summary of Transparency Features

53
Distribution Transparency
✘ The EMPLOYEE table is divided among three locations (no replication)

✘ Suppose an employee wants to find all employees with a birthdate prior


to jan 1, 1940

✗ Fragmentation transparency-
SELECT * FROM EMPLOYEE WHERE EMP_DOB < ’01-JAN-1940’;

✗ Location transparency-
SELECT * FROM E1 WHERE EMP_DOB < ’01-JAN-1940’ UNION
SELECT * FROM E2 … UNION SELECT * FROM E3…;

✗ Local Mapping Transparency


SELECT * FROM E1 NODE NY WHERE EMP_DOB < ’01-JAN-1940’
UNION SELECT * FROM E2 NODE ATL … UNION SELECT * FROM E3
NODE MIA…;
Naming Transparency
✘ Each item in a DDB must have a unique name.

✘ DDBMS must ensure that no two sites create a database object


with same name.

✘ One solution is to create central name server. However, this


results in:
✗ loss of some local autonomy;
✗ central site may become a bottleneck;
✗ low availability; if the central site fails, remaining sites
cannot create any new objects.
Replication Transparency
✘ Replication Transparency
✗ With replication transparency, user is unaware of
replication of fragments .
Transaction Transparency
✘ Ensures database transactions will maintain
distributed database’s integrity and consistency

✘ A DDBMS transaction can update data stored in many


different computers connected in a network
✗ Transaction transparency ensures that the
transaction will be completed only if all database
sites involved in the transaction complete their
part of the transaction
Performance Transparency
• Performance transparency – allows system to perform
as if it were a centralized DBMS.

• No performance degradation due to use of a network or


platform differences
DBMS Transparency
✘ DBMS transparency hides the knowledge that the local
DBMSs may be different, and is therefore only applicable to
heterogeneous DDBMSs.

✘ It is one of the most difficult transparencies to provide as a


generalization.
References
✘ Thomas Connolly and Carolyn Begg, Database Systems:
A Practical Approach to Design, Implementation, and
Management, 6th Edition, Pearson, 2015, ISBN: 978-
01329432

You might also like