DISTRIBUTED
DBMS
SUSHIL
KULKARNI
DDBMS Concepts
Applications
Characteristics, Properties of DDBMS
Distributed Processing
Advantages & Disadvantages DDBMS
Types & Functions of DDBMS
Main Issues of DDBMS
Component Architecture for DDBMS
Data Allocation & Fragmentation
Transparencies
CONCEPTS
CONCEPTS
• So far, we assume a centralized database
Data are stored in one location (e.g. a single
hard disk)
A centralized database management system to
handle transaction
To handle multiple requests, a client-server
system is used
- Client send requests for data to server
- Server handle query, transaction management etc.
SUSHIL KULKARNI
CONCEPTS
• This is not the only possibility
• In many cases, it may be advantageous
for data to be distributed
– Branches of a bank
– Different part of the government storing
different kind of data about a person
– Different organizations sharing part of their
data
• Thus, distributed databases
SUSHIL KULKARNI
CONCEPTS
• Data spread over multiple machines (also referred
to as sites or nodes.
• Network interconnects the machines
• Data shared by users on multiple machines
SUSHIL KULKARNI
CONCEPTS
Distributed database
Logical interrelated collection of shared data,
along with description of data, physically
distributed over a computer network.
SUSHIL KULKARNI
CONCEPTS
Distributed DBMS
The software system that permits the
management of the distributed database and
makes the distribution transparent to users
SUSHIL KULKARNI
CONCEPTS
Applications
• User access distributed database via
applications
SUSHIL KULKARNI
CONCEPTS
TWO types of Applications
• Local application : Application that do not
required data from other sites.
• Global application : Application that required
data from other sites.
SUSHIL KULKARNI
TYPES OF DDBMS
• In a homogeneous distributed database:
– All sites have identical software.
– Are aware of each other and agree to
cooperate in processing user requests.
– Each site surrenders part of its autonomy
in terms of right to change schemas or
software.
– Appears to user as a single system.
SUSHIL KULKARNI
TYPES OF DDBMS
• In a heterogeneous distributed database:
– Different sites may use different schemas and
software.
• Difference in schema is a major problem for
query processing.
• Difference in software is a major problem for
transaction processing.
– Sites may not be aware of each other and may
provide only limited facilities for cooperation in
transaction processing.
SUSHIL KULKARNI
TYPE: HOMOGENEOUS DBMS
Identical DBMSs
SUSHIL KULKARNI
TYPE: HETROGENEOUS DBMS
Non-identical DBMSs
SUSHIL KULKARNI
OBJECTIVES : DISTRIBUTED
ARCHITECTURE
• Location Transparency
– User does not have to know the location of the data.
– Data requests automatically forwarded to appropriate
sites
• Local Autonomy
– Local site can operate with its database when
network connections fail
– Each site controls its own data, security,
logging, recovery
SUSHIL KULKARNI
SIGNIFICANT TRADE -OFF
Synchronous • Asynchronous
Distributed Database Distributed Database
• All copies of the same • Some data inconsistency
data are always is tolerated
identical • Data update propagation
is delayed
• Data updates are
immediately applied to • Lower data integrity
all copies throughout • Less overhead faster
network response time
• Good for data integrity
• High overhead slow
response times
NOTE: all this assumes replicated data (to be discussed later)
Advantages & Disadvantages
Advantages Disadvantages
• Increased reliability • Software cost &
& availability complexity
• Local control • Processing overhead
• Modular growth • Data integrity
• Lower • Slow response
communication
costs
• Faster response
DISTRIBUTED PROCESSING
A centralized database that can be
accessed over a computer network.
SUSHIL KULKARNI
DISTRIBUTED PROCESSING
T T T T T T
COM 1 COM 2
Communication
Network DB
T T T
SUSHIL KULKARNI COM 3
FUNCTIONS OF DDBMS
Functions of a centralized DBMS plus:
extended communication to allow the transfer of
queries and data among sites
extended system catalog to store data distribution
details
distributed query processing , including query
optimization
SUSHIL KULKARNI
FUNCTIONS OF DDBMS
extended concurrency control to maintain
consistency of replicated data.
extended recovery services to take account
of failures of individual sites and common
links
SUSHIL KULKARNI
TWO MAIN ISSUES IN DDBMS
Making query from one site to the same or
remote site.
Logical database is partitioned in to
different
data streams and located at different sites.
SUSHIL KULKARNI
COMPONENT ARCHITECTURE FOR
DDBMS
• Local DBMS
• Data Communication Component
• Global System Catalog
• Distributed DBMS component
SUSHIL KULKARNI
DATA
ALLOCATION
DATA ALLOCATION
• Centralized
• Fragmented
• Complete replication
• Selective replication
SUSHIL KULKARNI
Distributed Data Storage
• Assume relational data model.
• Replication:
– System maintains multiple copies of data, stored in
different sites, for faster retrieval and fault tolerance.
• Fragmentation:
– Relation is partitioned into several fragments stored in
distinct sites
• Replication and fragmentation can be combined:
– Relation is partitioned into several fragments: System
maintains several identical replicas of each such
fragment.
Data Replication
• A relation or fragment of a relation is
replicated if it is stored redundantly in
two or more sites.
• Full replication of a relation is the case
where the relation is stored at all sites.
• Fully redundant databases are those in
which every site contains a copy of the
entire database.
SUSHIL KULKARNI
Data Replication (Cont.)
Data Replication
Data Replication
• Advantages of Replication:
– Availability: failure of site containing relation r
does not result in unavailability of r is replicas
exist.
– Parallelism: queries on r may be processed by
several nodes in parallel.
– Reduced data transfer: relation r is available
locally at each site containing a replica of r.
SUSHIL KULKARNI
Data Replication (Cont.)
Data Replication
Data Replication
• Disadvantages of Replication
– Increased cost of updates: each replica of
relation r must be updated.
– Increased complexity of concurrency control:
concurrent updates to distinct replicas may
lead to inconsistent data unless special
concurrency control mechanisms are
implemented.
• One solution: choose one copy as primary copy
and apply concurrency control operations on
primary copy.
Data Fragmentation
• Division of relation r into fragments r1, r2, …, rn which
contain sufficient information to reconstruct relation r.
• Horizontal fragmentation: each tuple of r is assigned to
one or more fragments.
• Vertical fragmentation: the schema for relation r is split
into several smaller schemas.
– All schemas must contain a common candidate key (or
superkey) to ensure lossless join property.
– A special attribute, the tuple-id attribute may be added to each
schema to serve as a candidate key.
• Example : relation account with following schema.
• Account-schema = (branch-name, account-number,
balance).
HORIZONTAL FRAGMENTATION
Original relation A1 A2 ………. An
T1
A1 A2 ………. An T2
T1 T3
1
T2 .
1
T3 .T60
1
.
2 Site 1
.T60
2
T61
A1 A2 ………. An
3
. T61
3
. .
3
Tn .
Tn
Site 2
-Fragments contain subsets of complete tuples (all attributes at all
sites)
How to reconstruct R= Rs1 Rs2 ……. Rsn
VERTICAL FRAGMENTATION
A1 A2 A3 A4
Original t1 How to Reconstruct:
Relation (R)
t2 R=Rs1 Rs2 Rsn
TID –Tuple ID
Hidden Attribute to
ensure account tn
and simple join
reconstruction
A1 A2 TID TIDA3 A4 RS2
t1 1 1 t1
RS1 2 2
t2 t2
RS1.TID=RS2.TID
n n
tn tn Join condition
SITE1 SITE2
VERTICAL FRAGMENTATION
A1 A2 A3 A4
Original t1 How to Reconstruct:
Relation (R)
t2 R=Rs1 Rs2 Rsn
TID –Tuple ID
Hidden Attribute to
ensure account tn
and simple join
reconstruction
A1 A2 TID TIDA3 A4 RS2
t1 1 1 t1
RS1 2 2
t2 t2
RS1.TID=RS2.TID
n n
tn tn Join condition
SITE1 SITE2
MIXED FRAGMENTATION
Rs1 A1 A2 A3 A4 A5
Rs3
u
R s
a
A1 A2 A3 A4 A5
Rs2
A1 A2 A3 E
A4 A5 u
(Salary (Benefit r
Attributes) Attributes)
o
p
Rs4 e
MIXED FRAGMENTATION
A1 A2 A3 A4
Original t1 How to Reconstruct:
Relation (R)
t2 R=Rs1 Rs2 Rsn
TID –Tuple ID
Hidden Attribute to
ensure account tn
and simple join
reconstruction
A1 A2 TID TIDA3 A4 RS2
t1 1 1 t1
RS1 2 2
t2 t2
RS1.TID=RS2.TID
n n
tn tn Join condition
SITE1 SITE2
Horizontal Fragmentation of account
Relation
branchname accountnumber balance
Hillside A305 500
Hillside A226 336
Hillside A155 62
account1=σ branchname=“Hillside”(account)
branchname accountnumber balance
Valleyview A177 205
Valleyview A402 10000
Valleyview A408 1123
Valleyview A639 750
account2=σ branchname=“Valleyview”(account)
SUSHIL KULKARNI
Vertical Fragmentation of employee-info
Relation
branchname customername tupleid
Hillside Lowman 1
Hillside Camp 2
Valleyview Camp 3
Valleyview Kahn 4
Hillside Kahn 5
Valleyview Kahn 6
Valleyview Green 7
deposit1=Π branchname, customername, tupleid(employeeinfo)
account number balance tupleid
A305 500 1
A226 336 2
A177 205 3
A402 10000 4
A155 62 5
A408 1123 6
A639 750 7
deposit2=Π accountnumber, balance, tupleid(employeeinfo)
Advantages of Fragmentation
• Horizontal:
– allows parallel processing on fragments of a relation
– allows a relation to be split so that tuples are located where
they are most frequently accessed
• Vertical:
– allows tuples to be split so that each part of the tuple is stored
where it is most frequently accessed
– tuple-id attribute allows efficient joining of vertical fragments
– allows parallel processing on a relation
• Vertical and horizontal fragmentation can be mixed.
– Fragments may be successively fragmented to an arbitrary
depth.
SUSHIL KULKARNI
REPLICATION and FRAGMENTATION
Partition of Attributes/tuples need not be disjoint
A1 A2 A3 A4 A5
A1 A2 A3 A4 A2 A3 A4 A5
Overlap
(replication of attributes)
TRANSPARENCIES
TRANSPARENCIES IN DDBMS
• Transparencies hide implementation
details from the user
• Example in Centralized databases : Data
independence
• Main types of transparencies in DDBMS:
o Distributed Transparency
o Transaction Transparency
SUSHIL KULKARNI
DISTRIBUTED TRANSPARENCY
Allows the user to see the database as a
single, logical entity.
If this transparency is exhibited then the
user does not need to know that
1. The data are partitioned.
2. Data can be replicated at several
sites.
3. Data location.
SUSHIL KULKARNI
EXAMPLE
Staff (staffNo, position, sex, dob, salary,
fName, lName, branchNo)
Vertical fragmentation:
S = Π staffNo, position, sex , dob, salary (Staff)
1
S = Π staffNo, fName, lName , dbranchNo (Staff)
2
SUSHIL KULKARNI
EXAMPLE
Fragment S 2 according to branch number.
Assume that there are only three branches.
Horizontal fragmentation:
S = σ (Staff)
21 branchNo = ' B 003 '
S =σ (Staff)
22 branchNo = ' B 005 '
S =σ (Staff)
23 branchNo = ' B007 '
SUSHIL KULKARNI
EXAMPLE
Assume that :
S 1 and S 2 are at site 5,
S 21 at site 3
S 22 at site 5
S 23 at site 7
SUSHIL KULKARNI
FRAGMENTATION TRANSPARENCY
If it is provided then the user does not need
to know the data is fragmented.
Example:
SELECT fName, lName
FROM Staff
WHERE position = ‘ Manager ’
SUSHIL KULKARNI
LOCATION TRANSPARENCY
If it is provided then the user must know
how the data has been fragmented but still
does not have know the location of the data.
SUSHIL KULKARNI
LOCATION TRANSPARENCY
Example:
SELECT fName, lName
FROM S21
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’)
UNION
SELECT fName, lName
FROM S22
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’)
SUSHIL KULKARNI
LOCATION TRANSPARENCY
Example:
UNION
SELECT fName, lName
FROM S23
WHERE staffNo IN (SELECT staffNO FROM S1 where
position = ‘ Manager ’ )
SUSHIL KULKARNI
LOCAL MAPPING TRANSPARENCY
If it is provided then the user must know
how the data has been fragmented as well
as the location of the data.
SUSHIL KULKARNI
LOCATION TRANSPARENCY
Example:
SELECT fName, lName
FROM S21 AT SITE 3
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 5
where position = ‘ Manager ’)
UNION
SELECT fName, lName
FROM S22 AT SITE 5
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3
where position = ‘ Manager ’)
SUSHIL KULKARNI
LOCATION TRANSPARENCY
Example:
UNION
SELECT fName, lName
FROM S23 AT SITE 7
WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3
where position = ‘ Manager ’ )
SUSHIL KULKARNI
TRANSACTION TRANSPARENCY
It maintains distributed database’s integrity
and consistency.
SUSHIL KULKARNI
QUERY PROCESSING IN DDMS
Issues 1:
Horizontal
Parallel Processing across Fragments fragmentations
=Emp1 U Emp2
Π LName(σ salary>40,000(Employee))
2 Fragments
⇒ Π LName( σ salary>40,000(Emp1)) U Π LName( σ salary>40,000(Emp2))
Site 1 Site 2
Execution in Parallel on fragments
and union results together
QUERY PROCESSING IN DDMS
Site1 Site2 Site3 Joins- symmetric and
associative
(A B) C
Parallel Processing
(σ xx(A)) (B C)
A (B C)
QUERY PROCESSING IN DDMS
Join Strategies
R=Π Fnames, Cnames, Dnames (Employee Department)
Site 3 Site 1 Mg rssn Site 2
100 records, 2000 bytes 10,000 records, to ssn 100 records, 3000
1,000,000 bytes bytes
Strategies:
1,003,000
1)Ship both relations to the result site and join there bytes
transfered
1,002,000
2)Ship employee to 2, join at 2, results to 3 bytes
transfered
3)Ship Department to 1, join at 1, results to 3 5,000 bytes
transfered
⇒ minimize total communication cost of data transfer
THANKS !