0% found this document useful (0 votes)
15 views

UNIT-1-Distributed Database

A distributed database system consists of multiple interconnected sites where users can access data stored at any location, with each site managing its own local database and users. The system aims to achieve twelve objectives, including local autonomy, continuous operation, and replication independence, while addressing challenges such as query processing and update propagation. Key features include location independence, fragmentation independence, and the ability to operate across various hardware and software platforms.

Uploaded by

ompatel4624
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

UNIT-1-Distributed Database

A distributed database system consists of multiple interconnected sites where users can access data stored at any location, with each site managing its own local database and users. The system aims to achieve twelve objectives, including local autonomy, continuous operation, and replication independence, while addressing challenges such as query processing and update propagation. Key features include location independence, fragmentation independence, and the ability to operate across various hardware and software platforms.

Uploaded by

ompatel4624
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Distributed Database

16-1
Introduction

16-2
Distributed Database System
▪ A system involving multiple sites connected together via communication
network.
▪ User at any site can access data stored at any site.
▪ Each site is a database system in its own right: its own local database,
local users, local DBMS, local DC manager.
Communi
User cation
manager
DBM
S
datab
ase

Communication
Network

Fig 16.1: A typical distributed database system


Wei-Pang Yang, Information Management, NDHU 16-3
The Twelve Objectives

16-4
The Twelve Objectives
1. Local Autonomy
• all operations at a given site are controlled by that site, should not
depend on other sites.
• local data is locally owned and managed.
• Not wholly achievable => sites should be autonomous to the
maximum extend possible.
2. No Reliance on a Central Site
• all sites must be treated as equals.
• the central site may be bottleneck.
3. Continuous Operation
• Reliability
• Availability
• Never require the system to be shutdown to perform some function:
e.g. add a new site.
Wei-Pang Yang, Information Management, NDHU 16-5
The Twelve Objectives (cont.)
4. Location Independence ( Location Transparency )
• user should not need to know at which site the data is stored, but should be
able to behave as if the entire database were stored at their own local site.
• a request for some remote data => system should find the data
automatically.
C
• Advantages A
<1> Simplify user programs and activities
<e.g.> SELECT S# B
FROM S
AT SITE A
WHERE SNAME = 'John'

<2> allow data to be moved from one site to another at any time without
invalidating any program or activities.

Wei-Pang Yang, Information Management, NDHU 16-6


The Twelve Objectives (cont.)
5. Fragmentation Independence ( Fragmentation Transparency )
• Data Fragmentation
• a given local object can be divided up into pieces (fragments) for
physical storage purpose.
<e.g.> user perception
EMP EMP# DEPT# SALARY
E1 DX 45K
E2 DY 40K
E3 DZ 50K
E4 DY 63K
E5 DZ 40K

EMP# DEPT# SALARY EMP# DEPT# SALARY London


New York E1 DX 45K E2 DY 40K fragment
fragment E3 DZ 50K E4 DY 63K
E5 DZ 40K

physical storage physical storage


New York London Fig. 16.2: An example
of fragmentation.
Wei-Pang Yang, Information Management, NDHU 16-7
The Twelve Objectives (cont.)
• Data Fragmentation
• A fragmentation can be any subrelation derivable via restriction and
projection (with primary key).
• Advantage: data can stored at the location where it is most frequently
used.
• Fragmentation independence
• user should be able to behave as if the relations were not fragmented
at all.
• one reasons why relational technology is suitable for DBMS.
• user should be presented with a view of data.
=> system must support updates against join and union views.
• Advantages
(1) simplify user program and activity.
(2) allow data to be re-fragmented at any time.

Wei-Pang Yang, Information Management, NDHU 16-8


The Twelve Objectives (cont.)
6. Replication Independence ( Replication Transparency )
• Data Replication
USER PERCEPTION
New York fragment London fragment
EMP EMP# DEPT# SALARY EMP# DEPT# SALARY EMP# DEPT# SALARY

E1 DX E1 DX 45K E2 DY 40K
45K
E2 DY E3 DZ 50K E4 DY 63K
40K
E3 DZ 50K E5 DZ 40K
E4 DY 63K copy
E5 DZ 40K replica of London fragment replica of New York fragment
EMP# DEPT# SALARY EMP# DEPT# SALARY
E2 DY 40K E1 DX 45K
E4 DY 63K E3 DZ 50K
E5 DZ 40K

physical storage physical storage


New York London

Fig. 16.3: An example of replication.

Wei-Pang Yang, Information Management, NDHU 16-9


The Twelve Objectives (cont.)
• Data Replication
• A given fragment of relation can be represented at the physical level
by many distinct copies of the same object at many distinct sites.
• Unit of replication: fragment (may not a complete relation)
• Advantage: better performance and availability
• Disadvantage: update propagation problem.
• Replication Independence

• User should be able to behave as if the data is not replicated at all.


• Advantages
(1) simplify user programs and activities.
(2) allow replicas to be created and destroyed dynamically.

Wei-Pang Yang, Information Management, NDHU 16-10


The Twelve Objectives (cont.)
7. Distributed Query Processing
• To execute single query at different location, does not able to
satisfy transparent request. So, query optimization is crucial and
performed transparently by DDBMS.
8. Distributed Transaction Management
• Transaction is able to update data at different sites transparently,
but control of recovery and concurrency is achieved by using
agents.
9. Hardware Independence: It should be possible for DDBMS
to run on different hardware platforms. Like IBM, DEC, HP,
PC, ...

Wei-Pang Yang, Information Management, NDHU 16-11


The Twelve Objectives (cont.)
10. Operating System Independence: It should be possible for
DDBMS to run on different Operating system platforms. Like
VMS, UNIX, ...
11. Network Independence: The DDBMS system is able to run
on any network platform. Like BITNET, INTERNET,
ARPANET, ...
12. DBMS Independence: Relational, hierarchical, network, ...
• The system must support any vendor of the database product.
• distributed system may be heterogeneous.

Wei-Pang Yang, Information Management, NDHU 16-12


Problems of Distributed Database
Systems

16-13
Basic Point: Network are slow !
Basic point: network are slow !

Overriding Objective : minimize the number and


volume of messages.

Give rise to the following problem


• Query Processing
• Update Propagation
• Concurrency
• Recovery
• Catalog Management
..
.
Wei-Pang Yang, Information Management, NDHU 16-14
Query Processing: Example
• Query Optimization is more important in a distributed
system.
• Example (Date, Vol.2 p.303)
S ( S#, CITY )
• Database: 10,000 tuples, stored at site A.
P ( P#, COLOR) 100,000 tuples, stored at site B.
SP ( S#, P# ) 1,000,000 tuples, stored at site A.

Assume each tuple is 100 bits long.

Site A: S SP Site B: P

Wei-Pang Yang, Information Management, NDHU 16-15


Query Processing: Example (cont.)
• Query: "Select S# for London suppliers of Red Parts"
SELECT S.S# site A site B
FROM S, P, SP S, SP P
WHERE S.CITY = "London"
AND S.S# = SP.S# S SP
AND SP.P# = P.P#
AND P.COLOR = 'Red'
• Estimates
# of Red Parts = 10
# of Shipments by London Supplier = 100,000
• Communication Assumption :
Data Rate = 10,000 bits per second
Access Delay = 1 second
• T[i] = total communication time for strategy i
= total access delay + total data volume / data rate
= (# of messages * 1 sec) + (total # of bits / 10,000 ) sec.

Wei-Pang Yang, Information Management, NDHU 16-16


Query Processing: Example (cont.)
site A site B
• Strategy 1 S, SP P
1. Join S and SP at site A
2. Select tuples from ( S SP ) for which city is 'London'
( 100,000 tuples )
3. For each of those tuple, check site B to see if the part is
red. (2 messages: 1 query, 1 response)
T[1] = ( 100,000 * 2 ) * 1 = 2.3 days
• Strategy 2
Move relations S and SP to site B and process the query at B.
T[2] = 2+(10,000+1,000,000)*100/10,000 = 28 hours
• Strategy 3
Move relation P to site A and process the query at A
T[3] =1+(100,000*100) /10,000 = 16.7 min

Wei-Pang Yang, Information Management, NDHU 16-17


Query Processing: Example (cont.)
• Strategy 4
1. Select tuples from P where color is red. (10 tuples)
2. Check site A to see if there exists a shipment relating the part to
a London Supplier. ( 2*10 messages )
T[4] = 2*10*1 = 20 sec site A site B
S, SP P
• Strategy 5
1. Select tuples from P where color is red (10 tuples)
2. Move the result to site A and complete the processing at A.
T[5] = 1 + ( 10*100) / 100,000 = 1.01 sec

• Note: Each of the five strategies represents a plausible


solution, but the variation in communication time is
enormous.
Wei-Pang Yang, Information Management, NDHU 16-18
Query Processing: Semijoin
• Semijoin: (used in SDD - 1) Ref. p.529 [18.15]
A B p.626 [21.26]

site A site B
<e.g.> S SP
• Database :
S: 1,000 tuples, at site A S'
SP' S#
SP: 2,000 tuples, at site B
# of tuples in S where S.S#=SP.S#: 100,
length of a S tuple: 100 bit
length of a SP tuple: 100 bit
length of the S# field: 10 bit

• Regular Join:
<1> Ship S to site B ( 1000 * 100 bits )
<2> Join S and SP at site B
communication time = 1 + 1000*100/10000 = 11 sec
Wei-Pang Yang, Information Management, NDHU 16-19
Query Processing: Semijoin (cont.)
• Semijoin
<1> site B: step 1. Project SP on S# (get SP')
site A site B step 2. ship to site A
S SP <2> site A: step 3. Join the projection of SP' on S# with S
step 4. The result S‘, ship to site B
S' <3> site B: step 5. Join S' with SP
SP' S#
communication time = 1+10*2000/10000+1+100*100/10000
= 1+2+1+1= 5 sec

Site A Site B
S SP SP'
S' S1
S# S# P#
Join S4
# = 100 # = 1,000 #=2,000
S' ... # =< 2,000
S921
100 bits
100 bits
10 bits

Wei-Pang Yang, Information Management, NDHU 16-20


Update Propagation
▪ Basic problem with data replication
• An update to any given logical data object must be propagated to all
stored copies of that object.
• some sites may be unavailable (because of site or network failure) at the
time of the update
=> Data is less available !
▪ A possible Solution: Primary Copy (used in distributed INGRES)
• one copy of each object is designated as the primary copy.
• primary copies of different objects will generally be at different sites.
• Update Operation
1. Complete as soon as the primary copy has been updated.
2. Control is returned and the transaction can continue execute.
3. The site holding the primary copy broadcasts the update to all other sites.
• Further Problem: violation of the local autonomy objective.
Wei-Pang Yang, Information Management, NDHU 16-21

You might also like