3distribution Design
3distribution Design
• Introduction
• Background
• Distributed Database Design
• Fragmentation
• Data distribution
• Database Integration
• Semantic Data Control
• Distributed Query Processing
• Multidatabase Query Processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
data
Level of knowledge
data +
complete
program
information
Level of sharing
•Top-down
• mostly in designing systems from scratch
•Bottom-up
• when the databases already exist at a number of
sites
Requirements
Analysis
Objectives
User Input
Conceptual View Integration View Design
Design
Access
GCS Information ES’s
Distribution
Design User Input
LCS’s
Physical
Design
LIS’s
Department of Computer Science &Engineering, National Institute of Technology Karnataka, Surathkal 5 of 65
9-Sep-19
Distribution Design Issues
How to fragment?
How to allocate?
Information requirements?
PROJ1 PROJ2
PROJ1 PROJ2
PNO BUDGET PNO PNAME LOC
tuples relations
or
attributes
• Completeness
• Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be found
in some Ri
• Reconstruction
• If relation R is decomposed into fragments R1, R2, ..., Rn, then
there should exist some relational operator ∇ such that
R = ∇1≤i≤nRi
• Disjointness
• If relation R is decomposed into fragments R1, R2, ..., Rn, and
data item di is in Rj, then di should not be in any other
fragment Rk (k ≠ j ).
•Non-replicated
• partitioned : each fragment resides at only one
site
•Replicated
• fully replicated : each fragment at each site
• partially replicated : each fragment at some of
the sites
•Rule of thumb:
If read-only queries << 1, replication is advantageous,
update queries
otherwise replication may cause problems
CONCURRENCY
Moderate Difficult Easy
CONTROL
Possible Possible
REALITY Realistic
application application
Department of Computer Science &Engineering, National Institute of Technology Karnataka, Surathkal 13 of 65
9-Sep-19
Information Requirements
•Four categories:
• Database information
• Application information
• Communication network information
• Computer system information
SKILL
TITLE, SAL
L1
EMP PROJ
ENO, ENAME, TITLE PNO, PNAME, BUDGET,
LOC
ASG
ENO, PNO, RESP, DUR
Example
m1: PNAME="Maintenance" BUDGET≤200000
Definition :
Rj = Fj(R), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a minterm
predicate.
Therefore,
A horizontal fragment Ri of relation R consists of all the tuples of R
which satisfy a minterm predicate mi.
Given a set of minterm predicates M, there are as many horizontal
fragments of relation R as there are minterm predicates.
Set of horizontal fragments also referred to as minterm fragments.
Preliminaries :
• Pr should be complete
• Pr should be minimal
• Example :
• Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.
• Find the budgets of projects at each location. (1)
• Find projects with budgets less than $200000. (2)
According to (1),
Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}
which is not complete with respect to (2).
Modify
Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}
which is complete.
acc(mi ) acc(m j )
=
card( fi ) card( f j )
Example :
Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}
Initialization :
• find a pi Pr such that pi partitions R according to Rule
1
• set Pr' = pi ; Pr Pr – {pi} ; F {fi}
Iteratively add predicates to Pr' until it is complete
• find a pj Pr such that pj partitions some fk defined
according to minterm predicate over Pr' according to
Rule 1
• set Pr' = Pr' {pi}; Pr Pr – {pi}; F F {fi}
• if pk Pr' which is nonrelevant then
Pr' Pr – {pi}
F F – {fi}
PROJ1 PROJ2
Database
P1 Instrumentation 150000 Montreal P2 135000 New York
Develop.
PROJ4 PROJ6
• Completeness
• Since Pr' is complete and minimal, the selection predicates are
complete
• Reconstruction
• If relation R is fragmented into FR = {R1,R2,…,Rr}
R = Ri FR Ri
• Disjointness
• Minterm predicates that form the basis of fragmentation
should be mutually exclusive.
SKILL
TITLE, SAL
L1
EMP PROJ
L2 L3
ASG
EMP1 EMP2
ENO ENAME TITLE ENO ENAME TITLE
• Overlapping fragments
• grouping
• Non-overlapping fragments
• splitting
We do not consider the replicated key attributes to be
overlapping.
Advantage:
Easier to enforce functional dependencies
(for integrity checking etc.)
A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0
q3 0 1 0 1
q4 0 0 1 1
access
query access = access frequency of a query
execution
all sites
q
4 3 0 0
Then
aff(A1, A3) = 15*1 + 20*1+10*1 A1 A2 A3 A4
= 45 A1 45 0 45 0
and the attribute affinity matrix AA is A2 0 80 5 75
A3 45 5 53 3
A4 0 75 3 78
i j
where
n
bond(Ax,Ay) = aff(Az,Ax)aff(Az,Ay)
z =1
Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
Department of Computer Science &Engineering, National Institute of Technology Karnataka, Surathkal 48 of 65
9-Sep-19
BEA – Example
• Therefore, the CA matrix has the form A1 A3 A2
45 45 0
0 5 80
45 53 5
0 3 75
A1
A2
A3 … Ai Ai+1 . . .Am
A1 A2 TA
Ai
Ai+1
BA
Am
Define
TQ = set of applications that access only TA
BQ = set of applications that access only BA
OQ = set of applications that access both TA and BA
and
CTQ = total number of accesses to attributes by applications
that access only TA
CBQ = total number of accesses to attributes by applications
that access only BA
COQ = total number of accesses to attributes by applications
that access both TA and BA
Then find the point along the diagonal that maximizes
CTQCBQCOQ2
• Disjointness
• TID's are not considered to be overlapping since they are
maintained by the system
• Duplicated keys are not considered to be overlapping
HF HF
R1 R2
VF VF VF VF VF
Decision Variable
• Total Cost
• Access cost
min
all sites
(cost of retrieval command
• Constraints
• Response Time
execution time of query ≤ max. allowable response time for that
query
• Storage Constraint (for a site)
• Solution Methods
• FAP is NP-complete
• DAP also NP-complete
• Heuristics based on
• single commodity warehouse location (for FAP)
• knapsack problem
• branch and bound techniques
• network flow