Chapter 2 - 9-15DDB Architecture
Chapter 2 - 9-15DDB Architecture
Architecture
Design of DDBMS 1
NetworkTransparancy
• The user should be protected from the
operational details of the network.
• It is desirable to hide even the existence
of the network, if possible.
Location transparency: The command used is
independent of the system on which the data is
stored.
Naming transparency: a unique name is
provided for each object in the database.
Design of DDBMS 2
Replication & Fragmentation Transparency
• The user is unaware of the replication of framents
• Queries are specified on the relations (rather than
the fragments).
Site A
Copy 1 of R1
Copy 1 of R2
Relation R
Fragment R1 Site B
Copy 2 of R1
Fragment R2
Fragment R3
Fragment R4 Site C
Copy 2 of R2
Design of DDBMS 3
ANSI/SPARC Architecture
Conceptual
Conceptual Schema
view
Internal
Internal Schema
view
Internal view: deals with the physical definition and organization of data.
Conceptual view: abstract definition of the database. It is the “real
world” view of the enterprise being modeled in the database.
External view: individual user’s view of the database.
Design of DDBMS 4
A Taxonomy of Distributed Data Systems
A distributed database
can be defined as Distributed data systems
•a logically integrated
collection of shared data Homogeneous
Heterogeneous
(Multidatabase)
which is
• physically distributed
across the nodes of a Federated
Unfederated
(no local users)
computer network.
Local DB 1 Local DB n 6
Design of DDBMS
Fragmentation Schema & Allocation Schema
E
Design of DDBMS 7
Homogeneous vs. Heterogeneous
Global • Homogeneous DDBMS
user
– No local users
Local
user
– Most systems do not have
Local
user
local schemas
Multidatabase
Management • Heterogeneous DDBMS
system
– There are both local and
global users
– Multidatabase systems are
split into:
DBMS DBMS DBMS DBMS
• Tightly Coupled Systems:
have a global schema
Database 1 Database 2 Database 3 Database 4 • Loosely Coupled Systems:
do not have a global
Design of DDBMS schema. 8
Schema Architecture of a Tightly-
Coupled System
Global user Global user An individual node’s
view 1 view n participation in the MDB
is defined by means of a
participation schema.
Global Conceptual Schema
Local user
Local Local
view 1 Local user
Conceptual Conceptual
Schema 1 Schema 1 view 1
Local
user view 1 Local Local Local
Conceptual Conceptual Conceptual
Local schema 1 Schema 2 Schema n
user view 2
Local Local Local
internal internal internal
schema 1 Schema 2 Schema n
Distribution Design
Level of sharing
Access Pattern
1. Static: Access patterns do not change.
2. Dynamic: Access patterns change over
time.
Level of Knowledge
1. No information
2. Partial information: Access patterns may
deviate from the predictions.
3. Complete information: Access patterns
can reasonably be predicted.
Design of DDBMS 19
Fragmentation Alternatives
J JNO JNAME BUDGET LOC
J1 Instrumental 150,000 Montreal
J2 Database Dev. 135,000 New York
J3 CAD/CAM 250,000 New York
J4 Maintenance 350,000 Paris
Reasons:
• Interquery concurrency
• Intraquery concurrency
Disadvantages:
• Vertical fragmentation may incur overhead.
• Attributes participating in a dependency may be
allocated to different sites.
Integrity checking
Design of DDBMSis more costly. 21
Degree of Fragmentation
•Application views are usually subsets of relations.
Hence, it is only natural to consider subsets of
relations as distribution units.
Design of DDBMS 22
Correctness Rules
• Vertical Partitioning
• Lossless Allocation Alternatives
decomposition
•Partitioning: No replication
• Dependency
preservation •Partial Replication: Some
fragments are replicated
• Horizontal Partitioning
•Full Replication: Database
• Disjoint fragments exists in its entirety at
each site
Design of DDBMS 23
Notations
S Title SAL
L1
L2
L3
Example:
P4: TITLE=“Programmer”
P5: SAL ≤ 35,000 A minterm predicate defines
Design of DDBMS a data fragment 26
P6: SAL > 35,000
Primary Horizontal Fragmentation
Set of simple
predicates is
Design of DDBMS incomplete 30
Completeness (2)
Simple Predicates Minterm Fragments Applications
p1
A1 ≥ k1
F1 A1
p1
p3 p3 A2
A2 = k2
F2
A3
A3 ≤ k3
F31 p4
F3 A4
p5
A4 = k4
F32
Additional
simple
A5 > k5 predicate Design of DDBMS 31
Completeness (4)
A set of simple predicate Pr is said to be complete if and only
if there is an equal probability of access by every application
to any two tuples belonging to any minterm fragment that is
defined according to Pr.
J 1 LOC " MONTREAL " ( J ) Case 1: The only application that accesses
J wants to access the tuples according to
J 2 LOC " NewYork " ( J ) the location.
J11
LOC=“Montreal”
J1 J12
BUDGET>200,000
LOC=“Orlando” J22
J3 BUDGET>200,000
BUDGET<=200,000
J31
J32
BUDGET>200,000
Design of DDBMS 34
Redundant Fragmentation
Fragment 1
Logically
uniform &
statistically
homogeneous
fragment Fragment 2
mi = p 1 Λ p2 Λ p 3 fragment fi
mj = p 1 Λ ¬ p2 Λ p3 fragment fj
p2 is relevant if and only if
Access frequency
acc(mi ) acc(m j )
card ( f i ) card ( f j )
Cardinality
That is, there should be at least one application that accesses fi and
fj differently.
i.e., The simple predicate pi should be relevant in determining
a fragmentation.
•Minimal Design of DDBMS 36
If all the predicates of a set Pr are relevant, Pr is minimal.
A Complete and Minimal Example
Two applications:
1. One application accesses the tuples according
to location.
2. Another application accesses only those project
tuples where the budget is less than $200,000.
J11 J121
LOC=“Montreal”
J1 J12 J122
BUDGET>200,000
JNAME! “Instrument”
J LOC=“New York” BUDGET<=200,000
J32
BUDGET>200,000
Relevant Irrelevant
Design of DDBMS 38
Application Information
• Qualification Information
– The fundamental qualification information consists of the
predicates used in user queries (i.e., “where” clauses in SQL).
– 80/20 rule: 20% of user queries account for 80% of the total
data access.
One should investigate the more important queries.
• Quantitative Information
– Minterm Selectivity sel(mi): number of tuples that would be
accessed by a query specified according to a given minterm
predicate.
– Access Freequency acc(qi): the access frequency of queries in
a given period.
Design of DDBMS 43
Benefits of Derived Fragmentation
PAY (TITLE, SAL) Primary Fragmentation:
PAY 1 SAL 30 , 000( PAY )
EMP (ENO, ENAME, TITLE) PAY 2 ( SAL 30 , 000 )( PAY )
Not using derived fragmentation: one can divide EMP into EMP1
and EMP2 based on TITLE and divide PAY into PAY1, PAY2, PAY3
based on SAL. To join EMP and PAY, we have the following
scenarios. PAY1
...
• for 2 k n in that order.
Design of DDBMS 45
Derived Fragmentation
Correctness:
Each attribute of R belongs to at least one
fragment.
Each fragment includes either a key of R or a
“tuple identifier”. Design of DDBMS 48
Vertical Clustering - Replication
In evaluating the convenience of vertical
clustering, it is important that overlapping
attributes are not heavily updated.
Example: EMP(ENUM,NAME,SAL,TAX,MGRNUM,DNUM)
Design of DDBMS 50
Attribute Usage Matrix
PROJ PNO PNAME BUDGET LOC
A1 A2 A3 A4
1 if Aj is referenced by qi
q1: SELECT BUDGET
use(qi,Aj) =
0 otherwise
FROM PROJ
WHERE PNO=Value;
A1 A2 A3 A4
q2: SELECT PNAME, BUDGET
FROM PROJ;
q1 1 0 1 0
q3: SELECT PNAME
q2 0 1 1 0
FROM PROJ
WHERE LOC=Value;
q3 0 1 0 1
q4 0 0 1 1
q4: SELECT SUM(BUDGET)
FROM PROJ Attribute Usage Matrix
WHERE Loc=Value
Design of DDBMS 51
Attribute Affinity Measure
aff ( Ai, Aj )
k , use ( qk , Ai ) 1 use ( qk , Aj ) 1
ref (q ) acc (q )
l
l k l k
For each query qk that uses both Ai and Aj Popularity of such Ai-Aj pair at
Popularity all sites
of using Relation R Site m
Ai and Aj Site n
together
qk
Ai qi
qi qi
Ak
qk
accl (qk )
Refl (qk): Number of accesses to qi
attributes (Ai,Aj) for each
execution of qk at site l
Design of DDBMS 52
Accl (qk): Application access frequency of qk at site l.
Attribute Affinity Matrix
aff ( Ai, Aj )
k , use ( qk , Ai ) 1 use ( qk , Aj ) 1
ref (q ) acc (q )
l
l k l k
A3
Accl (qk): Application access
frequency of qk at A4
Attribute Affinity Matrix
site l.
Design of DDBMS 53
Attribute Affinity Matrix Example
A1 A2 A3 A4 A1 A2 A3 A4
q1 1 0 1 0 A1 45 0 45 0
q2 0 1 1 0 A2 0 80 5 75
q3 0 1 0 1 A3 45 5 53 3
q4 0 0 1 1 A 0 75 3 78
4
Attribute Usage Matrix Attribute Affinity Matrix (AA)
A1 A2 A3 A4 A1 A2 A3 A4
A1 45 0 45 0 A1 45 0
A2 0 80 5 75 A2 0 80
A3 45 5 53 3 A3 45 5
A 0 75 3 78
4 A4 0 75
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)
Design of DDBMS 55
Clustered Affinity Matrix
Step 2: Determine Location for A3
3 possibilities
A1 A2 A3 A4 A1 A2
A1 45 0 45 0 A1 45 0
A2 0 80 5 75 A2 0 80
A3 45 5 53 3 A3 45 5
A 0 75 3 78 A4 0 75
4
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)
Design of DDBMS 56
Clustered Affinity Matrix
Step 2: Determine the order for A3
n
bond ( Ax , Ay ) aff ( Az , Ax ) aff ( Az , Ay )
z 1
cont ( Ai , Ak , A j ) 2 bond ( Ai , Ak ) 2 bond ( Ak , A j ) 2 bond ( Ai , A j )
Note: aff(A0,Ai)=aff(Ai,ADesign
0)=aff(A 5,Ai)=aff(Ai,A5)=0 by definition57
of DDBMS
Clustered Affinity Matrix
Step 2: Determine the order for A4
A1 A2 A3 A4 A1 A3 A2 A4
A1 45 0 45 0 A1 45 45 0 0
A2 0 80 5 75 A2 0 5 80 75
A3 45 5 53 3 A3 45 53 5 3
A 0 75 3 78
4 A 0 3 75 78
4
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)
Design of DDBMS 58
Clustered Affinity Matrix
Step 3: Re-order the Rows
A1 A3 A2 A4 A1 A3 A2 A4
A1 45 45 0 0 A1 45 45 0 0
A2 0 5 80 75 A3 45 53 5 3
A3 45 53 5 3 A2 0 5 80 75
A 0 3 75 78
4 A 0
4
3 75 78
Clustered Affinity Matrix (CA) Clustered Affinity Matrix (CA)
Design of DDBMS 59
Partitioning
Find the sets of attributes A1 A3 A2 A4
that are accessed, for the
most part, by distinct sets
A1 45 45 0 0
of applications. A3 45 53 5 3
We look for dividing points A2 0 5 80 75
along the diagonal such that A 0 3 75 78
4
• Total accesses to only Clustered Affinity Matrix (CA)
one fragment are
maximized, while Cluster 1: A 1 & A3
• Total accesses to more Cluster 2: A 2 & A4
than one fragments are
Two vertical fragments:
minimized. PROJ1(A1, A3) and PROJ2(A2, A4)
Design of DDBMS 60
MIXED FRAGMENTATION
•Apply horizontal fragmentation to vertical fragments.
•Apply vertical fragmentation to horizontal fragments.
Jacksonville
Orlando
Miami
Horizontal
Vertical fragmentation Fragmentation
Design of DDBMS 61
(local work)
i: fragment index ALLOCATION –
j: site index Notations
k: application index
fkj: the frequency of Site j
application k at site j
Fragment i
rki: the number of retrieval
references of application k uki
rki
to fragment i.
uki: the number of update
references of application k Application k
to fragment i. /w freq. fkj
Benefit to
Bij f kj nki Number of
Access by k
Site j k
Frequency of
All applications k
application k
at Site j
Site j
Savings due to Cost of update
retrieval references from
references other sites
(d i ) (1 21 d ) F i
i
(1) 0, (2) F i , (3) 3 F i , β
2 4
Fi
The benefit of introducing a new copy of Ri at site
j:
Bij f kj rki c f kj 'uki (d i )
1 di
k k j ' j
Also takes into
Same as All Beneficial account the benefit
Sites approach Design of DDBMS of replication 65
Allocation of Vertical Fragments
PSr A1 A3 A2 Allocate Rs to site PSs , and
Rt to site PSt
Ri Rs Rt
PSr
Application type
at site PSr , that
A1 A3 A2
accesses only Rs
As At A4 ... An As Rs Rt At
PSt
at PSs
k A2 k A3 4 l n k Al