0% found this document useful (0 votes)
19 views67 pages

Chapter 2 - 9-15DDB Architecture

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views67 pages

Chapter 2 - 9-15DDB Architecture

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 67

Distributed DBMS

Architecture

Design of DDBMS 1
NetworkTransparancy
• The user should be protected from the
operational details of the network.
• It is desirable to hide even the existence
of the network, if possible.
 Location transparency: The command used is
independent of the system on which the data is
stored.
 Naming transparency: a unique name is
provided for each object in the database.

Design of DDBMS 2
Replication & Fragmentation Transparency
• The user is unaware of the replication of framents
• Queries are specified on the relations (rather than
the fragments).
Site A
Copy 1 of R1

Copy 1 of R2
Relation R
Fragment R1 Site B
Copy 2 of R1
Fragment R2

Fragment R3

Fragment R4 Site C
Copy 2 of R2

Design of DDBMS 3
ANSI/SPARC Architecture

External Schema External External External


view view view

Conceptual
Conceptual Schema
view

Internal
Internal Schema
view

Internal view: deals with the physical definition and organization of data.
Conceptual view: abstract definition of the database. It is the “real
world” view of the enterprise being modeled in the database.
External view: individual user’s view of the database.
Design of DDBMS 4
A Taxonomy of Distributed Data Systems

A distributed database
can be defined as Distributed data systems

•a logically integrated
collection of shared data Homogeneous
Heterogeneous
(Multidatabase)
which is
• physically distributed
across the nodes of a Federated
Unfederated
(no local users)
computer network.

Loosely coupled Tightly coupled


(interoperable DB (/w global schema)
systems using
export schema)
Design of DDBMS 5
Architecture of a Homogeneous DDBMS

Global user Global user


view 1 view n A homogeneous
Global Schema DDBMS resembles
a centralized DB,
Fragmentation
Schema but instead of
Allocation
Schema
storing all the
data at one site,
Local Local the data is
conceptual conceptual
schema 1 schema n distributed across
Local Local a number of sites
internal internal
schema 1 schema n in a network.

Local DB 1 Local DB n 6
Design of DDBMS
Fragmentation Schema & Allocation Schema

Fragmentation Schema: describes how the global


relations are divided into fragments.

Allocation Schema: specifies at which sites each


fragment is stored.

Example: Fragmentation of global relation R.

A B To materialize R, the following


operations are required:
C D R = (A B) U ( C D) U E

E
Design of DDBMS 7
Homogeneous vs. Heterogeneous
Global • Homogeneous DDBMS
user
– No local users
Local
user
– Most systems do not have
Local
user
local schemas
Multidatabase
Management • Heterogeneous DDBMS
system
– There are both local and
global users
– Multidatabase systems are
split into:
DBMS DBMS DBMS DBMS
• Tightly Coupled Systems:
have a global schema
Database 1 Database 2 Database 3 Database 4 • Loosely Coupled Systems:
do not have a global
Design of DDBMS schema. 8
Schema Architecture of a Tightly-
Coupled System
Global user Global user An individual node’s
view 1 view n participation in the MDB
is defined by means of a
participation schema.
Global Conceptual Schema

Auxiliary Local Local Auxiliary


Schema 1 Participation Participation Schema 1
Schema 1 Schema 1

Local user
Local Local
view 1 Local user
Conceptual Conceptual
Schema 1 Schema 1 view 1

Local user Local Local


view 2 Internal Internal Local user
Schema 1 Schema 1 view 2

Local DB 1 Design of DDBMS


Local DB 1 9
Auxiliary Schema (1)
Auxiliary schema describes the rules which
govern the mappings between the local and
global levels.
 Rules for unit conversion: may be required when
one site expresses distance in kilometers and
another in miles, …
 Rules for handling null values: may be necessary
where one site stores additional information which
is not stored at another site.
– Example: One site stores the name, home address and
telephone number of its employees, whereas another just
stores names and addresses.
Design of DDBMS 10
Auxiliary Schema (2)
 Rules for naming conflicts: naming conflicts occur when:
 semantically identical data items are named differently
• DNAME  Department name (at Site 1)
• DEPTNAME  Department name (at Site 2)
 semantically different data items are named identically.
• NAME  Department name (at Site 1)
• NAME  Manager name (at Site 2)

 Rules for handling data representation conflicts: Such


conflicts occur when semantically identical data items
are represented differently in different data source.
 Example: Data represented as a character string in one
database may be represented as a real number in the other
database. Design of DDBMS 11
Auxiliary Schema (3)
 Rules for handling data scaling conflicts: Such
conflicts occur when semantically identical
data items stored in different databases using
different units of measure.
 Example: “Large”, “New”, “Good”, etc.

These problems are called


domain mismatch problems
Design of DDBMS 12
Loosely-Coupled Systems
(Interoperable Database Systems)

Global Global Global


user view 1 user view 2 user view 3

Local
user view 1 Local Local Local
Conceptual Conceptual Conceptual
Local schema 1 Schema 2 Schema n
user view 2
Local Local Local
internal internal internal
schema 1 Schema 2 Schema n

Local DB 1 Local DB 2 Local DB n


Design of DDBMS 13
Loosely-Coupled Systems
Global Global Global
user view 1 user view 2 user view m

Export Export Export Export


schema 1 schema 2 Schema 3 Schema n
Local
user view 1 Local Local Local
Conceptual Conceptual Conceptual
Local schema 1 Schema 2 Schema n
user view 2
Local Local Local
internal internal internal
schema 1 Schema 2 Schema n

Local DB 1 Design of DDBMS


Local DB 2 Local DB14n
Integration of Heterogeneous Data Models
• Provide bidirectional translators between all
pairs of models
– Advantage: no need to learn another data model and
language
– Disadvantage: requires n(n-1) translators, where n
is the number of different models.
• Adopt a single model (called canonical model) at
the global level and map all the local models
onto this model
– Advantage: requires only 2n translators
– Disadvantage: translations must go through the
global model.
(The 2nd approach is more widely used)
Design of DDBMS 15
Distributed Database Design

•Top-Down Approach: The database system is


being designed from scratch.

• Issues: fragmentation & allocation

•Bottom-up Approach: Integrating existing


databases into one database

• Issues: Design of the export and global


schemas.
Design of DDBMS 16
TOP-DOWN DESIGN PROCESS
Requirements Analysis
Entity analysis +
functional System Requirements Defining the
analysis interfaces
(Objectives)
for end users
Conceptual View integration
design View Design

Global Access External Schema


conceptual information Definitions
schema

Distribution Design

Local Conceptual Schemas Fragmentatio


n & allocation
Maps the local
conceptual
schemas to Physical Design
physical storage
devices
Physical Schema Design of DDBMS 17
Design Consideration (1)

The organization of distributed systems can be


investigated along three dimensions:

Level of sharing

1. No sharing: Each application and its data


execute at one site.
2. Data sharing: Programs are replicated at all
sites, but data files are not.
3. Data + Program Sharing: Both data and
programs may be shared.
Design of DDBMS 18
Design Consideration (2)

Access Pattern
1. Static: Access patterns do not change.
2. Dynamic: Access patterns change over
time.
Level of Knowledge
1. No information
2. Partial information: Access patterns may
deviate from the predictions.
3. Complete information: Access patterns
can reasonably be predicted.
Design of DDBMS 19
Fragmentation Alternatives
J JNO JNAME BUDGET LOC
J1 Instrumental 150,000 Montreal
J2 Database Dev. 135,000 New York
J3 CAD/CAM 250,000 New York
J4 Maintenance 350,000 Paris

Horizontal Partitioning Vertical Partitioning


J1 JNO JNAME BUDGET LOC JNO BUDGET
J1 150,000
J1 Instrumental 150,000 Montreal
J2 135,000
J2 Database Dev. 135,000 New York J3 250,000
J4 310,000

J2 JNO JNAME BUDGET LOC


JNO JNAME LOC
J1 Instrumentation Montreal
J3 CAD/CAM 150,000 Montreal
J2 Database Devl New York
J4 Maintenance. 310,000 Paris
J3 CAD/CAM New York
J4 Maintenance Paris
Design of DDBMS 20
Why fragment at all?

Reasons:
• Interquery concurrency
• Intraquery concurrency

Disadvantages:
• Vertical fragmentation may incur overhead.
• Attributes participating in a dependency may be
allocated to different sites.
 Integrity checking
Design of DDBMSis more costly. 21
Degree of Fragmentation
•Application views are usually subsets of relations.
Hence, it is only natural to consider subsets of
relations as distribution units.

•The appropriate degree of fragmentation is dependent


on the applications.

Design of DDBMS 22
Correctness Rules

• Vertical Partitioning
• Lossless Allocation Alternatives
decomposition
•Partitioning: No replication
• Dependency
preservation •Partial Replication: Some
fragments are replicated
• Horizontal Partitioning
•Full Replication: Database
• Disjoint fragments exists in its entirety at
each site

Design of DDBMS 23
Notations
S Title SAL

L1

E ENO ENAME TITLE J JNO JNAME BUDGET LOC

L2
L3

G ENO JNO RESP DUR

L1: 1-to-many relationship


S: Owner(L1), Source relation
E: Member(L1), Target relation
Design of DDBMS 24
Simple Predicates
Given a relation R(A1, A2, …, An) where Ai has domain Di, a simple
predicate pj defined on R has the form
pj : A i Value
where 
  {, , , , , } and Value  Di

Example:

J JNO JNAME BUDGET LOC


J1 Instrumental 150,000 Montreal
J2 Database Dev. 135,000 New York
J3 CAD/CAM 250,000 New York
J4 Maintenance 350,000 Orlando

Simple predicates: p1: JNAME = “Maintenance”


P2: BUDGET < 200,000

Note: A simple predicate defines a data fragment


Design of DDBMS 25
MINTERM PREDICATE
Given a set of simple predicates for relation R.
P = {p1, p2, …, pm}
The set of minterm predicates
M = {m1, m2, …, mn} TITLE SAL
Elect. Eng. 40,000
is defined as Syst. Analy. 54,000
 p *j
M = {mi | mi = p j P } Mech. Eng. 32,000
p *j  p j or p *j  p j
where Programmer 42,000
Possible simple predicates: Some corresponding
P1: TITLE=“Elect. Eng.” minterm predicates:
P2: TITLE=“Syst. Analy” m : TITLE  " Elect.Eng ."  SAL  30,000
1

P3: TITLE=“Mech. Eng.” m : TITLE  " Elect.Eng "  SAL  30,000


2

P4: TITLE=“Programmer”
P5: SAL ≤ 35,000 A minterm predicate defines
Design of DDBMS a data fragment 26
P6: SAL > 35,000
Primary Horizontal Fragmentation

A primary horizontal fragmentation is defined by a selection


operation on the owner relations of a database schema.

E ENO ENAME TITLE J JNO JNAME BUDGET LOC


L2
L3

G ENO JNO RESP DUR Owner(L3) = J

A possible fragmentation of J is defined as follows:


J1   BUDGET  200, 000 ( J )
J 2   BUDGET  200, 000 ( J )
Design of DDBMS 27
Horizontal Fragments

Thus, a horizontal fragment Ri of relation R


consists of all the tuples of R that satisfy a
minterm predicate mi.

There are as many horizontal fragments


(also called minterm fragments) as there are
minterm predicates.
Design of DDBMS 28
Completeness (1)
A set of simple predicate Pr is said to be complete if and only
if there is an equal probability of access by every application
to any two tuples belonging to any minterm fragment that is
defined according to Pr.
Simple Predicates Minterm Fragments Applications
p1
A1 ≥ k1
F1 A1
p1
p3 p3 A2
A2 = k2
F2
A3
A3 ≤ k3
F3
A4
TheA fragments
=k look homogeneous
Design of DDBMS 29
4 4
Completeness (2)
Simple Predicates Minterm Fragments Applications
p1
A1 ≥ k1
F1 A1
p1
p3 p3 A2
A2 = k2
F2
A3
A3 ≤ k3
p4
F3 A4
p5
A4 = k4

Set of simple
predicates is
Design of DDBMS incomplete 30
Completeness (2)
Simple Predicates Minterm Fragments Applications
p1
A1 ≥ k1
F1 A1
p1
p3 p3 A2
A2 = k2
F2
A3
A3 ≤ k3
F31 p4
F3 A4
p5
A4 = k4
F32
Additional
simple
A5 > k5 predicate Design of DDBMS 31
Completeness (4)
A set of simple predicate Pr is said to be complete if and only
if there is an equal probability of access by every application
to any two tuples belonging to any minterm fragment that is
defined according to Pr.

J 1  LOC  " MONTREAL " ( J ) Case 1: The only application that accesses
J wants to access the tuples according to
J 2  LOC  " NewYork " ( J ) the location.

J 3  LOC  " Orlando " ( J ) The set of simple predicates


LOC=“Montreal” LOC=“Montreal”,
J1 Pr = LOC=“New York”,
LOC=“New York” LOC=“Orlando”
J J2
is complete because each tuple of each
fragment has the same probability of
J3
LOC=“Orlando” being
Design ofaccessed.
DDBMS 32
Completeness (5)
Example:
J1 JNO JNAME BUDGET LOC
Note: Completeness is a
001 Instrumental 150,000 Montreal
desirable property because a
complete set defines
J2 fragments that are not only
JNO JNAME BUDGET LOC
004 GUI 135,000 New York logically uniform in that they
007 CAD/CAM 250,000 New York all satisfy the minterm
predicate, but statistically
J3 JNO JNAME BUDGET LOC homogeneous.
003 Database Dev. 310,000 Orlando

Case 2: There is a second application which accesses only those


project tuples where the budget is less than $200,000.
 Since tuple “004” is accessed more frequently than tuple
“007”, Pr is not complete.
 To make the the set complete, we need to add
to Prof
(BUDGET< 200,000) Design . DDBMS 33
Completeness (6)
BUDGET<=200,000

J11
LOC=“Montreal”

J1 J12
BUDGET>200,000

J LOC=“New York” BUDGET<=200,000

J2 J21 Small-budget applications

LOC=“Orlando” J22
J3 BUDGET>200,000

BUDGET<=200,000

J31

J32
BUDGET>200,000
Design of DDBMS 34
Redundant Fragmentation

Fragment 1
Logically
uniform &
statistically
homogeneous
fragment Fragment 2

• Fragments 1 and 2 have the same


characteristics
• The fragmentation is unnecessary
Design of DDBMS 35
Minimality
Relevant:
Let mi and mj be two almost identical minterm predicates:

mi = p 1 Λ p2 Λ p 3 fragment fi
mj = p 1 Λ ¬ p2 Λ p3 fragment fj
p2 is relevant if and only if
Access frequency
acc(mi ) acc(m j )

card ( f i ) card ( f j )
Cardinality

That is, there should be at least one application that accesses fi and
fj differently.
i.e., The simple predicate pi should be relevant in determining
a fragmentation.
•Minimal Design of DDBMS 36
If all the predicates of a set Pr are relevant, Pr is minimal.
A Complete and Minimal Example
Two applications:
1. One application accesses the tuples according
to location.
2. Another application accesses only those project
tuples where the budget is less than $200,000.

Case 1: Pr={Loc=“Montreal”, Loc=“New York”, Loc=“Orlando”,


BUDGET<=200,000,BUDGET>200,000} is
complete and minimal.

Case 2: If, however, we were to add the predicate


JNAME= “Instrumentation” to Pr, the resulting
set would not be minimal since the new predicate
is not relevant with
Design respect to the applications. 37
of DDBMS
BUDGET<=200,000 JNAME = “Instrument”

J11 J121
LOC=“Montreal”

J1 J12 J122
BUDGET>200,000
JNAME!  “Instrument”
J LOC=“New York” BUDGET<=200,000

J2 J21 acc(m121) acc(m122) acc(m12)


 
card ( f ) card ( f ) card ( f )
121 122 12
LOC=“Orlando” J22
J3 BUDGET>200,000
[ JNAME = “Instrument” ]
BUDGET<=200,000
is not relevant.
J31

J32
BUDGET>200,000

Relevant Irrelevant
Design of DDBMS 38
Application Information
• Qualification Information
– The fundamental qualification information consists of the
predicates used in user queries (i.e., “where” clauses in SQL).
– 80/20 rule: 20% of user queries account for 80% of the total
data access.
 One should investigate the more important queries.
• Quantitative Information
– Minterm Selectivity sel(mi): number of tuples that would be
accessed by a query specified according to a given minterm
predicate.
– Access Freequency acc(qi): the access frequency of queries in
a given period.

Qualitative information guides the fragmentation


activity. Design of DDBMS 39
Determine the set of meaningful minterm predicates
Applications:
• Take the salary and determine a raise accordingly.
• The employee records are managed in two places, one handling the
records of those with salary less than or equal to $30,000 and the other
handling the records of those who earn more than $30,000.

Pr={p1: SAL<=30,000, p2: SAL>30,000} is complete and minimal.

The minterm predicates:


m1 : ( SAL  30,000)  ( SAL  30,000)
m 2 : ( SAL  30,000)  ( SAL  30,000) i1  m1 is contradictory
m3 : ( SAL  30,000)  ( SAL  30,000)
m 4 : ( SAL  30,000)  ( SAL  30,000)
i 2  m 4 is contradictory

Implications: Therefore, we are left with


i1 : ( SAL  30,000)  ( SAL  30,000)
M = {m2, m3}
i 2 : ( SAL  30,000)  ( SAL  30,000)
i 3 : ( SAL  30,000)  ( SAL  30,000)
i 4 : ( SAL  30,000)  ( SAL  30,000Design
) of DDBMS 40
Invalid Implications

J JNO JNAME BUDGET LOC


J1 Instrumental 150,000 Montreal
J2 Database Dev. 135,000 New York
J3 CAD/CAM 250,000 New York
J4 Maintenance 350,000 Orlando

Simple predicates VALID Implications INVALID Implications


p1: LOC = “Montreal” i 1 : p 1  p 2  p 3 i 8 : LOC " Montreal "  ( BUDGET  200,000)
p2: LOC = “New York” i 9 : LOC " Orlando"  ( BUDGET  200,000)
i 2 : p 2  p 1  p 3
p3: LOC = “Orlando”
i 3 : p 3  p 1  p 2
p4: BUDGET ≤ 200,000 Implications should be
i 4 : p 4  p 5
p5: BUDGET > 200,000 defined according to the
i 5 : p 5  p 4 semantics of the database,
i 6 : p 4  p 5 not according to the
i 7 : p 5  p 4 current values.
Design of DDBMS 41
Compute Complete & Minimal Set
Rule: a relation or fragment is partitioned “into at least two parts which are
accessed differently by at least one application.
Relevant: a simple predicate which satisfies the above rule, is relevant.

• Repeat until the predicate set is complete


– Find a simple predicate pi that is relevant
– Determine minterm fragments fi and fj according to pi
– Accept pi , fi , and fj
– Remove any pk and fk from acceptance list if pk becomes
nonrelevant /* the list is minimal */
• Determine the set of minterm predicates M (using the
acceptance list)
• Determine the set of implications I (among the
acceptance list)
• For each mi in M, remove mi if it is contradictory
according to I Design of DDBMS 42
Derived Horizontal Fragmentation

Derived fragmentation is used to facilitate the


join between fragments.

In some cases, the horizontal fragmentation of a


relation cannot be based on a property of its own
attributes, but is derived from the horizontal
fragmentation of another relation.

Design of DDBMS 43
Benefits of Derived Fragmentation
PAY (TITLE, SAL) Primary Fragmentation:
PAY 1  SAL  30 , 000( PAY )
EMP (ENO, ENAME, TITLE) PAY 2   ( SAL  30 , 000 )( PAY )

EMP1 = EMP SJ PAY1


Using Derived Fragmentation:
EMP2 = EMP SJ PAY2
EMP1 PAY1 EMPi and PAYi can be allocated
EMP2 PAY2 to the same site.

Not using derived fragmentation: one can divide EMP into EMP1
and EMP2 based on TITLE and divide PAY into PAY1, PAY2, PAY3
based on SAL. To join EMP and PAY, we have the following
scenarios. PAY1

EMP1 PAY2 More communication overhead !


EMP2 PAY3
Design of DDBMS 44
Chain Relationships

• Design the primary


fragmenation for R1.

R1 (R!PK, …) • Derive the derived


fragmentation for Rk as
R2 (R2PK, R1FK, …) follows:

R3 (R3PK, R2FK, …) • Rk = Rk SJ R(k-1)


RKFK=R(k-1)PK

...
• for 2  k  n in that order.

Design of DDBMS 45
Derived Fragmentation

EMP (ENO, ENAME, TITLE) PROJ (PNO, PNAME, BUDGET)

EMP_PROJ (ENO, PNO, RESP, DUR)

• How do we fragment EMP_PROJ ?


– Semi-Join with EMP, or
– Semi-Join with PROJ
• Criterion: Suport the more-frequent join
operation.
Design of DDBMS 46
VERTICAL FRAGMENTATION
Purpose: Identify fragments Ri such that
many applications can be executed using
just one fragment. A7 A1

Advantage: When many applications which


use R1 and many applications which use R2 R2 R1
are issued at different sites, fragmenting
R avoids communication overhead.

Vertical partitioning is more complicated than horizontal


partitioning:
•Vertical Partitioning: The number of possible fragments is equal
to m m where m is the number of nonprimary key attributes
•Horizontal Partitioning: 2 n possible minterm predicates can be
defined, where n is the number of simple predicates in the
complete and minimal set Pr.
Design of DDBMS 47
Vertical Fragmentation Approaches
Greedy Heuristic Approaches:
Split Approach: Global relations are
progressively split into fragments.
Grouping Approach: Attributes are
progressively aggregated to constitute
fragments.

Correctness:
Each attribute of R belongs to at least one
fragment.
Each fragment includes either a key of R or a
“tuple identifier”. Design of DDBMS 48
Vertical Clustering - Replication
In evaluating the convenience of vertical
clustering, it is important that overlapping
attributes are not heavily updated.
Example: EMP(ENUM,NAME,SAL,TAX,MGRNUM,DNUM)

Administrative Applications Applications


at Site 1 at all sites

Bad Fragmentation: NAME not available in EMP2


1. EMP1(ENUM,NAME,TAX,SAL)
2. EMP2(ENUM,MGRNUM,DNUM)

Good Fragmentation: NAME is relatively stable.


1. EMP1(ENUM, NAME, TAX, SAL)
2. Design
EMP2(ENUM,
of DDBMS
NAME, MGRNUM, DNUM) 49
Split Approach

• Splitting is considered only for attributes that do


not participate in the primary key.

• The split approach involves three steps:


1. Obtain attribute affinity matrix.
2. Use a clustering algorithm to group some attributes
together based on the attribute affinity matrix. This
algorithm produces a clustered affinity matrix.
3. Use a partitioning algorithm to partition attributes
such that set of attributes are accessed solely or for
the most part by distinct set of applications.

Design of DDBMS 50
Attribute Usage Matrix
PROJ PNO PNAME BUDGET LOC
A1 A2 A3 A4

1 if Aj is referenced by qi
q1: SELECT BUDGET
use(qi,Aj) =
0 otherwise
FROM PROJ
WHERE PNO=Value;
A1 A2 A3 A4
q2: SELECT PNAME, BUDGET
FROM PROJ;
q1 1 0 1 0
q3: SELECT PNAME
q2 0 1 1 0
FROM PROJ
WHERE LOC=Value;
q3 0 1 0 1
q4 0 0 1 1
q4: SELECT SUM(BUDGET)
FROM PROJ Attribute Usage Matrix
WHERE Loc=Value
Design of DDBMS 51
Attribute Affinity Measure
aff ( Ai, Aj )  
k , use ( qk , Ai ) 1 use ( qk , Aj ) 1
 ref (q )  acc (q )
l
l k l k

For each query qk that uses both Ai and Aj Popularity of such Ai-Aj pair at
Popularity all sites
of using Relation R Site m
Ai and Aj Site n
together
qk
Ai qi
qi qi
Ak

Aj ref l (qk ) Site l

qk
accl (qk )
Refl (qk): Number of accesses to qi
attributes (Ai,Aj) for each
execution of qk at site l
Design of DDBMS 52
Accl (qk): Application access frequency of qk at site l.
Attribute Affinity Matrix

aff ( Ai, Aj )  
k , use ( qk , Ai ) 1 use ( qk , Aj ) 1
 ref (q )  acc (q )
l
l k l k

Refl (qk): Number of accesses A1 A2 A3 A4


to attributes (Ai,Aj) A1
for each execution
of qk at site l A2 aff ( A2, A3)

A3
Accl (qk): Application access
frequency of qk at A4
Attribute Affinity Matrix
site l.
Design of DDBMS 53
Attribute Affinity Matrix Example
A1 A2 A3 A4 A1 A2 A3 A4
q1 1 0 1 0 A1 45 0 45 0
q2 0 1 1 0 A2 0 80 5 75
q3 0 1 0 1 A3 45 5 53 3
q4 0 0 1 1 A 0 75 3 78
4
Attribute Usage Matrix Attribute Affinity Matrix (AA)

Next Step - Determine clustered affinity (CA) matrix


Design of DDBMS 54
Clustered Affinity Matrix
Step 1: Initialize CA

Copy first 2 columns

A1 A2 A3 A4 A1 A2 A3 A4
A1 45 0 45 0 A1 45 0
A2 0 80 5 75 A2 0 80
A3 45 5 53 3 A3 45 5
A 0 75 3 78
4 A4 0 75
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)

Design of DDBMS 55
Clustered Affinity Matrix
Step 2: Determine Location for A3
3 possibilities

A1 A2 A3 A4 A1 A2
A1 45 0 45 0 A1 45 0
A2 0 80 5 75 A2 0 80
A3 45 5 53 3 A3 45 5
A 0 75 3 78 A4 0 75
4
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)

Design of DDBMS 56
Clustered Affinity Matrix
Step 2: Determine the order for A3
n
bond ( Ax , Ay )   aff ( Az , Ax )  aff ( Az , Ay )
z 1
cont ( Ai , Ak , A j )  2  bond ( Ai , Ak )  2  bond ( Ak , A j )  2  bond ( Ai , A j )

Cont(A0,A3,A1) = 8820 Cont(A1,A3,A2) = 10150 Cont(A2,A3,A4) = 1780

Contributio Since Cont(A1,A3,A2) is the greatest, [A1,A3,A2] is the best order.


n
A1 A2 A3 A4 A1 A3 A2 A4
A1 45 0 45 0 A1 45 45 0
A2 0 80 5 75 A2 0 5 80
A3 45 5 53 3 A3 45 53 5
A 0 75 3 78
4 A 0 3 75
4
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)

Note: aff(A0,Ai)=aff(Ai,ADesign
0)=aff(A 5,Ai)=aff(Ai,A5)=0 by definition57
of DDBMS
Clustered Affinity Matrix
Step 2: Determine the order for A4

Since Cont(A3,A2,A4) is the biggest, [A3,A2,A4] is the best order.

A1 A2 A3 A4 A1 A3 A2 A4
A1 45 0 45 0 A1 45 45 0 0
A2 0 80 5 75 A2 0 5 80 75
A3 45 5 53 3 A3 45 53 5 3
A 0 75 3 78
4 A 0 3 75 78
4
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)

Design of DDBMS 58
Clustered Affinity Matrix
Step 3: Re-order the Rows

The rows are organized in the same order as the columns.

A1 A3 A2 A4 A1 A3 A2 A4
A1 45 45 0 0 A1 45 45 0 0
A2 0 5 80 75 A3 45 53 5 3
A3 45 53 5 3 A2 0 5 80 75
A 0 3 75 78
4 A 0
4
3 75 78
Clustered Affinity Matrix (CA) Clustered Affinity Matrix (CA)

Design of DDBMS 59
Partitioning
Find the sets of attributes A1 A3 A2 A4
that are accessed, for the
most part, by distinct sets
A1 45 45 0 0
of applications. A3 45 53 5 3
We look for dividing points A2 0 5 80 75
along the diagonal such that A 0 3 75 78
4
• Total accesses to only Clustered Affinity Matrix (CA)
one fragment are
maximized, while Cluster 1: A 1 & A3
• Total accesses to more Cluster 2: A 2 & A4
than one fragments are
Two vertical fragments:
minimized. PROJ1(A1, A3) and PROJ2(A2, A4)
Design of DDBMS 60
MIXED FRAGMENTATION
•Apply horizontal fragmentation to vertical fragments.
•Apply vertical fragmentation to horizontal fragments.

Example: Applications about work at each department reference tuples


of employees in the departments located around the site with 80%
probability.
EMP(ENUM,NAME,SAL,TAX,MGRNUM,DNUM)
ENUM NAME TAX SAL ENUM NAME MGRNUM DNUM

Jacksonville
Orlando
Miami

Horizontal
Vertical fragmentation Fragmentation
Design of DDBMS 61
(local work)
i: fragment index ALLOCATION –
j: site index Notations
k: application index
fkj: the frequency of Site j
application k at site j
Fragment i
rki: the number of retrieval
references of application k uki
rki
to fragment i.
uki: the number of update
references of application k Application k
to fragment i. /w freq. fkj

nki = rki + uki


Design of DDBMS 62
Allocation of Horizontal Fragments (1)
No replication: Best Fit Strategy
• The number of local references of Ri at site j is

Benefit to
Bij   f kj nki Number of
Access by k
Site j k
Frequency of
All applications k
application k
at Site j

• Ri is allocated at site j* such that Bij* is maximum.


Advantage: A fragment is allocated to a site that needs it most.
Disadvantage: It disregards the “mutual” effect of placing a
fragment at a given site if a related fragment is also at that
site.
Design of DDBMS 63
Allocation of Horizontal Fragments (2)
All beneficial sites approach (replication)

Bij   f kj rki  c  f kj 'uki


Fragment i k j ' j k

Site j
Savings due to Cost of update
retrieval references from
references other sites

Ri is allocated at all sites j* such that Bij* > 0.


When all Bij’s are negative, a single copy of Ri is
placed at the site such that Bij* is maximum.
Design of DDBMS 64
Allocation of Horizontal Fragments (3)
Another Replication Approach:
di The degree of redundancy of R i

Fi The reliability and availability benefit of having R i fully replicated.

The reliability and availability benefit when the fragment has d i


(di)
copies.

 (d i )  (1  21 d )  F i
i
 (1)  0,  (2)  F i ,  (3)  3  F i ,    β
2 4
Fi
The benefit of introducing a new copy of Ri at site
j:
Bij   f kj rki  c   f kj 'uki   (d i )
1 di
k k j ' j
Also takes into
Same as All Beneficial account the benefit
Sites approach Design of DDBMS of replication 65
Allocation of Vertical Fragments
PSr A1 A3 A2 Allocate Rs to site PSs , and
Rt to site PSt
Ri Rs Rt
PSr
Application type
at site PSr , that
A1 A3 A2
accesses only Rs

As At A4 ... An As Rs Rt At
PSt

PSs PSt PS4 PSn PSs


A4 PS4
Bist  f ks nki  f kt nki  f kr nki ..
k A1 k A2 k A1 .
Applications
of type A1
  f kr nki  2  f kr nki   f kl nki An PSn

at PSs
k A2 k A3 4  l  n k Al

This formula can be used within an exhaustive “splitting”


algorithm by trying all possible combinations of sites s and t.
Design of DDBMS 66
SUMMARY
Design of a distributed DB consists of four phases:
– Phase 1: Global schema design (same as in centralized DB
design)
– Phase 2: Fragmentation
• Horizontal Fragmentation
– Primary: Determent a complete and minimal set of predicates
– Derived: Use semijoin
• Vertical Fragmentation
Identify fragments such that many applications can be executed
using just one fragment.
– Phase 3: Allocation
The primary goal is to minize the number of remote accesses.
– Phase 4: Physical schema design (same as in centralized DB
design).
Design of DDBMS 67

You might also like