0% found this document useful (0 votes)

19 views67 pages

Chapter 2 - 9-15DDB Architecture

Uploaded by

Tranquil Prashanta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views67 pages

Chapter 2 - 9-15DDB Architecture

Uploaded by

Tranquil Prashanta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 67

Distributed DBMS

Architecture

Design of DDBMS 1
NetworkTransparancy
• The user should be protected from the
operational details of the network.
• It is desirable to hide even the existence
of the network, if possible.
 Location transparency: The command used is
independent of the system on which the data is
stored.
 Naming transparency: a unique name is
provided for each object in the database.

Design of DDBMS 2
Replication & Fragmentation Transparency
• The user is unaware of the replication of framents
• Queries are specified on the relations (rather than
the fragments).
Site A
Copy 1 of R1

Copy 1 of R2
Relation R
Fragment R1 Site B
Copy 2 of R1
Fragment R2

Fragment R3

Fragment R4 Site C
Copy 2 of R2

Design of DDBMS 3
ANSI/SPARC Architecture

External Schema External External External

view view view

Conceptual
Conceptual Schema
view

Internal
Internal Schema
view

Internal view: deals with the physical definition and organization of data.
Conceptual view: abstract definition of the database. It is the “real
world” view of the enterprise being modeled in the database.
External view: individual user’s view of the database.
Design of DDBMS 4
A Taxonomy of Distributed Data Systems

A distributed database
can be defined as Distributed data systems

•a logically integrated
collection of shared data Homogeneous
Heterogeneous
(Multidatabase)
which is
• physically distributed
across the nodes of a Federated
Unfederated
(no local users)
computer network.

Loosely coupled Tightly coupled

(interoperable DB (/w global schema)
systems using
export schema)
Design of DDBMS 5
Architecture of a Homogeneous DDBMS

Global user Global user

view 1 view n A homogeneous
Global Schema DDBMS resembles
a centralized DB,
Fragmentation
Schema but instead of
Allocation
Schema
storing all the
data at one site,
Local Local the data is
conceptual conceptual
schema 1 schema n distributed across
Local Local a number of sites
internal internal
schema 1 schema n in a network.

Local DB 1 Local DB n 6
Design of DDBMS
Fragmentation Schema & Allocation Schema

Fragmentation Schema: describes how the global

relations are divided into fragments.

Allocation Schema: specifies at which sites each

fragment is stored.

Example: Fragmentation of global relation R.

A B To materialize R, the following

operations are required:
C D R = (A B) U ( C D) U E

E
Design of DDBMS 7
Homogeneous vs. Heterogeneous
Global • Homogeneous DDBMS
user
– No local users
Local
user
– Most systems do not have
Local
user
local schemas
Multidatabase
Management • Heterogeneous DDBMS
system
– There are both local and
global users
– Multidatabase systems are
split into:
DBMS DBMS DBMS DBMS
• Tightly Coupled Systems:
have a global schema
Database 1 Database 2 Database 3 Database 4 • Loosely Coupled Systems:
do not have a global
Design of DDBMS schema. 8
Schema Architecture of a Tightly-
Coupled System
Global user Global user An individual node’s
view 1 view n participation in the MDB
is defined by means of a
participation schema.
Global Conceptual Schema

Auxiliary Local Local Auxiliary

Schema 1 Participation Participation Schema 1
Schema 1 Schema 1

Local user
Local Local
view 1 Local user
Conceptual Conceptual
Schema 1 Schema 1 view 1

Local user Local Local

view 2 Internal Internal Local user
Schema 1 Schema 1 view 2

Local DB 1 Design of DDBMS

Local DB 1 9
Auxiliary Schema (1)
Auxiliary schema describes the rules which
govern the mappings between the local and
global levels.
 Rules for unit conversion: may be required when
one site expresses distance in kilometers and
another in miles, …
 Rules for handling null values: may be necessary
where one site stores additional information which
is not stored at another site.
– Example: One site stores the name, home address and
telephone number of its employees, whereas another just
stores names and addresses.
Design of DDBMS 10
Auxiliary Schema (2)
 Rules for naming conflicts: naming conflicts occur when:
 semantically identical data items are named differently
• DNAME  Department name (at Site 1)
• DEPTNAME  Department name (at Site 2)
 semantically different data items are named identically.
• NAME  Department name (at Site 1)
• NAME  Manager name (at Site 2)

 Rules for handling data representation conflicts: Such

conflicts occur when semantically identical data items
are represented differently in different data source.
 Example: Data represented as a character string in one
database may be represented as a real number in the other
database. Design of DDBMS 11
Auxiliary Schema (3)
 Rules for handling data scaling conflicts: Such
conflicts occur when semantically identical
data items stored in different databases using
different units of measure.
 Example: “Large”, “New”, “Good”, etc.

These problems are called

domain mismatch problems
Design of DDBMS 12
Loosely-Coupled Systems
(Interoperable Database Systems)

Global Global Global

user view 1 user view 2 user view 3

Local
user view 1 Local Local Local
Conceptual Conceptual Conceptual
Local schema 1 Schema 2 Schema n
user view 2
Local Local Local
internal internal internal
schema 1 Schema 2 Schema n

Local DB 1 Local DB 2 Local DB n

Design of DDBMS 13
Loosely-Coupled Systems
Global Global Global
user view 1 user view 2 user view m

Export Export Export Export

schema 1 schema 2 Schema 3 Schema n
Local
user view 1 Local Local Local
Conceptual Conceptual Conceptual
Local schema 1 Schema 2 Schema n
user view 2
Local Local Local
internal internal internal
schema 1 Schema 2 Schema n

Local DB 1 Design of DDBMS

Local DB 2 Local DB14n
Integration of Heterogeneous Data Models
• Provide bidirectional translators between all
pairs of models
– Advantage: no need to learn another data model and
language
– Disadvantage: requires n(n-1) translators, where n
is the number of different models.
• Adopt a single model (called canonical model) at
the global level and map all the local models
onto this model
– Advantage: requires only 2n translators
– Disadvantage: translations must go through the
global model.
(The 2nd approach is more widely used)
Design of DDBMS 15
Distributed Database Design

•Top-Down Approach: The database system is

being designed from scratch.

• Issues: fragmentation & allocation

•Bottom-up Approach: Integrating existing

databases into one database

• Issues: Design of the export and global

schemas.
Design of DDBMS 16
TOP-DOWN DESIGN PROCESS
Requirements Analysis
Entity analysis +
functional System Requirements Defining the
analysis interfaces
(Objectives)
for end users
Conceptual View integration
design View Design

Global Access External Schema

conceptual information Definitions
schema

Distribution Design

Local Conceptual Schemas Fragmentatio

n & allocation
Maps the local
conceptual
schemas to Physical Design
physical storage
devices
Physical Schema Design of DDBMS 17
Design Consideration (1)

The organization of distributed systems can be

investigated along three dimensions:

Level of sharing

1. No sharing: Each application and its data

execute at one site.
2. Data sharing: Programs are replicated at all
sites, but data files are not.
3. Data + Program Sharing: Both data and
programs may be shared.
Design of DDBMS 18
Design Consideration (2)

Access Pattern
1. Static: Access patterns do not change.
2. Dynamic: Access patterns change over
time.
Level of Knowledge
1. No information
2. Partial information: Access patterns may
deviate from the predictions.
3. Complete information: Access patterns
can reasonably be predicted.
Design of DDBMS 19
Fragmentation Alternatives
J JNO JNAME BUDGET LOC
J1 Instrumental 150,000 Montreal
J2 Database Dev. 135,000 New York
J3 CAD/CAM 250,000 New York
J4 Maintenance 350,000 Paris

Horizontal Partitioning Vertical Partitioning

J1 JNO JNAME BUDGET LOC JNO BUDGET
J1 150,000
J1 Instrumental 150,000 Montreal
J2 135,000
J2 Database Dev. 135,000 New York J3 250,000
J4 310,000

J2 JNO JNAME BUDGET LOC

JNO JNAME LOC
J1 Instrumentation Montreal
J3 CAD/CAM 150,000 Montreal
J2 Database Devl New York
J4 Maintenance. 310,000 Paris
J3 CAD/CAM New York
J4 Maintenance Paris
Design of DDBMS 20
Why fragment at all?

Reasons:
• Interquery concurrency
• Intraquery concurrency

Disadvantages:
• Vertical fragmentation may incur overhead.
• Attributes participating in a dependency may be
allocated to different sites.
 Integrity checking
Design of DDBMSis more costly. 21
Degree of Fragmentation
•Application views are usually subsets of relations.
Hence, it is only natural to consider subsets of
relations as distribution units.

•The appropriate degree of fragmentation is dependent

on the applications.

Design of DDBMS 22
Correctness Rules

• Vertical Partitioning
• Lossless Allocation Alternatives
decomposition
•Partitioning: No replication
• Dependency
preservation •Partial Replication: Some
fragments are replicated
• Horizontal Partitioning
•Full Replication: Database
• Disjoint fragments exists in its entirety at
each site

Design of DDBMS 23
Notations
S Title SAL

E ENO ENAME TITLE J JNO JNAME BUDGET LOC

L2
L3

G ENO JNO RESP DUR

L1: 1-to-many relationship

S: Owner(L1), Source relation
E: Member(L1), Target relation
Design of DDBMS 24
Simple Predicates
Given a relation R(A1, A2, …, An) where Ai has domain Di, a simple
predicate pj defined on R has the form
pj : A i Value
where 
  {, , , , , } and Value  Di

Example:

J JNO JNAME BUDGET LOC

J1 Instrumental 150,000 Montreal
J2 Database Dev. 135,000 New York
J3 CAD/CAM 250,000 New York
J4 Maintenance 350,000 Orlando

Simple predicates: p1: JNAME = “Maintenance”

P2: BUDGET < 200,000

Note: A simple predicate defines a data fragment

Design of DDBMS 25
MINTERM PREDICATE
Given a set of simple predicates for relation R.
P = {p1, p2, …, pm}
The set of minterm predicates
M = {m1, m2, …, mn} TITLE SAL
Elect. Eng. 40,000
is defined as Syst. Analy. 54,000
 p *j
M = {mi | mi = p j P } Mech. Eng. 32,000
p *j  p j or p *j  p j
where Programmer 42,000
Possible simple predicates: Some corresponding
P1: TITLE=“Elect. Eng.” minterm predicates:
P2: TITLE=“Syst. Analy” m : TITLE  " Elect.Eng ."  SAL  30,000
1

P3: TITLE=“Mech. Eng.” m : TITLE  " Elect.Eng "  SAL  30,000

P4: TITLE=“Programmer”
P5: SAL ≤ 35,000 A minterm predicate defines
Design of DDBMS a data fragment 26
P6: SAL > 35,000
Primary Horizontal Fragmentation

A primary horizontal fragmentation is defined by a selection

operation on the owner relations of a database schema.

E ENO ENAME TITLE J JNO JNAME BUDGET LOC

L2
L3

G ENO JNO RESP DUR Owner(L3) = J

A possible fragmentation of J is defined as follows:

J1   BUDGET  200, 000 ( J )
J 2   BUDGET  200, 000 ( J )
Design of DDBMS 27
Horizontal Fragments

Thus, a horizontal fragment Ri of relation R

consists of all the tuples of R that satisfy a
minterm predicate mi.

There are as many horizontal fragments

(also called minterm fragments) as there are
minterm predicates.
Design of DDBMS 28
Completeness (1)
A set of simple predicate Pr is said to be complete if and only
if there is an equal probability of access by every application
to any two tuples belonging to any minterm fragment that is
defined according to Pr.
Simple Predicates Minterm Fragments Applications
p1
A1 ≥ k1
F1 A1
p1
p3 p3 A2
A2 = k2
F2
A3
A3 ≤ k3
F3
A4
TheA fragments
=k look homogeneous
Design of DDBMS 29
4 4
Completeness (2)
Simple Predicates Minterm Fragments Applications
p1
A1 ≥ k1
F1 A1
p1
p3 p3 A2
A2 = k2
F2
A3
A3 ≤ k3
p4
F3 A4
p5
A4 = k4

Set of simple
predicates is
Design of DDBMS incomplete 30
Completeness (2)
Simple Predicates Minterm Fragments Applications
p1
A1 ≥ k1
F1 A1
p1
p3 p3 A2
A2 = k2
F2
A3
A3 ≤ k3
F31 p4
F3 A4
p5
A4 = k4
F32
Additional
simple
A5 > k5 predicate Design of DDBMS 31
Completeness (4)
A set of simple predicate Pr is said to be complete if and only
if there is an equal probability of access by every application
to any two tuples belonging to any minterm fragment that is
defined according to Pr.

J 1  LOC  " MONTREAL " ( J ) Case 1: The only application that accesses
J wants to access the tuples according to
J 2  LOC  " NewYork " ( J ) the location.

J 3  LOC  " Orlando " ( J ) The set of simple predicates

LOC=“Montreal” LOC=“Montreal”,
J1 Pr = LOC=“New York”,
LOC=“New York” LOC=“Orlando”
J J2
is complete because each tuple of each
fragment has the same probability of
J3
LOC=“Orlando” being
Design ofaccessed.
DDBMS 32
Completeness (5)
Example:
J1 JNO JNAME BUDGET LOC
Note: Completeness is a
001 Instrumental 150,000 Montreal
desirable property because a
complete set defines
J2 fragments that are not only
JNO JNAME BUDGET LOC
004 GUI 135,000 New York logically uniform in that they
007 CAD/CAM 250,000 New York all satisfy the minterm
predicate, but statistically
J3 JNO JNAME BUDGET LOC homogeneous.
003 Database Dev. 310,000 Orlando

Case 2: There is a second application which accesses only those

project tuples where the budget is less than $200,000.
 Since tuple “004” is accessed more frequently than tuple
“007”, Pr is not complete.
 To make the the set complete, we need to add
to Prof
(BUDGET< 200,000) Design . DDBMS 33
Completeness (6)
BUDGET<=200,000

J11
LOC=“Montreal”

J1 J12
BUDGET>200,000

J LOC=“New York” BUDGET<=200,000

J2 J21 Small-budget applications

LOC=“Orlando” J22
J3 BUDGET>200,000

BUDGET<=200,000

J31

J32
BUDGET>200,000
Design of DDBMS 34
Redundant Fragmentation

Fragment 1
Logically
uniform &
statistically
homogeneous
fragment Fragment 2

• Fragments 1 and 2 have the same

characteristics
• The fragmentation is unnecessary
Design of DDBMS 35
Minimality
Relevant:
Let mi and mj be two almost identical minterm predicates:

mi = p 1 Λ p2 Λ p 3 fragment fi
mj = p 1 Λ ¬ p2 Λ p3 fragment fj
p2 is relevant if and only if
Access frequency
acc(mi ) acc(m j )

card ( f i ) card ( f j )
Cardinality

That is, there should be at least one application that accesses fi and
fj differently.
i.e., The simple predicate pi should be relevant in determining
a fragmentation.
•Minimal Design of DDBMS 36
If all the predicates of a set Pr are relevant, Pr is minimal.
A Complete and Minimal Example
Two applications:
1. One application accesses the tuples according
to location.
2. Another application accesses only those project
tuples where the budget is less than $200,000.

Case 1: Pr={Loc=“Montreal”, Loc=“New York”, Loc=“Orlando”,

BUDGET<=200,000,BUDGET>200,000} is
complete and minimal.

Case 2: If, however, we were to add the predicate

JNAME= “Instrumentation” to Pr, the resulting
set would not be minimal since the new predicate
is not relevant with
Design respect to the applications. 37
of DDBMS
BUDGET<=200,000 JNAME = “Instrument”

J11 J121
LOC=“Montreal”

J1 J12 J122
BUDGET>200,000
JNAME!  “Instrument”
J LOC=“New York” BUDGET<=200,000

J2 J21 acc(m121) acc(m122) acc(m12)

 
card ( f ) card ( f ) card ( f )
121 122 12
LOC=“Orlando” J22
J3 BUDGET>200,000
[ JNAME = “Instrument” ]
BUDGET<=200,000
is not relevant.
J31

J32
BUDGET>200,000

Relevant Irrelevant
Design of DDBMS 38
Application Information
• Qualification Information
– The fundamental qualification information consists of the
predicates used in user queries (i.e., “where” clauses in SQL).
– 80/20 rule: 20% of user queries account for 80% of the total
data access.
 One should investigate the more important queries.
• Quantitative Information
– Minterm Selectivity sel(mi): number of tuples that would be
accessed by a query specified according to a given minterm
predicate.
– Access Freequency acc(qi): the access frequency of queries in
a given period.

Qualitative information guides the fragmentation

activity. Design of DDBMS 39
Determine the set of meaningful minterm predicates
Applications:
• Take the salary and determine a raise accordingly.
• The employee records are managed in two places, one handling the
records of those with salary less than or equal to $30,000 and the other
handling the records of those who earn more than $30,000.

Pr={p1: SAL<=30,000, p2: SAL>30,000} is complete and minimal.

The minterm predicates:

m1 : ( SAL  30,000)  ( SAL  30,000)
m 2 : ( SAL  30,000)  ( SAL  30,000) i1  m1 is contradictory
m3 : ( SAL  30,000)  ( SAL  30,000)
m 4 : ( SAL  30,000)  ( SAL  30,000)
i 2  m 4 is contradictory

Implications: Therefore, we are left with

i1 : ( SAL  30,000)  ( SAL  30,000)
M = {m2, m3}
i 2 : ( SAL  30,000)  ( SAL  30,000)
i 3 : ( SAL  30,000)  ( SAL  30,000)
i 4 : ( SAL  30,000)  ( SAL  30,000Design
) of DDBMS 40
Invalid Implications

J JNO JNAME BUDGET LOC

J1 Instrumental 150,000 Montreal
J2 Database Dev. 135,000 New York
J3 CAD/CAM 250,000 New York
J4 Maintenance 350,000 Orlando

Simple predicates VALID Implications INVALID Implications

p1: LOC = “Montreal” i 1 : p 1  p 2  p 3 i 8 : LOC " Montreal "  ( BUDGET  200,000)
p2: LOC = “New York” i 9 : LOC " Orlando"  ( BUDGET  200,000)
i 2 : p 2  p 1  p 3
p3: LOC = “Orlando”
i 3 : p 3  p 1  p 2
p4: BUDGET ≤ 200,000 Implications should be
i 4 : p 4  p 5
p5: BUDGET > 200,000 defined according to the
i 5 : p 5  p 4 semantics of the database,
i 6 : p 4  p 5 not according to the
i 7 : p 5  p 4 current values.
Design of DDBMS 41
Compute Complete & Minimal Set
Rule: a relation or fragment is partitioned “into at least two parts which are
accessed differently by at least one application.
Relevant: a simple predicate which satisfies the above rule, is relevant.

• Repeat until the predicate set is complete

– Find a simple predicate pi that is relevant
– Determine minterm fragments fi and fj according to pi
– Accept pi , fi , and fj
– Remove any pk and fk from acceptance list if pk becomes
nonrelevant /* the list is minimal */
• Determine the set of minterm predicates M (using the
acceptance list)
• Determine the set of implications I (among the
acceptance list)
• For each mi in M, remove mi if it is contradictory
according to I Design of DDBMS 42
Derived Horizontal Fragmentation

Derived fragmentation is used to facilitate the

join between fragments.

In some cases, the horizontal fragmentation of a

relation cannot be based on a property of its own
attributes, but is derived from the horizontal
fragmentation of another relation.

Design of DDBMS 43
Benefits of Derived Fragmentation
PAY (TITLE, SAL) Primary Fragmentation:
PAY 1  SAL  30 , 000( PAY )
EMP (ENO, ENAME, TITLE) PAY 2   ( SAL  30 , 000 )( PAY )

EMP1 = EMP SJ PAY1

Using Derived Fragmentation:
EMP2 = EMP SJ PAY2
EMP1 PAY1 EMPi and PAYi can be allocated
EMP2 PAY2 to the same site.

Not using derived fragmentation: one can divide EMP into EMP1
and EMP2 based on TITLE and divide PAY into PAY1, PAY2, PAY3
based on SAL. To join EMP and PAY, we have the following
scenarios. PAY1

EMP1 PAY2 More communication overhead !

EMP2 PAY3
Design of DDBMS 44
Chain Relationships

• Design the primary

fragmenation for R1.

R1 (R!PK, …) • Derive the derived

fragmentation for Rk as
R2 (R2PK, R1FK, …) follows:

R3 (R3PK, R2FK, …) • Rk = Rk SJ R(k-1)

RKFK=R(k-1)PK

...
• for 2  k  n in that order.

Design of DDBMS 45
Derived Fragmentation

EMP (ENO, ENAME, TITLE) PROJ (PNO, PNAME, BUDGET)

EMP_PROJ (ENO, PNO, RESP, DUR)

• How do we fragment EMP_PROJ ?

– Semi-Join with EMP, or
– Semi-Join with PROJ
• Criterion: Suport the more-frequent join
operation.
Design of DDBMS 46
VERTICAL FRAGMENTATION
Purpose: Identify fragments Ri such that
many applications can be executed using
just one fragment. A7 A1

Advantage: When many applications which

use R1 and many applications which use R2 R2 R1
are issued at different sites, fragmenting
R avoids communication overhead.

Vertical partitioning is more complicated than horizontal

partitioning:
•Vertical Partitioning: The number of possible fragments is equal
to m m where m is the number of nonprimary key attributes
•Horizontal Partitioning: 2 n possible minterm predicates can be
defined, where n is the number of simple predicates in the
complete and minimal set Pr.
Design of DDBMS 47
Vertical Fragmentation Approaches
Greedy Heuristic Approaches:
Split Approach: Global relations are
progressively split into fragments.
Grouping Approach: Attributes are
progressively aggregated to constitute
fragments.

Correctness:
Each attribute of R belongs to at least one
fragment.
Each fragment includes either a key of R or a
“tuple identifier”. Design of DDBMS 48
Vertical Clustering - Replication
In evaluating the convenience of vertical
clustering, it is important that overlapping
attributes are not heavily updated.
Example: EMP(ENUM,NAME,SAL,TAX,MGRNUM,DNUM)

Administrative Applications Applications

at Site 1 at all sites

Bad Fragmentation: NAME not available in EMP2

1. EMP1(ENUM,NAME,TAX,SAL)
2. EMP2(ENUM,MGRNUM,DNUM)

Good Fragmentation: NAME is relatively stable.

1. EMP1(ENUM, NAME, TAX, SAL)
2. Design
EMP2(ENUM,
of DDBMS
NAME, MGRNUM, DNUM) 49
Split Approach

• Splitting is considered only for attributes that do

not participate in the primary key.

• The split approach involves three steps:

1. Obtain attribute affinity matrix.
2. Use a clustering algorithm to group some attributes
together based on the attribute affinity matrix. This
algorithm produces a clustered affinity matrix.
3. Use a partitioning algorithm to partition attributes
such that set of attributes are accessed solely or for
the most part by distinct set of applications.

Design of DDBMS 50
Attribute Usage Matrix
PROJ PNO PNAME BUDGET LOC
A1 A2 A3 A4

1 if Aj is referenced by qi
q1: SELECT BUDGET
use(qi,Aj) =
0 otherwise
FROM PROJ
WHERE PNO=Value;
A1 A2 A3 A4
q2: SELECT PNAME, BUDGET
FROM PROJ;
q1 1 0 1 0
q3: SELECT PNAME
q2 0 1 1 0
FROM PROJ
WHERE LOC=Value;
q3 0 1 0 1
q4 0 0 1 1
q4: SELECT SUM(BUDGET)
FROM PROJ Attribute Usage Matrix
WHERE Loc=Value
Design of DDBMS 51
Attribute Affinity Measure
aff ( Ai, Aj )  
k , use ( qk , Ai ) 1 use ( qk , Aj ) 1
 ref (q )  acc (q )
l
l k l k

For each query qk that uses both Ai and Aj Popularity of such Ai-Aj pair at
Popularity all sites
of using Relation R Site m
Ai and Aj Site n
together
qk
Ai qi
qi qi
Ak

aff ( Ai, Aj )  
k , use ( qk , Ai ) 1 use ( qk , Aj ) 1
 ref (q )  acc (q )
l
l k l k

Refl (qk): Number of accesses A1 A2 A3 A4

to attributes (Ai,Aj) A1
for each execution
of qk at site l A2 aff ( A2, A3)

A3
Accl (qk): Application access
frequency of qk at A4
Attribute Affinity Matrix
site l.
Design of DDBMS 53
Attribute Affinity Matrix Example
A1 A2 A3 A4 A1 A2 A3 A4
q1 1 0 1 0 A1 45 0 45 0
q2 0 1 1 0 A2 0 80 5 75
q3 0 1 0 1 A3 45 5 53 3
q4 0 0 1 1 A 0 75 3 78
4
Attribute Usage Matrix Attribute Affinity Matrix (AA)

Next Step - Determine clustered affinity (CA) matrix

Design of DDBMS 54
Clustered Affinity Matrix
Step 1: Initialize CA

Copy first 2 columns

A1 A2 A3 A4 A1 A2 A3 A4
A1 45 0 45 0 A1 45 0
A2 0 80 5 75 A2 0 80
A3 45 5 53 3 A3 45 5
A 0 75 3 78
4 A4 0 75
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)

Design of DDBMS 55
Clustered Affinity Matrix
Step 2: Determine Location for A3
3 possibilities

A1 A2 A3 A4 A1 A2
A1 45 0 45 0 A1 45 0
A2 0 80 5 75 A2 0 80
A3 45 5 53 3 A3 45 5
A 0 75 3 78 A4 0 75
4
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)

Design of DDBMS 56
Clustered Affinity Matrix
Step 2: Determine the order for A3
n
bond ( Ax , Ay )   aff ( Az , Ax )  aff ( Az , Ay )
z 1
cont ( Ai , Ak , A j )  2  bond ( Ai , Ak )  2  bond ( Ak , A j )  2  bond ( Ai , A j )

Cont(A0,A3,A1) = 8820 Cont(A1,A3,A2) = 10150 Cont(A2,A3,A4) = 1780

Contributio Since Cont(A1,A3,A2) is the greatest, [A1,A3,A2] is the best order.

n
A1 A2 A3 A4 A1 A3 A2 A4
A1 45 0 45 0 A1 45 45 0
A2 0 80 5 75 A2 0 5 80
A3 45 5 53 3 A3 45 53 5
A 0 75 3 78
4 A 0 3 75
4
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)

Note: aff(A0,Ai)=aff(Ai,ADesign
0)=aff(A 5,Ai)=aff(Ai,A5)=0 by definition57
of DDBMS
Clustered Affinity Matrix
Step 2: Determine the order for A4

Since Cont(A3,A2,A4) is the biggest, [A3,A2,A4] is the best order.

A1 A2 A3 A4 A1 A3 A2 A4
A1 45 0 45 0 A1 45 45 0 0
A2 0 80 5 75 A2 0 5 80 75
A3 45 5 53 3 A3 45 53 5 3
A 0 75 3 78
4 A 0 3 75 78
4
Attribute Affinity Matrix (AA) Clustered Affinity Matrix (CA)

Design of DDBMS 58
Clustered Affinity Matrix
Step 3: Re-order the Rows

The rows are organized in the same order as the columns.

A1 A3 A2 A4 A1 A3 A2 A4
A1 45 45 0 0 A1 45 45 0 0
A2 0 5 80 75 A3 45 53 5 3
A3 45 53 5 3 A2 0 5 80 75
A 0 3 75 78
4 A 0
4
3 75 78
Clustered Affinity Matrix (CA) Clustered Affinity Matrix (CA)

Design of DDBMS 59
Partitioning
Find the sets of attributes A1 A3 A2 A4
that are accessed, for the
most part, by distinct sets
A1 45 45 0 0
of applications. A3 45 53 5 3
We look for dividing points A2 0 5 80 75
along the diagonal such that A 0 3 75 78
4
• Total accesses to only Clustered Affinity Matrix (CA)
one fragment are
maximized, while Cluster 1: A 1 & A3
• Total accesses to more Cluster 2: A 2 & A4
than one fragments are
Two vertical fragments:
minimized. PROJ1(A1, A3) and PROJ2(A2, A4)
Design of DDBMS 60
MIXED FRAGMENTATION
•Apply horizontal fragmentation to vertical fragments.
•Apply vertical fragmentation to horizontal fragments.

Example: Applications about work at each department reference tuples

of employees in the departments located around the site with 80%
probability.
EMP(ENUM,NAME,SAL,TAX,MGRNUM,DNUM)
ENUM NAME TAX SAL ENUM NAME MGRNUM DNUM

Jacksonville
Orlando
Miami

Horizontal
Vertical fragmentation Fragmentation
Design of DDBMS 61
(local work)
i: fragment index ALLOCATION –
j: site index Notations
k: application index
fkj: the frequency of Site j
application k at site j
Fragment i
rki: the number of retrieval
references of application k uki
rki
to fragment i.
uki: the number of update
references of application k Application k
to fragment i. /w freq. fkj

nki = rki + uki

Design of DDBMS 62
Allocation of Horizontal Fragments (1)
No replication: Best Fit Strategy
• The number of local references of Ri at site j is

Benefit to
Bij   f kj nki Number of
Access by k
Site j k
Frequency of
All applications k
application k
at Site j

• Ri is allocated at site j* such that Bij* is maximum.

Advantage: A fragment is allocated to a site that needs it most.
Disadvantage: It disregards the “mutual” effect of placing a
fragment at a given site if a related fragment is also at that
site.
Design of DDBMS 63
Allocation of Horizontal Fragments (2)
All beneficial sites approach (replication)

Bij   f kj rki  c  f kj 'uki

Fragment i k j ' j k

Site j
Savings due to Cost of update
retrieval references from
references other sites

Ri is allocated at all sites j* such that Bij* > 0.

When all Bij’s are negative, a single copy of Ri is
placed at the site such that Bij* is maximum.
Design of DDBMS 64
Allocation of Horizontal Fragments (3)
Another Replication Approach:
di The degree of redundancy of R i

Fi The reliability and availability benefit of having R i fully replicated.

The reliability and availability benefit when the fragment has d i

(di)
copies.

 (d i )  (1  21 d )  F i
i
 (1)  0,  (2)  F i ,  (3)  3  F i ,    β
2 4
Fi
The benefit of introducing a new copy of Ri at site
j:
Bij   f kj rki  c   f kj 'uki   (d i )
1 di
k k j ' j
Also takes into
Same as All Beneficial account the benefit
Sites approach Design of DDBMS of replication 65
Allocation of Vertical Fragments
PSr A1 A3 A2 Allocate Rs to site PSs , and
Rt to site PSt
Ri Rs Rt
PSr
Application type
at site PSr , that
A1 A3 A2
accesses only Rs

As At A4 ... An As Rs Rt At
PSt

PSs PSt PS4 PSn PSs

A4 PS4
Bist  f ks nki  f kt nki  f kr nki ..
k A1 k A2 k A1 .
Applications
of type A1
  f kr nki  2  f kr nki   f kl nki An PSn

at PSs
k A2 k A3 4  l  n k Al

This formula can be used within an exhaustive “splitting”

algorithm by trying all possible combinations of sites s and t.
Design of DDBMS 66
SUMMARY
Design of a distributed DB consists of four phases:
– Phase 1: Global schema design (same as in centralized DB
design)
– Phase 2: Fragmentation
• Horizontal Fragmentation
– Primary: Determent a complete and minimal set of predicates
– Derived: Use semijoin
• Vertical Fragmentation
Identify fragments such that many applications can be executed
using just one fragment.
– Phase 3: Allocation
The primary goal is to minize the number of remote accesses.
– Phase 4: Physical schema design (same as in centralized DB
design).
Design of DDBMS 67

Lecture 8 - Distributed Database Management Systems
No ratings yet
Lecture 8 - Distributed Database Management Systems
60 pages
Distributed DB
No ratings yet
Distributed DB
146 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
Types of Distributed Databases.: Homogeneous Distributed Databases System Heterogeneous Distributed Database System
No ratings yet
Types of Distributed Databases.: Homogeneous Distributed Databases System Heterogeneous Distributed Database System
22 pages
DDS5 Design
No ratings yet
DDS5 Design
89 pages
Distributed Database Design
100% (3)
Distributed Database Design
86 pages
3 Distribution Design
No ratings yet
3 Distribution Design
110 pages
Distributed Dbmss - Concepts and Design: Pearson Education © 2009
No ratings yet
Distributed Dbmss - Concepts and Design: Pearson Education © 2009
72 pages
Distributed Database Design
88% (8)
Distributed Database Design
85 pages
04 - Distributed DBMSs - Concepts and Design
No ratings yet
04 - Distributed DBMSs - Concepts and Design
72 pages
Unit 2-DBP
No ratings yet
Unit 2-DBP
44 pages
Lecture 1 Ho
No ratings yet
Lecture 1 Ho
62 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
123 pages
8-Distributed Database
No ratings yet
8-Distributed Database
22 pages
Topic 7 - Distributed Database Systems
No ratings yet
Topic 7 - Distributed Database Systems
44 pages
Lecture 6 - Distributed Databases
No ratings yet
Lecture 6 - Distributed Databases
61 pages
Distributeddbms Er. Inderjeet Bal
No ratings yet
Distributeddbms Er. Inderjeet Bal
60 pages
Distributed Database Frank Chinembiri and Florence-2
No ratings yet
Distributed Database Frank Chinembiri and Florence-2
42 pages
07 DistributedDataManagement
No ratings yet
07 DistributedDataManagement
44 pages
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
No ratings yet
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
32 pages
Vu Lec 12
No ratings yet
Vu Lec 12
16 pages
Unit - 2 (1) DBMS
No ratings yet
Unit - 2 (1) DBMS
25 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
Lecture 1 Ho PDF
No ratings yet
Lecture 1 Ho PDF
62 pages
Ddis U1-3
No ratings yet
Ddis U1-3
40 pages
Week 12 - Distributed Databases
No ratings yet
Week 12 - Distributed Databases
37 pages
CSC302 ch24
No ratings yet
CSC302 ch24
23 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Types of Distributed Data Base System - 49724
No ratings yet
Types of Distributed Data Base System - 49724
37 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Chapter No5 - Distributive Database
No ratings yet
Chapter No5 - Distributive Database
25 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
Final
No ratings yet
Final
46 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
CH 4
No ratings yet
CH 4
16 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Lecture 4db
No ratings yet
Lecture 4db
14 pages
CS8492 DBMS Unit 5
No ratings yet
CS8492 DBMS Unit 5
20 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Lecture 8 - Distributed Databases
No ratings yet
Lecture 8 - Distributed Databases
4 pages
Advanced Database Chapter 7 Assignment PDF
No ratings yet
Advanced Database Chapter 7 Assignment PDF
7 pages
DBMS Unit V
No ratings yet
DBMS Unit V
17 pages
ADBS Chapter Seven
No ratings yet
ADBS Chapter Seven
22 pages
Distributed Database Design: Basics
No ratings yet
Distributed Database Design: Basics
18 pages
Business Analytics
No ratings yet
Business Analytics
9 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Distributed Database Design
No ratings yet
Distributed Database Design
52 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Certificado
No ratings yet
Certificado
14 pages
Database II: Distributed Databases
No ratings yet
Database II: Distributed Databases
15 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Windev Tutorial
100% (1)
Windev Tutorial
225 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
Railway Ticket Booking System Using QR Code
No ratings yet
Railway Ticket Booking System Using QR Code
14 pages
Sample Exam Questions
No ratings yet
Sample Exam Questions
10 pages
Developing Java Applications - Db2aje90
No ratings yet
Developing Java Applications - Db2aje90
401 pages
CSB353: Compiler Design Lab: Project Report
No ratings yet
CSB353: Compiler Design Lab: Project Report
15 pages
Free Preparation Oracle 1Z0-591 Exam Questions and Answers - IT Exam Leak
100% (1)
Free Preparation Oracle 1Z0-591 Exam Questions and Answers - IT Exam Leak
62 pages
LAMP Quickstart For Red Hat Enterprise Linux 4
100% (1)
LAMP Quickstart For Red Hat Enterprise Linux 4
8 pages
Mrp-Material Requirement Planning
No ratings yet
Mrp-Material Requirement Planning
88 pages
Sgi Xfs Guide
No ratings yet
Sgi Xfs Guide
113 pages
SAP System Copy Procedure
No ratings yet
SAP System Copy Procedure
6 pages
JFrog CLICheat Sheet
No ratings yet
JFrog CLICheat Sheet
1 page
AWS Cheat Sheet - AWS Identity and Access Management (IAM) - Tutorials Dojo
No ratings yet
AWS Cheat Sheet - AWS Identity and Access Management (IAM) - Tutorials Dojo
14 pages
Casper Suite 9.82 Installation Guide For Linux
No ratings yet
Casper Suite 9.82 Installation Guide For Linux
71 pages
Business Metrics v2 Prerequisites Guide
No ratings yet
Business Metrics v2 Prerequisites Guide
62 pages
1.2 CSRF-Slides
No ratings yet
1.2 CSRF-Slides
40 pages
Subcontracting Process in SAP MM After GST
No ratings yet
Subcontracting Process in SAP MM After GST
3 pages
Processing Asset Acquisitions in Purchasing (FI-AA and MM)
No ratings yet
Processing Asset Acquisitions in Purchasing (FI-AA and MM)
25 pages
Robotic Process Automation: A Case Study in Quality Management at Mercedes-Benz AG
No ratings yet
Robotic Process Automation: A Case Study in Quality Management at Mercedes-Benz AG
7 pages
Forensic
No ratings yet
Forensic
8 pages
Software Requirements:: Goal Oriented RE
No ratings yet
Software Requirements:: Goal Oriented RE
20 pages
JIGAA
No ratings yet
JIGAA
14 pages
AWS Questions
No ratings yet
AWS Questions
5 pages
POOA SOHO Integration Technical Design Document
No ratings yet
POOA SOHO Integration Technical Design Document
9 pages
Inventory Control Using ABC and Min-Max Analysis o
No ratings yet
Inventory Control Using ABC and Min-Max Analysis o
11 pages
Cloud Infrastructure Security at Different Laevels
No ratings yet
Cloud Infrastructure Security at Different Laevels
7 pages
Resume Prashanth Vadla
No ratings yet
Resume Prashanth Vadla
2 pages
Assistmyteam Email To PDF: For Outlook
No ratings yet
Assistmyteam Email To PDF: For Outlook
4 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
DragonFly BSD System Design and Administration: Definitive Reference for Developers and Engineers
From Everand
DragonFly BSD System Design and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Debian System Essentials: Definitive Reference for Developers and Engineers
From Everand
Debian System Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
From Everand
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Chapter 2 - 9-15DDB Architecture

Uploaded by

Chapter 2 - 9-15DDB Architecture

Uploaded by

Distributed DBMS

External Schema External External External

Loosely coupled Tightly coupled

Global user Global user

Fragmentation Schema: describes how the global

Allocation Schema: specifies at which sites each

Example: Fragmentation of global relation R.

A B To materialize R, the following

Auxiliary Local Local Auxiliary

Local user Local Local

Local DB 1 Design of DDBMS

 Rules for handling data representation conflicts: Such

These problems are called

Global Global Global

Local DB 1 Local DB 2 Local DB n

Export Export Export Export

Local DB 1 Design of DDBMS

•Top-Down Approach: The database system is

• Issues: fragmentation & allocation

•Bottom-up Approach: Integrating existing

• Issues: Design of the export and global

Global Access External Schema

Local Conceptual Schemas Fragmentatio

The organization of distributed systems can be

1. No sharing: Each application and its data

Horizontal Partitioning Vertical Partitioning

J2 JNO JNAME BUDGET LOC

•The appropriate degree of fragmentation is dependent

E ENO ENAME TITLE J JNO JNAME BUDGET LOC

G ENO JNO RESP DUR

L1: 1-to-many relationship

J JNO JNAME BUDGET LOC

Simple predicates: p1: JNAME = “Maintenance”

Note: A simple predicate defines a data fragment

P3: TITLE=“Mech. Eng.” m : TITLE  " Elect.Eng "  SAL  30,000

A primary horizontal fragmentation is defined by a selection

E ENO ENAME TITLE J JNO JNAME BUDGET LOC

G ENO JNO RESP DUR Owner(L3) = J

A possible fragmentation of J is defined as follows:

Thus, a horizontal fragment Ri of relation R

There are as many horizontal fragments

J 3  LOC  " Orlando " ( J ) The set of simple predicates

Case 2: There is a second application which accesses only those

J LOC=“New York” BUDGET<=200,000

J2 J21 Small-budget applications

• Fragments 1 and 2 have the same

Case 1: Pr={Loc=“Montreal”, Loc=“New York”, Loc=“Orlando”,

Case 2: If, however, we were to add the predicate

J2 J21 acc(m121) acc(m122) acc(m12)

Qualitative information guides the fragmentation

Pr={p1: SAL<=30,000, p2: SAL>30,000} is complete and minimal.

The minterm predicates:

Implications: Therefore, we are left with

J JNO JNAME BUDGET LOC

Simple predicates VALID Implications INVALID Implications

• Repeat until the predicate set is complete

Derived fragmentation is used to facilitate the

In some cases, the horizontal fragmentation of a

EMP1 = EMP SJ PAY1

EMP1 PAY2 More communication overhead !

• Design the primary

R1 (R!PK, …) • Derive the derived

R3 (R3PK, R2FK, …) • Rk = Rk SJ R(k-1)

EMP (ENO, ENAME, TITLE) PROJ (PNO, PNAME, BUDGET)

EMP_PROJ (ENO, PNO, RESP, DUR)

• How do we fragment EMP_PROJ ?

Advantage: When many applications which

Vertical partitioning is more complicated than horizontal

Administrative Applications Applications

Bad Fragmentation: NAME not available in EMP2

Good Fragmentation: NAME is relatively stable.

• Splitting is considered only for attributes that do

• The split approach involves three steps:

Aj ref l (qk ) Site l

Refl (qk): Number of accesses A1 A2 A3 A4

Next Step - Determine clustered affinity (CA) matrix