0% found this document useful (0 votes)

27 views78 pages

2 Distribution Design

Uploaded by

Justin William

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views78 pages

2 Distribution Design

Uploaded by

Justin William

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 78

Principles of Distributed Database

Systems
Presenter: Mr. Thomas Tesha

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 1
Outline
 Introduction
 Distributed and Parallel Database Design
 Distributed Data Control
 Distributed Query Processing
 Distributed Transaction Processing
 Data Replication
 Database Integration – Multidatabase Systems
 Parallel Database Systems

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 2
Outline
 Distributed and Parallel Database Design
 Fragmentation
 Data distribution

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 3
Distribution Design

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 4
Distribution Design

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 5
Outline
 Distributed and Parallel Database Design
 Fragmentation
 Data distribution

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 6
Fragmentation

 Why Fragmentation?
 First, application views are usually subsets of relations.
 Therefore, the locality of accesses of applications is defined not on
entire relations but on their subsets. Hence it is only natural to
consider subsets of relations as distribution units.
 Second, if the applications that have views defined on a given
relation reside at different sites, two alternatives can be followed,
with the entire relation being the unit of distribution.
 Either the relation is not replicated and is stored at only one site, or
it is replicated at all or some of the sites where the applications
reside. The former results in an unnecessarily high volume of
remote data accesses. The latter, has unnecessary replication,
which causes problems in executing updates (to be discussed later)
and may not be desirable if storage is limited.
 Finally to facilitate concurrency where each unit of fragment can
permit transaction execution.. 7
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez
Fragmentation alternatives

 What is a reasonable way of fragmentation in

distribution?
 Relation instances
are essentially
tables,
 Alternative ways of
dividing a table into
smaller ones?
 There two
alternatives for this:
dividing it
horizontally or
dividing it vertically.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 8
Fragmentation Alternatives – Horizontal

PROJ1 : projects with budgets

less than $200,000
PROJ2 : projects with budgets
greater than or equal
to $200,000

PROJ1= σ BUDGET <=200000 (PROJ)

PROJ2= σBUDGET > 200000 (PROJ)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 9
Fragmentation Alternatives – Vertical
PROJ1: information about
project budgets
PROJ2: information about
project names and
locations

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 10
Correctness of Fragmentation

 Completeness
 Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be found in
some Ri
 Reconstruction
 If relation R is decomposed into fragments R1, R2, ..., Rn, then
there should exist some relational operator ∇ such that
R = ∇1≤i≤nRi
 Disjointness
 If relation R is decomposed into fragments R1, R2, ..., Rn, and
data item di is in Rj, then di should not be in any other fragment
Rk (k ≠ j ).

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 11
Allocation Alternatives

 Non-replicated
 partitioned : each fragment resides at only one site
 Replicated
 fully replicated : each fragment at each site
 partially replicated : each fragment at some of the sites
 Rule of thumb:

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 12
Comparison of Replication Alternatives

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 13
Fragmentation

 Horizontal Fragmentation (HF)

 Primary Horizontal Fragmentation (PHF)
 Derived Horizontal Fragmentation (DHF)
 Vertical Fragmentation (VF)
 Hybrid Fragmentation (HF)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 14
PHF – Information Requirements

 Database Information
 relationship

 cardinality of each relation: card(R)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 15
PHF - Information Requirements
 Application Information
 simple predicates : Given R[A1, A2, …, An], a simple predicate pj
is
pj : Ai θValue
where θ  {=,<,≤,>,≥,≠}, Value  Di and Di is the domain of Ai.
For relation R we define Pr = {p1, p2, …,pm}
Example :
PNAME = "Maintenance"
BUDGET ≤ 200000
 minterm predicates : Given R and Pr = {p1, p2, …,pm}
define M = {m1,m2,…,mr} as
M = { mi | mi =  pjPr pj* }, 1≤j≤m, 1≤i≤z
where pj* = pj or pj* = ¬(pj).
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 16
PHF – Information Requirements

Example
m1: PNAME="Maintenance"  BUDGET≤200000

m2: NOT(PNAME="Maintenance")  BUDGET≤200000

m3: PNAME= "Maintenance"  NOT(BUDGET≤200000)

m4: NOT(PNAME="Maintenance")  NOT(BUDGET≤200000)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 17
PHF – Information Requirements

 Application Information
 minterm selectivities: sel(mi)
 The number of tuples of the relation that would be accessed by a
user query which is specified according to a given minterm
predicate mi.
 access frequencies: acc(qi)
 The frequency with which a user application qi accesses data.
 Access frequency for a minterm predicate can also be defined.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 18
Primary Horizontal Fragmentation

 Definition : A primary horizontal fragmentation is defined

by a selection operation on the owner relations of a
database schema. Therefore, with relation R, its
horizontal fragment is given by
Rj = Fj(R), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a minterm
predicate.
 Therefore, A horizontal fragment Ri of relation R consists
of all the tuples of R which satisfy a minterm predicate
mi.
 Given a set of minterm predicates M, there are as many
horizontal fragments of relation R as there are minterm
predicates. Set of horizontal fragments also referred to as
Lecture slidesminterm 19
fragments.
as adapted and customized from © 2020, M.T. Özsu & P. Valduriez
PHF – Algorithm

Given: A relation R, the set of simple predicates Pr

Output: The set of fragments of R = {R1, R2,…,Rw} which
obey the fragmentation rules.

Preliminaries :
 Pr should be complete
 Pr should be minimal

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 20
Completeness of Simple Predicates

 Completeness of simple predicates refers to the extent

to which queries involving simple conditions or
predicates can be executed efficiently and effectively
across distributed nodes.
 A simple predicate typically involves a single condition applied to
one or more attributes of a relation (table).
 Ensuring completeness of simple predicates is essential for
achieving optimal query performance and data retrieval in
distributed database systems.
 A set of simple predicates Pr is said to be complete if
and only if the accesses to the tuples of the minterm
fragments defined on Pr requires that two tuples of the
same minterm fragment have the same probability of
being accessed by any application.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 21
Completeness of Simple Predicates

 Example :
 Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.
 Find the budgets of projects at each location. (1)
 Find projects with budgets less than $200000. (2)
 According to (1),
Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}
which is not complete with respect to (2).
 Modify

Pr ={LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,

BUDGET≤200000, BUDGET>200000}
which is complete.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 22
Minimality of Simple Predicates

 In the context of distributed database fragmentation, the

minimality of simple predicates refers to the principle of
selecting the simplest and most concise conditions or
predicates to determine how data is distributed across
nodes in the distributed system.
 Minimizing predicates in distributed database
fragmentation is essential for ensuring efficient data
distribution, optimizing query performance, and reducing
unnecessary data movement across nodes.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 23
Minimality of Simple Predicates

 If a predicate influences how fragmentation is performed,

(i.e., causes a fragment f to be further fragmented into,
say, fi and fj) then there should be at least one
application that accesses fi and fj differently.
 In other words, the simple predicate should be relevant
in determining a fragmentation.
 If all the predicates of a set Pr are relevant, then Pr is
minimal.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 24
Minimality of Simple Predicates

 Definition.. Let mi and mj be two minterm predicates that

are identical in their definition, except that mi contains
the simple predicate pi in its natural form while mj
contains ¬pi . Also, let fi and fj be two fragments defined
according to mi and mj, respectively. Then pi is relevant
if and only if
acc(mi ) acc(m j )

card ( f i ) card ( f j )

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 25
Minimality of Simple Predicates

Example :
Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}

is minimal (in addition to being complete). However, if we

add
PNAME = “Instrumentation”

then Pr is not minimal.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 26
COM_MIN Algorithm

Given: a relation R and a set of simple predicates Pr

Output: a complete and minimal set of simple predicates
Pr' for Pr

Rule 1: a relation or fragment is partitioned into at least

two parts which are accessed differently by at
least one application.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 27
COM_MIN Algorithm

 Initialization :
 find a pi  Pr such that pi partitions R according to Rule 1
 set Pr' = pi ; Pr Pr – {pi} ; F  {fi}
 Iteratively add predicates to Pr' until it is complete
 find a pj  Pr such that pj partitions some fk defined according to
minterm predicate over Pr' according to Rule 1
 set Pr' = Pr'  {pi}; Pr Pr – {pi}; F  F  {fi}
 if pk  Pr' which is nonrelevant then
Pr'  Pr – {pi}
F  F – {fi}

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 28
PHORIZONTAL Algorithm

Makes use of COM_MIN to perform fragmentation.

Input: a relation R and a set of simple predicates Pr
Output: a set of minterm predicates M according to which
relation R is to be fragmented

 Pr'  COM_MIN (R,Pr)

 determine the set M of minterm predicates
 determine the set I of implications among pi  Pr
 eliminate the contradictory minterms from M

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 29
PHF – Example

 Two candidate relations : PAY and PROJ.

 Fragmentation of relation PAY
 Application: Check the salary info and determine raise.
 Employee records kept at two sites  application run at two sites
 Simple predicates
p1 : SAL ≤ 30000
p2 : SAL > 30000
Pr = {p1,p2} which is complete and minimal Pr'=Pr
 Minterm predicates
m1 : (SAL ≤ 30000)
m2 : NOT(SAL ≤ 30000) = (SAL > 30000)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 30
PHF – Example

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 31
PHF – Example
 Fragmentation of relation PROJ
 Applications:
 Find the name and budget of projects given their no.
 Issued at three sites

 Access project information according to budget

 one site accesses ≤200000 other accesses >200000

 Simple predicates
 For application (1)
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
 For application (2)
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000
 Pr = Pr' = {p1,p2,p3,p4,p5}
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 32
PHF – Example

 Fragmentation of relation PROJ continued

 Minterm fragments left after elimination
m1 : (LOC = “Montreal”)  (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”)  (BUDGET > 200000)
m3 : (LOC = “New York”)  (BUDGET ≤ 200000)
m4 : (LOC = “New York”)  (BUDGET > 200000)
m5 : (LOC = “Paris”)  (BUDGET ≤ 200000)
m6 : (LOC = “Paris”)  (BUDGET > 200000)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 33
PHF – Example

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 34
PHF – Correctness

 Completeness
 Since Pr' is complete and minimal, the selection predicates are
complete
 Reconstruction
 If relation R is fragmented into FR = {R1,R2,…,Rr}

R = Ri FR Ri
 Disjointness
 Minterm predicates that form the basis of fragmentation should
be mutually exclusive.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 35
Derived Horizontal Fragmentation

 Defined on a member relation of a link according to a

selection operation specified on its owner.
 Each link is an equijoin.
 Equijoin can be implemented by means of semijoins.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 36
DHF – Definition

Given a link L where owner(L)=S and member(L)=R, the

derived horizontal fragments of R are defined as
Ri = R ⋉F Si, 1≤i≤w
where w is the maximum number of fragments that will be
defined on R and
Si = F (S)
i

where Fi is the formula according to which the primary

horizontal fragment Si is defined.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 37
DHF – Example

Given link L1 where owner(L1)=SKILL and member(L1)=EMP

EMP1 = EMP ⋉ SKILL1
EMP2 = EMP ⋉ SKILL2
where
SKILL1 = SAL≤30000(SKILL)
SKILL2 = SAL>30000(SKILL)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 38
DHF – Correctness
 Completeness
 Referential integrity
 Let R be the member relation of a link whose owner is relation S
which is fragmented as FS = {S1, S2, ..., Sn}. Furthermore, let A
be the join attribute between R and S. Then, for each tuple t of
R, there should be a tuple t' of S such that
t[A] = t' [A]
 Reconstruction
 Same as primary horizontal fragmentation.
 Disjointness
 Simple join graphs between the owner and the member
fragments.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 39
Vertical Fragmentation

 Has been studied within the centralized context..

 design methodology….. Its motivation within the centralized
context is as a design tool, which allows the user queries to deal
with smaller relations, thus causing a smaller number of page
accesses
 physical clustering….. It has also been suggested that the most
“active” sub-relations can be identified and placed in a faster
memory subsystem in those cases where memory hierarchies
are supported

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 40
Vertical Fragmentation

 More difficult than horizontal, because more alternatives

exist.
Two approaches :
 Grouping… starts by assigning each attribute to one fragment,
and at each step, joins some of the fragments until some criteria
is satisfied. It was first suggested for centralized databases and
was used later for distributed databases
 attributes to fragments
 Splitting… starts with a relation and decides on beneficial
partitioning based on the access behavior of applications to the
attributes. The technique was also first discussed for centralized
database design then later for distributed databases

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 41
Vertical Fragmentation

 Overlapping fragments vs Non-overlapping fragments

 Splitting…. generates non-overlapping fragments whereas
grouping…. typically results in overlapping fragments
We do not consider the replicated key attributes to be
overlapping.
Advantage:
Easier to enforce functional dependencies
(for integrity checking etc.)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 42
VF – Information Requirements

 Application Information
 Attribute affinities
 a measure that indicates how closely related the attributes are
 This is obtained from more primitive usage data
 Attribute usage values
 Given a set of queries Q = {q1, q2,…, qq} that will run on the relation
R[A1, A2,…, An],

 if attribute Aj is referenced by query qi

1
use(qi,Aj) =
 0 otherwise
use(qi,•) can be defined accordingly

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 43
VF – Examples
The following are example of 4 queries on applications that
are defined for the relation PROJ where vertical fragment
on attributes is indicated
PNO PNAMEBUDGETLOC
q1: Find the budget of a project, given q1 1 0 1 0
its identification number q2 0 1 1 0
SELECT BUDGET FROM PROJ q3 0 1 0 1
WHERE PNO=Value q4 0 0 1 1
q2: Find the names and budgets of all projects.
Attribute Usage Matrix
SELECT PNAME,BUDGET FROM PROJ

q3: Find the names of projects located at a given city.

SELECT PNAME FROM PROJ WHERE LOC=Value

q4: Find the total project budgets for each city.
SELECTSUM(BUDGET) FROM PROJ WHERE LOC=Value

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 44
VF – Affinity Measure aff(Ai,Aj)

The attribute affinity measure between two attributes Ai and Aj

of a relation R[A1, A2, …, An] with respect to the set of
applications Q = (q1, q2, …, qq) is defined as follows :

aff (Ai, Aj)   (query access)

all queries that access A and A i j

access
query access  access 
frequency of a query 
execution
all sites

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 45
VF – Calculation of aff(Ai, Aj)

Assume each query in the previous example accesses the attributes once
during each execution.
Also assume the access frequencies S S S 1 2 3

q1 15 20 10
q2 5 0 0
q3 25 25 25

q4 3 0 0

Attribute access frequencies

Then, the cost Matrix is where
aff(PNO, BUDGET) = 15*1 + 20*1+10*1
PNO BUDGET =45 = 45
PNAMEBUDGET= 5
PNAME LOC =75
BUDGET LOC 3
Attribute usage cost matrix

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 46
VF – Calculation of aff(Ai, Aj)

Using the cost matrix, we can construct the attribute affinity matrix as follows
PNO BUDGET =45
PNAMEBUDGET= 5
PNAME LOC =75
BUDGET LOC 3

Attribute usage cost matrix

The attribute affinity matrix

PNO PNAMEBUDGET LOC

PNO 45 0 45 0
PNAME 0 80 5 75
BUDGET 45 5 53 3

LOC 0 75 3 78

Attribute affinity matrix

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 47
VF – Clustering Algorithm

 The fundamental task in designing a vertical fragmentation

algorithm is to ﬁnd some means of grouping the attributes of
a relation based on the attribute affinity values in AA.
 The reasons behind are the following (Bond energy)
1. It is designed specifically to determine groups of similar items as
opposed to, say, a linear ordering of the items (i.e., it clusters the
attributes with larger affinity values together, and the ones with
smaller values together).
2. The ﬁnal groupings are insensitive to the order in which items are
presented to the algorithm.
3. The computation time of the algorithm is reasonable: O(n 2), where
n is the number of attributes.
4. Secondary interrelationships between clustered attribute groups
are identifiable.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 48
VF – Clustering Algorithm

 In short, the Bond Energy Algorithm (BEA) has been used

for clustering of entities. BEA finds an ordering of entities (in
our case attributes) such that the global affinity measure is
maximized.

AM   (affinity of A and A with their neighbors)

i j

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 49
Bond Energy Algorithm

Input: The AA matrix

Output: The clustered affinity matrix CA which is a
perturbation of AA
 Initialization: Place and fix one of the columns of AA in
CA.
 Iteration: Place the remaining n-i columns in the
remaining i+1 positions in the CA matrix. For each
column, choose the placement that makes the most
contribution to the global affinity measure.
 Row order: Order the rows according to the column
ordering.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 50
Bond Energy Algorithm

“Best” placement? Define contribution of a placement:

cont(Ai, Ak, Aj) = 2bond(Ai, Ak)+2bond(Ak, Aj) –2bond(Ai, Aj)
Where n
bond(Ax,Ay) = aff(A ,A )aff(A ,A )
z x z y

z 1

For instance, with the tabular attributes, here show

computation of bond itself
bond(PNO,BUDGET) =aff(PNO,BUDGET)*aff(BUDGET,PNO)+
aff(PNO,PNAME)*aff(BUDGET,PNAME)+
aff(PNO,BUDGET)*aff(BUDGET,BUDGET)+
aff(PNO,LOC)*aff(BUDGET,LOC)

bond(PNO,BUDGET) =45*45+0*5+45*53+0*78=4410
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 51
BEA – Complete Example
Consider the following AA matrix and the corresponding CA matrix where
PNO and PNAME have been placed. Place BUDGET:
PNO PNAMEBUDGET LOC PNO PNAME
0
PNO 45 0 45 PNO 45 0
75 PNAME 0 80
PNAME 0 80 5
3 BUDGET 45 5
BUDGET 45 5 53

0 75 3 78 LOC 0 75
LOC
Ordering (0-3-1) :
cont(A0,BUDGET,PNO) = 2bond(A0, BUDGET)+2bond(BUDGET, PNO)
–2bond(A0 , PNO)
= 8820
Ordering (1-3-2) :
cont(PNO,BUDGET,PNAME) = 10150
Ordering (2-3-4) :
cont (PNAME,BUDGET,LOC) = 1780

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 52
BEA – Example

 Therefore, the CA matrix has the form

 When LOC is placed, the final form of the CA

matrix (after row organization) is

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 53
VF – Algorithm: Partitioning Algorithm

How can you divide a set of clustered attributes {A1, A2,

…, An} into two (or more) sets {A1, A2, …, Ai} and {Ai, …,
An} such that there are no (or minimal) applications that
access both (or more than one) of the sets.

The objective of
the splitting
activity is to
find sets of
attributes that
are accessed Locating a Splitting Point
solely, or for the
54
most part,
Lecture slides bycustomized from © 2020, M.T. Özsu & P. Valduriez
as adapted and
VF – Algorithm

Two problems :
 Cluster forming in the middle of the CA matrix
 Shift a row up and a column left and apply the algorithm to find
the “best” partitioning point
 Do this for all possible shifts
 Cost O(m2)
 More than two clusters
 m-way partitioning
 try 1, 2, …, m–1 split points along diagonal and try to find the
best point for each of these
 Cost O(2m)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 55
VF – Correctness

A relation R, defined over attribute set A and key K,

generates the vertical partitioning FR = {R1, R2, …, Rr}.
 Completeness
 The following should be true for A:
A =  ARi

 Reconstruction
 Reconstruction can be achieved by
R = ⋈K Ri, Ri  FR
 Disjointness
 TID's are not considered to be overlapping since they are
maintained by the system
 Duplicated keys are not considered to be overlapping

Other approaches to VF apart from Attribute affinity and

bond energy are:
 Association Rule Mining:
 Association rule mining is a data mining technique used to
discover interesting patterns, associations, or relationships
among variables in large datasets.
 In the context of database query analysis, association rule
mining can be applied to identify frequent itemsets, which are
sets of columns that often appear together in queries.
 Algorithms such as Apriori and FP-Growth can be used to mine
association rules from query logs or query execution histories.

Other approaches to VF apart from Attribute affinity and

bond energy are:
 Query Log Analysis:
 Analyzing query logs or query execution histories provides
valuable insights into the usage patterns of database columns.
 By examining the frequency of column references in queries,
you can identify which columns are commonly accessed
together and which queries are frequently executed.
 Techniques such as frequency analysis, sequence analysis, and
clustering can be applied to query logs to uncover patterns of
column usage.

Other approaches to VF apart from Attribute affinity and

bond energy are:
 Correlation analysis:
 Correlation analysis measures the statistical relationship
between pairs of columns in a dataset.
 By calculating correlation coefficients such as Pearson
correlation or Spearman correlation, you can identify columns
that are positively or negatively correlated with each other.
 High correlation between columns suggests that they are
frequently accessed together in queries.

Other approaches to VF apart from Attribute affinity and

bond energy are:
 Dimensionality Reduction:
 Dimensionality reduction techniques such as principal
component analysis (PCA) or singular value decomposition
(SVD) can be applied to query matrices representing the usage
of columns in queries.
 These techniques help identify latent factors or patterns in the
data that explain the variability in query patterns and column
usage.

Other approaches to VF apart from Attribute affinity and

bond energy are:
 Graph-based Analysis:
 Representing queries and column relationships as a graph
enables the application of graph-based analysis techniques.
 Algorithms such as community detection, centrality analysis, and
graph clustering can be applied to identify groups of columns
that are tightly connected or frequently accessed together in
queries.

Other approaches to VF apart from Attribute affinity and

bond energy are:
 Statistical Hypothesis Testing:
 Statistical hypothesis testing techniques can be used to assess
the significance of relationships between columns based on their
usage in queries.
 Methods such as chi-square tests, t-tests, or ANOVA tests can
help determine whether the co-occurrence of columns in queries
is statistically significant.

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 62
Hybrid Fragmentation
In most cases a simple horizontal or vertical fragmentation
of a database schema will not be sufficient to satisfy the
requirements of user applications. In this case a vertical
fragmentation may be followed by a horizontal one, or vice
versa, producing a tree structured partitioning

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 64
Hybrid Fragmentation
In most cases a simple horizontal or vertical fragmentation
of a database schema will not be sufficient to satisfy the
requirements of user applications. In this case a vertical
fragmentation may be followed by a horizontal one, or vice
versa, producing a tree structured partitioning

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 65
Outline
 Distributed and Parallel Database Design
 Fragmentation
 Data distribution

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 66
Fragment Allocation
 Problem Statement
Given
F = {F1, F2, …, Fn} fragments
S ={S1, S2, …, Sm} network sites
Q = {q1, q2,…, qq} applications
Find the "optimal" distribution of F to S.
 Optimality
 Minimal cost
 Communication + storage + processing (read & update)
 Cost in terms of time (usually)
 Performance
Response time and/or throughput
 Constraints
 Per site constraints (storage & processing)

Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 67
Information Requirements
 Database information
 selectivity of fragments
 size of a fragment
 Application information
 access types and numbers
 access localities
 Communication network information
 unit cost of storing data at a site
 unit cost of processing at a site
 Computer system information
 bandwidth
 latency
 communication overhead

File Allocation (FAP) vs Database Allocation (DAP):

 Fragments are not individual files
 relationships have to be maintained
 Access to databases is more complicated
 remote file access model not applicable
 relationship between allocation and query processing
 Cost of integrity enforcement should be considered
 Cost of concurrency control should be considered

General Form
min(Total Cost)
subject to
response time constraint
storage constraint
processing constraint

Decision Variable

1 if fragment Fi is stored at site Sj

xij 
0 otherwise

 Total Cost

 query processing cost 

all queries

  cost of storing a fragment at a site

all sites all fragments

 Storage Cost (of fragment Fj at Sk)

(unit storage cost at Sk)  (size of Fj)  xjk
 Query Processing Cost (for one query)
processing component + transmission component

 Query Processing Cost

Processing component
access cost + integrity enforcement cost + concurrency control cost
 Access cost

  (no. of update accesses+ no. of read accesses) 

all sites all fragments
xij  local processing cost at a site
 Integrity enforcement and concurrency control costs
 Can be similarly calculated

 Query Processing Cost

Transmission component
cost of processing updates + cost of processing retrievals
 Cost of updates

  update message cost 

all sites all fragments
  acknowledgment cost
all sites all fragments
 Retrieval Cost

 min all sites (cost of retrieval command 

all fragments cost of sending back the result)

 Constraints
 Response Time
execution time of query ≤ max. allowable response time for that query

 Storage Constraint (for a site)

 storage requirement of a fragment at that site 

storage capacity at that site
all fragments
 Processing constraint (for a site)

 processing load of a query at that site 

all queries processing capacity of that site

 Solution Methods
 FAP is NP-complete
 DAP also NP-complete
 Heuristics based on
 single commodity warehouse location (for FAP)
 knapsack problem
 branch and bound techniques
 network flow

 Attempts to reduce the solution space

 assume all candidate partitionings known; select the “best”
partitioning
 ignore replication at first
 sliding window on fragments

Question 1

a. Define vertical fragmentation and explain its significance

in distributed database management.

b. Discuss the advantages and challenges associated with

vertical fragmentation compared to other fragmentation
techniques, such as horizontal fragmentation.

c. Explore real-world scenarios or application domains

where vertical fragmentation can provide significant
benefits in terms of data management, query
optimization, and scalability.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 77
Individual exercise

Question 2: Consider fragment design

a. Select a specific scenario or application domain (e.g., e-

commerce, healthcare, finance) for which vertical
fragmentation is suitable.

b. Identify a relation schema relevant to the chosen

scenario and propose a vertical fragmentation strategy
based on the attributes of the relation.

c. Justify your fragmentation strategy by analyzing the

specific requirements, access patterns, and scalability
considerations of the chosen scenario.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 78

Photoshop MCQ Questions and Answers
73% (15)
Photoshop MCQ Questions and Answers
9 pages
2 Distribution Design
No ratings yet
2 Distribution Design
76 pages
2 Distribution Design
No ratings yet
2 Distribution Design
73 pages
Lecture4-Distribution - Design - Replica Allocation
No ratings yet
Lecture4-Distribution - Design - Replica Allocation
70 pages
2 Distribution Design
No ratings yet
2 Distribution Design
73 pages
M2 L2 Fragmentation
No ratings yet
M2 L2 Fragmentation
42 pages
3-Distribution Design
No ratings yet
3-Distribution Design
66 pages
Lecture 9
No ratings yet
Lecture 9
53 pages
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
No ratings yet
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
73 pages
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
No ratings yet
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
76 pages
Chapter 3: Distributed Database Design
No ratings yet
Chapter 3: Distributed Database Design
44 pages
ADB - Unit - II (Chapter-2)
No ratings yet
ADB - Unit - II (Chapter-2)
67 pages
DDBS Lecture4
No ratings yet
DDBS Lecture4
23 pages
Distribution Design
No ratings yet
Distribution Design
33 pages
Vu Lec 14
No ratings yet
Vu Lec 14
21 pages
DDBS Lecture3
No ratings yet
DDBS Lecture3
33 pages
Chapter 6 DDBMS
No ratings yet
Chapter 6 DDBMS
41 pages
3distribution Design
No ratings yet
3distribution Design
65 pages
On The Exam We Can Have 1 Cheat Sheet: Blg/Edit?Usp Sharing
No ratings yet
On The Exam We Can Have 1 Cheat Sheet: Blg/Edit?Usp Sharing
40 pages
8th DD 2023-4 Seg 3
No ratings yet
8th DD 2023-4 Seg 3
11 pages
Lecture7 8
No ratings yet
Lecture7 8
26 pages
Distributed Database Design
No ratings yet
Distributed Database Design
51 pages
CSE 453 Slide 2
No ratings yet
CSE 453 Slide 2
75 pages
Distributed Database Design
No ratings yet
Distributed Database Design
49 pages
Fragmentation Instructor: Mehwashma Amir
No ratings yet
Fragmentation Instructor: Mehwashma Amir
17 pages
Distributed Database Design
No ratings yet
Distributed Database Design
73 pages
Distributed DB New
No ratings yet
Distributed DB New
44 pages
Distributed Database Management Systems: Week-4
No ratings yet
Distributed Database Management Systems: Week-4
24 pages
4.1 Lecture 4 Distributed Databases
No ratings yet
4.1 Lecture 4 Distributed Databases
42 pages
Chapter 3 Distributed Database Design
No ratings yet
Chapter 3 Distributed Database Design
34 pages
3 Distribution Design
No ratings yet
3 Distribution Design
65 pages
IJERT Efficient Fragmentation and Alloca
No ratings yet
IJERT Efficient Fragmentation and Alloca
7 pages
Distributed Database Design
No ratings yet
Distributed Database Design
15 pages
Fragmentation: Univ.-Prof. Dr. Peter Brezany Institut Für Scientific Computing Universität Wien
No ratings yet
Fragmentation: Univ.-Prof. Dr. Peter Brezany Institut Für Scientific Computing Universität Wien
17 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
7-Distributed DB
No ratings yet
7-Distributed DB
37 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
42 pages
Distrubuted Database Concept
No ratings yet
Distrubuted Database Concept
22 pages
Distributed DB Design
No ratings yet
Distributed DB Design
10 pages
Dbms Unit V Notes 2 27
No ratings yet
Dbms Unit V Notes 2 27
26 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
29 pages
1 Introduction
No ratings yet
1 Introduction
58 pages
Chapter 2
No ratings yet
Chapter 2
61 pages
Ace of PACE Sample Paper
55% (20)
Ace of PACE Sample Paper
5 pages
Unit 1
No ratings yet
Unit 1
28 pages
DDB 05 PDF
No ratings yet
DDB 05 PDF
19 pages
DBMS-Unit 5
No ratings yet
DBMS-Unit 5
27 pages
Ch2 Ch3 Mina
No ratings yet
Ch2 Ch3 Mina
10 pages
ddb03 2
No ratings yet
ddb03 2
62 pages
2024 Ceed Mathematics - Paper I
No ratings yet
2024 Ceed Mathematics - Paper I
14 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Adt Unit I
No ratings yet
Adt Unit I
18 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
Distributed Database Design
No ratings yet
Distributed Database Design
52 pages
ADBS Chapter Seven
No ratings yet
ADBS Chapter Seven
22 pages
Dist DB
No ratings yet
Dist DB
15 pages
PMDG 737 Flows + FS2CREW PDF
100% (1)
PMDG 737 Flows + FS2CREW PDF
15 pages
Distributed Database: Source
No ratings yet
Distributed Database: Source
19 pages
Solutions Zake
No ratings yet
Solutions Zake
112 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
Drafting and Making The Shieldmaiden Corset
100% (2)
Drafting and Making The Shieldmaiden Corset
6 pages
Centralized Versus Distributed DBMS: T T T T A A A A
No ratings yet
Centralized Versus Distributed DBMS: T T T T A A A A
3 pages
2.RGP Corneal Lens
No ratings yet
2.RGP Corneal Lens
13 pages
Prowirl F 200 PDF
No ratings yet
Prowirl F 200 PDF
98 pages
MacOS Monograph
No ratings yet
MacOS Monograph
58 pages
IS221 Lecture2
No ratings yet
IS221 Lecture2
147 pages
Assignment 1
No ratings yet
Assignment 1
13 pages
Program Design 2
No ratings yet
Program Design 2
22 pages
Measurement and Error: Definition, Accuracy and Precision Significant Figures, Types of Errors Electrical Standards, IEEE Standards
No ratings yet
Measurement and Error: Definition, Accuracy and Precision Significant Figures, Types of Errors Electrical Standards, IEEE Standards
3 pages
Is 221-Lecture 1
No ratings yet
Is 221-Lecture 1
78 pages
BCS 040 Previous Year Question Papers by Ignouassignmentguru 2
No ratings yet
BCS 040 Previous Year Question Papers by Ignouassignmentguru 2
66 pages
Line Parameters Program: Frequency-Dependent Electromagnetic
No ratings yet
Line Parameters Program: Frequency-Dependent Electromagnetic
10 pages
Query
No ratings yet
Query
111 pages
OmniStudio Build Simple Integration Procedures
No ratings yet
OmniStudio Build Simple Integration Procedures
11 pages
SISS S13 LiuJian FHE by LiuJian
No ratings yet
SISS S13 LiuJian FHE by LiuJian
7 pages
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
No ratings yet
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
25 pages
BT413-Lecture 003 2022 23
No ratings yet
BT413-Lecture 003 2022 23
87 pages
Albert Einstein
No ratings yet
Albert Einstein
19 pages
BT413-Lecture 004 2022 23
No ratings yet
BT413-Lecture 004 2022 23
64 pages
CH 19 Cardiovascular System
No ratings yet
CH 19 Cardiovascular System
25 pages
BT413-Lecture 002 2022 23
No ratings yet
BT413-Lecture 002 2022 23
50 pages
Shrivastava Et Al 2023 Rapid Estimation of Size Based Heterogeneity in Monoclonal Antibodies by Machine Learning
No ratings yet
Shrivastava Et Al 2023 Rapid Estimation of Size Based Heterogeneity in Monoclonal Antibodies by Machine Learning
11 pages
Ultrapac 2000 Standard, Ultrapac 2000 Superplus, Mini (Typ 0005 Bis 0025)
No ratings yet
Ultrapac 2000 Standard, Ultrapac 2000 Superplus, Mini (Typ 0005 Bis 0025)
3 pages
How To Know (Check) My Own Mobile Number - Airtel, Idea, Jio Vodafone, Tata Docomo, Reliance, BSNL, Aircel, MTNL, Videocon, Virgin, Uninor
No ratings yet
How To Know (Check) My Own Mobile Number - Airtel, Idea, Jio Vodafone, Tata Docomo, Reliance, BSNL, Aircel, MTNL, Videocon, Virgin, Uninor
3 pages
MG HG Replacement
No ratings yet
MG HG Replacement
16 pages
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
No ratings yet
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
35 pages
Electroválvula Honeywell TN UR
No ratings yet
Electroválvula Honeywell TN UR
20 pages
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
No ratings yet
MT 1117: Linear Algebra For ICT: Instructor: A.V. Mathias Department of Mathematics & Statistics University of Dodoma
26 pages
MT 1112: Calculus: Integrals
No ratings yet
MT 1112: Calculus: Integrals
23 pages
(DRAFT 3) Zone C (COESE, CIVE) Timetable - Teaching Timetable2021-11-06-131131 - 211108 - 110830
No ratings yet
(DRAFT 3) Zone C (COESE, CIVE) Timetable - Teaching Timetable2021-11-06-131131 - 211108 - 110830
21 pages
Experiment 2 VOM
No ratings yet
Experiment 2 VOM
5 pages
Fractional Fourier Transform
No ratings yet
Fractional Fourier Transform
28 pages
WF4 Pre Production HoW
No ratings yet
WF4 Pre Production HoW
142 pages
Biology Revision KS3 Cells To Systems and Respiration
No ratings yet
Biology Revision KS3 Cells To Systems and Respiration
3 pages
Module - 7 Lecture Notes - 2 Mixed Integer Programming: y C B X
No ratings yet
Module - 7 Lecture Notes - 2 Mixed Integer Programming: y C B X
3 pages
DUAL NATURE Test
No ratings yet
DUAL NATURE Test
2 pages
(New) Akh-0.66k-φ Split Ct (5a) 英文
No ratings yet
(New) Akh-0.66k-φ Split Ct (5a) 英文
2 pages
DVE Viscometer
No ratings yet
DVE Viscometer
1 page
CSP2101 Scripting Languages Assignment 3 - Software Based Solution
No ratings yet
CSP2101 Scripting Languages Assignment 3 - Software Based Solution
8 pages
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Perceptual Computing: Fundamentals and Applications
From Everand
Perceptual Computing: Fundamentals and Applications
Fouad Sabry
No ratings yet