0% found this document useful (0 votes)
5 views42 pages

M2 L2 Fragmentation

The document discusses fragmentation in distributed databases. It describes how relations can be fragmented horizontally and vertically. It also discusses primary horizontal fragmentation, including determining a complete and minimal set of predicates and generating minterm predicates to fragment relations. An example demonstrates horizontally fragmenting relations based on location and budget attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views42 pages

M2 L2 Fragmentation

The document discusses fragmentation in distributed databases. It describes how relations can be fragmented horizontally and vertically. It also discusses primary horizontal fragmentation, including determining a complete and minimal set of predicates and generating minterm predicates to fragment relations. An example demonstrates horizontally fragmenting relations based on location and budget attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Systems Infrastructure for Data

Science
Web Science Group
Uni Freiburg
WS 2012/13
Lecture VIII: Fragmentation
Fragmentation
• Fragments should be subsets of database
relations due to two main reasons:
– Access locality: Application views are subsets of
relations. Also, multiple views that access a relation
may reside at different sites.
– Query concurrency and system throughput: Sub-
queries can operate on fragments in parallel.
• Main issues:
– Views that cannot be defined on a single fragment will
require extra processing and communication cost.
– Semantic data control (e.g., integrity checking) of
dependent fragments residing at different sites is
more complicated and costly.

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 3


Fragmentation Alternatives
• Horizontal fragmentation
– Primary horizontal fragmentation
– Derived horizontal fragmentation
• Vertical fragmentation
• Hybrid fragmentation

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 4


Example Database

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 5


Horizontal Fragmentation Example

Projects with BUDGET < $200,000 Projects with BUDGET ≥ $200,000

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 6


Vertical Fragmentation Example

Project budgets Project names and locations

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 7


Hybrid Fragmentation Example

Projects with BUDGET < $200,000 Projects with BUDGET ≥ $200,000


Horizontal

Project budgets Project names and locations


Vertical

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 8


Correctness of Fragmentation
• Completeness
– Decomposition of relation R into fragments R1, R2, .., Rn is complete iff each
data item in R can also be found in one or more of Ri’s.

• Reconstruction
– If a relation R is decomposed into fragments R1, R2, .., Rn, then there should
exist a relational operator θ such that R = θ1≤i≤nRi.

• Disjointness
– If a relation R is horizontally (vertically) decomposed into fragments R1, R2,
.., Rn, and data item di (non-primary key attribute di) is in Rj, then di should
not be in any other fragment Rk (k ≠ j).

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 9


Horizontal Fragmentation Algorithms
What is given?
• Relationships among database relations

Li: one-to-many relationship from an “owner” to a “member”

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 10


Horizontal Fragmentation Algorithms
What is given?
• Cardinality of each database relation
• Mostly used predicates in user queries
• Predicate selectivities
• Access frequencies for data

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 11


Horizontal Fragmentation Algorithms
Predicates
• Simple predicate
– Given R(A1, A2, .., An), a simple predicate pj is defined as “pj: Ai θ value”,
where θ є {=, <, ≤, >, ≥, ≠} and value є Di, where Di is the domain of Ai.
– Examples:
PNAME = “Maintenance”
BUDGET ≤ 200000

• Minterm predicate
– A conjunction of simple and negated simple predicates
– Examples:
PNAME = “Maintenance” AND BUDGET ≤ 200000
NOT(PNAME = “Maintenance”) AND BUDGET ≤ 200000

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 12


Primary Horizontal Fragmentation
Definition
• Given an owner relation R, its horizontal fragments are
given by
Ri = σFi(R), 1 ≤ i ≤ w
where Fi is a minterm predicate.
• First step: Determine a set of simple predicates that will
form the minterm predicates. This set of simple
predicates must have two key properties:
– completeness
– minimality

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 13


Completeness of Simple Predicates
Definition
• A set of simple predicates P is complete iff the
accesses to the tuples of the minterm fragments
defined on P requires that two tuples of the same
minterm fragment have the same probability of
being accessed by any application.

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 14


Completeness of Simple Predicates
Example

Set of simple predicates:

P = {LOC=“Montreal”,
LOC=“New York”,
LOC=“Paris”}

App 1: Find the budgets of projects at each location.


App 2: Find projects with budgets less than $200000.
P = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET ≤ 200000, BUDGET > 200000} complete
Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 15
Minimality of Simple Predicates
Definition
• A set of simple predicates P is complete iff for each
predicate p є P:
– if p influences how fragmentation is performed (i.e.,
causes a fragment f to be further fragmented into fi anf fj),
then there should be at least one application that accesses
fi and fj differently.

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 16


Minimality of Simple Predicates
Example
App 1: Find the budgets of projects at each location.
App 2: Find projects with budgets less than $200000.

P = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,


BUDGET ≤ 200000, BUDGET > 200000}
complete & minimal
+ PNAME=“Instrumentation”

P = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,


BUDGET ≤ 200000, BUDGET > 200000,
PNAME=“Instrumentation”} complete & NOT minimal

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 17


Primary Horizontal Fragmentation
COM_MIN Algorithm Sketch
• Input: a relation R and a set of simple predicates Pr
• Output: a complete and minimal set of simple predicates
Pr’ for Pr

• Rule 1: A relation or fragment is partitioned into at least


two parts which are accessed differently by at least one
application.

• Find a pi є Pr such that pi partitions R according to Rule 1.


Initialize Pr’ = pi.
• Iteratively add predicates to Pr’ until it is complete.

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 18


Primary Horizontal Fragmentation
PHORIZONTAL Algorithm Sketch
• Input: a relation R and a set of simple predicates Pr
• Output: a set of minterm predicates M according to
which relation R is to be fragmented

• Pr’ ← COM_MIN(R, Pr)


• Determine the set M of minterm predicates
• Determine the set I of implications among pi є Pr’
• Eliminate the minterms from M that contradict with I

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 19


Primary Horizontal Fragmentation
Example
• PAY(title, sal) and PROJ(pno, pname, budget, loc)
• Fragmentation of relation PAY
– Application: Check the salary info and determine raise.
(employee records kept at two sites → application run at
two sites)
– Simple predicates
• p1 : sal ≤ 30000
• p2 : sal > 30000
• Pr = {p1, p2} which is complete and minimal Pr‘ = Pr
– Minterm predicates
• m1 : (sal ≤ 30000)
• m2 : NOT(sal ≤ 30000) = (sal > 30000)

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 20


Primary Horizontal Fragmentation
Example

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 21


Primary Horizontal Fragmentation
Example
• Fragmentation of relation PROJ
– App1: Find the name and budget of projects given
their location. (issued at 3 sites)
– App2: Access project information according to budget
(one site accesses ≤ 200000, other accesses > 200000)
– Simple predicates
• For App1:
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
• For App2:
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000
• Pr = Pr' = {p1, p2, p3, p4, p5}
Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 22
Primary Horizontal Fragmentation
Example
• Fragmentation of relation PROJ
– Minterm fragments left after elimination
m1 : (LOC = “Montreal”) AND (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”) AND (BUDGET > 200000)
m3 : (LOC = “New York”) AND (BUDGET ≤ 200000)
m4 : (LOC = “New York”) AND (BUDGET > 200000)
m5 : (LOC = “Paris”) AND (BUDGET ≤ 200000)
m6 : (LOC = “Paris”) AND (BUDGET > 200000)

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 23


Primary Horizontal Fragmentation
Example

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 24


Primary Horizontal Fragmentation
Correctness
• Completeness
– Since Pr' is complete and minimal, the selection
predicates are complete.
• Reconstruction
– If relation R is fragmented into FR = {R1, R2, .., Rr}
R = URi є FR Ri
• Disjointness
– Minterm predicates that form the basis of
fragmentation should be mutually exclusive.

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 25


Derived Horizontal Fragmentation
• Defined on a member relation of a link
according to a selection operation specified on
its owner.
• Two important points:
– Each link is an equi-join.
– Equi-join can be implemented using semi-joins.

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 26


Semi-join
• Given R(A) and S(B), semi-join of R with S is defined as follows:

• Example:

Semi-join reduces the amount of data that needs to be transmitted btw sites.
Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 27
Derived Horizontal Fragmentation
• Given relations R and S: S owner

L
R member

• The derived horizontal fragments of R are defined as


Ri = R Si, 1 ≤ i ≤ w
where w is the maximum number of fragments that
will be defined on R, and
Si = σFi(S)
where Fi is the formula according to which the
primary horizontal fragment Si is defined.

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 28


Derived Horizontal Fragmentation
Example
Primary horizontal fragments of PAY:
PAY1 = σ sal ≤ 30000 (PAY)
PAY2 = σ sal > 30000 (PAY)

Derived horizontal fragments of EMP:


EMP1 = EMP PAY1
EMP2 = EMP PAY2

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 29


Derived Horizontal Fragmentation
Correctness
• Completeness
– Referential integrity
– Let R be the member relation of a link whose owner is relation S
which is fragmented as FS = {S1, S2, .., Sn}. Furthermore, let A be
the join attribute between R and S. Then, for each tuple t of R,
there should be a tuple t' of S such that
t[A]=t'[A]
• Reconstruction
– If relation R is fragmented into FR = {R1, R2, .., Rr}
R = U R i є F R Ri
• Disjointness
– Simple join graphs between the owner and the member
fragments.

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 30


Vertical Fragmentation
• Divide a relation R into fragments R1, R2, .., Rr,
each of which contains a subset of R’s attributes
as well as the primary key of R.
• Goal: to minimize the execution time of user
applications that run on these fragments.
• Too many alternatives => Use heuristic solutions
based on:
– Grouping: merge attributes to fragments
– Splitting: divide a relation into fragments
• Better for disjointness and easier dependency enforcement

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 31


Vertical Fragmentation Algorithms
What is given?
• Attribute usage matrix of the application queries
• Example: PROJ(PNO, PNAME, BUDGET, LOC)
Q1: SELECT BUDGET Q2: SELECT PNAME, BUDGET
FROM PROJ FROM PROJ
WHERE PNO=110
Q3: SELECT PNAME Q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=“New York” WHERE LOC=“New York”
A1 A2 A3 A4
Q1 1 0 1 0
Q2 0 1 1 0
Q3 0 1 0 1

Uni Freiburg, WS2012/13


Q4 1 for Data
0Systems 0Infrastructure 1 Science 32
Vertical Fragmentation Algorithms
What is given?
• Attribute affinity matrix
• Togetherness measure for attribute pairs
• Given a relation R(A1, A2, .., An), the affinity between Ai and Aj
w.r.t. a set of application queries Q = {Q1, Q2, .., Qq} is defined
as follows:

aff (Ai, Aj) = ∑all queries that access A and A (query access)
i j

comes from the attribute usage matrix

access
query access = ∑ all sites
access frequency of a query *
execution

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 33


Vertical Fragmentation
Algorithm Sketch
• Cluster step: Permute rows and columns of
the attribute affinity matrix to generate a
clustered affinity matrix where attributes in
each cluster are in high affinity to each other.
A A A A A A A A
1 2 3 4 1 3 2 4
A 45 0 45 0 A 45 45 0 0
1 1
A 0 80 5 75 A
2 3 45 53 5 3
A A
3 45 5 53 3 2 0 5 80 75
A A
4 0 75 3 78 4 0 3 75 78

[ Bond Energy Algorithm ]


Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 34
Vertical Fragmentation
Algorithm Sketch
• Partition step: Divide the clustered attributes into
non-overlapping partitions such that the number of
application queries that access to more than one
partition is as small as possible.

Given:
TQ = set of applications that access only TA
BQ = set of applications that access only BA
TA OQ = set of applications that access both TA and BA
CTQ = total number of accesses to attributes by TQ
CBQ = total number of accesses to attributes by BQ
COQ = total number of accesses to attributes by OQ
Find:
BA The point along the diagonal that maximizes
CTQ*CBQ-COQ2

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 35


Vertical Fragmentation
Correctness
• A relation R, defined over attribute set A and key K, generates
the vertical partitioning FR = {R1, R2, …, Rr}.
• Completeness
– The following should be true for A:
A =∪ ARi
• Reconstruction
– Reconstruction can be achieved by
R= Ri ,∀Ri ∈ FR
K
• Disjointness
– Duplicated keys are not considered to be overlapping

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 36


Hybrid Fragmentation
• Obtained by applying horizontal and vertical
fragmentation one after the other.
• In practice, nesting level does not exceed 2.
• Correctness properties are guaranteed if constituent
fragmentations are correct.
• Bottom-up reconstruction:

H H

V V V V V

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 37


Fragment Allocation
• Problem definition:
– Given a set of fragments F, a set of network sites S, and a
set of application queries Q, find the optimal distribution
of F to S.
• Optimality measures:
– Minimal cost = communication + storage + processing
– Optimal performance = response time and/or throughput
• Complex problem, heuristic solutions

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 38


Fragment Allocation High-Level Model
• Minimize(total cost)
• Subject to
– Response time constraint
– Storage constraint
– Processing constraint
• Decide on variable xij

1 if fragment Fi is stored at site Sj


xij = 
0 otherwise

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 39


Fragment Allocation Algorithms
What is given?
• Size of a fragment in bytes
• Selectivity of a fragment w.r.t. a query
• Number of read and update accesses of a query on a
fragment
• Access localities
• Max. response time for each application
• Costs and capacities of sites

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 40


Fragment Allocation Alternatives
• Non-replicated
– Partitioned: each fragment at only one site
• Replicated
– Fully replicated: each fragment at each site
– Partially replicated: each fragment at some of the
sites
• Rule of thumb:
– If read-only queries ≥ 1 , then replication pays off.
update queries

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 41


Fragment Allocation Alternatives

Uni Freiburg, WS2012/13 Systems Infrastructure for Data Science 42

You might also like