10-08-2024
Query decomposition and data
localization
Dario Della Monica
These slides are a modified version of the slides provided with the book
Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011
The original version of the slides is available at: extras.springer.com
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1
Outline (distributed DB)
• Introduction (Ch. 1) ⋆
• Distributed Database Design (Ch. 3) ⋆
• Distributed Query Processing (Ch. 6-8) ⋆
➡ Overview (Ch. 6) ⋆
➡ Query decomposition and data localization (Ch. 7) ⋆
➡ Distributed query optimization (Ch. 8) ⋆
• Distributed Transaction Management (Ch. 10-12) ⋆
⋆ Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/2
1
10-08-2024
Outline (today)
• Query decomposition and data localization (Ch. 7) ⋆
➡ The problem of distributed data localization
➡ A naïve algorithm
➡ Optimization steps (reductions)
✦ PHF (selection, join)
✦ VF (projection)
✦ DHF (join)
✦ Hybrid Fragmentation (selection/join + projection)
⋆ Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/3
Data Localization
Input: Relational algebra expression on global, distributed relations (distributed
query)
Output: Relational algebra expression on fragments (localized query)
• Localization uses global information about distribution of fragments (no use of
quantitative information, e.g., catalog statistics)
• Recall that fragmentation is obtained by several application of rules expressed
by relational algebra …
➡ primary horizontal fragmentation: selection σ
➡ derived horizontal fragmentation: semijoin ⋉
➡ vertical fragmentation: projection
• … and that reconstruction (reverse fragmentation) rules are also expressed in
relational algebra
➡ horizontal fragmentation: union ∪
➡ vertical fragmentation: join ⋈
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/4
2
10-08-2024
A naïve algorithm to localize
distribute queries
• Localization program: relational algebra expression that reconstructs a
global relation from its fragments, by reverting the rules employed for
fragmentation
• A localized query is obtained from distributed, global query by replacing
leaves (global relations) with (the tree of) its corresponding localization
program
➡ Leaves of localized queries are fragments
• This approach to obtain a localized query from a distributed one is
inefficient and the result can be improved
➡ During data localization there is a first optimization phase
✦ we call it reduction
✦ different from the “proper” global optimization phase (“proper” in the sense of
the centralized case, i.e., finding the “best” strategy for executing the query)
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/5
Example
Assume
EMP ⋈ ASG
• EMP is fragmented as follows:
=
(EMP1 ∪ EMP2 ∪ EMP3) ⋈ (ASG1 ∪ ASG2 )
➡ EMP1= ENO≤“E3”(EMP)
➡ EMP2= “E3”<ENO≤“E6”(EMP)
➡ EMP3= ENO≥“E6”(EMP)
• ASG is fragmented as follows: ⋈ENO
➡ ASG1= ENO≤“E3”(ASG)
➡ ASG2= ENO>“E3”(ASG)
Replace EMP by (EMP1 EMP2 EMP3) EMP1 EMP3 ASG1 ASG2
and ASG by (ASG1 ASG2) in any query EMP2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/6
3
10-08-2024
Provides Parallellism
( EMP1 ∪ EMP2 ∪ EMP3) ⋈ (ASG1 ∪ ASG2 )
EMP1= ENO≤“E3”(EMP)
=
EMP2= “E3”<ENO≤“E6”(EMP) ( EMP1 ⋈ ASG1) ∪ ( EMP1 ⋈ ASG2) ∪
EMP3= ENO≥“E6”(EMP) ( EMP2 ⋈ ASG1) ∪ ( EMP2 ⋈ ASG2) ∪
( EMP3 ⋈ ASG1) ∪ ( EMP3 ⋈ ASG2)
ASG1= ENO≤“E3”(ASG)
ASG2= ENO>“E3”(ASG)
⋈ENO
⋈ENO ⋈ENO ⋈ENO ⋈ENO ⋈ ⋈ENO
EMP 1 EMP3 ASG1 ASG2 ENO
EMP2
EMP1 ASG1 EMP1 ASG2 EMP2 ASG1 EMP2 ASG2 EMP3 ASG1 EMP3 ASG2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/7
Eliminates Unnecessary Work
⋈ENO ⋈ENO ⋈ENO
EMP1 ASG1 EMP2 ASG2 EMP3 ASG2
Identify (pairs of) fragments that can be ignored because they produce
empty relations (e.g., when a selection or a join is applied to them)
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/8
4
10-08-2024
Reduction for PHF – Selection
• Reduction of a selection over a relation fragmented with PHF: ignore a fragment
if selection predicate and fragment predicate are contradictory
➡ Consider p(R)
➡ Horizontal fragmentation on R: FR={R1, R2, …, Rw}, where Rj=p (R)
j
➡ p(Rj)= if x in R: ¬(p(x) pj(x)) i.e., p and pj are contradictory
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/9
Reduction for PHF – Selection
(Example)
• Reduction of a selection over a relation fragmented with PHF: ignore a fragment
if selection predicate and fragment predicate are contradictory
➡ Example SELECT * EMP1= ENO≤“E3”(EMP)
FROM EMP EMP2= “E3”<ENO≤“E6”(EMP)
WHERE ENO="E5" EMP3= ENO≥“E6”(EMP)
ASG1= ENO≤“E3”(ASG)
ASG2= ENO>“E3”(ASG)
ENO=“E5” ENO=“E5” ENO=“E5”
EMP EMP1 EMP2 EMP3 EMP2
distributed query localized query reduced local query
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/10
10
5
10-08-2024
Reduction for PHF – Join
• Reduction of a join over relations fragmented with PHF: ignore the join of 2
fragments if their fragment predicates are contradictory over the join attributes
➡ Possible if fragmentation predicates (minterms) involve the join attribute
➡ Distribute join over union
R ⋈ S (R1 R2) ⋈ (S1 S2)
(R1 ⋈ S1) (R1 ⋈ S2) (R2 ⋈ S1) (R2 ⋈ S2)
➡ Then, join between 2 fragments can be simplified in some cases
✦ Given Ri =pi(R) and Sj = pj(S) [pi and pj defined over join attributes]
Ri ⋈Sj = if x in R ⋈ S : ¬(pi(x) pj(x)) [there is a mistake in the textbook]
i.e., pi and pj are contradictory
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/11
11
Reduction for PHF – Join (Example)
EMP1= ENO≤“E3”(EMP) ⋈ENO
EMP2= “E3”<ENO≤“E6”(EMP)
EMP3= ENO≥“E6”(EMP)
ASG1= ENO≤“E3”(ASG)
ASG2= ENO>“E3”(ASG)
EMP1 EMP2 EMP3 ASG1 ASG2
• Consider the query
SELECT *
FROM EMP,ASG
WHERE EMP.ENO=ASG.ENO
• Distribute join over unions ⋈ENO ⋈ENO ⋈ENO
• Apply the reduction rule
Not always useful EMP1 ASG1 EMP2 ASG2 EMP3 ASG2
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/12
12
6
10-08-2024
Reduction for VF
• Reduction of a projection over a relation fragmented with VF: ignore the fragment for
which the set of fragmentation attributes intersected with the set of projection
attributes is contained in the primary key
• Recall that the localization program consists in joins over key attributes
• Let R1 be a fragment of R obtained as R1 = A’ (R) where A’ attr(R) :
➡ Reduction of a projection A’’ over R1 is possible when A’’ ∩ A’ key(R)
Ex.: EMP1 = ENO,ENAME (EMP)
ENAME ENAME
EMP2 = ENO,TITLE (EMP)
⋈ENO
SELECT ENAME
FROM EMP
EMP1 EMP2 EMP1
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/13
13
Reduction for DHF
• Similar to the case PHF
• DHF: 2 relations S (owner) and R (member) in association one-to-many
➡ S participates with cardinality N , R participates with cardinality 1
➡ Fragmentation propagate from S to R
➡ Localization program: union
➡ Compatible fragments (i.e., fragments that agree on the values of join
attributes) are placed at the same site
• Reduction of a join over relations fragmented with DHF: only join
“corresponding” fragments
➡ Distribute joins over unions
➡ Apply the join reduction for horizontal fragmentation
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/14
14
7
10-08-2024
Reduction for DHF – Example
• Example [EMP is owner , ASG is member]
EMP1: TITLE=“Programmer” (EMP) Always convenient
EMP2: TITLE ≠“Programmer” (EMP) - the number of joins is always equal to the number
ASG1: ASG ⋉ENO EMP1 of fragments
ASG2: ASG ⋉ENO EMP2 - all joins can be performed in parallel (are disjoint)
• Query SELECT *
FROM EMP, ASG
WHERE ASG.ENO = EMP.ENO
⋈ENO
⋈ENO ⋈ENO
ASG1 ASG2 EMP1 EMP2 ASG1 EMP1 ASG2 EMP2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/15
15
Complex reduction for PHF and DHF
1. Generic query ⋈ENO SELECT *
FROM EMP, ASG
WHERE ASG.ENO = EMP.ENO
TITLE=“Mech. Eng.” AND EMP.TITLE = “Mech. Eng”
ASG1 ASG2 EMP1 EMP2
2. Reduction of selection over a 3. Reduction of join over a relation
relation fragmented with HF fragmented with DHF
⋈ENO ⋈ENO
TITLE=“Mech. Eng.” TITLE=“Mech. Eng.”
ASG1 ASG2 EMP2 ASG2 EMP2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/16
16
8
10-08-2024
Reduction for Hybrid Fragmentation
• Combine the rules already specified
➡ Remove empty relations generated by contradicting predicates (inside
selections or joins) on horizontal fragments
➡ Remove useless relations generated by projections on vertical fragments
➡ Distribute joins/selections/projections over unions in order to isolate and
remove useless operands
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/17
17
Reduction for Hybrid Fragmentation
Example
ENAME
Consider the following hybrid fragmentation:
ENAME
EMP1= ENO≤"E4" (ENO,ENAME (EMP))
EMP2= ENO>"E4" (ENO,ENAME (EMP)) ENO=“E5”
EMP3= ENO,TITLE (EMP)
⋈ENO
ENO=“E5”
Thus, the localization program for EMP is:
EMP = (EMP1 ∪ EMP2) ⋈ EMP3
Consider also the query:
EMP2
SELECT ENAME
FROM EMP EMP1 EMP2 EMP3
WHERE ENO="E5"
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/18
18