Outline
• Introduction
• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
➡ Overview
➡ Query decomposition and localization
➡ Distributed query optimization
• Multidatabase query processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/1
Step 1 – Query Decomposition
Input : Calculus query on global relations
• Normalization
➡ manipulate query quantifiers and qualification
• Analysis
➡ detect and reject “incorrect” queries
➡ possible for only a subset of relational calculus
• Simplification
➡ eliminate redundant predicates
• Restructuring
➡ calculus query algebraic query
➡ more than one translation is possible
➡ use transformation rules
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/2
Normalization
• Lexical and syntactic analysis
➡ check validity (similar to compilers)
➡ check for attributes and relations
➡ type checking on the qualification
• Put into normal form
➡ Conjunctive normal form
(p11 p12 … p1n) … (pm1 pm2 … pmn)
➡ Disjunctive normal form
(p11 p12 … p1n) … (pm1 pm2 … pmn)
➡ OR's mapped into union
➡ AND's mapped into join or selection
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/3
Analysis
• Refute incorrect queries
• Type incorrect
➡ If any of its attribute or relation names are not defined in the global schema
➡ If operations are applied to attributes of the wrong type
• Semantically incorrect
➡ Components do not contribute in any way to the generation of the result
➡ Only a subset of relational calculus queries can be tested for correctness
➡ Those that do not contain disjunction and negation
➡ To detect
✦ connection graph (query graph)
✦ join graph
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/4
Analysis – Example
SELECT ENAME,RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"
Query graph Join graph
DUR≥36
ASG ASG
EMP.ENO=ASG.ENO ASG.PNO=PROJ.PNO EMP.ENO=ASG.ENO ASG.PNO=PROJ.PNO
TITLE =
EMP RESP PROJ EMP PROJ
“Programmer”
ENAME
RESULT
PNAME=“CAD/CAM”
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/5
Analysis
If the query graph is not connected, the query may be wrong or
use Cartesian product
SELECT ENAME,RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND PNAME = "CAD/CAM"
AND DUR > 36
AND TITLE = "Programmer"
ASG
EMP RESP PROJ
ENAME
RESULT
PNAME=“CAD/CAM”
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/6
Simplification
• Why simplify?
➡ Remember the example
• How? Use transformation rules
➡ Elimination of redundancy
✦ idempotency rules
p1 ¬( p1) false
p1 (p1 p2) p1
p1 false p1
…
➡ Application of transitivity
➡ Use of integrity rules
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/7
Simplification – Example
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = "J. Doe"
OR (NOT(EMP.TITLE = "Programmer")
AND (EMP.TITLE = "Programmer"
OR EMP.TITLE = "Elect. Eng.")
AND NOT(EMP.TITLE = "Elect. Eng."))
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = "J. Doe"
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/8
Restructuring
• Convert relational calculus to relational ENAME Project
algebra
• Make use of query trees σDUR=12 OR DUR=24
• Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
σ PNAME=“CAD/CAM” Select
project for either 1 or 2 years.
SELECT ENAME σENAME≠“J. DOE”
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO ⋈PNO
AND ASG.PNO = PROJ.PNO
AND ENAME≠ "J. Doe" ⋈ENO Join
AND PNAME = "CAD/CAM"
AND (DUR = 12 OR DUR = 24) PROJ ASG EMP
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/9
Restructuring –Transformation
Rules
• Commutativity of binary operations
➡ R×SS×R
➡ R ⋈S S ⋈R
➡ RSSR
• Associativity of binary operations
➡ ( R × S) × T R × (S × T)
➡ (R ⋈S) ⋈T R ⋈ (S ⋈T)
• Idempotence of unary operations
➡ A’( A’(R)) A’(R)
➡ p1(A1)(p2(A2)(R)) p1(A1)p2(A2)(R)
where R[A] and A' A, A" A and A' A"
• Commuting selection with projection
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/10
Restructuring – Transformation
Rules
• Commuting selection with binary operations
➡ p(A)(R × S) (p(A) (R)) × S
➡ p(A )(R ⋈(A ,B )S) (p(A ) (R)) ⋈(A ,B )S
i j k i j k
➡ p(A )(R T) p(A ) (R) p(A ) (T)
i i i
where Ai belongs to R and T
• Commuting projection with binary operations
➡ C(R × S) A’(R) × B’(S)
➡ C(R ⋈(A ,B )S) A’(R) ⋈(A ,B ) B’(S)
j k j k
➡ C(R S) C(R) C(S)
where R[A] and S[B]; C = A' B' where A' A, B' B
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/11
Example
Recall the previous example: ENAME
Project
Find the names of employees other
than J. Doe who worked on the DUR=12 DUR=24
CAD/CAM project for either one or
two years.
PNAME=“CAD/CAM” Select
SELECT ENAME
FROM PROJ, ASG, EMP
ENAME≠“J. DOE”
WHERE ASG.ENO=EMP.ENO
AND ASG.PNO=PROJ.PNO
⋈PNO
ENAME ≠ "J. Doe"
⋈ENO
AND
Join
AND PROJ.PNAME="CAD/CAM"
AND (DUR=12 OR DUR=24) PROJ ASG EMP
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/12
Equivalent Query
ENAME
PNAME=“CAD/CAM” (DUR=12 DUR=24) ENAME≠“J. Doe”
⋈PNO,ENO
EMP PROJ ASG
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/13
Restructuring
ENAME
⋈PNO
PNO,ENAME
⋈ENO
PNO PNO,ENO PNO,ENAME
PNAME = "CAD/CAM" DUR =12DUR=24 ENAME ≠ "J. Doe"
PROJ ASG EMP
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/14
Step 2 – Data Localization
Input: Algebraic query on distributed relations
• Determine which fragments are involved
• Localization program
➡ substitute for each global query its materialization program
➡ optimize
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/15
Example
Assume ENAME
➡ EMP is fragmented into EMP1, EMP2, DUR=12 DUR=24
EMP3 as follows:
✦ EMP1= ENO≤“E3”(EMP) PNAME=“CAD/CAM”
✦ EMP2= “E3”<ENO≤“E6”(EMP)
ENAME≠“J. DOE”
✦ EMP3= ENO≥“E6”(EMP)
➡ ASG fragmented into ASG1 and ASG2 ⋈PNO
as follows:
✦ ASG1= ENO≤“E3”(ASG) ⋈ENO
✦ ASG2= ENO>“E3”(ASG) PROJ
Replace EMP by (EMP1 EMP2 EMP3)
and ASG by (ASG1 ASG2) in any query EMP1EMP2 EMP3 ASG1 ASG2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/16
Provides Parallellism
⋈ENO ⋈ENO ⋈ENO ⋈ENO
EMP1 ASG1 EMP2 ASG2 EMP3 ASG1 EMP3 ASG2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/17
Eliminates Unnecessary Work
⋈ENO ⋈ENO ⋈ENO
EMP1 ASG1 EMP2 ASG2 EMP3 ASG2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/18
Reduction for PHF
• Reduction with selection
➡ Relation R and FR={R1, R2, …, Rw} where Rj=pj(R)
pi(Rj)= if x in R: ¬(pi(x) pj(x))
➡ Example
SELECT *
FROM EMP
WHERE ENO="E5"
ENO=“E5” ENO=“E5”
EMP1 EMP2 EMP3 EMP2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/19
Reduction for PHF
• Reduction with join
➡ Possible if fragmentation is done on join attribute
➡ Distribute join over union
(R1 R2)⋈S (R1⋈S) (R2⋈S)
➡ Given Ri =p (R) and Rj = p (R)
i j
Ri ⋈Rj = if x in Ri, y in Rj: ¬(pi(x) pj(y))
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/20
Reduction for PHF
• Assume EMP is fragmented as ⋈ENO
before and
➡ ASG1: ENO ≤ "E3"(ASG)
➡ ASG2: ENO > "E3"(ASG)
• Consider the query EMP1 EMP2 EMP3 ASG1 ASG2
SELECT *
FROM EMP,ASG
WHERE EMP.ENO=ASG.ENO
• Distribute join over unions
• Apply the reduction rule ⋈ENO ⋈ENO ⋈ENO
EMP1 ASG1 EMP2 ASG2 EMP3 ASG2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/21
Reduction for VF
• Find useless (not empty) intermediate relations
Relation R defined over attributes A = {A1, ..., An} vertically fragmented
as Ri = A'(R) where A' A:
D,K(Ri) is useless if the set of projection attributes D is not in A'
Example: EMP1= ENO,ENAME (EMP); EMP2= ENO,TITLE (EMP)
SELECT ENAME
FROM EMP
ENAME ENAME
⋈ENO
EMP1 EMP2 EMP1
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/22
Reduction for DHF
• Rule :
➡ Distribute joins over unions
➡ Apply the join reduction for horizontal fragmentation
• Example
ASG1: ASG ⋉ENO EMP1
ASG2: ASG ⋉ENO EMP2
EMP1: TITLE=“Programmer” (EMP)
EMP2: TITLE=“Programmer” (EMP)
• Query
SELECT *
FROM EMP, ASG
WHEREASG.ENO = EMP.ENO
AND EMP.TITLE = "Mech. Eng."
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/23
Reduction for DHF
Generic query ⋈ENO
TITLE=“Mech. Eng.”
ASG1 ASG2 EMP1 EMP2
Selections first ⋈ENO
TITLE=“Mech. Eng.”
ASG1 ASG2 EMP2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/24
Reduction for DHF
Joins over unions
⋈ENO ⋈ENO
TITLE=“Mech. Eng.” TITLE=“Mech. Eng.”
ASG1 EMP2 ASG2 EMP2
Elimination of the empty intermediate relations
(left sub-tree) ⋈ENO
TITLE=“Mech. Eng.”
ASG2 EMP2
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/25
Reduction for Hybrid
Fragmentation
• Combine the rules already specified:
➡ Remove empty relations generated by contradicting selections on horizontal
fragments;
➡ Remove useless relations generated by projections on vertical fragments;
➡ Distribute joins over unions in order to isolate and remove useless joins.
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/26
Reduction for HF
Example
ENAME
Consider the following hybrid
fragmentation: ENAME
ENO=“E5”
EMP1= ENO≤"E4" ( ENO,ENAME (EMP))
EMP2= ENO>"E4" ( ENO,ENAME (EMP))
⋈ENO
ENO=“E5”
EMP3= ENO,TITLE (EMP)
and the query
EMP2
SELECT ENAME
FROM EMP
WHERE ENO="E5" EMP1 EMP2 EMP3
Distributed DBMS © M. T. Özsu & P. Valduriez
Ch.7/27