DDBS Lecture5
DDBS Lecture5
Lecture 5
● Horizontal Fragmentation
● Derived Horizontal Fragmentation
● Vertical Fragmentation
Query
Processor
Site1 Site2
ASG1= σDUR>37 (ASG) ASG2= σDUR<=37 (ASG)
Site3 Site4
EMP1= σENO<=E3 (EMP) EMP2= σENO>E3 (EMP)
Site5
Result
* Distributed Databases - CSC451 7
Problem in DDBS?
Site 5
EMP1 EMP2
Site 3 Site 4
ASG1 ASG2
Site 1 Site 2
● Static
● Strategy of transmissions and local processing activities
is fully determined before execution begins - at compile
time
● Difficult to estimate size of intermediate results
● Dynamic
● Run time optimization
● Each step is decided only after seeing results of
previous steps
● Useful for how to optimize for multiple executions
● Transport
● send that one relation from each participating site (result
of the reduction phase) to the querying site
● Completion
● finishing up processing using those relations to get final
answer (e.g., final projections, selections and joins)
At site 1: site 2
R1: A1 A2 A3 A4 A5 A6 A7 A8 A9 R2 A1 A2
a AA B C C E A F d 1
a C D D E AA B B e 2
b A B C D B A B A g 3
c D D B B A C A C
e E B AA C C D D
At site 1: site 2
R1: A1 A2 A3 A4 A5 A6 A7 A8 A9 R2 A1 A2
d AA B C C E A F d 1
d C D D E AA B B e 2
e A B C D B A B A g 3
g D D B B A C A C
e E B AA C C D D
● Final Join
dAABCCEAF1
dCDDEAABB1
eABCDBABA2
gDDBBACAC3
eEBAACCDD1
Response time 84
Based on
● Relation
● Cardinality
● Size of tuple
● Fraction of tuples participating in a join with other
relations
● Attribute
● Actual number of distinct values
Based on
● Centralized
● Single site determines the best schedule
● Simple
● Needs knowledge about the entire database
● Distributed
● Cooperation among sites to determine the schedule
● Need only local information
● Cost of cooperation
● Hybrid
● One site determines the global schedule
● Each site optimizes the local subqueries
Based on
● WAN
● Global schedule to minimize communication cost
● LAN
● Broadcasting can be exploited