0% found this document useful (0 votes)
50 views48 pages

Distributed Databases: CS347 May 30, 2001

The document summarizes key concepts related to query processing in distributed databases, including: 1) Query decomposition breaks queries into algebraic query trees, localization rewrites trees using relation fragments. Optimization chooses lowest cost execution plan. 2) Localization pushes selections up and projections, selections down query trees to match relation fragments. 3) Parallel/distributed operations include sorting fragments independently and merging, partitioned joins using a partition function to distribute relations across sites for local joins, and replicate joins to broadcast small relations.

Uploaded by

shubham rane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views48 pages

Distributed Databases: CS347 May 30, 2001

The document summarizes key concepts related to query processing in distributed databases, including: 1) Query decomposition breaks queries into algebraic query trees, localization rewrites trees using relation fragments. Optimization chooses lowest cost execution plan. 2) Localization pushes selections up and projections, selections down query trees to match relation fragments. 3) Parallel/distributed operations include sorting fragments independently and merging, partitioned joins using a partition function to distribute relations across sites for local joins, and replicate joins to broadcast small relations.

Uploaded by

shubham rane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

Distributed Databases

CS347
Lecture 14
May 30, 2001

1
Topics for the Day
• Query processing in distributed databases
– Localization
– Distributed query operators
– Cost-based optimization

2
Query Processing Steps
• Decomposition
– Given SQL query, generate one or more algebraic
query trees
• Localization
– Rewrite query trees, replacing relations by fragments
• Optimization
– Given cost model + one or more localized query
trees
– Produce minimum cost query execution plan

3
Decomposition
• Same as in a centralized DBMS
• Normalization (usually into relational algebra)
Select A,C
From R Natural Join S
Where (R.B = 1 and S.D = 2) or (R.C > 3 and S.D = 2)

 (R.B = 1 v R.C > 3)  (S.D = 2)


Conjunctive
normal
form
R S
4
Decomposition
• Redundancy elimination
(S.A = 1)  (S.A > 5)  False
(S.A < 10)  (S.A < 5)  S.A < 5

• Algebraic Rewriting
– Example: pushing conditions down

cond3

 cond

 cond1  cond2

S T S T 5
Localization Steps
1. Start with query tree
2. Replace relations by fragments

3. Push  up & , down (CS245 rules)


4. Simplify – eliminating unnecessary operations

Note: To denote fragments in query trees


[R: cond]

Relation that fragment belongs to Condition its tuples satisfy 6


Example 1
E=3
E=3 
R
[R: E<10] [R: E  10]


E=3 E=3 E=3
[R: E<10]
[R: E<10] [R: E  10]
7
Example 2
A
A
R S

 

[R: A<5] [R: 5  A  10] [R: A>10] [S: A<5] [S: A  5]


R1 R2 R3 S1 S2
8

A A A A
A A

R1 S1 R1 S2 R2 S1 R2 S2 R3 S1 R3 S2

A A A

[R:A<5][S:A<5] [R:5A10] [S:A5] [R:A>10][S:A5]


9
Rules for Horiz. Fragmentation
• C1[R: C2]  [R: C1  C2]
• [R: False]  Ø
• [R: C1] [S: C2]  [R S: C1  C2  R.A = S.A]
A A

• In Example 1:
E=3[R2: E  10]  [R2: E=3  E  10]
 [R2: False]  Ø
• In Example 2:
[R: A<5] A [S: A  5]
 [R A S: R.A < 5  S.A  5  R.A = S.A]
 [R A S: False]  Ø
10
Example 3 – Derived Fragmentation
S’s fragmentation K
is derived from
that of R. R S

 

[R: A<10] [R: A  10] [S: K=R.K  R.A<10] [S: K=R.K  R.A10]
R1 R2 S1 S2 11

K K K K

R1 S1 R1 S2 R2 S1 R2 S2

K
K

[R: A<10] [S: K=R.K  R.A<10] [R: A  10] [S: K=R.K  R.A  10]
12
Example 4 – Vertical Fragmentation
A A

K
R
R1(K,A,B) R2(K,C,D)

A
A
K

K,A K,A
R1(K,A,B)
R1(K,A,B) R2(K,C,D)
13
Rule for Vertical Fragmentation
• Given vertical fragmentation of R(A):
Ri = Ai(R), Ai  A
• For any B  A:
B (R) = B [ Ri | B  Ai  Ø]
i

14
Parallel/Distributed Query Operations
• Sort
– Basic sort
– Range-partitioning sort
– Parallel external sort-merge
• Join
– Partitioned join
– Asymmetric fragment and replicate join
– General fragment and replicate join
– Semi-join programs
• Aggregation and duplicate removal
15
Parallel/distributed sort
• Input: relation R on
– single site/disk
– fragmented/partitioned by sort attribute
– fragmented/partitioned by some other attribute

• Output: sorted relation R


– single site/disk
– individual sorted fragments/partitions

16
Basic sort
• Given R(A,…) range partitioned on attribute A,
sort R on A
11
7 27
10 17 20
3 22
14

11
3 22
10 14 20
7 27
17

• Each fragment is sorted independently


• Results shipped elsewhere if necessary 17
Range partitioning sort
• Given R(A,….) located at one or more sites, not
fragmented on A, sort R on A
• Algorithm: range partition on A and then do basic sort

R1 Local sort
Ra R1s
a0
Rb Local sort
R2 R2s Result
a1
Local sort
R3 R3s
18
Selecting a partitioning vector
• Possible centralized approach using a “coordinator”
– Each site sends statistics about its fragment to coordinator
– Coordinator decides # of sites to use for local sort
– Coordinator computes and distributes partitioning vector

• For example,
– Statistics could be (min sort key, max sort key, # of tuples)
– Coordinator tries to choose vector that equally partitions
relation

19
Example
• Coordinator receives:
– From site 1: Min 5, Max 9, 10 tuples
– From site 2: Min 7, Max 16, 10 tuples
• Assume sort keys distributed uniformly within
[min,max] in each fragment
• Partition R into two fragments

What is k0?

5 10 15 20
k0 20
Variations
• Different kinds of statistics
– Local partitioning vector Site 1
– Histogram 3 4 3 # of tuples
5 6 8 10 local vector

• Multiple rounds between coordinator and sites


– Sites send statistics
– Coordinator computes and distributes initial vector V
– Sites tell coordinator the number of tuples that fall in
each range of V
– Coordinator computes final partitioning vector Vf
21
Parallel external sort-merge
• Local sort
• Compute partition vector
• Merge sorted streams at final sites

In order R1
Ra Local sort Ras a0
R2 Result
Rb Local sort Rbs
a1
R3
22
Parallel/distributed join

Input: Relations R, S
May or may not be partitioned
Output: R S
Result at one or more sites

23
Partitioned Join
Join attribute A Local join

Ra R1 S1 Sa
R2 S2
Rb Sb
R3 S3
f(A)
Result f(A)

Note: Works only for equi-joins


24
Partitioned Join
• Same partition function (f) for both relations
• f can be range or hash partitioning
• Any type of local join (nested-loop, hash, merge, etc.)
can be used
• Several possible scheduling options. Example:
– partition R; partition S; join
– partition R; build local hash table for R; partition S and join
• Good partition function important
– Distribute join load evenly among sites

25
Asymmetric fragment + replicate join
Join attribute A Local join

Ra R1 S Sa
R2 S
Rb Sb
R3 S
f union
Partition function
Result

• Any partition function f can be used (even round-robin)


• Can be used for any kind of join, not just equi-joins
26
General fragment + replicate join
Ra R1 R1
R2 Replicate R2
Rb m copies

Partition Rn Rn

Sa S1 S1
S2 Replicate S2
Sb n copies

Partition Sm Sm 27
R1 S1 R1 Sm

R2 S1 R2 Sm

All n x m pairings of
R,S fragments

Rn S1 Rn Sm

Result

•Asymmetric F+R join is a special case of general F+R.


•Asymmetric F+R is useful when S is small. 28
Semi-join programs
• Used to reduce communication traffic during join
processing
• R S = (R S) S
= R (S R)
= (R S) (S R)

29
A B Example A C
2 a 3 x
S R
10 b 10 y
25 c 15 z
30 d 25 w
32 x
Compute A(S) = [2,10,25,30]
S (R S)
R S = 10 y
25 w

• Using semi-join, communication cost = 4 A + 2 (A + C) + result


• Directly joining R and S, communication cost = 4 (A + B) + result
30
Comparing communication costs
• Say R is the smaller of the two relations R and S
• (R S) S is cheaper than R S if
size (AS) + size (R S) < size (R)
• Similar comparisons for other types of semi-joins
• Common implementation trick:
– Encode AS (or AR) as a bit vector
– 1 bit per domain of attribute A

001101000010100

31
n-way joins
• To compute R S T
– Semi-join program 1: R’ S’ T
where R’ = R S & S’ = S T
– Semi-join program 2: R’’ S’ T
where R’’ = R S’ & S’ = S T
– Several other options

• In general, number of options is exponential in


the number of relations

32
Other operations
• Duplicate elimination
– Sort first (in parallel), then eliminate duplicates in
the result
– Partition tuples (range or hash) and eliminate
duplicates locally
• Aggregates
– Partition by grouping attributes; compute
aggregates locally at each site

33
Example
sum
Ra
# dept sal # dept sal dept sum
1 toy 10 1 toy 10 toy 50
2 toy 20 2 toy 20 mgmt 45
3 sales 15 5 toy 20
6 mgmt 15
8 mgmt 30
# dept sal
sum
Rb
4 sales 5
# dept sal dept sum
5 toy 20
3 sales 15 sales 30
6 mgmt 15
4 sales 5
7 sales 10
7 sales 10
8 mgmt 30

sum(sal) group by dept


34
Example
sum sum
# dept sal dept sum dept sum
Ra 1 toy 10 toy 30 toy 50
2 toy 20 toy 20 mgmt 45
3 sales 15 mgmt 45

# dept sal
sum
Rb 4
5
sales
toy
5
20
sum dept sum
sales 15
dept sum
sales 30
6 mgmt 15 sales 15
7 sales 10
8 mgmt 30

Does this work for all


kinds of aggregates?
Aggregate during partitioning to reduce communication cost
35
Query Optimization
• Generate query execution plans (QEPs)
• Estimate cost of each QEP ($,time,…)
• Choose minimum cost QEP

• What’s different for distributed DB?


– New strategies for some operations (semi-join,
range-partitioning sort,…)
– Many ways to assign and schedule processors
– Some factors besides number of IO’s in the cost
model
36
Cost estimation
• In centralized systems - estimate sizes of
intermediate relations
• For distributed systems
– Transmission cost/time may dominate
Work T1 Work T2 answer
at site at site
Plan B
– Account for parallelism Plan A
50 IOs
70 IOs
100 IOs 20 IOs

– Data distribution and result re-assembly cost/time


37
Optimization in distributed DBs
• Two levels of optimization
• Global optimization
– Given localized query and cost function
– Output optimized (min. cost) QEP that includes
relational and communication operations on
fragments
• Local optimization
– At each site involved in query execution
– Portion of the QEP at a given site optimized using
techniques from centralized DB systems
38
Search strategies
1. Exhaustive (with pruning)
2. Hill climbing (greedy)
3. Query separation

39
Exhaustive with Pruning
• A fixed set of techniques for each relational
operator
• Search space = “all” possible QEPs with this set
of techniques
• Prune search space using heuristics
• Choose minimum cost QEP from rest of search
space

40
Example
|R|>|S|>|T| A B
R S T

R S T
R S RT S R S T T S T R
2 1 2 1
(S R) T (T S) R

Ship S Semi-join Ship Semi-join


to R T to S

1 Prune because cross-product not necessary


2 Prune because larger relation first 41
Hill Climbing
2
Initial plan
1 x

• Begin with initial feasible QEP


• At each step, generate a set S of new QEPs by applying
‘transformations’ to current QEP
• Evaluate cost of each QEP in S
• Stop if no improvement is possible
• Otherwise, replace current QEP by the minimum cost
QEP from S and iterate
42
Example
• Goal: minimize communication
R S T V
cost

R S T V • Initial plan: send all relations to


A B C
one site
To site 1: cost=20+30+40= 90
Rel. Site # of tuples To site 2: cost=10+30+40= 80
R 1 10 To site 3: cost=10+20+40= 70
S 2 20 To site 4: cost=10+20+30= 60
T 3 30
V 4 40 • Transformation: send a relation
to its neighbor
43
Local search
• Initial feasible plan
P0: R (1  4); S (2  4); T (3  4)
Compute join at site 4

• Assume following sizes: R S  20


S T5
T V1

44
No change 4
R S 20 cost = 30
10
cost = 30 1 R 2

4
10 20
R S 4 cost = 40
1 2
S R 20
20
1 2
S
Worse

45
Improvement
4
T S 5
cost = 35
30
cost = 50 2 3
T
4
S T
2
20 30 4 cost = 25
3
5 S T
20
2 S 3

Improvement
46
Next iteration
• P1: S (2  3); R (1  4);  (3  4)
where  = S T
Compute answer at site 4
• Now apply same transformation to R and 
4
 R
1  3
4
R 
4
1 3 R 
1 3
R 47
Resources
• Ozsu and Valduriez. “Principles of Distributed
Database Systems” – Chapters 7, 8, and 9.

48

You might also like