0% found this document useful (0 votes)

3 views41 pages

Lecture 06

The document discusses distributed query processing, focusing on query optimization, transformation rules, and evaluation strategies. It highlights the importance of cost estimation in query evaluation and presents various techniques such as memoization and semijoin strategies for efficient query execution in distributed systems. Additionally, it introduces the concept of Eddies for adaptive query processing, emphasizing the need for continuous adaptation in unpredictable large-scale systems.

Uploaded by

SANJAY P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views41 pages

Lecture 06

Uploaded by

SANJAY P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 41

Distributed Query Processing

Agenda
• Recap of query optimization
• Transformation rules for P&D systems
• Memoization

• Query evaluation strategies

• Eddies
Introduction
• Alternative ways of evaluating a given query
– Equivalent expressions
– Different algorithms for each operation (Chapter 13)

• Cost difference between a good and a bad way of evaluating a query can
be enormous
– Example: performing a r X s followed by a selection r.A = s.B is much
slower than performing a join on the same condition

• Need to estimate the cost of operations

– Depends critically on statistical information about relations which the
database must maintain
– Need to estimate statistics for intermediate results to compute cost of
complex expressions
Introduction (Cont.)
Relations generated by two equivalent expressions have the same
set of attributes and contain the same set of tuples, although their
attributes may be ordered differently.
Introduction (Cont.)

• Generation of query-evaluation plans for an

expression involves several steps:
1. Generating logically equivalent expressions
• Use equivalence rules to transform an expression into an
equivalent one.
2. Annotating resultant expressions to get alternative
query plans
3. Choosing the cheapest plan based on estimated cost
• The overall process is called cost based
optimization.
Equivalence Rules
1. Conjunctive selection operations can be
deconstructed into a sequence of individual
selections.  ( E )  ( ( E ))
1   2 1 2

2. Selection operations
  (  ( E )) are
  commutative.
(  ( E ))
1 2 2 1

3. Only the last in a sequence of projection

operations is needed, the others can be omitted.
 t1 ( t2 ( ( tn (E ))))  t1 (E )

4. Selections can be combined with Cartesian

products and theta joins.
a. (E1 X E2) = E1  E2
b. 1(E1 2 E2) = E1 1 2 E2
Equivalence Rules (Cont.)
5. Theta-join operations (and natural joins) are
commutative.
E1  E2 = E2  E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)

(b) Theta joins are associative in the following manner:

(E1 1 E 2) 2  3 E3 = E 1 2 3 (E2 2 E 3)

where 2 involves attributes from only E2 and E3.

Pictorial Depiction of Equivalence
Rules
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join
operation under the following two conditions:
(a) When all the attributes in 0 involve only the
attributes of one of the expressions (E1) being joined.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2

involves only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))
Equivalence Rules (Cont.)
8. The projections operation distributes over the
theta join operation as follows:
(a) if L involves only attributes from L1  L2:
L1 L2 ( E1....... E2 ) (L1 ( E1 ))...... (L2 ( E2 ))

(b) Consider a join E1  E2.

– Let L1 and L2 be sets of attributes from E1 and E2,
respectively.
– Let L3 be attributes of E1 that are involved in join
condition , but are not in L1  L2, and
– let L4 be attributes of E2 that are involved in join
condition

, but are not
( E ..... E )  ((
in L(1E)) L2.( ( E )))
L1  L2 1  2 L1  L2 L1  L3 1 ...... L2  L4 2
9.
Equivalence Rules (Cont.)
The set operations union and intersection are commutative
E1  E2 = E2  E1
E1  E2 = E2  E1
9. (set difference is not commutative).
10. Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
9. The selection operation distributes over ,  and –.
 (E 1 – E2) =  (E ) – (E )
1 2

and similarly for  and  in place of –

Also:  (E 1 – E2) = (E ) – E
1 2

and similarly for  in place of –, but not for 

12. The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
Multiple Transformations (Cont.)
Optimizer strategies
• Heuristic
– Apply the transformation rules in a specific order
such that the cost converges to a minimum

• Cost based
– Simulated annealing
– Randomized generation of candidate QEP
– Problem, how to guarantee randomness
Memoization Techniques
• How to generate alternative Query Evaluation Plans?
– Early generation systems centred around a tree representation of the
plan
– Hardwired tree rewriting rules are deployed to enumerate part of the
space of possible QEP
– For each alternative the total cost is determined
– The best (alternatives) are retained for execution

– Problems: very large space to explore, duplicate plans, local maxima,

expensive query cost evaluation.

– SQL Server optimizer contains about 300 rules to be deployed.

Memoization Techniques
• How to generate alternative Query Evaluation Plans?
– Keep a memo of partial QEPs and their cost.
– Use the heuristic rules to generate alternatives to
built more complex QEPs
– r1 r2 r3 r r4
4 Level n plans

r3 r3 Level 2 plans

x
r2 r1 r1 r2 r2 r3 r3 r4 r1 r4 Level 1 plans
Distributed Query Processing
• For centralized systems, the primary criterion
for measuring the cost of a particular strategy
is the number of disk accesses.
• In a distributed system, other issues must be
taken into account:
– The cost of a data transmission over the network.
– The potential gain in performance from having
several sites process parts of the query in parallel.
Transformation rules for
distributed systems
• Primary horizontally fragmented table:
– Rule 9: The union is commutative
E1  E2 = E2  E1
– Rule 10: Set union is associative.
(E1  E2)  E3 = E1  (E2  E3)
– Rule 12: The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))

• Derived horizontally fragmented table:

– The join through foreign-key dependency is already reflected in the
fragmentation criteria
Transformation rules for
distributed systems
Vertical fragmented tables:
– Rules: Hint look at projection rules
Optimization in Par & Distr
• Cost model is changed!!!
– Network transport is a dominant cost factor

• The facilities for query processing are not

homogenous distributed
– Light-resource systems form a bottleneck
– Need for dynamic load scheduling
Simple Distributed Join Processing
• Consider the following relational algebra
expression in which the three relations are neither
replicated nor fragmented
account depositor branch

• account is stored at site S1

• depositor at S2
• branch at S3
• For a query issued at site SI, the system needs to
produce the result at site SI
Possible Query Processing
Strategies
• Ship copies of all three relations to site SI and choose a
strategy for processing the entire locally at site SI.
• Ship a copy of the account relation to site S2 and compute
temp1 = account depositor at S2. Ship temp1 from S2 to S3,
and compute temp2 = temp1 branch at S3. Ship the result
temp2 to SI.
• Devise similar strategies, exchanging the roles S1, S2, S3
• Must consider following factors:
– amount of data being shipped
– cost of transmitting a data block between sites
– relative processing speed at each site
Semijoin Strategy
• Let r1 be a relation with schema R1 stores at site S1
Let r2 be a relation with schema R2 stores at site S2
• Evaluate the expression r1 r2 and obtain the result at S1.

1. Compute temp1  R1  R2 (r1) at S1.

2. Ship temp1 from S1 to S2.
3. Compute temp2  r2 temp1 at S2
4. Ship temp2 from S2 to S1.
5. Compute r1 temp2 at S1. This is the same as r1 r2.
Formal Definition
• The semijoin of r1 with r2, is denoted by:
r1 r2
• it is defined by:
R1 (r1 r 2)
• Thus, r1 r2 selects those tuples of r1 that contributed to r1 r2.
• In step 3 above, temp2=r2 r1.
• For joins of several relations, the above strategy can be
extended to a series of semijoin steps.
Join Strategies that Exploit Parallelism

• Consider r1 r2 r3 r4 where relation ri is stored at site Si. The

result must be presented at site S1.

• r1 is shipped to S2 and r1 r2 is computed at S2: simultaneously r3 is
shipped to S4 and r3 r4 is computed at S4
• S2 ships tuples of (r1 r2) to S1 as they produced;
S4 ships tuples of (r3 r4) to S1
• Once tuples of (r1 r2) and (r3 r4) arrive at S1 (r1 r2) (r3 r4) is
computed in parallel with the computation of (r1 r2) at S2 and the
computation of (r3 r4) at S4.
Query plan generation
• Apers-Aho-Hopcroft
– Hill-climber, repeatedly split the multi-join query
in fragments and optimize its subqueries
independently

• Apply centralized algorithms and rely on cost-

model to avoid expensive query execution
plans.
Query evaluators
Query evaluation strategy
• Pipe-line query evaluation strategy
– Called Volcano query processing model
– Standard in commercial systems and MySQL
• Basic algorithm:
– Demand-driven evaluation of query tree.
– Operators exchange data in units such as records
– Each operator supports the following interfaces:– open, next, close
• open() at top of tree results in cascade of opens down the tree.
• An operator getting a next() call may recursively make next() calls
from within to produce its next answer.
• close() at top of tree results in cascade of close down the tree
Query evaluation strategy
• Pipe-line query evaluation strategy
– Evaluation:
• Oriented towards OLTP applications
– Granule size of data interchange
• Items produced one at a time
• No temporary files
– Choice of intermediate buffer size allocations
• Query executed as one process
• Generic interface, sufficient to add the iterator primitives for the
new containers.
• CPU intensive
• Amenable to parallelization
Query evaluation strategy
• Materialized evaluation strategy
– Used in MonetDB
– Basic algorithm:
• for each relational operator produce the complete intermediate
result using materialized operands
– Evaluation:
• Oriented towards decision support queries
• Limited internal administration and dependencies
• Basis for multi-query optimization strategy
• Memory intensive
• Amendable for distributed/parallel processing
Eddies: Continuously Adaptive
Query processing
R. Avnur, J.M. Hellerstein
UCB
ACM Sigmod 2000
Problem Statement
• Context: large federated and shared-nothing databases

• Problem: assumptions made at query optimization rarely

hold during execution

• Hypothesis: do away with traditional optimizers, solve it

thru adaptation

• Focus: scheduling in a tuple-based pipeline query

execution model
Problem Statement Refinement
• Large scale systems are unpredictable, because
– Hardware and workload complexity,
• bursty servers & networks, heterogenity, hardware
characteristics

– Data complexity,
• Federated database often come without proper statistical
summaries

– User Interface Complexity

• Online aggregation may involve user ‘control’
The Idea
• Relational algebra operators consume a stream from
multiple sources to produce a new stream

• A priori you don’t now how selective- how fast- tuples

are consumed/produced

• You have to adapt continuously and learn this

information on the fly

• Adapt the order of processing based on these lessons

The Idea
JOIN

next next

JOIN JOIN

next next next next

The Idea
• Standard method: derive a spanning tree over the query graph
• Pre-optimize a query plan to determine operator pairs and their
algorithm, e.g. to exploit access paths

• Re-optimization a query pipeline on the fly requires careful state

management, coupled with
– Synchronization barriers
• Operators have widely differing arrival rates for their operands
– This limits concurrency, e.g. merge-join algorithm
– Moments of symmetry
• Algorithm provides option to exchange the role of the operands
without too much complications
– E.g switching the role of R and S in a nested-loop join
Nested-loop
R

s
Join and sorting
• Index-joins are asymmetric, you can not easily change their role
– Combine index-join + operands as a unit in the process

• Sorting requires look-ahead

– Merge-joins are combined into unit

• Ripple joins
– Break the space into smaller pieces and solve the join operation for
each piece individually
– The piece crossings are moments of symmetry
The Idea
JOIN
Tuple buffer

JOIN next JOIN

next next next

Eddie
next next next next
Rivers and Eddies
Eddies are tuple routers that distribute arriving tuples to interested operators
– What are efficient scheduling policies?
• Fixed strategy? Random ? Learning?

Static Eddies
• Delivery of tuples to operators can be hardwired in the Eddie to reflect a
traditional query execution plan

Naïve Eddie
• Operators are delivered tuples based on a priority queue
• Intermediate results get highest priority to avoid buffer congestion
Observations for selections
• Extended priority queue for the operators
– Receiving a tuple leads to a credit increment
– Returning a tuple leads to a credit decrement
– Priority is determined by “weighted lottery”

• Naïve Eddies exhibit back pressure in the tuple flow; production is limited by
the rate of consumption at the output

• Lottery Eddies approach the cost of optimal ordering, without a need to a

priory determine the order

• Lottery Eddies outperform heuristics

– Hash-use first, or Index-use first, Naive
Observations
• The dynamics during a run can be controlled by a learning scheme
– Split the processing in steps (‘windows’) to re-adjust the weight during
tuple delivery

• Initial delays can not be handled efficiently

• Research challenges:
– Better learning algorithms to adjust flow
– Aggressive adjustments
– Remove pre-optimization
– Balance ‘hostile’ parallel environment
– Deploy eddies to control degree of partitioning (and replication )

Unit 6: Query Processing and Optimization
No ratings yet
Unit 6: Query Processing and Optimization
21 pages
Query Trees and Heuristics For Query Optimization
No ratings yet
Query Trees and Heuristics For Query Optimization
29 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
HEMIX-5 Serial Communication Protocol New
No ratings yet
HEMIX-5 Serial Communication Protocol New
14 pages
Computer Science
No ratings yet
Computer Science
33 pages
Resume Details
No ratings yet
Resume Details
40 pages
Ref. 1. (N. G. Palan) VHDL Programming: Half and Full Adder, Full Subractor, Four Bit Binary
No ratings yet
Ref. 1. (N. G. Palan) VHDL Programming: Half and Full Adder, Full Subractor, Four Bit Binary
2 pages
MM PDF
No ratings yet
MM PDF
228 pages
G4 Installation Guide - RevE
No ratings yet
G4 Installation Guide - RevE
89 pages
DataCentre Manual Issue 3
No ratings yet
DataCentre Manual Issue 3
95 pages
4 2 Query - Processing
No ratings yet
4 2 Query - Processing
106 pages
Host Script Samples
100% (7)
Host Script Samples
4 pages
12 Query Plan Space
No ratings yet
12 Query Plan Space
72 pages
Testtti - Google Search
100% (1)
Testtti - Google Search
2 pages
13 Query Plan Space
No ratings yet
13 Query Plan Space
71 pages
Query Optimization
No ratings yet
Query Optimization
84 pages
Clear
No ratings yet
Clear
60 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
1.6 PPT - Query Optimization
No ratings yet
1.6 PPT - Query Optimization
53 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
Lesson 07
No ratings yet
Lesson 07
57 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
33 pages
Chapter 5: Query Optimization: Acknowledgements: Slides Are Adapted From Böhlen and
No ratings yet
Chapter 5: Query Optimization: Acknowledgements: Slides Are Adapted From Böhlen and
53 pages
4 Chapter Four
No ratings yet
4 Chapter Four
34 pages
ADB Slides 4
No ratings yet
ADB Slides 4
47 pages
10 Qo343435154tertweretwgstwgw4
No ratings yet
10 Qo343435154tertweretwgstwgw4
46 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Lesson 06
No ratings yet
Lesson 06
44 pages
6 Query Optimization-Ch 16
No ratings yet
6 Query Optimization-Ch 16
35 pages
CSE 544: Optimizations: Wednesday, 5/10/2006
No ratings yet
CSE 544: Optimizations: Wednesday, 5/10/2006
51 pages
Loading DFI Software Rel 2
No ratings yet
Loading DFI Software Rel 2
6 pages
Query Optimization
No ratings yet
Query Optimization
63 pages
Vu Lec 32
No ratings yet
Vu Lec 32
34 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
64 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
CH 14 Updated
No ratings yet
CH 14 Updated
30 pages
Advanced Database System Chapter Two Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Two Query Processing and Optimization
50 pages
Guide Lacls BR Uig Ar Eft
No ratings yet
Guide Lacls BR Uig Ar Eft
33 pages
ONB 2.0 Furnish Equipment Integration To External - v2.1
No ratings yet
ONB 2.0 Furnish Equipment Integration To External - v2.1
29 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
34 pages
08 Query Processing Strategies and Optimization
No ratings yet
08 Query Processing Strategies and Optimization
32 pages
Advanced Database
No ratings yet
Advanced Database
47 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Chapter 2-Query Processing - 110554
No ratings yet
Chapter 2-Query Processing - 110554
38 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
DBMS - Unit 3 1
No ratings yet
DBMS - Unit 3 1
17 pages
Query Execution
No ratings yet
Query Execution
87 pages
KD Query Processing1
No ratings yet
KD Query Processing1
32 pages
Query Processing in Distributed Database
No ratings yet
Query Processing in Distributed Database
20 pages
Chapter 12, 13 - Query Processing and Optimization
No ratings yet
Chapter 12, 13 - Query Processing and Optimization
24 pages
Robust Control Optimization with Metaheuristics
From Everand
Robust Control Optimization with Metaheuristics
Philippe Feyel
No ratings yet
Astrofísica Computacional
No ratings yet
Astrofísica Computacional
27 pages
Unit-3 RDBMS-1
No ratings yet
Unit-3 RDBMS-1
22 pages
Ch13-Query Optimization
No ratings yet
Ch13-Query Optimization
42 pages
Distributed Databases: CS347 May 30, 2001
No ratings yet
Distributed Databases: CS347 May 30, 2001
48 pages
Dbi 3
No ratings yet
Dbi 3
28 pages
Adbms Unit2
No ratings yet
Adbms Unit2
20 pages
DBMS Module 2.5 Query Processing
No ratings yet
DBMS Module 2.5 Query Processing
19 pages
28-Query Processing-30-09-2024
No ratings yet
28-Query Processing-30-09-2024
17 pages
Queryoptimization Examples
No ratings yet
Queryoptimization Examples
26 pages
SOEN 363 - Data Systems For Software Engineers: Query Optimization
No ratings yet
SOEN 363 - Data Systems For Software Engineers: Query Optimization
15 pages
Query Processing 1
No ratings yet
Query Processing 1
13 pages
ISU Master Data V0.7
No ratings yet
ISU Master Data V0.7
28 pages
Ch-2 System Planning and Selection
No ratings yet
Ch-2 System Planning and Selection
79 pages
11 Ch13 Query Optimization
No ratings yet
11 Ch13 Query Optimization
54 pages
Chapter 01: Introduction CSS430 Systems Programming
No ratings yet
Chapter 01: Introduction CSS430 Systems Programming
27 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
11 pages
DDS Unit - 2
No ratings yet
DDS Unit - 2
7 pages
DE Module5 QueryOptimization
No ratings yet
DE Module5 QueryOptimization
11 pages
X20DC4395 Eng
No ratings yet
X20DC4395 Eng
12 pages
PD-IT-PR-2901 - Manual API Mondial Relay
No ratings yet
PD-IT-PR-2901 - Manual API Mondial Relay
51 pages
DDBS Unit 2
No ratings yet
DDBS Unit 2
7 pages
Proposal - Website +software AMC For Cryptoconnect
No ratings yet
Proposal - Website +software AMC For Cryptoconnect
5 pages
Travel and Tourism Synopsis
No ratings yet
Travel and Tourism Synopsis
16 pages
Oracle SQL and PL/SQL
From Everand
Oracle SQL and PL/SQL
Niraj Gupta
4.5/5 (8)
Relational Algebra Optimization
No ratings yet
Relational Algebra Optimization
24 pages
Centurion PLUS Full-Featured Controller: Sect. 50
No ratings yet
Centurion PLUS Full-Featured Controller: Sect. 50
4 pages
05.exploded View & Part List
No ratings yet
05.exploded View & Part List
16 pages
307 Assignment3
No ratings yet
307 Assignment3
3 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Computer Science 10th 14 - 10 - 2024 - 110451845
No ratings yet
Computer Science 10th 14 - 10 - 2024 - 110451845
3 pages
Processing Queries: Cyclic
No ratings yet
Processing Queries: Cyclic
2 pages
Passive Income PDF
No ratings yet
Passive Income PDF
2 pages
Akai Z4/Z8 Sampler Manual
No ratings yet
Akai Z4/Z8 Sampler Manual
9 pages
Embedded Systems Tools & Peripherals
No ratings yet
Embedded Systems Tools & Peripherals
4 pages
CSU07203 ERDs Qns Review
No ratings yet
CSU07203 ERDs Qns Review
2 pages
Anurag Dixit
No ratings yet
Anurag Dixit
1 page
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Nathaniel Brandon Cei 6 Stalpi Ai Increderii in Sine
No ratings yet
Nathaniel Brandon Cei 6 Stalpi Ai Increderii in Sine
179 pages

Lecture 06

Uploaded by

Lecture 06

Uploaded by

Distributed Query Processing

• Query evaluation strategies

• Need to estimate the cost of operations

• Generation of query-evaluation plans for an

3. Only the last in a sequence of projection

4. Selections can be combined with Cartesian

(b) Theta joins are associative in the following manner:

(E1 1 E 2) 2  3 E3 = E 1 2 3 (E2 2 E 3)

where 2 involves attributes from only E2 and E3.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2

(b) Consider a join E1  E2.

and similarly for  and  in place of –

and similarly for  in place of –, but not for 

– Problems: very large space to explore, duplicate plans, local maxima,

– SQL Server optimizer contains about 300 rules to be deployed.

• Derived horizontally fragmented table:

• The facilities for query processing are not

• account is stored at site S1

1. Compute temp1  R1  R2 (r1) at S1.

• Consider r1 r2 r3 r4 where relation ri is stored at site Si. The

result must be presented at site S1.

• Apply centralized algorithms and rely on cost-

• Problem: assumptions made at query optimization rarely

• Hypothesis: do away with traditional optimizers, solve it

• Focus: scheduling in a tuple-based pipeline query

– User Interface Complexity

• A priori you don’t now how selective- how fast- tuples

• You have to adapt continuously and learn this

• Adapt the order of processing based on these lessons

next next next next

• Re-optimization a query pipeline on the fly requires careful state

• Sorting requires look-ahead

JOIN next JOIN

next next next

• Lottery Eddies approach the cost of optimal ordering, without a need to a

• Lottery Eddies outperform heuristics

• Initial delays can not be handled efficiently

You might also like