0% found this document useful (0 votes)

9 views26 pages

Queryoptimization Examples

The document outlines distributed query optimization, detailing its significance in distributed databases and the challenges involved, such as NP-hard problems and communication costs. It discusses various strategies, including join ordering, semijoins, and optimization algorithms, emphasizing the importance of selecting execution sites and transfer methods. The content is derived from the book 'Principles of Distributed Database Systems' by Özsu and Valduriez.

Uploaded by

Ganga Bhattacharjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views26 pages

Queryoptimization Examples

Uploaded by

Ganga Bhattacharjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

DISTRIBUTED QUERY OPTIMIZATION

OUTLINE (DISTRIBUTED DB)

• Introduction (Ch. 1) ⋆

• Distributed Database Design (Ch. 3) ⋆

• Distributed Query Processing (Ch. 6-8) ⋆

➡ Overview (Ch. 6) ⋆

➡ Query decomposition and data localization (Ch. 7) ⋆

➡ Distributed query optimization (Ch. 8) ⋆

• Distributed Transaction Management (Ch. 10-12) ⋆

⋆ Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011 D I S T R I B U T E D DBMS
Ch.8
/2
OUTLINE (TODAY)

• Distributed query optimization (Ch. 8) ⋆

➡ Overview
➡ Join Ordering in Localized Queries
➡ Semijoin-based Algorithm
➡ Distributed query optimization strategies
➡ Hybrid approaches

⋆ Özsu and Valduriez, Principles of Distributed Database Systems (3rd Ed.), 2011 D I S T R I B U T E D DBMS
Ch.8
/3
DISTRIBUTED QUERY OPTIMIZATION

• In previous chapter (Ch. 7) ⋆:

➡ A distributed query is mapped into a query over fragments (decomposition and data localization)
➡ Reduction (“optimization”) independent from relation (fragment) statistics (e.g., cardinality)

• In this chapter (Ch. 8) ⋆:

➡ Optimization based on DB statistics (order of operations and operands, algorithm to perform simple
operations) to produce a query execution plan (QEP)
✦ In the distributed case a QEP is further extended with communication operations to support execution
of queries over fragment sites
➡ Once again: the problem is NP-hard, so not looking for the optimal solution
➡ Statement of the problem
✦ Input: Fragment query
✦ Output: the ”best” global strategy
➡ Additional problems specific to the distributed setting
✦ Where to execute (partial) queries? Which relation to ship where?
✦ Choose between data transfer methods : ship-whole vs. fetch-as-needed
✦ Decide on the use of semijoins (semijoins save on communication at the expense of more local
processing)

D I S T R I B U T E D DBMS
Ch.8
/4
ELEMENTS OF THE OPTIMIZER

• The element of the optimization process are similar in distributed and

centralized cases
➡ Search space (aka solution space)
✦ The set of equivalent QEP: algebra expressions enriched with implementation details
and communication choices
➡ Cost model
✦ Cost function (in terms of time)
✓ I/O cost + C P U cost + communication cost
✓ In early approach only communication costs were considered; due to fast communication
technology, communication and I/O costs become comparable
✓ These might have different weights in different distributed environments (LAN vs WAN)
➡ Search algorithm (aka search strategy)
✦ How do we move inside the solution space?
✓ Exhaustive search, heuristic algorithms
✦ Goal is searching the solution space to find a good strategy according to the cost model
• Difference between centralized and distributed settings: search space and cost
model (search strategy remains the same)
D I S T R I B U T E D DBMS
Ch.8/
5
SEARCH SPACE

• Search space is large

➡ N relations ((2(N-1))!)/((N-1)!) ⋆ equivalent join trees (by join commutativity
and associativity)
➡ Larger search space due to more options

• QEP are decorated with more information (on data exchange)

• Focus on join and semijoin order
• Different candidate solution in the search space
➡ A good heuristics for centralized context: left-deep trees
➡ In distributed context: non left-deep trees allow for parallelization

D I S T R I B U T E D DBMS
Ch.8/
6
CENTRALIZED VS. DISTRIBUTED QUERY
OPTIMIZATION

• Relation between centralized and distributed query optimization

➡ Distributed query optimization (DQO) employs techniques and solutions
from the centralized context
✦ A distributed query is translated into local ones (localized queries): centralized
query optimization (CQO) techniques

✦ Distributed query optimization is a more general (and thus difficult) problem

✓ Most solution to D Q O extend solutions to C Q O

➡ We focus on communication costs (local C P U and I/O costs are ignored)

✦ Clearly, cost of localized queries (handled with C Q O techniques) is computed as

in the centralized case (mainly I/O costs)

D I S T R I B U T E D DBMS
Ch.8/
7
JOIN ORDERING IN THE DISTRIBUTED CONTEXT
• Join ordering is important in centralized query optimization

• It is even more in distributed query optimization (affect communication costs)

• Use of semijoins to reduce relation sizes (and thus communication costs) before
performing join operations
JOIN ORDERING – 2 RELATIONS

• We assume query to be already localized (i.e., on fragments)

➡ Fragments are relations entirely stored at a single site
✦ We often use “fragments” and “relations” indistinguishably (no technical reason to
distinguish them)
• We first focus on ordering issues without using
semijoins
➡ Consider 2-relation join: R ⋈ S
(where R and S are stored at different sites)
if size(R) < size(S)
✦ Move the smaller relation to the site of the larger one R S
✦ If size(R) and size(S) are (more or less) the same if size(R) > size(S)
(and not other factor comes into play),
then moving outer relation R has benefits:
✓ No need for storing R in nested-loop or block nested-loop join
algorithms
✓ indexed nested-loop join algorithm remains available as index on
inner relation S is preserved (index is lost when transfering S) D I S T R I B U T E D DBMS
Ch.8/
9
JOIN ORDERING – MULTIPLE
RELATIONS
• Multiple relations case: more difficult because too many alternatives
• Goal is still transmit small operands (relations)
➡ Compute the cost of all alternatives and select the best one
✦ Necessary to compute the size of intermediate relations which is difficult
✓ In distributed context it is even more because information may be not available on site

D I S T R I B U T E D DBMS
Ch.8/
10
JOIN ORDERING – EXAMPLE

Consider PROJ ⋈ PN O A S G ⋈ EN O EMP Site 2

ASG

ENO PNO
Execution alternatives:
EMP PROJ
1. EMP Site 2 Site 1 Site 3
Site 2 computes EMP'=EMP ⋈ ASG
EMP' Site 3
Join graph of distributed query

Site 3 computes EMP' ⋈ PROJ

2 . A S G  Site 1 4. PROJ  Site 2
Site 1 computes EMP'=EMP⋈ ASG Site 2 computes PROJ'=PROJ ⋈ ASG
EMP'  Site 3 PROJ'  Site 1
Site 3 computes EMP’ ⋈ PROJ Site 1 computes PROJ' ⋈ EMP
3 . A S G  Site 3 5. EMP  Site 2
Site 3 computes ASG'=ASG ⋈ PROJ PROJ  Site 2
ASG'  Site 1 Site 2 computes EMP ⋈ PROJ ⋈ ASG
Site 1 computes ASG' ▷◁EMP
D I S T R I B U T E D DBMS
Ch.8/
11
SEMIJOIN ALGORITHMS

• Semijoins can be used to reduce the sizes of operands to transfer (similar to what
selections do)
➡ Reduced communication costs

• Consider the join of two relations:

➡ R (at site 1)
➡ S (at site 2)
• Alternatives:
1. Do the join R ⋈AS
2. Perform one of the semijoin-based equivalent options
Tradeoff between
R ⋈AS  (R ⋉ AS) ⋈ A S
a) cost to compute and send semijoin to other
 R ⋈ A (S ⋉ A R) site (and then perform the join there)
b) Cost to send the whole relation to other
 (R ⋉A S) ⋈A (S ⋉A R) site (and then perform the join there)
D I S T R I B U T E D DBMS
Ch.8/
12
SEMIJOIN ALGORITHMS – EXAMPLE

• Perform the join

➡ Send R to Site 2
➡ Site 2 computes R ⋈ A S
• Consider semijoin (R ⋉AS) ⋈ A S
➡ S' = A(S)
➡ S'  Site 1
➡ Site 1 computes R' = R ⋉ A S'
➡ R' Site 2
➡ Site 2 computes R' ⋈ A S
• Semijoin is better if
size(A(S)) + size(R ⋉AS)) < size(R)
➡ Only communication costs (time to transfer relations)

D I S T R I B U T E D DBMS
Ch.8/
13
SEMIJOIN ALGORITHMS – SUM UP

• Using semijoin is convenient if R ⋉ A S has high selectivity (select few tuples) and/or size
of R is large
• It is bad otherwise, due to the additional transfer of A(S)
• Cost of transferring A(S) can be reduced by using bit arrays
• A disadvantage of using semijoin is the loss of indices

Bit arrays
• Let h be a hash function that distributes possible values for A into n buckets:

h : Dom(A) { 0, … , n-1 }

• Bit array BA[0 .. n-1] over relation S is defined as:

BA[i] = 1 iff ∃ value v for attribute A in S s.t. h(v) = i
• Transfer BA (n bits) rather than A(S)
• A tuple of R with value v for attribute A belongs to R’ iff BA[h(v)] = 1
• R’ is an (over-)approximation of R ⋉ A S D I S T R I B U T E D DBMS
Ch.8/
14
BIT ARRAYS FOR SEMINOINS

R S • Recall:
id R A idS A o BA[i] = 1 iff ∃ value v for attribute A in S s.t. h(v) = i
o a tuple of R with value v for A belongs to R’ iff BA[h(v)] = 1
1 1 1 5
2 2 2 5
3 2 3 3 • h(x) = x mod 4
4 5 4 5 • n=4 (4 buckets)
5 4 5 3 • h(1) = h(5) = 1
6 5 • BA[0] = 0 (no value v occurs in S.A s.t. h(v) = 0)
7 4
8 5
• BA[1] = 1 (due to occurrence of 5 for attribute A in S)
• BA[2] = 0 (no value v occurs in S.A s.t. h(v) = 2)

R’ ⊋ R ⋉A S • BA[3] = 1 (due to occurrence of 3 for attribute A in S)

id R A idS A R’ contains tuple <1,1> that does not

1 1 4 5 belong to R ⋉ A S
4 5 6 5 However, R’ is a good approximation
6 5 8 5 because h has only one conflict (h(1) =
8 5 h(5)) among values for attribute A in R
and S
R’ : R ⋉ A S computed
with bit array
D I S T R I B U T E D DBMS
Ch.8/
15
SEMIJOINS FOR JOINS AMONG MULTIPLE
RELATIONS
• Semijoins to optimize joins among more than 2 operands
EMP ⋈ A S G ⋈ PROJ = EMP’ ⋈ ASG’ ⋈ PROJ
where EMP’ = EMP ⋉ ASG
and ASG’ = A S G ⋉ PROJ
• Each operand can be further reduced using more than one semijoin in cascade
EMP’’ = EMP ⋈ (ASG ⋈ PROJ)
We have size(ASG ⋈ PROJ) <= size(ASG) Semijoin
program
Therefore size(EMP’’) <= size(EMP’)

• Full reducer for a relation is the semijoin program that reduces the relation the most
• Finding full reducer for a relation with exhaustive brute force approach
➡ For cyclic queries full reducer cannot be found
✦ Solution: break the cycle
➡ With other queries: inefficient (NP-hard)
✦ Solution: only use semijoin when problem is simple
✓ e.g., for chained queries, where relations are in sequence and each one joins with
D I S the
T R I next
B U T one
ED DBMS
Ch.8/
16
DISTRIBUTED QUERY OPTIMIZATION

• We focus on optimization of joins

• The algorithm for optimizing a join is adapted from the one for the centralized
case
• In distributed context
➡ There is a coordinator (master site) where query is initiated

➡ Coordinator chooses
1. execution site and
2. transfer method
➡ Apprentice sites (where fragments are stored and queries are executed)
✦ Apprentices behave as in the case of centralized query optimization in optimizing
localized queries (over fragments) assigned to them
✓ Choose best join ordering, join algorithm, and access method for relations

D I S T R I B U T E D DBMS
Ch.8/
17
CHOICES OF THE MASTER SITE
1. Choice of the execution sites
➡ E.g., R ⋈ S can be executed:
✦ at the site where R is stored
✦ at the site where S is stored
✦ at a third site (e.g., where a 3rd relation waits to be joined – allows for parallel transfer)
2. Transfer method
➡ ship-whole: relation is transferred to the join execution site entirely
✦ In some cases (e.g., for outer relations of in case of merge join) there is no need to store the relation:
join as it arrives, in pipelined mode
➡ fetch-as-needed (only needed tuples are transferred, i.e., tuples selected by the join):
✦ equivalent to perform semijoin of one relation with tuple of the other one (to reduce size of the
former) before executing the join
✦ e.g., semi-join of inner relation wrt outer one (only needed tuples of inner relation are transferred)
✓ tuples of the outer relation are sent (only the join attribute) to the site of the inner relation
✓ matching tuples of the inner relation are sent to the site of the external relation to execute the join

Choices of the master produce 4 strategies (not all combinations are worth being considered)
D I S T R I B U T E D DBMS
Ch.8/
18
STRATEGY 1 – SHIP-WHOLE/INNER SITE

1. ship-whole/site of inner relation: move outer relation (R) to the site of the inner
relation (S)
• CT(x): communication time to transfer x bytes
(a) Retrieve outer tuples
• LT(x): local processing time to perform op. x
(b) Send them to the inner relation site • s = card(S ⋉ A R)/card(R): average number of
tuples of S that match a tuple of R
(c) Join them as they arrive

Total Cost = LT ( retrieve card(R) tuples from R )

+ CT ( size(R) )
+ LT ( retrieve s tuples from S ) * card(R)

Join is done as R comes because R is the outer relation

D I S T R I B U T E D DBMS
Ch.8/
19
STRATEGY 2 – SHIP-WHOLE/OUTER SITE

2. ship-whole/site of outer relation: move inner relation (S) to the site of outer
relation (R)
Cannot join as S arrives; it needs to be stored

Total cost = LT ( retrieve card( S ) tuples from S )

+ CT ( size(S) )
+ LT ( store card(S) tuples in temporary relation T)
+ LT ( retrieve card(R) tuples from R )
+ LT ( retrieve s tuples from T ) * card(R)
• CT(x): communication time to transfer x bytes
• LT(x): local processing time to perform op. x
• s = card(S ⋉ A R)/card(R): average number of
tuples of S that match a tuple of R

D I S T R I B U T E D DBMS
Ch.8/
20
STRATEGY 3 – FETCH-AS- NEEDED/OUTER
SITE

3. fetch-as-needed/site of outer relation

(a) Retrieve tuples at outer relation (R) site
(b) For each tuple of R, send join attribute values to inner relation (S) site
(c) Retrieve matching inner tuples at inner relation site
(d) Send the matching inner tuples to outer relation site
(e) Join as they arrive
Total Cost = LT ( retrieve card( R ) tuples from R )
+ CT ( length ( A ) ) * card ( R )
+ LT ( retrieve s tuples from S ) * card ( R )
+ CT ( s * length ( S ) ) * card ( R )
• CT(x): communication time to transfer x bytes
• LT(x): local processing time to perform op. x
• s = card(S ⋉ A R)/card(R): average number of
tuples of S that match a tuple of R
D I S T R I B U T E D DBMS
Ch.8/
21
STRATEGY 4 – MOVE BOTH RELATION AT THIRD
SITE

4. move both inner (S) and outer (R) relations to another site

Total cost = LT ( retrieve card ( S ) tuples from S )

+ CT ( size ( S ) )
+ LT ( store card(S) tuples in temporary relation T)
+ LT ( retrieve card ( R ) tuples from R )
+ CT ( size( R ) )
+ LT ( retrieve s tuples from T ) * card ( R )

• CT(x): communication time to transfer x bytes

• LT(x): local processing time to perform op. x Moving inner relation S first is
• s = card(S ⋉ A R)/card(R): average number of better so we can then join as outer
tuples of S that match a tuple of R relation R arrives
D I S T R I B U T E D DBMS
Ch.8/
22
STRATEGY COMPARISON

PROJ ⋈ PNO ASG

• PROJ (outer rel.) and A S G (inner rel.) are stored at different sites
• Index on P N O for relation ASG

1. Ship whole PROJ at site of ASG CT ( size(PROJ) )

2. Ship whole A S G at site of PROJ CT ( size(ASG) )
3. Fetch tuples of A S G as needed at site of PROJ
CT ( length ( A ) ) * card ( PROJ )
+ CT ( s * length ( A S G ) ) * card (PROJ )
4. Move both A S G and PROJ to a third site CT ( size ( A S G ) ) + CT ( size ( PROJ ) )

• If there is no upper level operation then 4 is a bad choice

• If size ( PROJ ) >> size ( A S G ), then 2 is a good choice (if local processing time is not too
bad compared with 1 and 3 (1 and 3 can exploit index on A S G in their local processing)
• If PROJ is large/few tuples of A S G match, then 3 is better than 1
• Otherwise, 1 is better than 3 D I S T R I B U T E D DBMS
Ch.8/
23
HYBRID APPROACH
• So far, focus on static approaches, i.e., strategies (QEP, expressed as decorated
trees) are evaluated and compared at compile time
• Advantages: query optimization is done once and used for several query
executions
• Disadvantages: cost evaluation is not that accurate
➡ it is not always done on exact values but on estimations based on statistics
✦ e.g., size of intermediate results
➡ some parameter of a query might be known only at runtime
• Problems of static query optimization are much more severe in the distributed
context: more infomation variability at runtime
➡ Sites may become unavailable or overloaded
➡ Selection of site and fragment copy should be done at runtime to
increase availability and load balancing
• A n hybrid solution (some decisions are taken at runtime) is implemented by means
of the C P (choose-plan) operator, which is resolved at runtime, when an exact plan
comparison can be done
THE CP (CHOOSE-PLAN) OPERATOR

SELECT *
FROM EMP, PAY
WHERE SALARY > $a
where $a is a variable whose value is specified by the user at runtime

CP
Normally, pushing 
inside ⋈ is a good
heuristics, but it can be
⋈  SALARY > $a bad if selection rate of
⋈ is higher than the
 SALARY > $a EMP ⋈ one of 

PAY PAY EMP

D I S T R I B U T E D DBMS
Ch.8/
25
2-STEP OPTIMIZATION
• 2-Step optimization: a simpler approach (more efficient, less exhaustive) than the
one based on CP operator; it reduces workload at runtime (no CP operator)
➡ At runtime labels are added about site and fragment copy selection only

1. At compile time, generate a

static plan with operation
ordering and access methods
only
2. At startup time, carry out site
and copy selection and
allocate operations to sites

• Site (and copy) selection is done in a greedy fashion

➡best load balancing,
➡best benefit (# of queries already executed at the site, possible saving of
communication costs as the site might have already data available)

D I S T R I B U T E D DBMS
Ch.8/
26

Texas A&M and Qatar Foundation Agreement
No ratings yet
Texas A&M and Qatar Foundation Agreement
75 pages
Lect#2 DDBS (Characteristics and Layers of Query Processing)
80% (10)
Lect#2 DDBS (Characteristics and Layers of Query Processing)
20 pages
BRF Example
100% (2)
BRF Example
16 pages
1 2e Query Optimization Ozsu ch8 SPLIT
No ratings yet
1 2e Query Optimization Ozsu ch8 SPLIT
29 pages
2e Query Optimization Ozsu ch8
No ratings yet
2e Query Optimization Ozsu ch8
26 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Distributed Databases: CS347 May 30, 2001
No ratings yet
Distributed Databases: CS347 May 30, 2001
48 pages
Query Processing
No ratings yet
Query Processing
121 pages
Query Optimization
No ratings yet
Query Optimization
29 pages
4 2 Query - Processing
No ratings yet
4 2 Query - Processing
106 pages
8 Query Optimization
No ratings yet
8 Query Optimization
53 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
DDBS Unit 2
No ratings yet
DDBS Unit 2
7 pages
DDS Unit - 2
No ratings yet
DDS Unit - 2
7 pages
Query
No ratings yet
Query
104 pages
Query Processing in Distributed Database
No ratings yet
Query Processing in Distributed Database
20 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
No ratings yet
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
42 pages
Outline: Distributed Query Processing
No ratings yet
Outline: Distributed Query Processing
8 pages
Lecture 06
No ratings yet
Lecture 06
41 pages
Zyqwadawfafslecture09 Query Optimization
No ratings yet
Zyqwadawfafslecture09 Query Optimization
90 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
6-Query Intro
No ratings yet
6-Query Intro
15 pages
10 DistQueryOptimization
No ratings yet
10 DistQueryOptimization
14 pages
Vu Lec 30
No ratings yet
Vu Lec 30
28 pages
Query
No ratings yet
Query
13 pages
07.overview of Query Processing
No ratings yet
07.overview of Query Processing
35 pages
Module 2
No ratings yet
Module 2
17 pages
Unit 1
No ratings yet
Unit 1
28 pages
7 Query Localization
No ratings yet
7 Query Localization
27 pages
Unit I (Distributed Databases)
No ratings yet
Unit I (Distributed Databases)
8 pages
Synchronization: Performed. Pecialized
No ratings yet
Synchronization: Performed. Pecialized
13 pages
Distributed Databases: by Allyson Moran
No ratings yet
Distributed Databases: by Allyson Moran
37 pages
7 Query Localization
No ratings yet
7 Query Localization
27 pages
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
No ratings yet
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
73 pages
4-Query Processing Nhom1
No ratings yet
4-Query Processing Nhom1
73 pages
Query Execution
No ratings yet
Query Execution
87 pages
4 Query Processing
No ratings yet
4 Query Processing
79 pages
Lecture5 - Query - Processing 1
No ratings yet
Lecture5 - Query - Processing 1
23 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
123 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
Centralized Versus Distributed DBMS: T T T T A A A A
No ratings yet
Centralized Versus Distributed DBMS: T T T T A A A A
3 pages
Chapter 5: Overview of Query Processing
No ratings yet
Chapter 5: Overview of Query Processing
18 pages
ADT Lecture 13
No ratings yet
ADT Lecture 13
15 pages
Distributed Query Processing Using Different Semijoin Operations
No ratings yet
Distributed Query Processing Using Different Semijoin Operations
26 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
7-Distributed DB
No ratings yet
7-Distributed DB
37 pages
Chhanda Ray - Distributed Database Systems (2009, Pearson Education) - Libgen - Li
No ratings yet
Chhanda Ray - Distributed Database Systems (2009, Pearson Education) - Libgen - Li
325 pages
L1 Distributed QueryProcessing
No ratings yet
L1 Distributed QueryProcessing
4 pages
Overview of Query Processing
No ratings yet
Overview of Query Processing
35 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Unit-V: Database Management System
No ratings yet
Unit-V: Database Management System
5 pages
SF8 - Unit 2 DDB
No ratings yet
SF8 - Unit 2 DDB
97 pages
Lecture 1 Ho
No ratings yet
Lecture 1 Ho
62 pages
Lecture 1 Ho PDF
No ratings yet
Lecture 1 Ho PDF
62 pages
Chapter 8
No ratings yet
Chapter 8
65 pages
Chapter 6
No ratings yet
Chapter 6
38 pages
UEU Basis Data Pertemuan 14
No ratings yet
UEU Basis Data Pertemuan 14
32 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
GreptimeDB Essentials: The Complete Guide for Developers and Engineers
From Everand
GreptimeDB Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Literature Survey On The Application of
No ratings yet
Literature Survey On The Application of
13 pages
Network Security Notes
No ratings yet
Network Security Notes
10 pages
Latex 04 (2) Compressed
No ratings yet
Latex 04 (2) Compressed
41 pages
Research Methodology
No ratings yet
Research Methodology
3 pages
Lec1-UNIT5 - MORE SIMPLER
No ratings yet
Lec1-UNIT5 - MORE SIMPLER
28 pages
Probability Solutions Problem1
No ratings yet
Probability Solutions Problem1
1 page
Bluetooth Security Protocols Are Designed To Protect The Confidentiality
No ratings yet
Bluetooth Security Protocols Are Designed To Protect The Confidentiality
4 pages
European Patent Application: Improved Blockchain Relying On Advanced Consensus Mechanism
No ratings yet
European Patent Application: Improved Blockchain Relying On Advanced Consensus Mechanism
64 pages
Latex 07
No ratings yet
Latex 07
21 pages
Interval Estimation Hypothesis Test Compressed
No ratings yet
Interval Estimation Hypothesis Test Compressed
75 pages
Primality Testing Notes
No ratings yet
Primality Testing Notes
22 pages
Latex Intro Compressed
No ratings yet
Latex Intro Compressed
17 pages
CSD505
No ratings yet
CSD505
1 page
UNIT-1 (Deleted Pages)
No ratings yet
UNIT-1 (Deleted Pages)
19 pages
Biba
No ratings yet
Biba
10 pages
Wep and Wpa
No ratings yet
Wep and Wpa
1 page
Solved Ellipticurve
No ratings yet
Solved Ellipticurve
3 pages
Distributed System
No ratings yet
Distributed System
7 pages
LEI Certificate BHAGWATI PROJECTS PRIVATE LIMITED
No ratings yet
LEI Certificate BHAGWATI PROJECTS PRIVATE LIMITED
1 page
Overseas Road Notes PDF
No ratings yet
Overseas Road Notes PDF
2 pages
WACC Nike
No ratings yet
WACC Nike
5 pages
The BPO
No ratings yet
The BPO
8 pages
G.R. No. 93397
No ratings yet
G.R. No. 93397
6 pages
Sulaiman 2020
No ratings yet
Sulaiman 2020
6 pages
Vitamin D - Wikipedia
No ratings yet
Vitamin D - Wikipedia
5 pages
Essentials of Pediatric Nursing 3rd Edition Kyle Solution Manual Unlocked Test Bank
No ratings yet
Essentials of Pediatric Nursing 3rd Edition Kyle Solution Manual Unlocked Test Bank
317 pages
Dossier PDF
No ratings yet
Dossier PDF
7 pages
Macdonald Lawrence Timber Framing LTD Wood Residu-Wageningen University and Research 248972
No ratings yet
Macdonald Lawrence Timber Framing LTD Wood Residu-Wageningen University and Research 248972
42 pages
Steel Section Tables PDF
No ratings yet
Steel Section Tables PDF
5 pages
Etech Midterms
No ratings yet
Etech Midterms
4 pages
PM Wbs Guide
No ratings yet
PM Wbs Guide
2 pages
Keyword Tool Export - Keyword Suggestions - Analyst Jobs
No ratings yet
Keyword Tool Export - Keyword Suggestions - Analyst Jobs
13 pages
PlasmaPro 80
No ratings yet
PlasmaPro 80
8 pages
Deriving Rate Laws Using The Steady-State Approximation - Part I
No ratings yet
Deriving Rate Laws Using The Steady-State Approximation - Part I
4 pages
This Report Contains Avanza Solutions
No ratings yet
This Report Contains Avanza Solutions
4 pages
Conceptualizing Public Diplomacy Social Convention Culinary Core
No ratings yet
Conceptualizing Public Diplomacy Social Convention Culinary Core
8 pages
Kahoot Results - NIKE SD 2
No ratings yet
Kahoot Results - NIKE SD 2
66 pages
Day2 05.10.2020
No ratings yet
Day2 05.10.2020
2 pages
Quamina Et Al. 2023
No ratings yet
Quamina Et Al. 2023
46 pages
BYD 10 Mins
No ratings yet
BYD 10 Mins
4 pages
Copy22-University of Kashmir
No ratings yet
Copy22-University of Kashmir
2 pages
Sudoku Workbook Final PDF
No ratings yet
Sudoku Workbook Final PDF
32 pages
3a Instruction Manual PCH 25-35-45 N0911638 GB
100% (1)
3a Instruction Manual PCH 25-35-45 N0911638 GB
39 pages
Apks Count List
No ratings yet
Apks Count List
4 pages
NJM2120 e
No ratings yet
NJM2120 e
9 pages
10 BEST Free YouTube Video Downloader
100% (1)
10 BEST Free YouTube Video Downloader
5 pages

Queryoptimization Examples

Uploaded by

Queryoptimization Examples

Uploaded by

DISTRIBUTED QUERY OPTIMIZATION

OUTLINE (DISTRIBUTED DB)

• Distributed Database Design (Ch. 3) ⋆

• Distributed Query Processing (Ch. 6-8) ⋆

➡ Query decomposition and data localization (Ch. 7) ⋆

• Distributed Transaction Management (Ch. 10-12) ⋆

• Distributed query optimization (Ch. 8) ⋆

• In previous chapter (Ch. 7) ⋆:

• In this chapter (Ch. 8) ⋆:

• The element of the optimization process are similar in distributed and

• Search space is large

• QEP are decorated with more information (on data exchange)

• Relation between centralized and distributed query optimization

✦ Distributed query optimization is a more general (and thus difficult) problem

➡ We focus on communication costs (local C P U and I/O costs are ignored)

✦ Clearly, cost of localized queries (handled with C Q O techniques) is computed as

• It is even more in distributed query optimization (affect communication costs)

• We assume query to be already localized (i.e., on fragments)

Consider PROJ ⋈ PN O A S G ⋈ EN O EMP Site 2

Site 3 computes EMP' ⋈ PROJ

• Consider the join of two relations:

• Perform the join

• Bit array BA[0 .. n-1] over relation S is defined as:

R’ ⊋ R ⋉A S • BA[3] = 1 (due to occurrence of 3 for attribute A in S)

id R A idS A R’ contains tuple <1,1> that does not

• We focus on optimization of joins

Total Cost = LT ( retrieve card(R) tuples from R )

Join is done as R comes because R is the outer relation

Total cost = LT ( retrieve card( S ) tuples from S )

3. fetch-as-needed/site of outer relation

Total cost = LT ( retrieve card ( S ) tuples from S )

• CT(x): communication time to transfer x bytes

PROJ ⋈ PNO ASG

1. Ship whole PROJ at site of ASG CT ( size(PROJ) )

• If there is no upper level operation then 4 is a bad choice

PAY PAY EMP

1. At compile time, generate a

• Site (and copy) selection is done in a greedy fashion

You might also like