0% found this document useful (0 votes)

35 views34 pages

Query Optimization in Distributed Database Systems

The document discusses query optimization in distributed database systems. It covers: 1. The framework for query optimization which involves determining data distribution, join order selection, and operation execution methods. 2. Transmission cost modeling which partitions optimization into distribution and local strategies considering data transmission. 3. Database profiling which provides statistics like cardinality, size, and distinct values to estimate partial results for operations like selection, projection, join, etc. 4. The architecture of query processing which includes parsing, rewriting, optimization, planning and execution stages.

Uploaded by

Minh Huy Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views34 pages

Query Optimization in Distributed Database Systems

Uploaded by

Minh Huy Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Query optimization in

distributed database systems

Framework for query optimization
• The selection of a query processing strategy
involves:
– determining the physical copies of the fragments upon
which to execute the query
– selecting the order of the execution of operations,
particularly, this involves the determination of a „good”
sequence of joins
– selecting the method for executing each operation

2
Transmission cost
• Transmission requirements are neutral with respect to
systems; they are typically a function of the amount of data
transmitted among sites
• The optimization of a distributed query can be partitioned
into two independent problems: the distribution of the
access strategy among sites, which is done considering
transmission only, and the determination of local access
strategies at each site, which use traditional methods of
centralized databases
• Transmission cost:
TC(X) = C0 + C1 * x

3
Database Profile
Database profile:
• The number of tuples in each relation Ri (card(Ri))
• The size of each attribute A (size(A) )
• The size of Ri (size(Ri)) is sum of the sizes of its attributes
• For each attribute A in each relation Ri: the number of
distinct values appearing in Ri (val(A[Ri])), max and min

LDBS1 LDBS2

Supply1 Supply2
Dept1 Dept2

4
Database Profile

Supply card(Supply)=50 000

SNUM PNUM DEPTNUM QUAN
size 6 7 2 10
val 3000 1000 30 500

Dept card(dept)= 30
DEPTNUM NAME AREA MGRNUM
size 2 15 1 7
val 30 30 6 30

5
Database Profile

Supply1 card(Supply1)=30 000 site(Supply1) = 1

SNUM PNUM DEPTNUM QUAN
size 6 7 2 10
val 1800 1000 20 500

Dept1 card(dept)= 10 site(Dept1) = 2

DEPTNUM NAME AREA MGRNUM
size 2 15 1 7
val 10 10 2 10

6
Profile of partial results of algebraic
operations - SELECTION
Let S denote the result of performing a unary relation over
a relation R
• Cardinality - to each selection we associate a selectivity
factor  which is the fraction of tuples satisfying it
In simple selection attribute = value (A=v),  can be
defined as follows:
 = 1/val(A[Ri])
under the assumptions that values are homogeneously
distributed. Thus
card(S) =  * card(R)

7
Profile of partial results of algebraic
operations - SELECTION
• Size: selection does not affect the size of relations
size(S) = size(R)
• Distinct values : depends on the selection criterion
Consider an attribute B which is not used in selection
formula. The determination of val(B[S]) may be as follows
Given n=card(R) - objects uniformly distributed over m =
val(B[R]) colors. How many different colors c= val(B[S])
are selected if we take just r objects?

8
Profile of partial results of algebraic
operations - SELECTION
• Yao approximation:

r, for r < m/2

c(n, m, r) = (r+m)/3 for m/2 < r < 2m
m, for r > 2m

9
Profile of partial results of algebraic
operations - PROJECTION
Let S denote the result of performing a unary relation over
a relation R
• Cardinality – projection affects the cardinality of
operands since duplicates are eliminated from the result.
This effect is difficult to evaluate, the following three rules
can be applied
– If the projection involves a single attribute A, set
card(S) = val(A[R])
– If the product  AiAttr(S) val(Ai[R]) is less than card(R), where
Attr(S) are the attributes in the result of the projection, set
card(S) =  AiAttr(S) val(Ai[R])

10
Profile of partial results of algebraic
operations - PROJECTION
– If the projection includes a key of R, set
card(S) = card(R)
• Note that if the system does not eliminate duplicates, the
cardinality of the result is the same as the cardinality of the
operand relation
• Size: the size of the result of a projection is reduced to the
sum of the sizes of attributes in its specification
• Distinct values : the distinct values of projected attributes
are the same as in the operand relation

11
Profile of partial results of algebraic
operations – GROUP BY
Let G denote the attributes on which the grouping is
performed, AF indicates the aggregate functions to be
evaluated
• Cardinality – we give an upper bound on the cardinality
of S:
card(S) <  AiG val(Ai[R])
• Size: for all attributes A appearing in G
size(R.A) = size (S.A)
• Distinct values : for all attributes A appearing in G
val(A[S]) = val(A[R])

12
Profile of partial results of algebraic
operations – UNION

• Cardinality – we have:
card(T) < card(R) + card(S)
Equality holds when duplicates are not eliminated
• Size: we have
size(T) = size(R) = size(S)
• Distinct values : an upper bound is
val(A[T]) < val(A[R]) + val(A[S])

13
Profile of partial results of algebraic
operations – DIFFERENCE
• Cardinality – we have:
max(0, card((R)-card(S)) < card(T) < card(R)
• Size: we have
size(T) = size(R) = size(S)
• Distinct values : an upper bound is
val(A[T]) < val(A[R])

14
Profile of partial results of algebraic
operations – CARTESIAN PRODUCT

• Cardinality – we have:
card(T) < card(R) x card(S)
• Size: we have
size(T) = size(R) + size(S)
• Distinct values : the distinct values of attributes are the
same as in the operand relation

15
Profile of partial results of algebraic
operations – JOIN
• Cardinality – estimating precisely the cardinality of T is
very complex; we can give an upper bound to card(T)
because card(T) < card(R) x card(S), but this value is
usually much higher than the actual cardinality. Assuming
that all the values of A in R appear also as values of B in S
and vice versa and that the two attributes are both
uniformly distributed over tuples of R and S, we have
card(T) = (card(R) x card(S))/val(A[R])
if one of the two attributes, say A, is a key of R, then
card(T) = card(S)

16
Profile of partial results of algebraic
operations – JOIN
• Size: we have
size(T) = size(R) + size(S)
In the case of natural join the size of the join attribute must
be subtracted from the size of the result
• Distinct values : if A is a join attribute, an upper bound is
val(A[T]) < min(val(A[R]), val(B[S]) )
if A is not a join attribute, an upper bound is
val(A[T]) < val(A[R]) + val(B[S])

17
Profile of partial results of algebraic
operations – SEMIJOIN
Consider the semijoin T=R SJ A=B S
• Cardinality – the estimation of the cardinality of T is
similar to that of a selection operation; we denote with 
the selectivity of the semijoin operation, which measures
the fraction of the tuples of R which belong to the result.
The estimation is the following:
 = 1/val(A[S]) / val(dom[A])
Given 
card(T) =  * card(R)

18
Profile of partial results of algebraic
operations – SEMIJOIN
• Size: The size of the result of a semijoin is the same as the
size of its first operand
size(T) = size(R)
• Distinct values : the number of distinct values of attributes
which do not belong to the semijoin specification can be
estimated using Yao’s formula with n= card(R),
m=val(A[R]), and r =card(T). If A is the only attribute
appearing in the semijoin specification, then
val(A[T]) =  * val(A[R])

19
Architecture of a Query Processing

Query result

Parser Catalog

Internal rep. plan query execution

plan
Query Query Plan Query
Rewrite Optimizer Refinement Execution
Engine
Internal rep.

Base data

20
Architecture of a Query Processing
• Parser: the query is parsed and translated into an internal
representation (flex and bison can be used for the
construction of SQL parser)
• Query Rewrite: query rewrite transforms a query in order
to carry out optimizations that are good regardless of the
physical state of the system (elimination of redundant
predicates, unnesting of subqueries, simplification of
expressions). Query rewrite is carried out by a rule engine
• Query Optimizer: this component carries out
optimizations that depend on the physical state of the
system. QO decides which index, which method, and in
which order to execute operations of a query.

21
Architecture of a Query Processing
• Query optimizer: in distributed system QO must decide at
which site each operation is to be executed. QO
enumerates alternative plans and chooses the best plan
using a cost estimation model
• Plan: specifies precisely how the query is to be executed.
The nodes are operators, and every operator carries out one
particular operation. The edges represent consumer-
producer relationships of operators.
• Plan Refinement: this component transforms the plan into
an executable plan. In DB2 this transformation involves
the generation of an assembler-like code to evaluate
expressions and predicates efficiently

22
Query evaluation plan
Site 0 PJ A1

NLJ A2=B2

scan

temp

receive receive

send send
PJ B3
PJ A3
SL C=cos
Inxscan(A) Scan(B)
23
Query evaluation plan
• Fragment reducers: a set of unary operations which apply
to the same fragment are collected into programs
• Binary operations: joins and unions
• Optimization graph: nodes represent reduced fragments,
and joins (unions) are represented by edges (hypernodes)

A2=B2

A B

24
Query Optimization (1)
• Plan enumeration with Dynamic Programming
Input: SPJ query q on relations R1, ..., Rn
Output: A query plan for q
1. for i=1 to n do {
2. optPlan({Ri}) = accessPlans(Ri)
3. prunePlans(optPlan({Ri}))
4. }
5. for i=2 to n do {
6. for all S  {R1, ..., Rn} such that |S| = i do {
7. optPlan(S) = 

25
Query Optimization (2)
8. for all O  S do {
9. optPlan(S) = optPlan(S) 
joinPlans(optPlan(O), optPlan(S-O))
10. prunePlans(optPlan(S))
11. }
12. }
13. }
14. return optPlan({R1, ..., Rn})

Problem: alternative plans cannot be immediately pruned

26
Query Optimization (3)

• Optimization criteria:
– Classic cost model (total time, total resource
consumption) – estimate the cost of every individual
operator of the plan and then sum up these costs – this
model is useful to estimate the overall throughput of a
system
– Mean response time model – estimate the lowest
response time of a query

27
Query Execution Techniques
• Row blocking – implementation of send and receive
operators is based on TCP/IP, UDP protocols;
idea: ship tuples in a blockwise fashion
• Optimization of Multicasts: send data sequentially
instead of sending data twice (NY  Berlin  Poznan)
• Joins with Horizontally Partitioned Data –
(A1  A2) JN B or (A1 JN B)  (A2 JN B)
If A and B are both partitioned than we have more plans
• Semijoin and Bloojoin programs

28
Semijoin Programs
• Semijoin between R and S over two attributes A and B is
defined as follows:
( R SJ A=B S) JN A=B S is equal R JN A=B S

1. Send PJ B (S) to site R at a cost

C0 + C1 * size(B) * val(B(S))
2. Compute semijoin on R at a null cost; Let R’= R SJ A=B S
3. Send R’ to site S at a cost
C0 + C1 * size(R) * card(R’)
4. Compute the join on site S at a null value

29
Reducers
• Semijoin programs can be regarded as reducers, i.e.
Operations that can be applied to reduce the cardinality of
their operands
• Let RED(Q, R) denote the set of reducer programs that can
be built for a given relation R in a given query Q
• There is one reducer program, element of RED(Q, R),
which reduces R more than all other programs – full
reducer
• The problem : find all full reducers for the relations of a
query (difficult task)
• Acyclic (tree queries) versus cyclic queries

30
Reducers
• Is it possible to give a limitation to the length of the full
reducer?
• Tree queries – YES
The limitation on the length of the full reducer amounts to
n-1, where n is the number of nodes of the tree
• Cyclic queries – NO
The limitation on the length of the ‘best’ reducer is linearly
bound by the number of tuples of some relations of the
query
• Best reducer does not mean full reducer

31
Example (1)
R S T
A B B C C A
1 a a x x 2
2 b b y y 3
3 c c z z 4

S
Cyclic query
B=B C=C
R T

A=A

The final result is empty relation; the length of the reducers

is 3*(m-1), where m is the number of tuples
32
Example (2)
R S T
A B B C C D
1 a a x x 10
2 b b y p 20
3 e c z q 30

S
Acyclic query
B=B C=C
R T

The final result - one tuple (a, x)

33
Testing the graph for cycles
• There are two cases in which cycles can be broken without
changing the meaning of the query
1. In the cycle (R.A=S.B), (S.B=T.C), (T.C=R.A), in which
R, S, T are relation names, and A, B, C are attributes, any
one of the edges can be dropped, as any edge can be
obtained from the remaining ones by transitivity.
2. In the cycle (R.A=S.B), (S.B=T.C), (T.C=R.D), we can
substitute (R.A=R.D) for (T.C=R.D) because, by
transitivity, T.C must equal R.A; the remaining graph
contains two edges (R.S) and (S.T) and is acyclic, because
an interrelation clause can be sabstituted by an intrarelation
clause
34

Resume
0% (1)
Resume
1 page
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
34 pages
15-QueryOptimization
No ratings yet
15-QueryOptimization
78 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
33 pages
Advanced Database System Chapter Two Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Two Query Processing and Optimization
50 pages
4 Chapter Four
No ratings yet
4 Chapter Four
34 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
DE_Module5_QueryOptimization
No ratings yet
DE_Module5_QueryOptimization
11 pages
ADB Slides 4
No ratings yet
ADB Slides 4
47 pages
CSE 544: Optimizations: Wednesday, 5/10/2006
No ratings yet
CSE 544: Optimizations: Wednesday, 5/10/2006
51 pages
DBMS - Unit 3 1
No ratings yet
DBMS - Unit 3 1
17 pages
Lecture 06
No ratings yet
Lecture 06
41 pages
Advanced Database
No ratings yet
Advanced Database
47 pages
CH 14 Updated
No ratings yet
CH 14 Updated
30 pages
ch2.pptx
No ratings yet
ch2.pptx
33 pages
dbi3
No ratings yet
dbi3
28 pages
Query Trees and Heuristics For Query Optimization
No ratings yet
Query Trees and Heuristics For Query Optimization
29 pages
Distributed Databases: CS347 May 30, 2001
No ratings yet
Distributed Databases: CS347 May 30, 2001
48 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
queryoptimization-examples
No ratings yet
queryoptimization-examples
26 pages
vu_Lec_33
No ratings yet
vu_Lec_33
36 pages
Query Processing and Optimization: Chapters 5.1, 23
No ratings yet
Query Processing and Optimization: Chapters 5.1, 23
55 pages
28-Query Processing-30-09-2024
No ratings yet
28-Query Processing-30-09-2024
17 pages
Week 4: Relational Algebra (Part II) : Database System Concepts
No ratings yet
Week 4: Relational Algebra (Part II) : Database System Concepts
35 pages
1.6 PPT - Query Optimization
No ratings yet
1.6 PPT - Query Optimization
53 pages
Unit 6: Query Processing and Optimization
No ratings yet
Unit 6: Query Processing and Optimization
21 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Chapter 2-Query Processing and Optimi
No ratings yet
Chapter 2-Query Processing and Optimi
43 pages
Slides-7-Relational Algebra
No ratings yet
Slides-7-Relational Algebra
31 pages
Chapter 2 Query processing and optimization [Autosaved]
No ratings yet
Chapter 2 Query processing and optimization [Autosaved]
35 pages
Q Optimizer
No ratings yet
Q Optimizer
15 pages
Relational Algebra Examples
No ratings yet
Relational Algebra Examples
35 pages
DBMS_Unit-2 relational algebra
No ratings yet
DBMS_Unit-2 relational algebra
113 pages
CH 11
No ratings yet
CH 11
19 pages
Relational Data Processing Models
No ratings yet
Relational Data Processing Models
29 pages
Chapter7-Relational Algebra
No ratings yet
Chapter7-Relational Algebra
41 pages
Topic 4
No ratings yet
Topic 4
10 pages
Chapter 01 Relational Algebra
No ratings yet
Chapter 01 Relational Algebra
92 pages
Relational Algebra
No ratings yet
Relational Algebra
34 pages
ADT Lecture 13
No ratings yet
ADT Lecture 13
15 pages
4 - Chapter 2 - Relational Model of Data - P3
No ratings yet
4 - Chapter 2 - Relational Model of Data - P3
41 pages
ADBS - Chapter Two
No ratings yet
ADBS - Chapter Two
41 pages
CS2202_RelAlgebra
No ratings yet
CS2202_RelAlgebra
55 pages
KD Query Processing1
No ratings yet
KD Query Processing1
32 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
UNIT 3 PRE
No ratings yet
UNIT 3 PRE
22 pages
Advanced D.base 4
No ratings yet
Advanced D.base 4
20 pages
Relational Model: What Are Query Languages?
No ratings yet
Relational Model: What Are Query Languages?
12 pages
Chapter 5 - Relational Algebra (1)
No ratings yet
Chapter 5 - Relational Algebra (1)
40 pages
CIS3530L06
No ratings yet
CIS3530L06
39 pages
RelationalAlgebra
No ratings yet
RelationalAlgebra
23 pages
08 Relational Algebra
No ratings yet
08 Relational Algebra
34 pages
Projecting Programs On Specifications Definitions and Applications
No ratings yet
Projecting Programs On Specifications Definitions and Applications
35 pages
Relational Algebra
No ratings yet
Relational Algebra
58 pages
chapter 1
No ratings yet
chapter 1
10 pages
Introduction To Database Systems: Relational Algebra
No ratings yet
Introduction To Database Systems: Relational Algebra
51 pages
Advanced Database Systems Chapter One Query Processing & Optimization
No ratings yet
Advanced Database Systems Chapter One Query Processing & Optimization
22 pages
Database Management Systems: Relational Algebra
No ratings yet
Database Management Systems: Relational Algebra
28 pages
Query Processing
No ratings yet
Query Processing
28 pages
DBMS Series 2
No ratings yet
DBMS Series 2
26 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Nube Software Solutions Pvt. LTD.: Product Presentation
No ratings yet
Nube Software Solutions Pvt. LTD.: Product Presentation
4 pages
Pointers in C: C and Data Structure - An Industrial Perspective
No ratings yet
Pointers in C: C and Data Structure - An Industrial Perspective
27 pages
Cohesion Coupling
No ratings yet
Cohesion Coupling
10 pages
Godel Machine PDF
No ratings yet
Godel Machine PDF
17 pages
Computer Science Class XI
No ratings yet
Computer Science Class XI
4 pages
Eson Overview
No ratings yet
Eson Overview
52 pages
MIT6 045JS11 Lec07
No ratings yet
MIT6 045JS11 Lec07
53 pages
String Instructions
No ratings yet
String Instructions
6 pages
Data Mining-Model Based Clustering
No ratings yet
Data Mining-Model Based Clustering
8 pages
Point Pattern Analysis in An ArcGIS Environment
No ratings yet
Point Pattern Analysis in An ArcGIS Environment
17 pages
MVS Commands
100% (1)
MVS Commands
6 pages
Pradnya Pramod Mohite: Budget Management App
No ratings yet
Pradnya Pramod Mohite: Budget Management App
22 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
48 pages
RaviKumar Gurrappagari PDF
No ratings yet
RaviKumar Gurrappagari PDF
8 pages
The Arsenal, The Armorty and The Library
100% (1)
The Arsenal, The Armorty and The Library
51 pages
Code Walkthrough
No ratings yet
Code Walkthrough
3 pages
Course Introduction - OOAD
No ratings yet
Course Introduction - OOAD
7 pages
New Indian Model School, Dubai: BYOD Programme
No ratings yet
New Indian Model School, Dubai: BYOD Programme
3 pages
Application Security by Design PDF
No ratings yet
Application Security by Design PDF
16 pages
Description of DBMS - STATS Oracle v12
No ratings yet
Description of DBMS - STATS Oracle v12
25 pages
Moving Cerberus' Data Directory (First Computer) : Chapter 1: Installing or Updating Cerberus
No ratings yet
Moving Cerberus' Data Directory (First Computer) : Chapter 1: Installing or Updating Cerberus
2 pages
Tivoli - Revised Script
No ratings yet
Tivoli - Revised Script
3 pages
How Oracle Uses Memory On Aix
100% (1)
How Oracle Uses Memory On Aix
29 pages
Preface: Internet User's Guide and Catalog, by Ed Krol. Unpacking Software Is Basically A Matter of
No ratings yet
Preface: Internet User's Guide and Catalog, by Ed Krol. Unpacking Software Is Basically A Matter of
430 pages
Vgis Performance
No ratings yet
Vgis Performance
13 pages
Somi Khan: Personal Objective
No ratings yet
Somi Khan: Personal Objective
2 pages
Msinfo 32
No ratings yet
Msinfo 32
1,238 pages
ZXVZXV
No ratings yet
ZXVZXV
4 pages
Rezolvari Final Sem Exam Oracle
No ratings yet
Rezolvari Final Sem Exam Oracle
22 pages

Query Optimization in Distributed Database Systems

Uploaded by

Query Optimization in Distributed Database Systems

Uploaded by

Query optimization in

distributed database systems

Supply card(Supply)=50 000

Supply1 card(Supply1)=30 000 site(Supply1) = 1

Dept1 card(dept)= 10 site(Dept1) = 2

r, for r < m/2

Internal rep. plan query execution

Problem: alternative plans cannot be immediately pruned

1. Send PJ B (S) to site R at a cost

The final result is empty relation; the length of the reducers

The final result - one tuple (a, x)

You might also like