0% found this document useful (0 votes)

51 views46 pages

DDB Lec5

card ( R - S )  card ( R) - card ( R  S )

Uploaded by

Akram Taha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views46 pages

DDB Lec5

card ( R - S )  card ( R) - card ( R  S )

Uploaded by

Akram Taha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Chapter 8

Optimization of Distributed Queries

Part-1-

1
2
Step 3: Global Query Optimization

 The query resulting from decomposition and

localization can be executed in many ways by
choosing different data transfer paths.

 We need an optimizer to choose a strategy close

to the optimal one.

3
Problem of Global Query Optimization

Input: Fragment query

Find the best (not necessarily optimal) global schedule
 Minimize a cost function
 Distributed join processing
– Bushy vs. linear trees
– Which relation to ship where?
– Ship-whole vs. ship-as-needed
 Decide on the use of semijoins
– Semijoin saves on communication at the expense of more local
processing
 Join methods
– Nested loop vs. ordered joins (merge join or hash join) 4
Cost-based Optimization
 Solution space
 The set of equivalent algebra expressions (query trees)
 Cost function (in terms of time)
 I/O cost + CPU cost + communication cost
 These might have different weights in different distributed
environments (LAN vs. WAN)
 Can also maximize throughput
 Search algorithm
 How do we move inside the solution space?
 Exhaustive search, heuristic algorithms (iterative
improvement, simulated annealing, genetic, …) 5
Query Optimization Process
input query

Search Space Transformation

Generation Rules

equivalent query execution plan

Search Strategy Cost Model

best query execution plan

6
Search Space

 Searchspace characterized by alternative

execution plans
 Focus on join trees
 ForN relations, there are O(N!) equivalent join
trees that can be obtained by applying community
and associativity rules.

7
Three Join Tree Examples
SELECT ENAME, RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO AND ASG.PNO=PROJ.PNO
(a) PNO (b) ENO

ENO PROJ PNO EMP

EMP ASG PROJ ASG

X ASG
PROJ EMP 8
Restricting the Size of Search Space

A large search space 

 optimization time much more than the actual execution time
 Restricting by means of heuristics
 Perform unary operations (selection, projection) when
accessing base relations
 Avoid Cartesian products that are not required by the query
– E.g., previous (c) query plan is removed from the search space

X ASG
PROJ EMP 9
Restricting the Size of Search Space (cont.)

 Restricting the shape of the join tree

 Consider only linear trees, ignore bushy ones
– Linear tree –at least one operand of each operator node is a base
relation
– Bushy tree – more general and may have operators with no base
relations as operands (i.e., both operands are intermediate relations)
Linear Join Tree Bushy Join Tree

R3
R1 R2 R3 R4
R1 R2 10
Search Strategy
 How to move in the search space?
 Deterministic and randomized
 Deterministic
 Starting from base relations, joining one more relation at
each step until complete plans are obtained
 Dynamic programming builds all possible plans first,
breadth-first, before it chooses the “best” plan
– the most popular search strategy
 Greedy algorithm builds only one plan, depth-first R4

R3 R3
R1 R2
R1 R2 R1 R2 11
Search Strategy (cont.)

 Randomized
 Trade optimization time for execution time
 Better when > 5-6 relations
 Do not guarantee the best solution is obtained, but avoid the
high cost of optimization in terms of memory and time
 Search for optimalities around a particular starting point
 By iterative improvement and simulated annealing

R3 R2
R1 R2 R1 R3
12
Search Strategy (cont.)

 First, one or more start plans are built by a greedy strategy

 Then, the algorithm tries to improve the start plan by
visiting its neighbors. A neighbor is obtained by applying a
random transformation to a plan.
– e.g., exchanging two randomly chosen operand relations of the
plan.

13
Cost Functions

 Total time
 the sum of all time (also referred to as cost) components

 Response Time
 the elapsed time from the initiation to the completion of the
query

14
Total Cost

 Summation of all cost factors

Total-cost = CPU cost + I/O cost + communication cost
CPU cost = unit instruction cost * no. of instructions
I/O cost = unit disk I/O cost * no. of I/O’s
communication cost = message initiation + transmission

15
Total Cost Factors

 Wide area network

 Message initiation and transmission costs high
 Local processing cost is low (fast mainframes or
minicomputers)

 Local area network

 Communication and local processing costs are more or
less equal.
 Ratio = 1:1.6
16
Response Time

 Elapsed time between the initiation and the

completion of a query

Response time = CPU time + I/O time + communication time

CPU time = unit instruction time * no. of sequential instructions
I/O time = unit I/O time * no. of. I/Os
communication time = unit message initiation time *
no. of sequential messages +
no. of sequential bytes

17
Example

 Assume that only the

communication cost is
considered

Total time = 2 ∗ message initialization time + unit transmission time

∗ (x+y)
Response time = max {time to send x from 1 to 3, time to send y
from 2 to 3}
time to send x from 1 to 3 = message initialization time +
unit transmission time ∗ x
time to send y from 2 to 3 = message initialization time +
unit transmission time ∗ y
18
Optimization Statistics
 Primary cost factor: size of intermediate relations
 The size of the intermediate relations produced
during the execution facilitates the selection of the
execution strategy
 This
is useful in selecting an execution strategy that
reduces data transfer
 The sizes of intermediate relations need to be
estimated based on cardinalities of relations and
lengths of attributes
 More precise  more costly to maintain
19
Optimization Statistics (cont.)

R [A1, A2,..., An] fragmented as R1,R2,…, Rn

 The statistical data collected typically are
 len(Ai), length of attribute Ai in bytes
 min(Ai) and max(Ai) value for ordered domains
 card(dom(Ai)), unique values in dom(Ai)
 Number of tuples in each fragment card(Rj)
 card ( Ai ( R j )) , the number of distinct values of Ai in
fragment Rj
 size(R) = card(R)*length(R)
20
Optimization Statistics (cont.)

 Selectivity factor of each operation for relations

 The join selectivity factor for R and S
– a real value between 0 and 1

card ( R >< S )
SF>< ( R, S ) 
card ( R)* card ( S )

21
Intermediate Relation Size
 Selection
card ( F ( R))  SF ( F )  card ( R )
1
SF ( A  value) 
card ( A ( R))
max( A)  value
SF ( A  value) 
max( A)  min( A)
value  min( A)
SF ( A  value) 
max( A)  min( A)
SF ( P ( Ai )  P ( Aj ))  SF ( P ( Ai ))  SF ( P ( Aj ))
SF ( P ( Ai )  P ( Aj )) 
SF ( P ( Ai ))  SF ( P ( Aj ))  SF ( P ( Ai ))  SF ( P ( Aj ))
SF ( A  {values})  SF ( A  value)  card ({values}) 22
Intermediate Relation Size (cont.)

 Projection

card ( A ( R))  the number of distinct values of A if A is a

single attribute, or card(R) if A contains
the key of R.

Otherwise, it’s difficult.

23
Intermediate Relation Size (cont.)
 Cartesian product
card ( R  S )  card ( R)  card (S )
 Union

Upper bound: card ( R  S )  card ( R)  card ( S )

Lower bound: card ( R  S )  max{card ( R), card (S )}
 Set Difference
Upper bound: card ( R  S )  card ( R)
Lower bound: 0

24
Intermediate Relation Size (cont.)

 Join
 No general way for its calculation. Some systems use the
upper bound of card(R*S) instead. Some estimations can
be used for simple cases.
 Special case: A is a key of R and B is a foreign key of S
card  R >< A B S   card  S 

 More general:
card  R >< A B S   SF>< ( R, S )* card ( R)* card  S 

25
Intermediate Relation Sizes (cont.)

 Semijoin

card (R A S) = SF (S.A) * card(R)

where
SF (R A S) = SF (S.A) = card ( A ( S )) / card  dom[ A]

26
Centralized Query Optimization

 Two examples showing the techniques

INGRES – dynamic optimization, interpretive

System R – static optimization based on
exhaustive search

27
INGRES Language: QUEL

 QUEL Language - a tuple calculus language

Example:
range of e is EMP
range of g is ASG
range of j is PROJ
retrieve e.ENAME
where e.ENO=g.ENO and j.PNO=g.PNO
and j.PNAME=”CAD/CAM”

Note: e, g, and j are called variables 28

INGRES Language: QUEL (cont.)

 One-variable query
Queries containing a single variable.
 Multivariable query
Queries containing more than one variable.
 QUEL can be equally translated into SQL. So we

just use SQL for convenience.

29
INGRES – General Strategy
 Decompose a multivariable query into a
sequence of mono-variable queries with a
common variable
 Processeach by an one variable query
processor
 Choose an initial execution plan (heuristics)
 Order the rest by considering intermediate relation
sizes
 No statistical information is maintained.
30
INGRES - Decomposition
 Replace an n variable query q by a series of
queries q1  q2  ...  qn, where qi uses the result of
qi-1.
 Detachment
 Query q decomposed into q’q’’, where q’ and q’’ have a
common variable which is the result of q’
 Tuple substitution
 Replace the value of each tuple with actual values and
simplify the query
q(V1,V2, ...,Vn )  (q '(t1,V2, ...,Vn ), t1  R)
31
INGRES – Detachment
q:
SELECT V2.A2, V3.A3, …, Vn.An
FROM R1 V1, R2 V2, …, Rn Vn
WHERE P1(V1.A1) AND
P2(V1.A1, V2.A2, …, Vn.An)

Note: P1(V1.A1) is an one-variable predicate,

indicating a chance for optimization, i.e. to
execute first expressed in following query.

32
INGRES – Detachment (cont.)
q: SELECT V2.A2, V3.A3, …, Vn.An
FROM R1 V1, R2 V2, …, Rn Vn
WHERE P1(V1.A1) AND P2(V1.A1, V2.A2, …, Vn.An)
q’ - one variable query generated by the single
variable predicate P1:
SELECT V1.A1 INTO R1’
FROM R1 V1
WHERE P1(V1.A1)

q’’ - in q, use R1’ to replace R1 and eliminate P1:

SELECT V2.A2, V3.A3, …, Vn.An
FROM R1’ V1, R2 V2, …, Rn Vn
WHERE P2(V1.A1, …, Vn.An) 33
INGRES – Detachment (cont.)

Note

• Query q is decomposed into q’  q’’

• It is an optimized sequence of query execution

34
INGRES – Detachment Example

Original query q1

SELECT E.ENAME
FROM EMP E, ASG G, PROJ J
WHERE E.ENO=G.ENO AND
J.PNO=G.PNO AND
J.PNAME=“CAD/CAM”

q1 can be decomposed into q11q12q13

35
INGRES – Detachment Example (cont.)
 First use the one variable predicate to get
q11 and q’ such that q = q11 q’
q11:
SELECT J.PNO INTO JVAR
FROM PROJ J
WHERE PNAME=“CAD/CAM”
q’:
SELECT E.ENAME
FROM EMP E, ASG G, JVAR
WHERE E.ENO=G.ENO
AND G.PNO=JVAR.PNO
36
INGRES – Detachment Example (cont.)
 Then q’ is further decomposed into q12q13

SELECT G.ENO INTO GVAR

q12 FROM ASG G, JVAR
WHERE G.PNO=JVAR.PNO

SELECT E.ENAME
q13 FROM EMP E, GVAR
WHERE E.ENO=GVAR.ENO

q11 is a mono-variable query

q12 and q13 are subject to tuple substitution
37
Tuple Substitution

 Assume GVAR has two tuples only: <E1> and

<E2>, then q13 becomes:

q131 SELECT EMP.ENAME

FROM EMP
WHERE EMP.ENO = “E1”

q132 SELECT EMP.ENAME

FROM EMP
WHERE EMP.ENO = “E2”

38
System R

 Static
query optimization based on exhaustive
search of the solution space
 Simple(i.e., mono-relation) queries are executed
according to the best access path
 Execute joins
 Determine the possible ordering of joins
 Determine the cost of each ordering
 Choose the join ordering with minimal cost

39
System R Algorithm

 For joins, two join methods are considered:

 Nested loops
for each tuple of external relation (cardinality n1)
for each tuple of internal relation (cardinality n2)
join two tuples if the join predicate is true
end
end
– Complexity: n1*n2
 Merge join
– Sort relations
– Merge relations
– Complexity: n1+n2 if relations are previously sorted and equijoin
40
System R Algorithm
 Hash join
– Assume hc is the complexity of the hash table creation, and
hm is the complexity of the hash match function.
– The complexity of the Hash join is O(N*hc + M*hm + J),
where N is the smaller data set, M is the larger data set, and J
is a complexity addition for the dynamic calculation and
creation of the hash function.

41
System R Algorithm - Example
Find names of employees working on the CAD/CAM project.
 Assume
 EMP has an index on ENO
 ASG has an index on PNO
 PROJ has an index on PNO and an index on PNAME

ASG
ENO PNO

EMP PROJ
42
System R Example (cont.)
 Choose the best access paths to each relation
 EMP: sequential scan (no selection on EMP)
 ASG: sequential scan (no selection on ASG)
 PROJ: index on PNAME (there is a selection on PROJ based on
PNAME)
 Determine the best join ordering
 EMP ASG PROJ
 ASG PROJ EMP
 PROJ ASG EMP
 ASG EMP PROJ
 EMP  PROJ ASG
 PROJ  EMP ASG
Select the best ordering based on the join costs evaluated
according to the two join methods 43
System R Example (cont.)
alternative joins

EMP ASG PROJ

EMP ASG EMP × PROJ ASG EMP ASG PROJ PROJ ASG PROJ × EMP

(ASG EMP) PROJ (PROJ ASG) EMP

 Best total join order is one of

(ASG EMP) PROJ (PROJ ASG) EMP
44
System R Example (cont.)

 (PROJ ASG) EMP has a useful index on the

select attribute and direct access to the join
attributes of ASG and EMP.

 Final plan:
 select PROJ using index on PNAME
 then join with ASG using index on PNO
 then join with EMP using index on ENO

45
46

CCDSALG Reviewer
No ratings yet
CCDSALG Reviewer
5 pages
CS218-Data Structures Final Exam
100% (2)
CS218-Data Structures Final Exam
7 pages
Vu Lec 33
No ratings yet
Vu Lec 33
36 pages
QueryProcessing Lect 3
No ratings yet
QueryProcessing Lect 3
26 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
4 2 Query - Processing
No ratings yet
4 2 Query - Processing
106 pages
Distributed Databases: CS347 May 30, 2001
No ratings yet
Distributed Databases: CS347 May 30, 2001
48 pages
Modified by Dr. ISSAM ALHADID 11/3/2019
No ratings yet
Modified by Dr. ISSAM ALHADID 11/3/2019
112 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
8 Query Optimization
No ratings yet
8 Query Optimization
53 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Adsa
No ratings yet
Adsa
8 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
Relational Query Optimization: Warih Maharani, ST.,MT
No ratings yet
Relational Query Optimization: Warih Maharani, ST.,MT
39 pages
Query Processing
No ratings yet
Query Processing
39 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
DDBS Unit 2
No ratings yet
DDBS Unit 2
7 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
DS Unit-2 SearchSort
No ratings yet
DS Unit-2 SearchSort
24 pages
02.introduction Linear+Binar Search
No ratings yet
02.introduction Linear+Binar Search
13 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
Unit - 4 Search Trees
No ratings yet
Unit - 4 Search Trees
192 pages
DS PPT
No ratings yet
DS PPT
221 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Uol Algorithms
No ratings yet
Uol Algorithms
215 pages
Searching For Solution
No ratings yet
Searching For Solution
41 pages
DBMS IMPORTANT UNIT-4 QUESTIONS and Answer
No ratings yet
DBMS IMPORTANT UNIT-4 QUESTIONS and Answer
5 pages
Algorithms Rosen
No ratings yet
Algorithms Rosen
18 pages
Unit-2 21CSC206T
No ratings yet
Unit-2 21CSC206T
67 pages
Chapter 9 Searching
No ratings yet
Chapter 9 Searching
47 pages
1 Preliminaries: Data Structures and Algorithms
No ratings yet
1 Preliminaries: Data Structures and Algorithms
21 pages
Search 4 PDF
No ratings yet
Search 4 PDF
5 pages
Search 4
No ratings yet
Search 4
5 pages
Iare DS PPT 0
No ratings yet
Iare DS PPT 0
221 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
Dsa - Two Marks All Units
No ratings yet
Dsa - Two Marks All Units
18 pages
Unit-Iv Syllabus What Is Greedy Approach?: Greedy: Interval Scheduling, Minimum Cost
No ratings yet
Unit-Iv Syllabus What Is Greedy Approach?: Greedy: Interval Scheduling, Minimum Cost
15 pages
Data Structure: Chapter 1 - Basic Concepts
No ratings yet
Data Structure: Chapter 1 - Basic Concepts
32 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Database Placement in Communication Networks For Minimizing The Overall Transmission Cost
No ratings yet
Database Placement in Communication Networks For Minimizing The Overall Transmission Cost
13 pages
01 Algo
No ratings yet
01 Algo
95 pages
AI Unit 2
No ratings yet
AI Unit 2
123 pages
Lecture6 Informed Search
No ratings yet
Lecture6 Informed Search
39 pages
DB - Lecture Query Optimization
No ratings yet
DB - Lecture Query Optimization
80 pages
DSA Assignment
No ratings yet
DSA Assignment
6 pages
University Solution 19-20
No ratings yet
University Solution 19-20
33 pages
Unitwise Definitions
No ratings yet
Unitwise Definitions
6 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
Algorithm Analysis Big Oh: Data Structures and Design With Java and Junit
No ratings yet
Algorithm Analysis Big Oh: Data Structures and Design With Java and Junit
45 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Dsaa 15
No ratings yet
Dsaa 15
1 page
AI and ECE
No ratings yet
AI and ECE
129 pages
DAA Module1
No ratings yet
DAA Module1
61 pages
Python
No ratings yet
Python
4 pages
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
Advanced Computer Network: Local Area Networks
No ratings yet
Advanced Computer Network: Local Area Networks
19 pages
5.AD and DA Converters
No ratings yet
5.AD and DA Converters
7 pages
Advanced Computer Network: Virtual Local Area Networks Vlan
No ratings yet
Advanced Computer Network: Virtual Local Area Networks Vlan
10 pages
BOOTP, DHCP, and ICMP - Notes
No ratings yet
BOOTP, DHCP, and ICMP - Notes
13 pages
IPv6 - Notes
No ratings yet
IPv6 - Notes
9 pages
6.timer 8254
No ratings yet
6.timer 8254
15 pages
6.timer 8254
No ratings yet
6.timer 8254
15 pages
Data Communications: Multiplexing
No ratings yet
Data Communications: Multiplexing
49 pages
3.IO and Device Interfacing
No ratings yet
3.IO and Device Interfacing
17 pages
4.motor Application
No ratings yet
4.motor Application
8 pages
Outline of Class: Internal Organization of A Microcomputer
No ratings yet
Outline of Class: Internal Organization of A Microcomputer
6 pages
Experiment No.4: Content: D/A Converter Specification D/A Converter Interface Examples On D/A Converter
No ratings yet
Experiment No.4: Content: D/A Converter Specification D/A Converter Interface Examples On D/A Converter
6 pages
DDB Lec 4 PDF
No ratings yet
DDB Lec 4 PDF
69 pages
1.introductions - Notes PDF
No ratings yet
1.introductions - Notes PDF
9 pages
Experiment No (3) Operation of Dot Matrix LED: Theory
No ratings yet
Experiment No (3) Operation of Dot Matrix LED: Theory
7 pages
Data Communications: Error Detection and Correction
No ratings yet
Data Communications: Error Detection and Correction
38 pages
Ali Mahmood - Micro - Lab
No ratings yet
Ali Mahmood - Micro - Lab
23 pages
DDB Lec 1
No ratings yet
DDB Lec 1
18 pages
Buckling of Elliptical Plates Under Uniform Pressure
No ratings yet
Buckling of Elliptical Plates Under Uniform Pressure
22 pages
Find The Optimal Solution To The Linear Programming Model With He Integer Restrictions Relaxed
No ratings yet
Find The Optimal Solution To The Linear Programming Model With He Integer Restrictions Relaxed
10 pages
3 (Energy & Power Signal)
100% (1)
3 (Energy & Power Signal)
10 pages
S C EE: ECES Exam Blueprint v1
No ratings yet
S C EE: ECES Exam Blueprint v1
3 pages
Flow Chart
No ratings yet
Flow Chart
9 pages
202104 - 공공분야 인공지능 도입 실무 안내서 PDF
No ratings yet
202104 - 공공분야 인공지능 도입 실무 안내서 PDF
74 pages
Agents and Environment
No ratings yet
Agents and Environment
35 pages
Exercise 3B: NP P X
No ratings yet
Exercise 3B: NP P X
3 pages
Introduction To Course
No ratings yet
Introduction To Course
17 pages
A Survey of Path Planning Algorithms For Mobile Robots
No ratings yet
A Survey of Path Planning Algorithms For Mobile Robots
21 pages
Unit IV Morphology Introduction Lecture
No ratings yet
Unit IV Morphology Introduction Lecture
16 pages
Dissertacao Mest XuYang
No ratings yet
Dissertacao Mest XuYang
67 pages
Signal Flow Graphs
No ratings yet
Signal Flow Graphs
13 pages
FVBSN
No ratings yet
FVBSN
4 pages
2D FFT Without Using 1D FFT A PREPRINT
No ratings yet
2D FFT Without Using 1D FFT A PREPRINT
9 pages
BUS336 A3 Spring 2024
No ratings yet
BUS336 A3 Spring 2024
4 pages
Investigations Into The Kaprekar Process
No ratings yet
Investigations Into The Kaprekar Process
22 pages
Microelectronic Devices Circuits and Systems Second International Conference ICMDCS 2021 Vellore India February 11 13 2021 Revised Selected Papers 1st Edition V. Arunachalam (Editor) - The latest ebook is available, download it today
100% (3)
Microelectronic Devices Circuits and Systems Second International Conference ICMDCS 2021 Vellore India February 11 13 2021 Revised Selected Papers 1st Edition V. Arunachalam (Editor) - The latest ebook is available, download it today
76 pages
Automation and Robotics PDF
No ratings yet
Automation and Robotics PDF
32 pages
Traversable Wormholes Time Travel
No ratings yet
Traversable Wormholes Time Travel
2 pages
Machine Translation
No ratings yet
Machine Translation
10 pages
Sl. No. Experiments/Programs Cos
No ratings yet
Sl. No. Experiments/Programs Cos
17 pages
MATHESH Matlab Final Output
No ratings yet
MATHESH Matlab Final Output
19 pages
Ii Puc Maths Assingment
No ratings yet
Ii Puc Maths Assingment
40 pages
Sciencedirect: Survey On Anomaly Detection Using Data Mining Techniques
No ratings yet
Sciencedirect: Survey On Anomaly Detection Using Data Mining Techniques
6 pages
C Program Algorithm
No ratings yet
C Program Algorithm
3 pages
Data Analysis Activity 2
No ratings yet
Data Analysis Activity 2
4 pages
Machine Learning in Statistical Arbitrage
No ratings yet
Machine Learning in Statistical Arbitrage
5 pages
Classification
No ratings yet
Classification
81 pages
What Is Deep Learning and How Does It Work - Towards Data Science
No ratings yet
What Is Deep Learning and How Does It Work - Towards Data Science
38 pages

DDB Lec5

Uploaded by

DDB Lec5

Uploaded by

Chapter 8

Optimization of Distributed Queries

 The query resulting from decomposition and

 We need an optimizer to choose a strategy close

Input: Fragment query

Search Space Transformation

equivalent query execution plan

Search Strategy Cost Model

best query execution plan

 Searchspace characterized by alternative

ENO PROJ PNO EMP

A large search space 

 Restricting the shape of the join tree

 First, one or more start plans are built by a greedy strategy

 Summation of all cost factors

 Wide area network

 Local area network

 Elapsed time between the initiation and the

Response time = CPU time + I/O time + communication time

 Assume that only the

Total time = 2 ∗ message initialization time + unit transmission time

R [A1, A2,..., An] fragmented as R1,R2,…, Rn

 Selectivity factor of each operation for relations

card ( A ( R))  the number of distinct values of A if A is a

Otherwise, it’s difficult.

Upper bound: card ( R  S )  card ( R)  card ( S )

card (R A S) = SF (S.A) * card(R)

 Two examples showing the techniques

INGRES – dynamic optimization, interpretive

 QUEL Language - a tuple calculus language

Note: e, g, and j are called variables 28

just use SQL for convenience.

Note: P1(V1.A1) is an one-variable predicate,

q’’ - in q, use R1’ to replace R1 and eliminate P1:

• Query q is decomposed into q’  q’’

q1 can be decomposed into q11q12q13

SELECT G.ENO INTO GVAR

q11 is a mono-variable query

 Assume GVAR has two tuples only: <E1> and

q131 SELECT EMP.ENAME

q132 SELECT EMP.ENAME

 For joins, two join methods are considered:

EMP ASG PROJ

(ASG EMP) PROJ (PROJ ASG) EMP

 Best total join order is one of

 (PROJ ASG) EMP has a useful index on the

You might also like