Chapter 1 Query Processing and Optimization
Chapter 1 Query Processing and Optimization
Optimization
1
Query Processing
Activities of retrieving data from the database.
• Extracting data from the database.
• In query processing, it takes various steps for
fetching the data from the database
• Aims of QP:
• transform query written in high-level language (e.g.
SQL), into correct and efficient execution strategy
expressed in low-level language (implementing RA);
• execute strategy to retrieve required data.
2
Query Processing
query
Processor
Slide 15- 4
Query Optimization
5
Query Optimization Algorithm
• Compute alternative plans
• Compute estimated cost of each plan
• Compute number of I/Os
• Compute CPU cost
6
Query optimization
• Query optimization
• Conducted by a query optimizer in a DBMS
• Goal: select best available strategy for executing
query
• Based on information available
• Most RDBMSs use a tree as the internal
representation of a query
Slide 19- 7
Phases of Query Processing
• QP for centralized database has four main
phases:
• decomposition (consisting of parsing and
validation);
• optimization;
• code generation;
• execution.
8
Phases of Query Processing
central database
9
Phases of Query Processing central
database
Query Optimizer
Execution Plan
Result of Query
•.
Query Processing central database
Query Processing
A query expressed in a high-level query language such as SQL must be scanned, parsed,
and validate
Scanner: The scanner identifies the language tokens such as SQL Keywords, attribute
names, and relation names in the text of the query. Therefore, scanner identify the
language tokens.
Parser: The parser checks the query syntax to determine whether it is formulated
according to the syntax rules of the query language. translate the query into its internal
form. This is then translated into relational algebra. Parser checks syntax, verifies
relations. A query is checked for syntax errors. Then it converts it into the parse tree.
Therefore, parser check query syntax.
Validation: The query must be validated by checking that all attributes and relation
names are valid and semantically meaningful names in the schema of the particular
database being queried. Therefore, validator check all attribute and relation names are
valid
Query Processing
Parsing: Firstly, the query will be parsed. This will also check whether the syntax is correct or
not. Then this query will be converted into a parse tree. This tree will look like this.
Eg. SELECT first_name, last_name FROM ninjas WHERE question_solved > 50;
SELECT
|
+-- first_name
|
+-- last_name
FROM
|
+-- ninjas
WHERE
|
+-- question_solved
|
+-- >
|
+-- 50
Example
• In SQL, a user wants to fetch the records of the employees
whose salary is greater than or equal to 10000. For doing this,
the following query is undertaken:
• select emp_name from Employee where salary>10000;
• Thus, to make the system understand the user query, it needs to
be translated in the form of relational algebra. We can bring this
query in the relational algebra form as:
• σsalary>10000 (πsalary (Employee))
• πsalary (σsalary>10000 (Employee))
• After translating the given query, we can execute each relational
algebra operation by using different algorithms. So, in this way,
a query processing begins its working.
Query Processing
• 1. Use Where Clause instead of having: This means that using Where
instead of having will enhance the performance and minimize the time
taken
• 2. Avoid Queries inside a Loop: This is one of the best optimization
techniques that you must follow. Running queries inside the loop will
slow down the execution time to a great extent. To avoid this, all the
queries can be made outside loops, and hence, the efficiency can be
improved.
Query Processing
MINUS, – )
CARTESIAN PRODUCT ( x )
DIVISION
DNO = 4 (EMPLOYEE)
Select the employee tuples whose salary is greater than $30,000:
SALARY > 30,000 (EMPLOYEE)
Translating SQL Queries into Relational Algebra
In general, the select operation is denoted by
<selection condition>(R) where
the symbol (sigma) is used to denote the select
operator
the selection condition is a Boolean (conditional)
expression specified on the attributes of relation R
tuples that make the condition true are selected
appear in the result of the operation
tuples that make the condition false are filtered out
discarded from the result of the operation
Translating SQL Queries into Relational Algebra
In general, the select operation is denoted by
<selection condition>(R) where
Five operations for demonstration:
– (OP1): σSSN=01234567890 (EMPLOY EE)
– (OP2): σDNUMBER>5 (DEPARTMENT)
– (OP3): σDNO=5 (EMPLOY EE)
– (OP4): σDNO=5 and SALARY >30000 and SEX=0F0
(EMPLOY EE)
– (OP5): σESSN=01234567890 and P NO=10
(WORKS_ON)
Translating SQL Queries into Relational Algebra
SELECT Operation Properties
The SELECT operation <selection condition>(R) produces a relation
S that has the same schema (same attributes) as R
SELECT is commutative:
<condition1>( < condition2> (R)) = <condition2> ( < condition1> (R))
Because of commutativity property, a cascade (sequence) of
SELECT operations may be applied in any order:
<cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> ( R)))
A cascade of SELECT operations may be replaced by a
single selection with a conjunction of all the conditions:
<cond1>(< cond2> (<cond3>(R)) = <cond1> AND < cond2> AND < cond3>(R)))
The number of tuples in the result of a SELECT is less than
(or equal to) the number of tuples in the input relation R
Translating SQL Queries into Relational Algebra
PROJECT Operation is denoted by (pi)
This operation keeps certain columns (attributes)
from a relation and discards the other columns.
PROJECT creates a vertical partitioning
The list of specified columns (attributes) is kept in
each tuple
The other attributes in each tuple are discarded
Example: To list each employee’s first and last
name and salary, the following is used:
LNAME, FNAME,SALARY(EMPLOYEE)
Translating SQL Queries into Relational Algebra
The general form of the project operation is:
<attribute list>(R)
(pi) is the symbol used to represent the project
operation
<attribute list> is the desired list of attributes from
relation R.
The project operation removes any duplicate
tuples
This is because the result of the project operation
must be a set of tuples
Mathematical sets do not allow duplicate elements.
Translating SQL Queries into Relational Algebra
PROJECT Operation Properties
The number of tuples in the result of projection
<list>(R) is always less or equal to the number of
tuples in R
If the list of attributes includes a key of R, then the
number of tuples in the result of PROJECT is equal
to the number of tuples in R
PROJECT is not commutative
<list1> ( <list2> (R) ) = <list1> (R) as long as <list2>
contains the attributes in <list1>
Translating SQL Queries into Relational Algebra
Query block:
The basic unit that can be translated into the
algebraic operators and optimized.
A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
Nested queries within a query are identified as
separate query blocks.
Aggregate operators in SQL must be included in
the extended algebra.
Translating SQL Queries into Relational
Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
• Inner block
• Outer block
Slide 18- 33
Translating SQL Queries
Slide 18- 34
SQL Query
Example : consider the following subset of the engineering
database schema
EMP(ENO, ENAME, TITLE)
ASG(ENO, PNO, RESP, DUR)
“Find the names of employees who are managing a project”
SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND RESP = ‘‘Manager’’
35
Translating SQL Queries to RA
SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND RESP = ‘‘Manager’’
36
JOIN Operation
– Implementing the JOIN Operation on two tables
• Two operations for demonstration:
- EMPLOY EE DNO=DNUMBER DEPARTMENT
- DEPARTMENT MGRSSN=SSN EMPLOY EE
37
Relational Algebra
38
Complexity of Relational Operations
The simplest way of defining complexity is in terms of relation
cardinalities independent of physical implementation details
such as fragmentation and storage
Operation Complexity
Select O(n)
Project
Project (with duplicate elimination) O(nlog n)
Group
Join O(nlog n)
Semi-join
Division
Set Operations
Cartesian Product O(n2)
39
𝐒𝐐𝐋 𝐐𝐔𝐄𝐑𝐘 𝐎𝐏𝐓𝐈𝐌𝐈𝐙𝐀𝐓𝐈𝐎𝐍
41
Heuristics Query Optimization
Heuristics: Make a sequence of choices for query based on
heuristics.
not optimal (near to optimal)
regroup common sub-expressions
perform selection, projection first
replace a join by a series of semi-joins
reorder operations to reduce intermediate relation size
optimize individual operations
Only one plan is generated!
However, it did not work well.
It is not cost based, and can pick expensive plans
This is done by moving the select and project operations down the query tree. This
reduces the number of tuples available for join.
Perform the most restrictive select/project operations at first before the other
operations.
Avoid cross-product operation since they result in very large-sized intermediate tables
42
Heuristic optimization algorithms
43
Using Heuristics in Query Optimization
44
Heuristic Optimization of Query Trees
The query parser will typically generate a standard initial query
tree to correspond to an SQL query, without doing any
optimization.
The heuristic query optimizer transform this initial query tree
(inefficient) to a final query tree that is efficient to execute.
Steps in converting a query tree during heuristic optimization.
Step 1. Initial (canonical) query tree for SQL query Q.
Step 2: Moving SELECT operations down the query tree
Step 3: Applying the more restrictive SELECT operation first.
Step 4: Replacing CARTESIAN PRODUCT and SELECT with
JOIN operations.
Step 5: Moving PROJECT operations down the query tree
45
Heuristic Optimization of Query Trees
46
Heuristic Optimization of Query Trees
47
Heuristic Optimization of Query Trees
– initial query tree ⇒ query tree after pushing down selection operation.
48
Heuristic Optimization of Query Trees
49
Heuristic Optimization of Query Trees
50
Heuristic Optimization of Query Trees
51
Using Selectivity and Cost Estimates in Query
Optimization
52
Cost Components for Query Execution
54
Estimating Cost
• What needs to be considered:
• Disk I/Os
• sequential
• random
• CPU time
• Network communication
• What are we going to consider:
• Disk I/Os
• page reads/writes
• Ignoring cost of writing final output
55
Estimating Cost
In a distributed database system, the total cost to be
minimized includes
I/O cost + CPU cost + communication cost
These might have different weights in different distributed
environments
Wide area networks
communication cost will dominate
low bandwidth
low speed
high protocol overhead
Local area networks
communication cost not that dominant
total cost function should be considered 56
Selectivity Cost-Based Optimization
Slide 19- 57
Using Selectivity and Cost Estimates in Query
Optimization
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
58
Different Strategies
• Three equivalent RA queries are:
(1) (position='Manager') (city='London')
(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position='Manager') (city='London')(
Staff Staff.branchNo=Branch.branchNo Branch)
(3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo
(city='London' (Branch))
59
Different Strategies
• Assume:
• 1000 tuples in Staff; 50 tuples in Branch;
• 50 Managers; 5 London branches;
• no indexes or sort keys;
• results of any intermediate operations stored on disk;
• cost of the final write is ignored;
• tuples are accessed one at a time.
60
Cost Comparison
• Cost (in disk accesses) are:
61
Exhaustive Search Optimization
Input language – relational calculus or relational algebra
In these techniques, for a query, all possible query plans
are initially generated and then the best plan is selected
Exhaustive search
cost-based
This is cost based, which is good.
However, it is too expensive!
The search space is much too large.
Optimal
combinatorial complexity in the number of relations
Exhaustive search techniques is suitable for queries with
a few relations,
62
Semantic Query Optimization
Slide 19- 63
Semantic Query Optimization
Slide 19- 65
Dynamic versus Static Optimization
• QP can be carried out:
• dynamically every time query is run;
• statically when query is first submitted.
• Advantages of dynamic QO arise from fact that
information is up to date.
• Disadvantages are that performance of query is
affected, time may limit finding optimum strategy.
66
© Pearson Education Limited 1995, 2005
Dynamic versus Static Optimization
• Advantages of static QO are removal of runtime
overhead, and more time to find optimum
strategy.
• Disadvantages arise from fact that chosen
execution strategy may no longer be optimal
when query is run.
• Could use a hybrid approach to overcome this.
67
© Pearson Education Limited 1995, 2005
Optimization Timing
Static
compilation optimize prior to the execution
difficult to estimate the size of the intermediate results error propagation
can amortize over many executions
E.g. R*
Dynamic
run time optimization
exact information on the intermediate relation sizes
have to reoptimize for multiple executions
E.g. Distributed INGRES
Hybrid
compile using a static algorithm
if the error in estimate sizes > threshold, reoptimize at run time
E.g. MERMAID
68
Database Statistics
• Success of estimation depends on amount and
currency of statistical information DBMS holds.
• Keeping statistics current can be problematic.
• If statistics updated every time tuple is changed,
this would impact performance.
• DBMS could update statistics on a periodic basis,
for example nightly, or whenever the system is
idle.
69
Database Statistics of Optimization
Relation
cardinality
size of a tuple
fraction of tuples participating in a join with another relation
Attribute
cardinality of domain
actual number of distinct values
Common assumptions
independence between different attribute values
uniform distribution of attribute values within their domain
70
Typical Statistics for Relation R
71
© Pearson Education Limited 1995, 2005
Typical Statistics for Attribute A of Relation R
72
© Pearson Education Limited 1995, 2005
Layers of Distributed database Query
Processing
73
Step 1 Query Decomposition
It Decompose calculus query to algebraic query on distributed R
Input : Calculus query on global relations.
Output: relational algebra on global relation
Query decomposition can be viewed as four successive steps
74
Normalization
Lexical and syntactic analysis
check validity (similar to compilers)
check for attributes and relations
type checking on the qualification
There are two possible normal forms for the predicate, one giving
precedence to the AND (^) and the other to the OR (V).
Put into normal form
Conjunctive normal form(In other words, a statement in CNF is a series of ORs
connected by ANDs. For example, (A OR B) AND (C OR D) is in CNF
(p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn)
Disjunctive normal form
(p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧pmn)
OR's mapped into union
AND's mapped into join or selection
77
Analysis
• Analyze the query lexically and syntactically
using compiler techniques.
• Verify relations and attributes exist.
• Verify operations are appropriate for object type.
78
Relations
Analysis - Example
80
Semantic Analysis
• For these queries, could construct:
• A relation connection graph.
• Normalized attribute connection graph.
81
Semantic Analysis
• Rejects normalized queries that are incorrectly
formulated or contradictory.
• Query is incorrectly formulated if components
do not contribute to generation of result.
• Query is contradictory if its predicate cannot be
satisfied by any tuple.
• Algorithms to determine correctness exist only
for queries that do not contain disjunction and
negation.
82
Analysis
Remove incorrect queries
Type incorrect
If any of its attribute or relation names are not defined in the global schema
If operations are applied to attributes of the wrong type
Semantically(meaningfully) incorrect general query
If its Components do not contribute in any way to the generation of the
result
Not possible for general queries but only a subset of relational calculus
queries can be tested for correctness
However, it is possible to do so for a large class of relational queries,
those which do not contain disjunction and negation
Technique to detect semantically incorrect queries
connection graph (query graph) that represent the semantic of the query
join graph(sub graph of query graph that considered only the join
83
Analysis
• Finally, query transformed into some internal
representation more suitable for processing.
• Some kind of query tree is typically chosen,
constructed as follows:
• Leaf node created for each base relation.
• Non-leaf node created for each intermediate relation
produced by RA operation.
• Root of tree represents query result.
• Sequence is directed from leaves to root.
84
Analysis – Example
the query graph is connected , the query is semantically correct . Find the names
and responsibilities of programmers who have been working on the CAD/CAM
project for more than 3 years.”
Select ENAME,RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"
85
Analysis
If the query graph is not connected, the query is
semantically wrong.
SELECT ENAME,RESP, PNAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"
86
Analysis
There are basically three solutions to the problem:
1) reject the query
2) assume that there is an implicit Cartesian product
between relations ASG and PROJ
3) Infer (conclude) from the schema the missing join
predicate ASG.PNO = PROJ.PNO which transforms the
query into that
4) Relation connection graph not fully connected, so query
is not correctly formulated semantically.
87
Simplification
• Detects redundant qualifications,
• eliminates common sub-expressions,
• transforms query to semantically equivalent but
more easily and efficiently computed form.
88
Simplification
Elimination of Redundancy
Such redundancy and thus redundant work may be
eliminated by simplifying the qualification with the
following well-known idempotency rules and
equivalence rules for logical operator (AND, OR and
Negation)
89
Simplification – Example
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”
OR (NOT(EMP.TITLE = “Programmer”)
AND [(EMP.TITLE = “Programmer”)
OR EMP.TITLE = “Elect. Eng.”)]
AND NOT(EMP.TITLE = “Elect. Eng.”) )
Let p3=EMP.ENAME=“J.Doe”
Let P1=EMP.TITLE=“Prpgramer”
Let P2=EMP.TITLE=“Elect.Egn.”
90
Simplification
91
Simplification – Example
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”
OR (NOT(EMP.TITLE = “Programmer”)
AND (EMP.TITLE = “Programmer”)
OR EMP.TITLE = “Elect. Eng.”)
AND NOT(EMP.TITLE = “Elect. Eng.”) )
The simplifed query:
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”
92
Restructuring (Rewriting)
The last step of query decomposition rewrites the query in
relational algebra.
For the sake of clearness to represent the relational
algebra query graphically by an operator tree.
An operator tree is a tree in which a leaf node is a relation
stored in the database, and a non-leaf node is an
intermediate relation produced by a relational algebra
operator.
The sequence of operations is directed from the leaves to
the root, which represents the answer to the query.
93
Restructuring (Rewriting)
How to draw query Tree
In SQL, the leaves are immediately available in the FROM
clause. (relations(tables) are leaves (FROM clause)
Second, the root node is created as a project operation
involving (the result attributes are root). These are found in
the SELECT clause in SQL.
Third, the qualification (SQL WHERE clause) is translated into
the appropriate sequence of relational operations (select, join,
union, etc.) going from the leaves to the root. Intermediate
leaves should give a result from the leaves to root
The sequence can be given directly by the order of appearance
of the predicates and operators.
94
Restructuring (Rewriting)
Convert relational calculus to
relational algebra
Make use of query trees
Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 12 or 24 years.
SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR = 24)
95
Restructuring (Rewriting)
By applying transformation rules, many different trees
may be found equivalent to the one produced by the
method described above
There are six most useful equivalence rules, which
concern the basic relational algebra operators.
96
Restructuring –Transformation Rules
Commutativity of binary operations
R×S⇔S×R
R join S ⇔S join R
R∪S⇔S∪R
Associativity of binary operations
( R × S ) × T ⇔ R × (S × T)
( R join S) join T ⇔ R join (S join T)
Idempotence of unary operations
ΠA’(ΠA’(R)) ⇔ΠA’(R)
σp1(A1)(σp2(A2)(R)) = σp1(A1) ∧ p2(A2)(R)
where R[A] and A' ⊆ A, A" ⊆ A and A' ⊆ A"
Commuting selection with projection
97
Restructuring –Transformation Rules
Commuting selection with binary operations
σp(A)(R × S) ⇔ (σp(A) (R)) × S
σp(Ai)(R join(Aj,Bk) S) ⇔ (σp(Ai)(R)) join(Aj,Bk) S
σp(Ai)(R ∪ T) ⇔ σp(Ai)(R) ∪ σp(Ai)(T)
where Ai belongs to R and T
Commuting projection with binary operations
ΠC(R × S) ⇔ΠA’(R) × ΠB’(S)
ΠC(R join(Aj,Bk) S)⇔ΠA’(R) join(Aj,Bk) ΠB’(S)
ΠC(R ∪ S) ⇔ΠC (R) ∪ ΠC (S)
where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B' ⊆ B
98
Example
Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 12 or 24 years
SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR =
24)
99
Equivalent Query
100
Restructuring
σDur=12 v Dur=24
101
Step 2 – Data Localization
The localization layer translates an algebraic query on
global relations into an algebraic query expressed on
physical fragments.
Input :Algebraic query on conceptual schema
Goal: localize the queries data using information
stored in the fragment schema
Total tables are fragmented and store in different site
Fragmentation is defined through fragmentation rules,
which can be expressed as relational queries
102
Step 2 – Data Localization …
Objective to localize the query’s data using data
distribution information in the fragment schema
It identifies which fragments are involved in the query and
transforms the distributed query into fragment query
It can be done in two steps
103
104
Assume
EMP is fragmented into EMP1, EMP2,
EMP3 as follows:
EMP1=σENO≤“E3”(EMP)
EMP2= σ“E3”<ENO≤“E6”(EMP)
EMP3=σENO≥“E6”(EMP)
ASG fragmented into ASG1 and ASG2 as
follows:
ASG1=σENO≤“E3”(ASG)
ASG2=σENO>“E3”(ASG)
The localization program for an horizontally fragmented relation is
the union of the fragments.
109
110
Reduction for Primary Horizontal
Fragmentation
Assume
EMP is fragmented into EMP1, EMP2,
EMP3 as follows:
EMP1=σENO≤“E3”(EMP)
EMP2= σ“E3”<ENO≤“E6”(EMP)
EMP3=σENO≥“E6”(EMP)
ASG fragmented into ASG1 and ASG2 as
follows:
ASG1=σENO≤“E3”(ASG)
ASG2=σENO>“E3”(ASG)
The localization program for an horizontally
fragmented relation is the union of the
fragments.
112
Reduction with selection
113
Reduction for PHF
Reduction with selection
Relation R and FR={R1, R2, …, Rw} where Rj=σ pj(R)
σ pi(Rj)= φ if ∀x in R: ¬(pi(x) ∧ pj(x))
EMP1=σENO≤“E3”(EMP)
Example EMP2= σ“E3”<ENO≤“E6”(EMP)
SELECT * EMP3=σENO>“E6”(EMP)
FROM EMP
WHERE ENO=“E5”
114
115
Reduction for PHF
Reduction with join
Possible if fragmentation is done on join attribute
Distribute join over union
(R1 ∪ R2) join S ⇔ (R1 join S) ∪ (R2 join S)
Given Ri = σpi(R) and Rj = σpj(R)
Ri join Rj = φ if ∀x in Ri, ∀y in Rj: ¬(pi(x) ∧ pj(y))
116
Reduction for PHF
Reduction with join - Example
Assume EMP is fragmented into three
ASG1: σENO ≤ "E3"(ASG)
ASG2: σENO > "E3"(ASG) EMP1=σ (EMP)
ENO≤“E3”
Consider the query EMP2= σ“E3”<ENO≤“E6”(EMP)
SELECT * FROM EMP, ASG EMP3=σENO>“E6”(EMP)
WHERE EMP.ENO=ASG.ENO
117
Reduction for PHF
Reduction with join
The query reduced by distributing joins over unions and
applying rule 2 can be implemented as a union of three
partial joins that can be done in parallel
118
Reduction for VF
Find useless (not empty) intermediate relations
Relation R defined over attributes A = {A1, ..., An} vertically
fragmented as Ri = ΠA'(R) where A' ⊆ A:
ΠD,K(Ri) is useless if the set of projection attributes D is not in A’
Example: EMP1= ΠENO,ENAME(EMP);
EMP2= ΠENO,TITLE (EMP)
– By commuting the projection with the join (i.e., projecting
SELECT ENAME on ENO, ENAME), we can see that the projection on EMP 2
is useless because ENAME is not in EMP 2.
FROM EMP
119
Reduction for DHF
Rule :
Distribute joins over unions
Apply the join reduction for horizontal fragmentation
Example
ASG1: ASG JoinENO EMP1
ASG2: ASG JoinENO EMP2
EMP1: σTITLE=“Programmer” (EMP)
EMP2: σTITLE<>“Programmer” (EMP)
Query
SELECT *
FROM EMP, ASG
WHERE ASG.ENO = EMP.ENO
AND EMP.TITLE = “Mech. Eng.”
120
Reduction for DHF
121
Reduction for DHF
Joins over unions
122
Reduction for Hybrid Fragmentation
Combine the rules already specified:
Remove empty relations generated by contradicting selections
on horizontal fragments
Remove useless relations generated by projections on vertical
fragments
Distribute joins over unions in order to isolate and remove
useless joins
123
Reduction for Hybrid Fragmentation
Example
Consider the following hybrid
fragmentation:
EMP1=σENO≤"E4" (ΠENO,ENAME(EMP))
EMP2=σENO>"E4"
(ΠENO,ENAME(EMP))
EMP3= ΠENO,TITLE(EMP)
and the query
SELECT ENAME
FROM EMP
WHERE ENO=“E5”
124
Step 3. Global Query Optimization
Input: algebraic Fragment query
Goal : to determine an execution strategy close to
optimal solution and to determine cost function.
Find the best (not necessarily optimal) global schedule
Minimize a cost function
Distributed join processing
Which relation to ship where?
Ship-whole vs ship-as-needed
Decide on the use of semijoins
Semijoin saves on communication at the expense of more local
processing.
Join methods
nested loop vs ordered joins (merge join or hash join)
125
Cost-Based Optimization
Solution space
The set of equivalent algebra expressions (query trees).
Cost function (in terms of time)
I/O cost + CPU cost + communication cost
These might have different weights in different distributed
environments (LAN vs WAN).
Can also maximize throughput
Search algorithm
How do we move inside the solution space?
Exhaustive search, heuristic algorithms (iterative improvement,
simulated annealing, genetic,…)
126
Step 4. Local optimization
• Input: optimized fragment algebraic query.
• Output: optimized local algebraic query.
• Each sub quires executing at one site called
local query .
• This optimized use local schema and then
executed.
128
The end
Thank you
Question?