Query Processing 1
Query Processing 1
Query Optimization
Activity of choosing an efficient execution strategy for processing query.
Ø As there are many equivalent transformations of same high-level query, aim of QO is to
choose one that minimizes resource usage.
Ø Generally, reduce total execution time of query.
Ø May also reduce response time of query.
Ø Problem computationally intractable with large number of relations, so strategy adopted is
reduced to finding near optimum solution.
Dynamic versus Static Optimization
Ø Two choices of decomposing and optimizing, when first three phases of QP can be carried out:
–dynamically every time query is run.
–Statically when query is parsed, validated and optimized once after submission.
Ø Advantages of dynamic QO arise from fact that information required to select an optimum strategy is
up-to-date.
Ø Disadvantages are that performance of query is affected, time may limit finding optimum strategy.
Ø Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy.
Ø Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query
is run.
Ø Could use a hybrid approach to overcome this.
1
Example 18.1 - Different Strategies
Assumptions:
–1000 tuples in Staff; 50 tuples in Branch;
–50 Managers; 5 London branches;
–No indexes or sort keys;
–Results of any intermediate operations stored on disk;
–Cost of the final write is ignored;
–Tuples are accessed one at a time.
Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050
(2) 2*1000 + (1000 + 50) = 3 050
(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
Cartesian product and join operations are much more expensive than selection,
and third option significantly reduces size of relations being joined together.
2
(1) (2)
Π Sno, Fname, Position, Bno Π Sno, Fname, Position, Bno
σ (position='Manager') ∧
(city='London') ∧ σ (position='Manager') ∧
(staff.bno=branch.bno) (city='London')
X
s.bno=b.bno
Staff Branch
Staff Branch
RELATIONAL
ALGEBRA TREE
(3)
s.bno=b.bno
σ (position='Manager') σ (city='London')
Staff Branch
3
Phases of query Processing
Query Decomposition
Aims are to transform high-level
query into RA query and check
that query is syntactically and
semantically correct.
Ø Typical stages are:
–analysis,
–normalization,
–semantic analysis,
–simplification,
–query restructuring.
Analysis Ø Finally, query transformed into some internal
Ø Analyze query lexically and representation more suitable for processing.
syntactically using compiler Ø Some kind of query tree is typically chosen,
techniques. constructed as follows:
Ø Verify relations and attributes exist. – Leaf node created for each base relation.
Ø Verify operations are appropriate for – Non-leaf node created for each intermediate
object type. relation produced by RA operation.
– Root of tree represents query result.
Example: – Sequence is directed from leaves to root.
SELECT staff_no
FROM staff Example 18.1 - R.A.T.
WHERE position > 10;
Ø This query would be rejected on two
grounds:
– Staff_No is not defined for Staff
relation (should be Sno).
– Comparison ‘>10’ is incompatible
with type Position, which is variable
character string.
4
Normalization
Ø Converts query into a normalized form for easier manipulation.
Ø Predicate can be converted into one of two forms:
–Conjunctive normal form:
(position = 'Manager' ∨ salary > 20000) ∧ (bno = 'B3')
–Disjunctive normal form:
(position = 'Manager' ∧ bno = 'B3' ) ∨ (salary > 20000 ∧ bno = 'B3')
Semantic Analysis
Ø Rejects normalized queries that are incorrectly formulated or contradictory.
Ø Query is incorrectly formulated if components do not contribute to generation of result.
Ø Query is contradictory if its predicate cannot be satisfied by any tuple.
Ø Algorithms to determine correctness exist only for queries that do not contain disjunction and
negation.
Ø For these queries, could construct:
–Construct a relation connection graph.
–Normalized attribute connection graph.
5
Example 18.2 - Checking Semantic Correctness
SELECT p.pno, p.street
Ø Relation connection graph not fully
FROM renter r, viewing v, property_for_rent p
connected, so query is not correctly
WHERE r.rno = v.rno AND
formulated.
r.max_rent >= 500 AND
Ø Have omitted the join condition
r.pref_type = 'Flat' AND p.ono = 'CO93';
(v.pno = p.pno) .
Relation Connection graph
Simplification
–Detects redundant qualifications,
–Eliminates common sub-expressions,
–Transforms query to semantically equivalent but more easily and efficiently
computed form.
Example:
SELECT *
FROM staff
WHERE (position = ‘Manager’AND salary < 15000);
7
Transformation Rules for RA Operations
Ø Conjunctive selection operations can cascade into individual selection operations (and vice
versa). Sometimes referred to as cascade of selection.
Ø Commutativity of selection.
σp(σq(R)) = σq(σp(R))
Example: σbno='B3'(σsalary>15000(Staff)) = σsalary>15000(σbno='B3'(Staff))
Π LΠ M … Π N(R) = Π L (R)
Example:Π lnameΠ bno.name(Staff) = Π lname (Staff)
R pS=S p R
RXS=SXR
8
o If selection predicate is conjunctive predicate having form (p∧ q), where p only involves
attributes of R, and q only attributes of S, selection and theta-join operations commute
as:
σp ∧ q(R r S) = (σp(R)) r (σq(S))
σp ∧ q(R X S) = (σp(R)) X (σq(S))
Example:
Π position, city, bno(Staff staff.bno=branch.bno Branch) =
(Π position, bno(Staff)) staff.bno=branch.bno (Π city, bno (Branch))
Ø Commutativity of selection and set operations (union, intersection, and set difference).
σp(R ∪ S) = σp(S) ∪ σp(R)
σp(R ∩ S) = σp(S) ∩ σp(R)
σp(R - S) = σp(S) - σp(R)
9
Ø Commutativity of projection and union.
Π L(R ∪ S) = Π L(S) ∪ Π L(R)
(R S) T=R (S T)
(R X S) X T = R X (S X T)
o If join condition q involves attributes only from S and T, then theta-join is associative as
follows:
(R p S) q∧ r T=R p ∧ r (S q T)
Example:
(Staff staff.sno=property_for_rent.sno Property_for_Rent) ono=owner.ono ∧
staff.lname=owner.lname Owner =
Staff staff.sno=property_for_rent.sno ∧ staff.lname=lname (Property_for_Rent ono Owner)
Ø Use associativity of binary operations to rearrange leaf nodes so leaf nodes with most
restrictive selection operations executed first.
Ø Perform projection as early as possible.
o Keep projection attributes on same relation together.
Ø Compute common expressions once.
o If common expression appears more than once, and result not too large, store result
and reuse it when required.
o Useful when querying views, as same expression is used to construct view each time.
10
Example 18.3 Use of Transformation Rules
11
12
Cost Estimation for RA Operations
Ø Many different ways of implementing RA operations.
Ø Aim of QO is to choose most efficient one.
Ø Use formulae that estimate costs for a number of options, and select one with lowest cost.
Ø Consider only cost of disk access, which is usually dominant cost in QP.
Ø Many estimates are based on cardinality of the relation, so need to be able to estimate this.
Database Statistics
Ø Success of estimation depends on amount and currency of statistical information DBMS
holds.
Ø Keeping statistics current can be problematic.
Ø If statistics updated every time tuple is changed, this would impact performance.
Ø DBMS could update statistics on a periodic basis, for example nightly, or whenever the
system is idle.
nblocks(R) = [ntuples(R)/bfactor(R)]
1 if A is a key attribute of R
SCA(R) =
[ntuples(R)/ ndistinctA(R)] otherwise
13