0% found this document useful (0 votes)
25 views

Query Processing 1

Uploaded by

asmeabeba12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Query Processing 1

Uploaded by

asmeabeba12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Query Processing

Connolly & Begg Chapter 19


Introduction
Ø In network and hierarchical DBMSs, low-level procedural query language is generally
embedded in high-level programming language.
Ø Programmer’s responsibility to select most appropriate execution strategy.
Ø With declarative languages such as SQL, user specifies what data is required rather than how
it is to be retrieved.
Ø Relieves user of knowing what constitutes good execution strategy.
Ø Also gives DBMS more control over system performance.
Ø Two main techniques for query optimization:
–heuristic rules that order operations in a query.
–comparing different strategies based on relative costs, and selecting one that minimizes
resource usage.
Ø Disk access tends to be dominant cost in query processing for centralized DBMS.
Query Processing
Ø Activities involved in retrieving data from the database. Aims of QP:
– transform query written in high-level language (e.g. SQL), into correct and efficient
execution strategy expressed in low-level language (implementing RA);
– execute the strategy to retrieve required data.
Ø Four main Phases: decomposition, optimization, code generation and execution.

Query Optimization
Activity of choosing an efficient execution strategy for processing query.
Ø As there are many equivalent transformations of same high-level query, aim of QO is to
choose one that minimizes resource usage.
Ø Generally, reduce total execution time of query.
Ø May also reduce response time of query.
Ø Problem computationally intractable with large number of relations, so strategy adopted is
reduced to finding near optimum solution.
Dynamic versus Static Optimization
Ø Two choices of decomposing and optimizing, when first three phases of QP can be carried out:
–dynamically every time query is run.
–Statically when query is parsed, validated and optimized once after submission.
Ø Advantages of dynamic QO arise from fact that information required to select an optimum strategy is
up-to-date.
Ø Disadvantages are that performance of query is affected, time may limit finding optimum strategy.
Ø Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy.
Ø Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query
is run.
Ø Could use a hybrid approach to overcome this.

1
Example 18.1 - Different Strategies

Find all Managers that work at a London branc h.

SELECT Sno, Fname, Position, Bno


FROM staff s, branch b
WHERE s.bno = b.bno AND
(s.position = ‘Manager’AND b.city = ‘London’);

Three equivalent RA queries are:

(1) σ (position='Manager') ∧ (city='London') ∧ (staff.bno=branch.bno) (Staff X Branch)

(2) σ (position='Manager') ∧ (city='London')(Staff staff.bno=branch.bno Branch)

(3) (σ position='Manager'(Staff)) staff.bno=branch.bno (σ city='London' (Branch))

Assumptions:
–1000 tuples in Staff; 50 tuples in Branch;
–50 Managers; 5 London branches;
–No indexes or sort keys;
–Results of any intermediate operations stored on disk;
–Cost of the final write is ignored;
–Tuples are accessed one at a time.
Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050
(2) 2*1000 + (1000 + 50) = 3 050
(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160

Cartesian product and join operations are much more expensive than selection,
and third option significantly reduces size of relations being joined together.

2
(1) (2)
Π Sno, Fname, Position, Bno Π Sno, Fname, Position, Bno

σ (position='Manager') ∧
(city='London') ∧ σ (position='Manager') ∧
(staff.bno=branch.bno) (city='London')

X
s.bno=b.bno
Staff Branch
Staff Branch
RELATIONAL
ALGEBRA TREE
(3)

Π Sno, Fname, Position, Bno

s.bno=b.bno

σ (position='Manager') σ (city='London')

Staff Branch

3
Phases of query Processing

QP has four main phases:


Ø decomposition (consisting
of parsing and validation)
Ø optimization
Ø code generation
Ø execution.

Query Decomposition
Aims are to transform high-level
query into RA query and check
that query is syntactically and
semantically correct.
Ø Typical stages are:
–analysis,
–normalization,
–semantic analysis,
–simplification,
–query restructuring.
Analysis Ø Finally, query transformed into some internal
Ø Analyze query lexically and representation more suitable for processing.
syntactically using compiler Ø Some kind of query tree is typically chosen,
techniques. constructed as follows:
Ø Verify relations and attributes exist. – Leaf node created for each base relation.
Ø Verify operations are appropriate for – Non-leaf node created for each intermediate
object type. relation produced by RA operation.
– Root of tree represents query result.
Example: – Sequence is directed from leaves to root.
SELECT staff_no
FROM staff Example 18.1 - R.A.T.
WHERE position > 10;
Ø This query would be rejected on two
grounds:
– Staff_No is not defined for Staff
relation (should be Sno).
– Comparison ‘>10’ is incompatible
with type Position, which is variable
character string.

4
Normalization
Ø Converts query into a normalized form for easier manipulation.
Ø Predicate can be converted into one of two forms:
–Conjunctive normal form:
(position = 'Manager' ∨ salary > 20000) ∧ (bno = 'B3')
–Disjunctive normal form:
(position = 'Manager' ∧ bno = 'B3' ) ∨ (salary > 20000 ∧ bno = 'B3')

Semantic Analysis
Ø Rejects normalized queries that are incorrectly formulated or contradictory.
Ø Query is incorrectly formulated if components do not contribute to generation of result.
Ø Query is contradictory if its predicate cannot be satisfied by any tuple.
Ø Algorithms to determine correctness exist only for queries that do not contain disjunction and
negation.
Ø For these queries, could construct:
–Construct a relation connection graph.
–Normalized attribute connection graph.

Relation connection graph


Create node for each relation and node for result. Create edges between two nodes that
represent a join, and edges between nodes that represent projection.
If not connected, query is incorrectly formulated.

Semantic Analysis - Normalized Attribute Connection Graph

Ø Create node for each reference to an attribute, or constant 0.


Ø Create directed edge between nodes that represent a join, and directed edge between
attribute node and 0 node that represents selection.
Ø Weight edges a → b with value c, if it represents inequality condition (a ≤b + c); weight edges
0 → a with -c, if it represents inequality condition (a ≥ c).
Ø If graph has cycle for which valuation sum is negative, query is contradictory.

5
Example 18.2 - Checking Semantic Correctness
SELECT p.pno, p.street
Ø Relation connection graph not fully
FROM renter r, viewing v, property_for_rent p
connected, so query is not correctly
WHERE r.rno = v.rno AND
formulated.
r.max_rent >= 500 AND
Ø Have omitted the join condition
r.pref_type = 'Flat' AND p.ono = 'CO93';
(v.pno = p.pno) .
Relation Connection graph

SELECT p.pno, p.street


FROM renter r, viewing v, property_for_rent p
WHERE r.max_rent > 500 AND
r.rno = v.rno AND v.pno = p.pno AND
r.pref_type = 'Flat' and r.max_rent < 200;

§ Create node for each reference to an attribute, or


constant 0.
§ Create directed edge between nodes that represent
a join, and directed edge between attribute node
and 0 node that represents selection.
§ Weight edges a → b with value c, if it represents
inequality condition (a ≤ b + c);
§ weight edges 0 → a with -c, if it represents
inequality condition (a ≥ c).
§ If graph has cycle for which valuation sum is
negative, query is contradictory.
Ø Normalized attribute connection graph has
cycle between nodes R.Max_Rent and 0
with negative valuation sum, so query is
contradictory.
Normalized attribute connection graph

Simplification
–Detects redundant qualifications,
–Eliminates common sub-expressions,
–Transforms query to semantically equivalent but more easily and efficiently
computed form.

Ø Typically, access restrictions, view definitions, and integrity constraints are


considered.
Ø Assuming user has appropriate access privileges, first apply well-known
idempotency rules of boolean algebra.
6
p∧ (p) ≡ p p∪ (p) ≡ p
p∧ false ≡ false p∪ false ≡ p
p∧ true ≡ p p∪ ≡ true
p∧ (~p) ≡ false p∪ (~p) ≡ true
p∧ (p) ≡ p p∪ (p) ≡ p
p∧ (p ∪ q) ≡ p p∪ (p ∧ q) ≡ p

Example:

CREATE VIEW staff3


AS SELECT sno, fname, lname, salary
FROM Staff SELECT *
WHERE bno = ‘B3’; FROM staff3
WHERE (bno = ‘B3’AND salary > 20000);

SELECT sno, fname, lname, salary


FROM Staff
WHERE (bno = ‘B3’AND salary > 20000) AND bno = ‘B3’;

WHERE (bno = ‘B3’AND salary > 20000);

Integrity Constraints to simplify queries:

CREATE ASSERTION only_manager_salary_high


CHECK ((position <> “Manager’AND salary <2000)
OR position = ‘Manager’AND salary > 20000));

SELECT *
FROM staff
WHERE (position = ‘Manager’AND salary < 15000);

Predicate in the WHERE clause is a contradiction of the constraint

There can be no tuples that satisfies this predicate.

7
Transformation Rules for RA Operations

Ø Conjunctive selection operations can cascade into individual selection operations (and vice
versa). Sometimes referred to as cascade of selection.

σ p ∧ q ∧ r(R) = σ p(σ q(σ r(R)))


Example: σ bno='B3' ∧ salary>15000(Staff) = σ bno='B3'(σ salary>15000(Staff))

Ø Commutativity of selection.

σp(σq(R)) = σq(σp(R))
Example: σbno='B3'(σsalary>15000(Staff)) = σsalary>15000(σbno='B3'(Staff))

Ø In a sequence of projection operations, only the last in the sequence is required.

Π LΠ M … Π N(R) = Π L (R)
Example:Π lnameΠ bno.name(Staff) = Π lname (Staff)

Ø Commutativity of selection and projection.


o If predicate p involves only attributes in projection list, selection and projection
operations commute:

Π Ai, … , Am(σp(R)) = σp(Π Ai, … , Am(R)) where p∈ {A1, A2, … , Am}


Example: Π fname,lname(σlname='Beech'(Staff)) = σlname='Beech'(Π fname,lname(Staff))

Ø Commutativity of theta-join (and Cartesian product).

R pS=S p R
RXS=SXR

o Rule also applies to equi-join and natural join.


Eexample: Staff staff.bno=branch.bno Branch = Branch staff.bno=branch.bno Staff

Ø Commutativity of selection and theta-join (or Cartesian product).


o If selection predicate involves only attributes of one of join relations, selection and join
(or Cartesian product) operations commute:
σp(R r S) = (σp(R)) rS
σp(R X S) = (σp(R)) X S where p∈ {A1, A2, … , An}

8
o If selection predicate is conjunctive predicate having form (p∧ q), where p only involves
attributes of R, and q only attributes of S, selection and theta-join operations commute
as:
σp ∧ q(R r S) = (σp(R)) r (σq(S))
σp ∧ q(R X S) = (σp(R)) X (σq(S))

Example: σposition='Manager' ∧ city='London'(Staff staff.bno=branch.bno Branch) =


(σposition='Manager'(Staff)) staff.bno=branch.bno (σcity='London' (Branch))

Ø Commutativity of projection and theta-join (or Cartesian product).


o If projection list is of form L = L1 ∪ L2, where L1 only involves attributes of R, and L2
only involves attributes of S, provided join condition only contains attributes of L,
projection and theta-join operations commute as:

Π L1∪ L2(R r S) = (Π L1(R)) r (Π L2(S))

o If join condition contains additional attributes not in L, say attributes M = M1 ∪ M2


where M1 only involves attributes of R, and M2 only involves attributes of S, a final
projection operation is required:

Π L1∪ L2(R r S) = Π L1∪ L2( (Π L1∪ M1(R)) r (Π L2∪ M2(S)))

Example:
Π position, city, bno(Staff staff.bno=branch.bno Branch) =
(Π position, bno(Staff)) staff.bno=branch.bno (Π city, bno (Branch))

and using the latter rule:


Π position, city(Staff staff.bno=branch.bno Branch) =
Π position, city ((Π position, bno(Staff)) staff.bno=branch.bno ( Π city, bno (Branch)))

Ø Commutativity of union and intersection (but not set difference).


R∪ S=S∪ R
R∩ S=S∩ R

Ø Commutativity of selection and set operations (union, intersection, and set difference).
σp(R ∪ S) = σp(S) ∪ σp(R)
σp(R ∩ S) = σp(S) ∩ σp(R)
σp(R - S) = σp(S) - σp(R)

9
Ø Commutativity of projection and union.
Π L(R ∪ S) = Π L(S) ∪ Π L(R)

Ø Associativity of union and intersection (but not set difference).


(R ∪ S) ∪ T = S ∪ (R ∪ T)
(R ∩ S) ∩ T = S ∩ (R ∩ T)

Ø Associativity of theta-join (and Cartesian product).


o Cartesian product and natural join are always associative:

(R S) T=R (S T)
(R X S) X T = R X (S X T)

o If join condition q involves attributes only from S and T, then theta-join is associative as
follows:

(R p S) q∧ r T=R p ∧ r (S q T)

Example:
(Staff staff.sno=property_for_rent.sno Property_for_Rent) ono=owner.ono ∧
staff.lname=owner.lname Owner =
Staff staff.sno=property_for_rent.sno ∧ staff.lname=lname (Property_for_Rent ono Owner)

Heuristical Processing Strategies


Ø Perform selection operations as early as possible.
o Keep predicates on same relation together.
Ø Combine Cartesian product with subsequent selection whose predicate represents join
condition into a join operation.
σR.a θ S.b (R X S) = R R.a θ S.b S)

Ø Use associativity of binary operations to rearrange leaf nodes so leaf nodes with most
restrictive selection operations executed first.
Ø Perform projection as early as possible.
o Keep projection attributes on same relation together.
Ø Compute common expressions once.
o If common expression appears more than once, and result not too large, store result
and reuse it when required.
o Useful when querying views, as same expression is used to construct view each time.

10
Example 18.3 Use of Transformation Rules

For prospective renters of flats, find properties that


match requirements and owned by CO93.

SELECT p.pno, p.street


FROM renter r, viewing v, property_for_rent p
WHERE r.pref_type = 'Flat' AND
r.rno = v.rno AND v.pno = p.pno AND
r.max_rent >= p.rent AND
r.pref_type = p.type AND p.ono = 'CO93';

11
12
Cost Estimation for RA Operations
Ø Many different ways of implementing RA operations.
Ø Aim of QO is to choose most efficient one.
Ø Use formulae that estimate costs for a number of options, and select one with lowest cost.
Ø Consider only cost of disk access, which is usually dominant cost in QP.
Ø Many estimates are based on cardinality of the relation, so need to be able to estimate this.

Database Statistics
Ø Success of estimation depends on amount and currency of statistical information DBMS
holds.
Ø Keeping statistics current can be problematic.
Ø If statistics updated every time tuple is changed, this would impact performance.
Ø DBMS could update statistics on a periodic basis, for example nightly, or whenever the
system is idle.

Typical Statistics for Relation R


ntuples(R) - number of tuples in R (cardinality).
bfactor(R) - blocking factor of R. (number of tuples of R that fit in one block).
nblocks(R) - number of blocks required to store R:

nblocks(R) = [ntuples(R)/bfactor(R)]

Typical Statistics for Attribute A of Relation R


ndistinctA(R) - number of distinct values that appear for attribute A in R.
minA(R),maxA(R) - minimum and maximum possible values for attribute A in R.
SCA(R) - selection cardinality of attribute A in R.
Average number of tuples that satisfy an equality condition on attribute A.
If values of A are uniformly distributed in R and that there is at least one value that
satisfy the condition, then:

1 if A is a key attribute of R
SCA(R) =
[ntuples(R)/ ndistinctA(R)] otherwise

Statistics for Multilevel Index I on Attribute A


nlevelsA(I) - number of levels in I.
nlfblocksA(I) - number of leaf blocks in I.

13

You might also like