0% found this document useful (0 votes)
23 views20 pages

Adbms Unit2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views20 pages

Adbms Unit2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Unit:2 Query Processing

Definition:

• A procedure for transforming a high-level query(SQL) into a correct and efficient execution plan
in low level language.

• A query processor selects the appropriate execution plan to respond to the user request.

• Query processing goes through a series of query complication steps called Execution plan before
it begins execution.

Steps in Query Processing

1. Syntax Analyzer- Parses the query and checks if it obeys the syntax rules.

2. Query Decomposer-Translates the query into relational algebra query using


transformation rules.

3. Query Optimizer-Performing optimization using equivalent expressions.

4. Query Code Generator-Generating code for the queries.

5. Runtime Database Processor –Selecting the optimal plan and execution.

Syntax Analyser

• Takes up a query, parses it into tokens , analyses the tokens inorder to make sure it comply with
the rules of the language grammer.

• If an error is found, it is rejected and an error code together with explanation is returned.

• Ex: A simple grammar to implement a sql statement,

Query: Select clause+from clause+where clause

Select clause: ‘select’ +<column list>

From clause: ‘From’+ <table list>

Where clause: ‘where’+value1opvalue2

Value1:value/column name

Value2:value/column name

Op:+,-,*,/,=

The above grammar can be used to implement a query such as,


Select eno, ename from emp where eno=10;

2. Query Decomposition

The first phase of query processing whose aim is to transform a high level query into a
relational algebra query and to check if the query is syntactically and semantically correct.

This consists of the following 5 stages,

1. Query Analysis

2. Query Normalization

3. Semantic Analysis

4. Query Simplifier

5. Query Restructuring

Query Analysis

Equivalence Rules
Query Normalization

Data Dictionary
Semantic Analysis

Idempotency Rules
Query Simplifier

Transformation Rules
Query Restructuring

Relational Algebra Query

Query Analysis

• Query is lexically and syntactically analysed using the compilers(parsers) to find out syntax
errors.

• Syntactically Legal queries is then validated with the system catalogue, whether the relations
and attributes mentioned exists.

• The type specification of the query qualifiers are also checked at this stage.
• Ex: of Invalid query which is rejected..

Select ename, desig from emp where desig>100;

• The outcome of this phase is an internal representation called Query Tree Notation.

Query Tree Notation

It is a tree data structure that corresponds to a relational algebra expression.

Components of Query Tree

• Leaf nodes-base input relations of the query.

• Non-leaf nodes-represent the intermediate relation as a result of applying an operation

• Root represents the result of the query

• Sequence of operations is directed from leaves to root.

Initial Canonical Query Tree Notation

Ex:

• Project the projno,deptno,and the managers details for every project located in Mumbai.

RA expression is,

∏proj-no,dept-no,ename,addr,dob’(σproj-loc=‘mumbai’(PROJECT))’ X dept-no=dnum(DEPARTMENT)
X mgrid=empid(EMPLOYEE)

∏p.proj-no,p.dept-no,e.ename,e.addr,e.dob

Root

σproj-loc=‘mumbai’ and p.dept-no=d.dnum and .mgrid=e.empid

Intermediate Op

Leaf
E
X

P D Leaf
In the initial query tree notation, the Cartesian product is applied first, then the selection and join
conditions of where clause are applied, followed by the projection on the select attributes.

Query tree corresponding to the relational algebra expression.

∏p.proj_no,p.Deptno,e.emp_name,e.addr,e,dob

d.mgrid=e.empid

p.deptno=d.deptno

σProj_loc=’mumbai’
D

Leaf node
P

Query Graph Notation

A Query graph corresponds to a relational calculus expression.

• Relational nodes represented as single circle.

• Constant values from the query selection are represented as double circles.

• Selection and join are represented by edges.

• Attributes to be retrieved are put in [].

• Disadvantage: Does not indicate the order in which operation is performed.


P D E

‘Chennai’

2. Query Normalization

• Primary Goal: Avoid Redundancy.

• A set of equivalency rule is applied so that projection and selection operations are
simplified.

• Applying these rules the predicate is converted into 2 normal forms– CNF & DNF.

• Conjuctive Normal Form is a sequence of conjucts connected with ^(AND). Conjucts


can contain one or more terms connected with v (OR).

• Disjunctive Normal Form is a sequence of disjuncts connected with V (OR). Disjuncts


contain one or more terms connected with ^ (AND)

• DNF is often used as it allows the query to be broken into a series of independent sub-
queries linked by unions.

Example:

CNF-Conjuctive Normal Form

(emp-design=‘Programmer’ V emp-sal>40000)^ location=‘mumbai’

This can be expressed in DNF as

(emp-design=‘Programmer’ ^location=‘mumbai’)

(emp-sal>40000 ^ location=‘mumbai’)

The equivalency rules are,


• Rule:1 Commutativity of Unary operation:

Unary op1 unary op2 REL Unary op2 unary op1 REL

• Rule:2 Commutativity of Binary operation:

REL1 binop (REL2 binop REL3) (REL1 binop REL2) binop REL3

• Rule:3 Idempotency of Unary op

unaryop1 unaryop2 REL unaryop REL

• Rule: 4 Distributivity of unary w.r.t binary

unaryop (REL1 binop REL2) unaryop (REL1) binop unaryop(REL2)

• Rule:5 Factorisation

Unaryop (REL1) binop unaryop(REL2) unaryop(REL1 binop REL2)

2.3 Semantic Analyser

• Goal: Reduce the no. of Predicates that must be evaluated by refuting the incorrect or
contradictory queries.

• This rejects normalized query which are incorrectly formulated or which are contradictory.

• A query is Incorrectly formulated- if the query does not contribute to the result. Ex: missing joins

• A Query is Contradictory if the predicate cannot be satisfied by any tuple in the relation.

• Ex: emp_desig=‘programmer’ ^ emp_desig=‘analyst’ (Contradictory query)

• Semantic Analyser also examines the query to make sure that only data objects defined in the
catalogue is used.

Connection Graphs

• Connection graphs can be constructed to check the correctness and contradictions as follows.

1. Constructing a Query graph for relation

2. Constructing a Query graph for normalized attribute

Ex:1

Select (p.projno,p.projloc)

from project as p,viewing as v,depart as d


where d.deptid=v.deptid and d.maxbudget>=85000 and d.compl_year=‘2005’ and p.proj_
mgr=‘Mathew’;

• The Query graph representation is not fully connected which means the query is not correctly
formulated. In this graph the join condition v.proj_no=p.proj_no has been omitted.

D Result

V P

Incorrectly formulated Query Graph

• Example 2

Select (p.projno,p.loc)
from project as p,cost_of_project as c,depart as d
where d.max-budget>85000 and d.compl_year=‘2005’ and d.maxbudget<50000

This graph has a cycle between the nodes 0 and max_budget with a negative sum.

50000
0
2500

-2500

-85000 d.max_budg d.compl_


et year

2.4 Query simplifier

• Goal: Detect redundancy, eliminate common sub expressions and transform query to
semantically equivalent and easily computed forms.

• Integrity constraints, view definitions and access restrictions are considered here and the query
will be rejected if it is contradictory.

• The final form of simplification is obtained using the Idempotence rule of Boolean algebra
Idempotence Rules of Boolean Algebra

1. Pred AND Pred = Pred

2. Pred AND true = Pred

3. Pred AND False = False

4. Pred AND not(Pred) = False

5. Pred1 and (Pred1 or Pred2) = pred1

6. Pred OR Pred = Pred

7. Pred OR true = True

8. Pred OR False = Pred

9. Pred OR not(Pred) = True

10. Pred1 OR (Pred1 AND Pred2) = pred1

Example of Using Idempotence Rules.

Consider the following query.

Select d.dept_id,m.branch_mgr,m.branch_id,b,branch_id,b.nranch_loc,e.empname,e.sal

From department as d,manager as m,branch as b

Where d.dept-id=m.dept-id

And m.branch-id=b.branch-id

And b.branch_loc=’mumbai’

And not(b.branch_loc=’mumbai’and b.branch_loc=’delhi’)

And e.empsal>85000

And not(b.brach_loc=’Delhi’)

And d.dept_loc=’Bangalore’

Let us examine the following part of the query to greater detail

And b.branch_loc=’mumbai’

And not(b.branch_loc=’mumbai’and b.branch_loc=’delhi’)


Let us equate

b.branch_loc=’mumbai’=PRED1

b.branch_loc=’mumbai’=PRED2

b.branch_loc=’delhi’=PRED3

It can now be represented in the form of idempotence rule as

PRED1 and not(PRED2 and PRED3)

Applying the equivalence rule,

(PRED1 and not(PRED1))and not(PRED3)

By the rule 4 of idempotence law we get the following form

FALSE and not(PRED3)

This is equivalent to not(PRED3)

Now translating the predicate into sql we get

Not(b.branch_loc=’delhi’)

Thus the original query which contains many redundant predicate can be eliminated and the query is
simplified now.

2.5 Query Restructuring

This is the final stage of Query Decomposition.

The Query can be restructured to give a more efficient implementation using transformation
rules. The query can be considered as a relational algebra program.

3. Query Optimization

• Goal: Choose an efficient execution strategy for processing a query.

• Atempts to minimise the use of resources-I/Os and CPU time. By choosing the best access plans

• A query has many possible execution strategies, choosing the suitable one is called Query
Optimization
Block diagram of Query Optimizer Estimated Formula

Query Query Optimizer Join Manager


Simplifier

Statistical data

Execution Plan
Generator
Database
Cost Model Catalogue

Execution Plan

Four main input modules are

1. Relational algebra query from the query decomposer.

2. Estimated formulas used to determine the cardinality.

3. A cost model

4. Statistical data from database catalogue.

The basic issues in Query Optimization are

1. How to use available indexes.

2. How to use memory to accumulate information and perform sorting.

3. How to determine the order in which join is to be performed.

• Two techniques for implementing Query Optimization:

1. Heuristic Rule

2. Systematic estimation of the cost of different strategies.


Heuristic Query Optimization-Intro’

• Heuristics rules are applied to improve the performance of the query.

• One main heuristic rule is , Apply select before applying join or other operations. This reduces
the size of the file. The SELECT and PROJECT operation will reduce the size of the file and hence
should be applied before a JOIM or other binary operation.

• The heuristic query optimizer then transforms the initial query tree into a final query tree using
equivalence transformation rules. The final queries are efficient to execute.

• Cartesian product with selection can be replaced by a join.

• Projection operations must be included in the query tree.

Illustration of the steps involved in converting query tree during optimization

Consider the following relations on a company database,

Employee(emp-name,emp-id,dob,addr,sex,sal,dep_no)

Project(proj-name,proj-no,proj-loc,proj-deptno)

Works_on(eid,pno,hours)

Consider the query to find the names of employees born after 1970 who works on a project named
‘Growth’.

SELECT EMP-NAME

FROM EMPLOYEE,WORKS-ON,PROJECT

WHERE PROJ-NAME=’GROWTH’ AND PROJ-NO=PNO AND EID=EMP-ID AND DOB=’31-12-1970’;

∏emp_name

σproj-name=’growth’ and proj_no=p-no and eid=emp-id and dob>1970

project
X

Initial Query tree

employee Works_on
∏Emp_name

σproj_no=p-no

σeid=emp-id

σproj_name=’Growth’

σdob>1970 project

Works_on
employee

Improved Query tree by applying SELECT operation

∏emp_name

σe-id=emp-id

σproj_no=p-no σ dob>1970

X employee

σproject_name=’Growth’

Works_on
Project

Improved Query tree by applying restrictive SELECT operation


∏emp_name

eid=emp-id

∏Emp-name,emp-id

∏eid

σdob>1970

proj_no=p-no

employee

∏Proj_no

σproject_name=’Growth’ ∏ e-id,p-no

Works_on
project

Improved Query tree by moving PROJECT operations down with JOIN

Steps in Heuristic Optimization Algorithm

Step Action Description

1 Perform Selection at the earliest Use transformation rule to


break u any select
operations with
conjunctive condition into
a cascade of selec.

2. Perform commutativity of SELECT Use transformation rules


operation 2,4,6,9and move selection
operation as far down the
tree.

3. Combine the Cartesian product Use transformation rule


with subsequent select operation 13 to combine a Cartesian
and selection into a join
operation.

4. Use Commutativity and Use transformations rule


associativity of binary operations 5,11,12 to rearrange the
leaf nodes so that most
restrictive selections are
done first.

5. Perform Projection operation as Use transformation rules


early as possible 3,4,7,10,break down and
move the projection
attributes down the tree
as far as possible.

6. Computer common expressions Identify subtrees that can


once be executed by a single
algorithm

Heuristics Transformation Rules

1. Cascading of Selection: (σ)


σbranch_loc=‘chennai’ ^ σ sal>85000(EMP) =

σbranch_loc=‘chennai’(σsal>85000(EMP))

2. Commutativity of Selection (σ)

σbranch_loc=‘chennai’(σsal>85000(EMP)) =

σsal>85000 (σbranch_loc=‘chennai’(EMP))

3. Commutativity of Projection (∏)

∏ename ∏branch_loc,ename(EMP)=∏ename,branch_loc (EMP)

4. Commutativity of σ and ∏

∏ename,dob(σename=‘thomas’ (EMP) =
σename=‘thomas’(∏ename,dob(EMP))

5. Commutativity of Join and cartesian product.

R X S=SXR

R cS =S cR

6. Commutativity of selection and join or cartesian

σcR S=σc S

σc(RXS)=(σc(R)) X S

7. Commutativity of ∏, or X

∏l1U l2(R c S)=∏l1(R) c ∏l2(S)

8.Commutativity of Union & Intersection

R U S=S U R

R n S= S n R

9. Commutativity of selection and set operations.

σc(R U S)=σc(S) U σc (R)

σc(R n S)=σc(S) n σc (R)

σc(R - S)=σc(S) - σc (R)

10. Commutativity of ∏ and U

∏L(R U S)=∏L(R) U ∏L(S)

11. Associativity of Join & Cartesian

(R X S)X T=R X (SXT)

(R S) T=R (S T)

12. Associativity of Union & Intersection

(R U S)U T=S U (R U T)

13. Converting a selection & cartesian into Join σc(RXS)= R c S

COST ESTIMATION IN QUERY OPTIMIZATION


The main aim is to choose the most efficient way of implementing the relational algebra
operations at the lowest possible cost. The method of optimizing the query by choosing a strategy
with the minimum cost is called cost-based query optimization.

COST COMPONENTS OF QUERY EXECUTION

a. Access cost to secondary storage (Larger databases)

b. Storage Cost

c. Computation Cost (Smaller databases)

d. Memory Uses Cost

e. Communication Cost (Distributed databases)

To estimate the cost of the various execution strategies, the DBMS is expected to hold the following
types of information,

a. nTuples(R) - no of tuples in relation R

b. The average record size in R

c. The no. Of blocks required to store relation R, nBlocks(R)

d. The blocking factor of Relation R- bFactor(R)

e. Primary access method for each file

f. Primary access attributes of each file.

g. nLevels(I)-the no. of levels of each multi-level index I.

h. nBlocks(I)- the no. of first-level index blocks

i. nDistinct(R)- the no. of distinctive values that appear for attribute A

j. The selectivity of an attribute.

k. The selection cardinality of a attribute A in Relation R

ESTIMATED COST OF STRATEGIES FOR SELECTION OPERATION

STATEGIES Cost Estimates

Linear Search nBlocks(R)/2, if record is found

nBlocks(R), if no record satisfies

Binary Search Log2(nBlocks(R), if equity condition on key


attribute,

[log2(nBlocks(R)]+sc(R)/bFactor(R)-1,
otherwise

Using Primary key or 1, assuming no overflow


hash key to retrieve a
single record

Equity condition on nLevels(I)+1


primary key

Inequity condition on [nLevels(I)+1]+[nBlocks[R]/2]


primary key

Using inequality Nlevels(R)+[nlfBlocks(I)/2]+[nTuples(R)/2]


condition on a
secondary index (b+
tree)

Equity condition on [nLeveles(I)+1]+[SC(R)/bFactor(R)]


clustering index

Equity condition on [nLevels(I)+1]+[sc(R)]


non-clustering

Pipelining

• When a query is composed of several RA operators, the result of one operator is


pipelined to another operator without creating a temporary relation to hold the intermediate
result.

• This is called on-the-fly-processing.

• Pipelining is used to Improve the performance of the query.

• Adv: Saves the cost of creating temporary relations.

• Disadv: Input to the operations are not available all at once for processing.

Materialization
• If the output of an operator is saved in a temporary relation for processing by the next operator,
the tuples are said to be materialized.

• Process is called Materialization because the results of the intermediate operations are created
(or materialised) and then evaluated for next level operation.

• By repeating the process, the operation at the root of the tree is evaluated giving the final result
of the execution.

STRUCTURE OF QUERY EVALUATION PLAN

An evaluation plan is used to define exactly what algorithm should be used for each operation
and how the execution of the operations should be coordinated.

It may be classified into the following,

 Left-deep (join) tree query execution plan

 Right-deep (join) tree query execution plan

 Linear Tree query execution plan

 Bushy(non-linear) tree query execution plan

Left-deep tree starts from a relation and constructs the result by successively adding an operation
involving a single relation until the query is completed. That is only one input into a binary operation is
an intermediate result. It reduces the searching space and allows the query optimizer to be based on
dynamic programming technique. The disadvantage is that many alternative execution strategies are not
considered.

Right-deep tree execution has applications where there is large main memory.

Combination of left-deep and right-deep are called Linear tree. Here the relation on one side of the
operator is always a base relation. The inner relation must always be materialized.

Bushy (Non-Linear) trees are the most general type of trees. They allow both inputs into binary
operation to be the intermediate results.

This allows wide variety of plans to be considered, and the disadvantage is it increases the search space.
∏ (Sort )

Hash Join

Relation3

Pipelining Pipelining

σ (Use index 1) σ (use linear scan)

Relation1 Relation2

A possible evaluation plan

Left deep execution plan

R4

R3

R11 R2

Right-deep execution plan


R4

R3

R21 R1

Linear Tree Execution Plan

R4

R3

R21 R1
Bushy execution plan

R3 R4 R5

R21 R1

You might also like