Adbms Unit2
Adbms Unit2
Definition:
• A procedure for transforming a high-level query(SQL) into a correct and efficient execution plan
in low level language.
• A query processor selects the appropriate execution plan to respond to the user request.
• Query processing goes through a series of query complication steps called Execution plan before
it begins execution.
1. Syntax Analyzer- Parses the query and checks if it obeys the syntax rules.
Syntax Analyser
• Takes up a query, parses it into tokens , analyses the tokens inorder to make sure it comply with
the rules of the language grammer.
• If an error is found, it is rejected and an error code together with explanation is returned.
Value1:value/column name
Value2:value/column name
Op:+,-,*,/,=
2. Query Decomposition
The first phase of query processing whose aim is to transform a high level query into a
relational algebra query and to check if the query is syntactically and semantically correct.
1. Query Analysis
2. Query Normalization
3. Semantic Analysis
4. Query Simplifier
5. Query Restructuring
Query Analysis
Equivalence Rules
Query Normalization
Data Dictionary
Semantic Analysis
Idempotency Rules
Query Simplifier
Transformation Rules
Query Restructuring
Query Analysis
• Query is lexically and syntactically analysed using the compilers(parsers) to find out syntax
errors.
• Syntactically Legal queries is then validated with the system catalogue, whether the relations
and attributes mentioned exists.
• The type specification of the query qualifiers are also checked at this stage.
• Ex: of Invalid query which is rejected..
• The outcome of this phase is an internal representation called Query Tree Notation.
Ex:
• Project the projno,deptno,and the managers details for every project located in Mumbai.
RA expression is,
∏proj-no,dept-no,ename,addr,dob’(σproj-loc=‘mumbai’(PROJECT))’ X dept-no=dnum(DEPARTMENT)
X mgrid=empid(EMPLOYEE)
∏p.proj-no,p.dept-no,e.ename,e.addr,e.dob
Root
Intermediate Op
Leaf
E
X
P D Leaf
In the initial query tree notation, the Cartesian product is applied first, then the selection and join
conditions of where clause are applied, followed by the projection on the select attributes.
∏p.proj_no,p.Deptno,e.emp_name,e.addr,e,dob
d.mgrid=e.empid
p.deptno=d.deptno
σProj_loc=’mumbai’
D
Leaf node
P
• Constant values from the query selection are represented as double circles.
‘Chennai’
2. Query Normalization
• A set of equivalency rule is applied so that projection and selection operations are
simplified.
• Applying these rules the predicate is converted into 2 normal forms– CNF & DNF.
• DNF is often used as it allows the query to be broken into a series of independent sub-
queries linked by unions.
Example:
(emp-design=‘Programmer’ ^location=‘mumbai’)
(emp-sal>40000 ^ location=‘mumbai’)
Unary op1 unary op2 REL Unary op2 unary op1 REL
REL1 binop (REL2 binop REL3) (REL1 binop REL2) binop REL3
• Rule:5 Factorisation
• Goal: Reduce the no. of Predicates that must be evaluated by refuting the incorrect or
contradictory queries.
• This rejects normalized query which are incorrectly formulated or which are contradictory.
• A query is Incorrectly formulated- if the query does not contribute to the result. Ex: missing joins
• A Query is Contradictory if the predicate cannot be satisfied by any tuple in the relation.
• Semantic Analyser also examines the query to make sure that only data objects defined in the
catalogue is used.
Connection Graphs
• Connection graphs can be constructed to check the correctness and contradictions as follows.
Ex:1
Select (p.projno,p.projloc)
• The Query graph representation is not fully connected which means the query is not correctly
formulated. In this graph the join condition v.proj_no=p.proj_no has been omitted.
D Result
V P
• Example 2
Select (p.projno,p.loc)
from project as p,cost_of_project as c,depart as d
where d.max-budget>85000 and d.compl_year=‘2005’ and d.maxbudget<50000
This graph has a cycle between the nodes 0 and max_budget with a negative sum.
50000
0
2500
-2500
• Goal: Detect redundancy, eliminate common sub expressions and transform query to
semantically equivalent and easily computed forms.
• Integrity constraints, view definitions and access restrictions are considered here and the query
will be rejected if it is contradictory.
• The final form of simplification is obtained using the Idempotence rule of Boolean algebra
Idempotence Rules of Boolean Algebra
Select d.dept_id,m.branch_mgr,m.branch_id,b,branch_id,b.nranch_loc,e.empname,e.sal
Where d.dept-id=m.dept-id
And m.branch-id=b.branch-id
And b.branch_loc=’mumbai’
And e.empsal>85000
And not(b.brach_loc=’Delhi’)
And d.dept_loc=’Bangalore’
And b.branch_loc=’mumbai’
b.branch_loc=’mumbai’=PRED1
b.branch_loc=’mumbai’=PRED2
b.branch_loc=’delhi’=PRED3
Not(b.branch_loc=’delhi’)
Thus the original query which contains many redundant predicate can be eliminated and the query is
simplified now.
The Query can be restructured to give a more efficient implementation using transformation
rules. The query can be considered as a relational algebra program.
3. Query Optimization
• Atempts to minimise the use of resources-I/Os and CPU time. By choosing the best access plans
• A query has many possible execution strategies, choosing the suitable one is called Query
Optimization
Block diagram of Query Optimizer Estimated Formula
Statistical data
Execution Plan
Generator
Database
Cost Model Catalogue
Execution Plan
3. A cost model
1. Heuristic Rule
• One main heuristic rule is , Apply select before applying join or other operations. This reduces
the size of the file. The SELECT and PROJECT operation will reduce the size of the file and hence
should be applied before a JOIM or other binary operation.
• The heuristic query optimizer then transforms the initial query tree into a final query tree using
equivalence transformation rules. The final queries are efficient to execute.
Employee(emp-name,emp-id,dob,addr,sex,sal,dep_no)
Project(proj-name,proj-no,proj-loc,proj-deptno)
Works_on(eid,pno,hours)
Consider the query to find the names of employees born after 1970 who works on a project named
‘Growth’.
SELECT EMP-NAME
FROM EMPLOYEE,WORKS-ON,PROJECT
∏emp_name
project
X
employee Works_on
∏Emp_name
σproj_no=p-no
σeid=emp-id
σproj_name=’Growth’
σdob>1970 project
Works_on
employee
∏emp_name
σe-id=emp-id
σproj_no=p-no σ dob>1970
X employee
σproject_name=’Growth’
Works_on
Project
eid=emp-id
∏Emp-name,emp-id
∏eid
σdob>1970
proj_no=p-no
employee
∏Proj_no
σproject_name=’Growth’ ∏ e-id,p-no
Works_on
project
σbranch_loc=‘chennai’(σsal>85000(EMP))
σbranch_loc=‘chennai’(σsal>85000(EMP)) =
σsal>85000 (σbranch_loc=‘chennai’(EMP))
4. Commutativity of σ and ∏
∏ename,dob(σename=‘thomas’ (EMP) =
σename=‘thomas’(∏ename,dob(EMP))
R X S=SXR
R cS =S cR
σcR S=σc S
σc(RXS)=(σc(R)) X S
7. Commutativity of ∏, or X
R U S=S U R
R n S= S n R
(R S) T=R (S T)
(R U S)U T=S U (R U T)
b. Storage Cost
To estimate the cost of the various execution strategies, the DBMS is expected to hold the following
types of information,
[log2(nBlocks(R)]+sc(R)/bFactor(R)-1,
otherwise
Pipelining
• Disadv: Input to the operations are not available all at once for processing.
Materialization
• If the output of an operator is saved in a temporary relation for processing by the next operator,
the tuples are said to be materialized.
• Process is called Materialization because the results of the intermediate operations are created
(or materialised) and then evaluated for next level operation.
• By repeating the process, the operation at the root of the tree is evaluated giving the final result
of the execution.
An evaluation plan is used to define exactly what algorithm should be used for each operation
and how the execution of the operations should be coordinated.
Left-deep tree starts from a relation and constructs the result by successively adding an operation
involving a single relation until the query is completed. That is only one input into a binary operation is
an intermediate result. It reduces the searching space and allows the query optimizer to be based on
dynamic programming technique. The disadvantage is that many alternative execution strategies are not
considered.
Right-deep tree execution has applications where there is large main memory.
Combination of left-deep and right-deep are called Linear tree. Here the relation on one side of the
operator is always a base relation. The inner relation must always be materialized.
Bushy (Non-Linear) trees are the most general type of trees. They allow both inputs into binary
operation to be the intermediate results.
This allows wide variety of plans to be considered, and the disadvantage is it increases the search space.
∏ (Sort )
Hash Join
Relation3
Pipelining Pipelining
Relation1 Relation2
R4
R3
R11 R2
R3
R21 R1
R4
R3
R21 R1
Bushy execution plan
R3 R4 R5
R21 R1