Module - 4
Module - 4
Module -- 4
Query Processing and
Query Optimization
Contents to be covered:
Steps in Query Processing,
Transforming SQL queries to Relational Algebra,
Heuristic Query Optimization
Query Optimization
• Query Optimization in DBMS is the process of
selecting the most efficient way to execute a
SQL statement, so that the correct rows are
returned.
• Because SQL is a nonprocedural language, the
optimizer can merge, restructure, and process
data in any sequence.
• Another important role of the optimizer is to produce an
“Execution plan” that is efficient.
Query Processing
• It refers to the range of activities
involved in extracting data from a
database.
• Query Processing is a translation of
“high-level queries” into “low-
level expression”.
• It is a step wise process.
• It can be used at the physical level of
the file system.
• It requires the basic concepts of
The basic steps involved in processing a
query are
It is the form
Low-level of query tree.
language query
1.Best evaluation
of query.
2. Consists of
dbms catalog.
Steps In Processing
High-Level Query
S. No. Category Selection Projection
The selection operation is The Project operation is also
1. Other Names also known as horizontal known as vertical
partitioning. partitioning.
R.A Expr.
converted
into query
tree
Two main Techniques for Query
Optimization
Heuristic Rules
Rules for ordering the operations in
query optimization.
Systematical estimation
It estimates cost of different execution
strategies and chooses the execution plan
with lowest execution cost
QUERY DATA STRUCTURE
• Before optimizing the query it is represented in an internal or
intermediate form.
1. Cascade of s:
2. Commutativity of s:
3. Cascade of p:
In a cascade (sequence) of p operations, all but
the last one can be ignored:
pList1 (pList2 (...(pListn(R))...) ) = pList1, List2,…, Listn(R)
General Transformation Rules for Relational
Algebra Operations:
4. Commuting s with p:
Rule 1: Cascade of σ
This rule states the deconstruction of the conjunctive selection operations into a sequence of
individual selections. Such a transformation is known as a cascade of σ.
However, in the case of theta join, the equivalence rule does not work if the order of attributes is
considered. Natural join is a special case of Theta join, and natural join is also commutative.
However, in the case of theta join, the equivalence rule does not work if the order of attributes is considered.
Natural join is a special case of Theta join, and natural join is also commutative.
Rule 3: Cascade of ∏
This rule states that we only need the final operations in the sequence of the projection operations, and other
operations are omitted. Such a transformation is referred to as a cascade of ∏.
Rule 4: We can combine the selections with Cartesian products as well as theta joins
Rule 4: We can combine the selections with Cartesian products as well as theta joins
In the theta associativity, θ2 involves the attributes from E2 and E3 only. There may be chances of empty
conditions, and thereby it concludes that Cartesian Product is also associative.
Rule 6: Distribution of the Selection operation over the Theta join.
Under two following conditions, the selection operation gets distributed over the theta-join operation:
a) When all attributes in the selection condition θ0 include only attributes of one of the expressions which
are being joined.
b) When the selection condition θ1 involves the attributes of E1 only, and θ2 includes the attributes of E2
only.
Under two following conditions, the selection operation gets distributed over the theta-join operation:
a) Assume that the join condition θ includes only in L1 υ L2 attributes of E1 and E2 Then, we get the
following expression:
b) Assume a join as E1 ⋈ E2. Both expressions E1 and E2 have sets of attributes as L1 and L2. Assume
two attributes L3 and L4 where L3 be attributes of the expression E1, involved in the θ join condition but not
in L1 υ L2 Similarly, an L4 be attributes of the expression E2 involved only in the θ join condition and not in
L1 υ L2 attributes. Thus, we get the following expression:
E1 υ E2 = E2 υ E1
E1 ꓵ E2 = E2 ꓵ E1
Rule 10: Distribution of selection operation on the intersection, union, and set difference operations.
The below expression shows the distribution performed over the set difference operation.
We can similarly distribute the selection operation on υ and ꓵ by replacing with -. Further, we get:
Rule 11: Distribution of the projection operation over the union operation.
This rule states that we can distribute the projection operation on the union operation for the given
expressions.
Apart from these discussed equivalence rules, there are various other equivalence rules also.
Query Graph
• Nodes represents Relations.
• Ovals represents constant nodes.
• Edges represents Join & Selection conditions.
• Attributes to be retrieved from relations represented
in square brackets.
• Drawback :- Does not indicate an order on which
operations are performed.
There is only a single graph corresponding to each query.
Process for heuristics optimization
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME= ‘AQUARIUS’ AND
PNUMBER=PNO AND
ESSN=SSN AND
BDATE > ‘1957-12-31’
a. Initial (canonical) query tree for SQL query.
Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS
E
WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
The same query could correspond to many different relational
algebra expressions — and hence many different query
trees.