0% found this document useful (0 votes)
186 views5 pages

DBMS Chapter 7

The document discusses query optimization in database systems. It describes the basic steps in query processing as parsing and translation, optimization, and evaluation. The optimization step chooses the most efficient execution plan from semantically equivalent options. Query cost estimation is used to select the lowest-cost plan based on database statistics. Relational algebra transformation rules are applied to generate equivalent expressions to optimize queries. Operator trees are used to represent relational algebra expressions graphically.

Uploaded by

Nabin Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views5 pages

DBMS Chapter 7

The document discusses query optimization in database systems. It describes the basic steps in query processing as parsing and translation, optimization, and evaluation. The optimization step chooses the most efficient execution plan from semantically equivalent options. Query cost estimation is used to select the lowest-cost plan based on database statistics. Relational algebra transformation rules are applied to generate equivalent expressions to optimize queries. Operator trees are used to represent relational algebra expressions graphically.

Uploaded by

Nabin Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 7

Query Optimization

Query Processing
Query Processing refers to range of activities involved in extracting data from a database.
The basic steps involved in processing of a query are
1. Parsing and translation
2. Optimization
3. Evaluation

Parser and
Query Relation Algebra
Translator

Optimizer

Evaluation
Query Execution Plan Database
Engine
Statistics
(Data
Dictionary)
Data Data

Fig: Steps in Query Processing

1. Parsing and Translation


The first step in any query processing system is to translate a given query into its internal form. This
translation process is similar to the work performed by the parser of a compiler. In generating the internal
from of the query, the parser check the syntax of the user’s query, verifies that the relation is formulated
according to the syntax rules of the query language. Then this is translated into relational algebra.
2. Optimization
A relational algebra expression may have many equivalent expressions.
E.g. σbalance<2500(∏balance(account)) is equivalent to∏balance(σbalance<2500(account))
We can execute each relation algebra operation by one of several different execute algorithms. The process
of choosing a suitable one with lowest cost is known as query optimization. Cost is estimated using the
statistical information from database catalog. The different statistical information is number of tuples in
each relation, size of tuples etc. So among all equivalent expressions, choose the one with the cheapest
possible evaluation plan (one of the possible way of executing a query).
3. Execution
The query execution engine takes a query evaluation plan, executes that plan and returns the answer to the
query.

Query Cost Estimation


Each query is translated into a number of semantically equivalent plans. So there are several alternatives, now
the question is which one is the most efficient evaluation plan to be selected for execution. To get the answer,
Compiled By: Mohan Bhandari
the cost for all alternatives must be estimated and the plan with lowest cost is selected. Since a database
resides on disk, often the cost of reading and writing to disk dominates the cost of processing a query.
We can choose a strategy based on reliable information, database systems may store statistics (metadata) for
each relation R. These statistics includes number of tuples in a relation, size of tuples in a relation etc. Cost is
generally measured as total elapsed time for answering a query. Many factors contribute to time cost. Some of
them are disk accesses, CPU, network communication etc.

Equivalence / Transformation Rules


Two algebraic expressions are said to be equivalent if they produce same result. By using the equivalence rule
which is concerned with basic relational algebra operator, we can formulate any equivalent expressions for a
single query. If R, S and T are relations and C1, C2……Cn are conditions then equivalent rules are
1. Commutativity of binary operators
RUS≡SUR R∩S≡S∩R R S≡S R R S≡S R
2. Associativity of binary operator
(R U S) U T ≡ R U(S U T) (R ∩ S) ∩ T ≡ R ∩(S ∩ T) R (S T) ≡ (R S) T
3. Commutating projection with binary operator
∏C(R S) ≡ ∏A(R) ∏B(S) where C=A U B such that attribute A is in relation R and attribute B is in relation S.
And similar for join operator also.
4. Commutating selection with binary operator
a. σC(R S) ≡ σC(R) S , if the attribute involved in condition is from relation R
b. σC(R S) ≡ R σC (S) , if the attribute involved in condition is from relation S
c. σC(R S) ≡ σA (R) σB (S) , where C=A ˄ B such that condition A has attribute from R and condition B has attribute from S.
5. Commutating selection and projection
∏X(σC(R)) ≡ σC(∏X (R)) σC (∏X (R)) ≡ ∏X (σC (R))
6. Idempotence of unary operator
a. Combine Cascade Selection
σC1(σC2(R)) ≡ σC1˄σ C2(R)
b. Combine Cascade Projection
∏X(∏Y(R)) ≡ ∏X(R) if X is subset of Y.

Example:
Suppose we have the relational algebra expression as below.
- ∏customer-name(σbranch-city = ‘ktm’ ˄ balance > 1000( branch account depositer))
Using rule no 4a we can have equivalent expression as below
- ∏customer-name((σbranch-city = ‘ktm’ ˄ balance > 1000( branch account) depositer))
Using rule no 4c, we can have another equivalent expression as below
- ∏customer-name((σbranch-city = ‘ktm’ (Branch) σ balance > 1000(account) depositer))

Operator Tree
The relational algebra query can be represented graphically for simplicity by an operator tree. An operator tree
is a tree in which leaf node is a relation stored in the database and a non-leaf node is a intermediate relation
produce by a relational algebra operator. The sequence of operations is directed from leaves to the root, which
represents the answer to the query.
σc


σc E2

E1 E2
E1

Compiled By: Mohan Bhandari


Example:
Suppose we have a relational algebra expression as below.
1. ∏student-name(σcourse-naem=’DBMS’(Student Registration Course))
The initial operator tree for the above relational expression is as below.
∏student-name

σcourse-name=’DBMS’

Student

Registration Course

Query Optimization
It is the process of selecting the most efficient query execution plan among the many strategies possible for
processing a query. The query optimizer is very important component of a database system because the
efficiency of the system depends on the performance of the optimizer. The selected plan minimizes the cost
function.
Query optimization refers to the process of producing a query execution plan which represents an execution
strategy for the query. The selected plan minimizes an object cost function.
Steps of optimization
1. Create an initial operator (expression) tree.
2. Move select operation down the tree for the easiest possible execution.
3. Applying more restrictive select operation first.
4. Replace Cartesian product by join.
5. Creating new projection whenever needed.
6. Adjusting rest of the tree accordingly.
Example1:
The following query retrieves the customer name from branch city pokhara whose balance is greater then
1000
Select customer-name from Branch, Account, Depositor where city= ‘Pkr’ and balance >1000
To process the above query, there are number of evaluation plan in which the above query can be processed.
1. Join relation Branch and Account, join the result with Depositor and then do the restriction.
2. Join the relation Branch and Account, do the restrictions and then join the result with Depositor.
3. Do the restriction, join the relations Branch and Account, and join the result with Depositor.
The query optimizer estimates cost for each of the plan and choose the best way to process the query.
Let us consider the following algebraic expression
∏customer-name(σcity=’Pkr’˄ balance>1000 (Branch Account Depositor))

Compiled By: Mohan Bhandari


The initial operator tree is The final Tree after multiple Transformations is
∏customer-name
∏customer-name

σcity=’Pkr’ ˄ balance > 1000

Depositor

Branch
σcity=’Pkr’ σbalance>1000
Account Depositor

Branch Account

Exmple2:
Suppose we are given the following table definitions with the certain records in each table.
PROJ (PNO, PNAME, BUDGET)
EMP(ENO, ENAME, TITLE)
ASG(ENO, PNO, DUR)
Write the sql statement and RA expression: “Find the names of employees other than Ram Thapa who worked
on CAD/CAM project for either 1 or 2 years”. Construct initial operator tree and final efficient operator
tree after applying transformation rules.
:
SQL:
select ENAME from EMP, ASG, PROJ where EMP.ENO=ASG.ENO and ASG.PNO=PROJ.PNO and
ENAME != ‘Ram Thapa’ and PNAME=’CAD/CAM’ and (DUR = 1 or DUR =2)
RA:
∏ENAME(σENAME ≠ ‘Ram Thapa’ PNAM = ‘CAD/CAM’ (DUR =1 DUR = 2)(PROJ⋈ (EMP ⋈ ASG)))
Initial Operator tree
∏ENAME (Project)

σENAME≠’Ram Thapa’ ˄ PNAME = ‘CAD/CAM’˄ (DUR = 1 DUR = 2) (Select)

Final operator tree (a more efficient query evaluation tree, since more selective operations are performed
first)

Compiled By: Mohan Bhandari


σPNAME=’CAD/CAM’ σ DUR =1 DUR =2 σ ENAME≠’Ram Thapa’

PROJ ASG EMP

Compiled By: Mohan Bhandari

You might also like