0% found this document useful (0 votes)

4 views

CHAPTER_2_Query_Processing_&_Optimization_Handout_Material

Query processing involves converting high-level queries into efficient execution plans, with optimization techniques to enhance performance. The process includes syntax analysis, query decomposition, optimization, and execution plan generation, utilizing both heuristic and cost-based methods. Key considerations in optimization include data transfer reduction, index utilization, and the order of join operations.

Uploaded by

tsionnegash12

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

CHAPTER_2_Query_Processing_&_Optimization_Handout_Material

Uploaded by

tsionnegash12

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Query Processing and Optimization

Chapter 2
11.1 Introduction

Query processing requires that the DBMS identify and execute a strategy for retrieving the
results of the query. The query determines what data is to be found, but does not define
the method by which the data manager searches the database. Therefore Query optimization
is necessary to determine the optimal alternative to process a query. There are two main
techniques for query optimization. The first approach is to use a rule based or heuristic
method for ordering the operations in a query execution strategy. The second approach
estimates the cost of different execution strategies and chooses the best solution. In general
most commercial database systems use a combination of both techniques.

11.2 Basics of Query Processing

Query Processing : Query Processing is a procedure of converting a query written in

high-level language (Ex. SQL, QBE (Query by Example)) into a correct and efficient
execution plan expressed in low-level language, which is used for data manipulation.
Query Processor : Query processor is responsible for generating execution plan.
Execution Plan : Query processing is a stepwise process. Before retrieving or updating
data in database, a query goes through a series of query compilation steps. These steps are
known as execution plan.
The success of a query language also depends upon its query processor i.e., how much
efficient execution plan it can create? The better execution plan leads to low time and cost.
In query processing, the first phase is transformation in which parser first checks the syntax
of query and also checks the relations and attributes used in the query that are defined in
the database. After checking the syntax and verifying the relations, query is transformed into
equivalent expression that are more efficient to execute. Transformation, depends upon various

419
420 Introduc tion to Database Management System
factors like existence of certain database structures, presence of different indexes, file is sorted
or not, cost of transformation, physical characteristics of data etc. After transformation of
query, transformed query is evaluated by using number of strategies known as access plans.
While generating access plans, factors like physical properties of data and storage are taken
into account and the optimal access plan is executed. The next step is to validate the user
privileges and ensure that the query does not disobey the relevant integrity constraints.
Finally, execution plan is executed to generate the result.

11.2.1 General Strategy for Query Processing

The general strategy for query processing is as follows:
(i) Representation of query : Query written by user cannot be processed directly by
system. Query processor first checks the syntax and existence of relations and their
attributes in database. After validations, query processor transform it into equivalent
and more efficient expression for example query will be converted into a standard
internal format that parser can manipulate. Parser also adds some additional predicates
to the query to enforce security. Internal form may be relational algebra, relational
calculus, any low-level language, operator graphs etc.
(ii) Operator graphs : Operator graphs are used to represent query. It gives the sequence
of operations that can be performed. It is easy to understand the query represented
by operator graphs. It is useful to determine redundancy in query expressions, result
of transformation, simplify the view etc.
(iii) Response time and Data characteristics consideration : Data characteristics like length
of records, expected sizes of both intermediate and final results, size of relations etc.,
are also considered for optimizing the query. In addition to this overall response time
is also determined.

11.2.2 Steps in Query Processing

Various steps in query processing are shown in Figure 11.1. Suppose that user inputs a query
in general query language say QBE, then it is first converted into high-level query language
say SQL etc. Other steps in query processing are discussed below in detail:
(i) Syntax Analysis : Query in high-level language is parsed into tokens and tokens are
analyzed for any syntax error. Order of tokens are also maintained to make sure that all the
rules of language grammars are followed. In case of any error, query is rejected and an error
code with explanation for rejection is returned to the user. (Only syntax is checked in this
step).
(ii) Query Decomposition : In this step, query is decomposed into query blocks which
are the low-level operations. It starts with the high-level query that is transformed into low-
level operations and checks whether that query is syntactically and semantically correct. For
example, a SQL query is decomposed into blocks like Select block, From block, Where block
etc. Various stages in query decomposition are shown in Figure 11.2.
421

User input General query

Transform into

High level query language

ex. SQL

Scanning, parsing, Syntax checking and verification

validating by parser

Check existence of relations and attributes,

Database Query decomposer semantic analysis, decomposing complex
catalog query into smaller ones

Optimization to reduce execution time and

Query optimizer cost by substituting equivalent expressions
for those in the query

Execution plan Query modification

Query code generator Generate code for queries

Main Runtime database processor Deals with database to do necessary

database operation

Query result

FIGURE 11.1. Steps in query processing.

=============================================================================================================

SQL query Query analysis

Query normalization

Semantic analysis

Query simplifier

Query restructuring

Algebraic expressions

FIGURE 11.2. Steps in query decomposition.

422 Introduc tion to Database Management System
(a) Query Analysis : In the query analysis stage, programming language compiler checks
that the query is lexically and syntactically correct. A syntactically correct query is
analyzed using system catalogues to verify the existence of relations and attributes used
in query. After analysis a correct query is converted into some internal representation,
which is more efficient for processing.
The type specification of the query qualifier and result is also checked at this stage.
The internal representation may be, query tree or query graph.
Query tree notation : A Typical internal representation of query is query tree. It is
also known as relational algebra tree. A query tree is constructed using tree data
structure that corresponds to the relational algebra expression. Main components of
query tree are:
 Root of tree – represents result of query.
 Leaf nodes – represent input relations of the query.
 Internal nodes – represent intermediate relation that is the output of applying
an operation in the algebra.
 The sequence of operations is directed from leaves to the root node.
For example,
Employee_Name, Address, JOB, Department, Department-location

E.Department = D.Department

E D
Employee Department

FIGURE 11.3. Query treee notation.

Query graph notation : Graph data structure is also used for internal representation
of query. In graphs:
 Relation nodes – represent relations by single circle.
 Constant nodes – represent constant values by double circle.
 Edges – represent relation and join conditions.
 Square brackets – represent attributes retrieved from each
relation. [E.Employee_Name, E.Address,
[D.Manager-ID]
E.Job, E.Department]

‘Delhi’ D E
Department-Location = Delhi D.Department = E.Department

FIGURE 11.4. Query graph notation.

Query Processing and Optimiz ation 423
(b) Query normalization : After query analysis, it is normalized to remove any redundancy.
In this phase query is converted into normalized form that can be easily manipulated.
A set of equivalency rules are applied to query to simplify the projection and selection
operations to avoid redundancy. Query can be converted into one of the following
two normal forms :
 Conjunctive normal form : It is a sequence of conjuncts that are connected with
‘AND’ operator. A conjunct consists of one or more terms connected with ‘OR’
operator. A conjuctive selection consists only those tuples that satisfy all conjuncts.
Example. (Emp_Job = ‘Analyst’ ∨ salary < 50000) ∧ (Hire_Date > 1–1–2000)
 Disjunctive normal forms : It is a sequence of disjuncts that are connected with ‘OR’
operator. A disjunct consists of one or more terms connected with ‘AND’ operator.
A disjunctive selection contains those tuples that satisfy anyone of the disjunct.
Example. (Emp_Job= ‘Analyst’ ∧ salary < 5000) ∨ (Hire_Date > 1–1–2000)
Disjunctive normal form is more useful as it allows the query to break into a series
of independent sub-queries linked by union.
(c) Semantic analyzer : The semantic analyzer performs the following tasks :
 It helps in reducing the number of predicates.
 It rejects contradictory normalized forms.
 In case of missing join specification, components of query do not contribute to
generation of results. It identifies these queries and rejects them.
 It makes sure that each object in query is referenced correctly according to its data
type.
(d) Query simplifier : The major tasks of query simplifier are as follows :
 It eliminates common sub-expressions.
 It eliminates redundant qualification.
 It introduces integrity constraints, view definitions into the query graph representation.
 It eliminates query that voids any integrity constraint without accessing the database.
 It transforms sub-graphs into semantically equivalent and more efficient form.
 It deals with user access rights.
Idempotence rules of Boolean Algebra are applied to get final form of simplification.
(e) Query Restructuring : At the final stage of query decomposition, transformation rules
are applied to restructure the query to give a more efficient implementation.

(iii) Query Optimization : The aim of the query optimization step is to choose the best
possible query execution plan with minimum resources required to execute that plan. Query
optimization is discussed in detail in section 11.3.

(iv) Execution Plan : Execution plan is the basic algorithm used for each operation in
the query. Execution plans are classified into following Four types : (a) Left-deep tree query
execution plane, (b) Right-deep tree query execution plan, (c) Linear tree execution plan,
(d) Bushy execution plan.
424 Introduc tion to Database Management System
(a) Left-deep tree query execution plan : In left-deep tree query execution plan, development
of plan starts with a single relation and successively adding a operation involving
a single relation until the query is completed. For example, Only the left hand side
of a join is allowed to participate in result from a previous join and hence named
left-deep tree. It is shown in Figure 11.5.

Result

R1 R2

FIGURE 11.5. Left-deep execution plan.

Advantages : The main advantages of left-deep tree query execution plan are

 It reduces search space.
 Query optimiser is based on dynamic programming techniques.
 It is convenient for pipelined evaluation as only one input to each join is pipelined.
Disadvantage : The disadvantages of left-deep tree query execution plan are

 Reduction in search space leads to miss some lower cost execution strategies.
(b) Right-deep tree query execution plan : It is almost same as left-deep query execution
plan with the only difference that only the right hand side of a join is allowed to
participate in result from a previous join and hence named right-deep tree. It is
applicable on applications having a large main memory. It is shown in Figure 11.6.
Result

R2 R1

FIGURE 11.6. Right-deep execution plan.

(c) Linear tree execution plan : The combination of left-deep and right-deep execution
plans with a restriction that the relation on one side of each operator is always a
base relation is known as linear trees. It is shown in Figure 11.7.
Query Processing and Optimiz ation 425

Result

R2 R1

FIGURE 11.7. Linear tree execution plan.

(d) Bushy execution plan : Bushy execution plan is the most general type of execution
plan. More than one relation can participate in intermediate results. It is shown in
Figure 11.8.

Result

R3 R4 R5

R2 R1

FIGURE 11.8. Bushy execution plan.

The main advantage of bushy execution plan is the flexibility provided by it in

choosing the best execution plan by increasing search space but this flexibility may
leads to considerably increase the search space.
(v) Query Code Generator : After selecting the best possible execution plan query is
converted into low-level language so that it can be taken as input by runtime database
process.
(vi) Runtime Database Processor : It deals directly with main database and do the
necessary operation mentioned in query and returns the result to user.

11.3 Query Optimization

Query performance of a database systems is dependent not only on the database structure,
but also on the way in which the query is optimized. Query optimization means converting a
query into an equivalent form which is more efficient to execute. It is necessary for high-level
relation queries and it provides an opportunity to DBMS to systematically evaluate alterative
426 Introduc tion to Database Management System
query execution strategies and to choose an optimal strategy. A typical query optimization
process is shown in Figure 11.9.
Statistical data
Estimation formulas
Simplified relational
(determine cardinality
algebra query tree
of intermediate result tables)
Query optimiser

Cost model Execution plan generator

Execution plan in form of

optimized relational algebra query

FIGURE 11.9. Query optimization process.

The main issues that need to be considered in query optimization are:

1. Reduction of data transfer with database.
2. Use of available indexes for fast searching.
3. Reduction of number of times the database is manipulated.
4. The order in which joins should be performed.
5. How to store intermediate results?
Following are the three relations we used in each example:
Employee (Emp-ID, Emp-Name, Age, Salary, Dept-ID)
Department (Dept-ID, Proj-ID, Dept-Name)
Project (Proj-ID, Name, Location, Duration)
There are two main techniques used to implement query optimization. These are heuristic
query optimization and cost based query optimization.

11.3.1 Transformation Rules for Relational Algebra

The transformation rules are used to formulate a relational algebra expression into different
ways and query optimizer choose the most efficient equivalent expression to execute. Two
expressions are considered to be equivalent if they have same set of attributes in different
order but representing the same information.
Let us consider relations R, S, and T with set of attributes
X = {X1, X2, ..., Xn}, Y = {Y1, Y2, ..., Yn} and Z = {Z1, Z2, ..., Zn}
respectively, where X, Y, and Z represent predicates and L, L1, L2, M, M1, M2, and N denote
sets of attributes.
Rule 1. Cascading of selection (σ)
σX ∧ Y ∧ Z (R) ≡ σX (σY(σZ (R))).
Query Processing and Optimiz ation 427
It means that conjunctive selection operations can be transformed into individual selection
operations and vice versa.
σAge
Ex. = 35∧ salary > 50000 (Employee) = σAge = 35 (σsalary> 50000 (Employee)).
Rule 2. Commutativity of selection (σ)
σX (σY (R)) ≡ σY (σX (R)).
Ex. σAge = 35 (σSalary > 50000 (Employee)) ≡ σSalary > 5000 (σAge = 35 (Employee)).
Rule 3. Cascading of projection (π)
πL πM ... πN (R) ≡ πL (R)
Ex. πEmp-name πEmp-name, Age (Employee) ≡ πEmp-name (Employee)
Rule 4. Commutativity of selection (σ) and projection (π)
pX1, X2, ..., Xn (sA (R)) ≡ sA (pX1, X2, ..., Xn (R))
πEmp-name, Age (σSalary > 50000 (Employee)) ≡ σSalary > 5000 (πEmp-name, Age (Employee))
Rule 5. Commutativity of join () and Cartesian Product (X)
R Y S ≡ S Y R
R × S ≡ S × R
Ex. Employee Employee.Dept-ID = Department.Dept-ID Department ≡
Department Employee.Dept-ID = Department.Dept-ID Employee

Rule 6. Commutavity of selection (σ) and join () or Cartesian product (X)
(σX R Y S ≡ (σX (R) Y S)
σX (R × S) ≡ (σX (R)) × S
Ex. σEmployee.Age > 30 ∧ Dept-Name = ‘MARKETING’ (Employee) Employee.Dept-ID = Department.Dept-ID
(Department) ≡ σEmployee.Age>30 (Employee) Employee.Dept-ID = Department.Dept-ID
(σDept-Name = ‘MARKETING’ (Department))

Rule 7. Commutavity of projection (π) and join () or Cartesian product (X).
pL1 ∪ L2 (R Z S) ≡ (pL1 (R)) Z (pL2 (S))
Ex. πEmp-Name, Dept-Name, Dept-ID (Employee E.Dept-ID = D.Dept-ID Department) ≡
(πEmp-name, Dept-ID (Employee)) E.Dept-ID = D.Dept-ID (πDept-Name, Dept-ID (Department))

Rule 8. Commutativity of Union (∪) and Intersection (∩)

R ∪ S = S ∪ R
R ∩ S = S ∩ R
Rule 9. Commutativity of Selection (σ) and Union (∪) or Intersection (∩) or
Differerence (–).
σX (R ∪ S ) = σX (S) ∪ σX (R)
σX (R ∩ S) = σX (S) ∩ σX (R)
σX (R – S) = σX (S) – σX (R)
428 Introduc tion to Database Management System
Rule 10. Comutativity of projection (π) and Union (∪)
πL (R ∪ S) = πL (S) ∪ πL (R)
Rule 11. Associativity of Join () and Cartesian product (X)
(R  S)  T ≡ R  (S  T)
(R X S) X T ≡ R X (S X T)
Rule 12. Associativity of Union (∪) and Intersection (∩)
(R ∪ S ) ∪ T ≡ S ∪ (R ∪ T)
(R ∩ S) ∩ T ≡ S ∩ (R ∩ T)
Rule 13. Converting a selection (σ) and Cartesian product (X) sequence into Join ()
σX (R X S ) ≡ (R X S)

11.3.2 Heuristic Query Optimization

Heuristic query optimization technique is used to modify the internal representation of a
query by using heuristic rules and transformation rules. Heuristic rules are used in the
form of a query tree or query graph structure. Optimiser starts with initial query tree and
transform it into an equivalent and efficient query tree using transformation rules.
Heuristic Optimization Algorithm : DBMS use heuristic optimization algorithms to improve
the efficiency of query by converting initial query tree into an equivalent and optimized
query tree. Optimizers utilize transformation rules to optimize the structure of query tree.
Following are the steps of heuristic optimization algorithm.
Step 1. Perform Selection operation as early as possible : By using selection operation at
early stages, you can reduce the unwanted number of record or data, to transfer
from database to primary memory.
Optimizer use transformation rule 1 to divide selection operations with conjunctive
conditions into a cascade of selection operations.
Step 2. Perform commutativity of selection operation with other operations as early as
possible : Optimizer use transformation rule 2, 4, 6, and 9 to move selection
operation as far down the tree as possible and keep selection predicates on the
same relation together. By keeping selection operation down at tree reduces the
unwanted data transfer and by keeping selection predicates together on same
relations reduces the number of times of database manipulation to retrieve
records from same database table.
Step 3. Combine the Cartesian Product with subsequent selection operation whose predicates
represents a join condition into a JOIN operation : Optimizer uses transformation rule
13 to convert a selection and cartesian product sequence into join. It reduces
data transfer. It is always better to transfer only required data from database
instead of transferring whole data and then refine it. (Cartesian product combines
all data of all the tables mention in query while join operation retrieves only
those records from database that satisfy the join condition).
Step 4. Use Commutativity and Associativity of Binary operations : Optimizer use transformation
rules 5, 11, and 12 to execute the most restrictive selection operations first.
Query Processing and Optimiz ation 429
It rearranges the leaf nodes of query tree. By using the most restrictive selection
operations, the number of records fetched from database reduces and also
subsequent operations can be performed on less number of records.
Step 5. Perform projection operations as early as possible : After performing selection operations,
optimizer use transformation rules 3, 4, 7 and 10 to reduce the number of
columns of a relation by moving projection operations as far down the tree as
possible and keeping projection predicates on the same relation together.
Step 6. Compute common expressions only once: It is used to identify sub-trees that represent
groups of operations that can be executed by a single algorithm.

Consider the query below in SQL and transformation of its initial query tree into an
optimal query tree.
Select Emp_Name
From Employee e, Department d, Project p
Where p.Name = ‘LUXMI PUB.’
AND d.Proj_ID = p.Proj_ID
AND e.Dept_ID= d.Dept_ID
AND e.Age > 35
This query needs to display names of all employees working for project “LUXMI PUB.”
and having age more than 35 years.
Figure 11.10 shows the initial query tree for the given SQL query. If the tree is executed
directly then it results in the Cartesian product of entire Employee, Department, and Project
table but in reality, the query needed only one record from relation Project and only the
employee records for those whose age is greater than 35 years.

Emp_Name

p.Name = ‘LUXMI PUB.’ d.Proj_ID = p.Proj_ID e.Dept_ID = d.Dept_ID e.Age > 35

X Project

Employee Department

FIGURE 11.10. Initial query tree.

— We can improve the performance by first applying selection operations to reduce the
number of records that appear in Casterian product. Figure 11.11 shows the improved query
tree.
430 Introduc tion to Database Management System

Emp_Name

d.Proj_ID = p.Proj_ID

e.Dept_ID = d.Dept_ID p.Name = ‘LUXMI PUB.’

X Project

e.Age > 35 Department

Employee

FIGURE 11.11. Improved query tree by first applying selection operations.

— The query tree can be further improved by applying more restrictive selection operation.
So, switch the positions of relations Project and Employee as you know that in a single
project it may be more than one employee. Figure 11.12 shows the improved query tree.
Emp_Name

e.Dept_ID = d.Dept_ID

d.Proj_ID = p.Proj_ID e.Age > 35

X Employee

p.Name = ‘LUXMI PUB.’ Department

Project

FIGURE 11.12. Improved query tree by applying more restrictive selection operations.

— A further improvement can be done by replacing Cartesian product operations by

Join operations with a join condition as shown in Figure 11.13.
Query Processing and Optimiz ation 431
Emp_Name

p.Dept_ID = d.Dept_ID

d.Proj_ID = p.Proj_ID e.Age > 35

Employee
p.Name = ‘LUXMI PUB.’ Department

Project

FIGURE 11.13. Improved query tree by replacing Cartesian product and

selection operations by join operations.

— Further improvement can be done in query tree by keeping only required attributes
(columns) of relations by applying projection operations as early as possible in the query
tree. Optimizer keep the attributes required to display, and the attributes needed by
the subsequent operation in the intermediate relations. Modified query tree is shown in
Figure 11.14.
Emp_Name

e.Dept_ID = d.Dept_ID

d.Dept_ID e.Emp_ID, e.Emp_Name, e.Dept_ID

d.Proj_ID = p.Proj_ID e.Age > 35

p.Proj_ID d.Dept_ID, d.Proj_ID Employee

p.Name = ‘LUXMI PUB.’ Department

Project

FIGURE 11.14. Improved query tree by applying and moving projection

operations down the query tree.

The SELECT clause in SQL is equivalent to projection operation and WHERE clause in SQL is
equivalent to selection operation in relational algebra.
432 Introduc tion to Database Management System
11.3.3 Cost Based Query Optimization
In cost based query optimization, optimizer estimates the cost of running of all alternatives
of a query and choose the optimum alternative. The alternative which uses the minimum
resources is having minimum cost. The cost of a query operation is mainly depend on
its selectivity i.e., the proportion of the input relations that forms the output. Following are
the main components used to determine the cost of execution of a query:
(a) Access cost of secondary storage : Access cost to secondary storage consists of cost of
database manipulation operations which includes searching, writing, reading of data
blocks stored in the secondary memory. The cost of searching depends upon the type
of indexes (primary, secondary, hashed), type of file structure, ordering of relation
in addition to physical storage location like file blocks are allocated contiguously on
the same disk or scattered on the disk.
(b) Storage cost : Storage cost consists of cost of storing intermediate results (tables or
files) that are generated by the execution strategy for the query.
(c) Computation cost : Computation cost consists of performing in-memory operations
during query execution such as sorting of records in a file, merging of records,
performing computations on field values, searching of records. These are mainly
performed on data buffers.
(d) Memory usage cost : It consists of cost of pertaining to the number of memory
buffers needed during query execution.
(e) Communication cost : It consists of the cost of transferring query and its result from
database location to the location of terminal where the query is originated.
From all the above components, the most important is access cost to secondary storage
because secondary storage is comparatively slower than other devices. Optimizer try to
minimize computation cost for small databases as most of the data files are stored in main
memory. For large database, it try to minimize the access cost to secondary storage and for
distributed databases, it trys to minimize the communication cost because various sites are
involved for data transfer.
To estimate the cost of execution strategies, optimizer access statistical data stored in
DBMS catalog. The information stored in DBMS catalog is given below:
(i) Number of records in relation X, given as R.
(ii) Number of blocks required to store relation X, given as B.
(iii) Blocking factor of relation X, given as BFR.
(iv) Primary access method for each file and attributes for each file.
(v) Number of levels for each multi-level index for a attribute A given as IA.
(vi) Number of first-level index blocks for a attribute A, given as BAI1.
(vii) Selection cardinality of attribute A in relation R, given as SA, where SA= R × SLA,
where SLA is the selectivity of the attributes.
Cost Function for Selection Operation : Selection operation works on a single relation in
relation algebra and retrieves the records that satisfy the given condition. Depending upon
the structure of file, available indexes, searching methods, the estimated cost of strategies
for selection operation is as given below.
Query Processing and Optimiz ation 433

S.No. Strategies Cost

1. Linear search B/2, if record is found
B, if record not found
2. Binary search log2 B, if equality condition is one a
unique key attribute
log2 B + (SA/BFR)-1, otherwise
3. Using primary index to retrive a single 1, assuming no overflow
record
4. Equality condition on primary key IA + 1
5. Equality condition on hash key 1, assuming no overflow
6. Inequality condition of primary index IA + B/2
7. Inequality condition of any ordered index IA + B/2
8. Equality condition on clustering index [IA + 1] + [SA/BFR]
(secondary index)
9. Equality condition on non-clustering [IA + 1] + [SA]
index (secondary index)
10. Inequality condition on B+-tree index IA + [BAI1/2] + [R/2]
(secondary index)
Now consider the example of Employee table. Consider, there are 10,000 records stored
in 2000 blocks. Also the following indexes are available.
 A secondary index on the Emp-ID with 4 levels.
 A clustering index on salary with 3 levels and average selection cardinality of 30.
 A secondary index on non-key attributes Age with 2 levels and 4 first level index
blocks. There are 200 distinct values for Age.
Consider the queries:
(i) σEmp-ID = 1A (Employee).
(ii) σAge > 20 (Employee).
(iii) σAge = 20 AND salary > 9000 (Employee).
Consider the following cost components.
 Number of records (R) = 10,000
 Number of blocks (B)= 2000
 Blocking Factor (BFR) = 10000/2000 = 5
 IEmp-ID = 4, SEmp-ID = 10000/10000 = 1
 IAge = 2, SAge = 10000/200 = 50
 ISalary = 2, SSalary = 30
Now cost of the above queries (suppose records are in table)
(i) If linear search is used then cost will be B/2 = 2000/2= 1000
(ii) If linear search is used then cost will be B = 2000.
434 Introduc tion to Database Management System
(iii) This query has a conjunctive selection condition. To estimate the cost of use using
anyone of the two components of selection condition, to retrieve the records plus
the linear search. The linear search costs 2000 and condition salary > 9000 first gives
cost estimate of Isalary + (B/2)= 2 + 1000 = 1002 and cost of condition Age = 20 is
30 + 2 = 32 So, total cost is 1034.
Cost Function for Join Operation : Join operation is the most time consuming operation
in databases and an accurate cost function for join operations depends upon the estimate of
size of file (number of records) after the join operation. The estimated cost of strategies for
join operation is given below:
S.No. Strategies Cost
1. Nested loop joins (Block) (a) B(R) + [B(R) * B(S)], if the buffer has only
one block
(b) B(R) + [B(S) * (B(R)/(Buffer-2))], if [Buffer-2]
block for relation R.
(c) B(R) + B(S), if all blocks of R can be read into
database buffer
2. Indexed nested-loop join (a) B(R) + R * (IA + 1), if join attribute A in
relation S is a primary key.
(b) B(R) + R * [IA + (SA(R)/BFR(R))], for clustering
index I on attribute A
3. Sort merged join (a) B(R) * [log2 (B(R) + B(S) * log2 (BCR)))], for
sorts
(b) B(R) + B(S) for merge
4. Hash join (a) (3 * B(R)) + B(S), if hash index is held in
memory
(b) 2[B(R) + B(S)] * [log (B(S) – 1)] + B(R) +
B(S), otherwise
(R) and (S) are relations R and S respectively.

Example. Consider we have 600 records in Department table. BFR for department table
is 60 and number of blocks are 600/60 = 10.
For the Join operation, Employee Dept-ID Department

Type of Join Cost

Block nested-loop joins 22000 (Buffer has only one block)
2010 (if all blocks of employee can be read into
database buffer)
Hash Join 6010 (is hash index is held in memory)
Example. Given two unary relation (contains only one attribute), R1 and R2.
R1 R2
7 8
2 4
Query Processing and Optimiz ation 435
9 2
8 1
3 3
9 2
1 7
3 3
6
Show the result of joining R1 and R2 using each of the following algorithms. List the
results in the order that would be output by the join algorithm.
(a) Nested Loop Join. Use R1 for outer Loop and R2 for the inner loop.
(b) Sort Merge Join.
Solution. The result relation contains only one attribute, which is the common attribute
between R and S.
(a) Nested loop join. For every tuple in R1, find matches in R2.
7, 2, 2, 8, 3, 3, 1, 3, 3
(b) Sort merge join. The result appears in sorted order on the join attribute.
1, 2, 2, 3, 3, 3, 3, 7, 8

Google Hacking Database
83% (18)
Google Hacking Database
91 pages
Dangerous Google - Searching For Secrets PDF
88% (26)
Dangerous Google - Searching For Secrets PDF
12 pages
Download ebooks file The Volatility Edge in Options Trading New Technical Strategies for Investing in Unstable Markets 1st Edition Jeff Augen all chapters
No ratings yet
Download ebooks file The Volatility Edge in Options Trading New Technical Strategies for Investing in Unstable Markets 1st Edition Jeff Augen all chapters
55 pages
Dangerous Google Searching For Secrets
No ratings yet
Dangerous Google Searching For Secrets
12 pages
Google Hacking Database
No ratings yet
Google Hacking Database
91 pages
David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler Python
100% (15)
David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler Python
643 pages
Understanding Database Types - by Alex Xu
No ratings yet
Understanding Database Types - by Alex Xu
13 pages
Policy Document Ucc Redemption Understanding The Process Further
80% (20)
Policy Document Ucc Redemption Understanding The Process Further
37 pages
Hackers Black Book (2011-Edition)
No ratings yet
Hackers Black Book (2011-Edition)
6 pages
How To Use Google Hack
100% (1)
How To Use Google Hack
4 pages
UCC-1 Financing Statement
87% (39)
UCC-1 Financing Statement
94 pages
PayPal Hacks
100% (1)
PayPal Hacks
6 pages
Pressman Books Question Solution Chapter
No ratings yet
Pressman Books Question Solution Chapter
45 pages
Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
Dark Web Market Price Index Hacking Tools July 2018 Top10VPN2
91% (11)
Dark Web Market Price Index Hacking Tools July 2018 Top10VPN2
7 pages
Kali Linux Tools Descriptions
100% (2)
Kali Linux Tools Descriptions
26 pages
Allison, Berkowitz - 2008 - SQL For Microsoft Access PDF
100% (1)
Allison, Berkowitz - 2008 - SQL For Microsoft Access PDF
393 pages
Hackers Favorite Search Queries 4
100% (1)
Hackers Favorite Search Queries 4
6 pages
canadianResumeTemplate 1
No ratings yet
canadianResumeTemplate 1
2 pages
SWE225 MidtermExReview
No ratings yet
SWE225 MidtermExReview
3 pages
Esakov - Data Structures - An Advanced Approach Using C
100% (1)
Esakov - Data Structures - An Advanced Approach Using C
195 pages
Chapter 2 Adb
No ratings yet
Chapter 2 Adb
21 pages
CO3-SESSION-23
No ratings yet
CO3-SESSION-23
27 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
129 pages
Advanced Database Chapter Two Query Processing and Optimization
100% (1)
Advanced Database Chapter Two Query Processing and Optimization
43 pages
Query Processing Optimization
No ratings yet
Query Processing Optimization
38 pages
Advancedchapter 2 2013
No ratings yet
Advancedchapter 2 2013
16 pages
CH - 1 Query Process SW
No ratings yet
CH - 1 Query Process SW
43 pages
Query Processing
No ratings yet
Query Processing
20 pages
2 Algorithms For Query Processing Optimization
No ratings yet
2 Algorithms For Query Processing Optimization
46 pages
36-Module-4 Query Optimization-16-03-2024
No ratings yet
36-Module-4 Query Optimization-16-03-2024
6 pages
CO3-Notes-Query Processing and Optimization
No ratings yet
CO3-Notes-Query Processing and Optimization
5 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
26 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
AMSAL
No ratings yet
AMSAL
58 pages
Chapter 1 Query Processing
No ratings yet
Chapter 1 Query Processing
58 pages
04 Advanced Database System Chap 02 [RVUNC]
No ratings yet
04 Advanced Database System Chap 02 [RVUNC]
50 pages
CH - 2 Query Process
No ratings yet
CH - 2 Query Process
44 pages
Chapter One1
No ratings yet
Chapter One1
21 pages
Bca3020 Unit 11 SLM
No ratings yet
Bca3020 Unit 11 SLM
22 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
Query Processing and Optimization: Chapter - 2
No ratings yet
Query Processing and Optimization: Chapter - 2
42 pages
Advanced Database Systems Chapter 2
100% (1)
Advanced Database Systems Chapter 2
16 pages
ADBMS Chapter One
No ratings yet
ADBMS Chapter One
21 pages
Adb_ch2
No ratings yet
Adb_ch2
72 pages
Query Processing 16 Oct
No ratings yet
Query Processing 16 Oct
12 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
ch2. pdf
No ratings yet
ch2. pdf
72 pages
Chapter-2
No ratings yet
Chapter-2
47 pages
Ad Database All Slide
No ratings yet
Ad Database All Slide
49 pages
ADB Notes 2021
No ratings yet
ADB Notes 2021
43 pages
Query Processing
No ratings yet
Query Processing
5 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Unit-5 Query Processing and Optimization
No ratings yet
Unit-5 Query Processing and Optimization
40 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
Chapter 2 Querry Proccessing
No ratings yet
Chapter 2 Querry Proccessing
7 pages
Advanced Database System Chapter Three Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Three Query Processing and Optimization
94 pages
QUERY Processing and Relational Algebra
No ratings yet
QUERY Processing and Relational Algebra
27 pages
CHAPTER - 02 - Query Processing - CS 2nd Year - 2016
No ratings yet
CHAPTER - 02 - Query Processing - CS 2nd Year - 2016
49 pages
Chapter Two Query Processing (2)
No ratings yet
Chapter Two Query Processing (2)
60 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
127 pages
What Is Query: Lecture's Name: Amanj Anwar Abdullah
No ratings yet
What Is Query: Lecture's Name: Amanj Anwar Abdullah
6 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
28 pages
Itm661 Lecture03 Part2 2015
No ratings yet
Itm661 Lecture03 Part2 2015
47 pages
Rdbms Assignment
No ratings yet
Rdbms Assignment
12 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
Ivunit Query Processing
No ratings yet
Ivunit Query Processing
12 pages
Introduction To Query Processing and Optimization
No ratings yet
Introduction To Query Processing and Optimization
4 pages
Chapter 2 Query Processing and Optimization
No ratings yet
Chapter 2 Query Processing and Optimization
45 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
21 pages
Query Optimization: Admas University, Advanced DBMS Lecture Note
No ratings yet
Query Optimization: Admas University, Advanced DBMS Lecture Note
5 pages
Query Processing
No ratings yet
Query Processing
28 pages
Query Processing
No ratings yet
Query Processing
3 pages
Module - 4
No ratings yet
Module - 4
60 pages
CH 02
No ratings yet
CH 02
127 pages
Advance Concept in Data Bases Unit-2 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-2 by Arun Pratap Singh
51 pages
ADBChapter 1
No ratings yet
ADBChapter 1
32 pages
29-Query Optimization-04-10-2024
No ratings yet
29-Query Optimization-04-10-2024
35 pages
Chapter 2 Query Processing and Optimization
No ratings yet
Chapter 2 Query Processing and Optimization
58 pages
Advanced SQL Performance Tuning: Optimize Your Database Workloads
From Everand
Advanced SQL Performance Tuning: Optimize Your Database Workloads
Robert Johnson
No ratings yet
Google Hacking Database PDF
0% (1)
Google Hacking Database PDF
100 pages
SQL Crash Course
No ratings yet
SQL Crash Course
17 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
42 pages
Useful Google Hacks
100% (4)
Useful Google Hacks
7 pages
TITLE 28 United States Code Sec. 3002
91% (11)
TITLE 28 United States Code Sec. 3002
77 pages
Microsoft Access For Beginners PDF
100% (2)
Microsoft Access For Beginners PDF
196 pages
Excel Cheat Sheet: Travis Cuzick
100% (1)
Excel Cheat Sheet: Travis Cuzick
15 pages
Full download Network Security and Cryptography Sarhan M. Musa pdf docx
No ratings yet
Full download Network Security and Cryptography Sarhan M. Musa pdf docx
40 pages
Master Cyber Digital Forensics
50% (2)
Master Cyber Digital Forensics
114 pages
Mythic Magazine #015
100% (3)
Mythic Magazine #015
34 pages
SFDSFD401 - Basics and Fundamentals of Database
No ratings yet
SFDSFD401 - Basics and Fundamentals of Database
77 pages
JCL Reference
No ratings yet
JCL Reference
722 pages
Record Keeping and Documentation
100% (4)
Record Keeping and Documentation
18 pages
F
No ratings yet
F
2 pages
Report Format Project
No ratings yet
Report Format Project
4 pages
ADD Field IW32
No ratings yet
ADD Field IW32
2 pages
Workflow in Order Management
100% (4)
Workflow in Order Management
35 pages
NAME-Megha Saxena Registration Number - 20BIT0366 Java Programming Lab - 33+34 Assignment 3
No ratings yet
NAME-Megha Saxena Registration Number - 20BIT0366 Java Programming Lab - 33+34 Assignment 3
4 pages
Redis
No ratings yet
Redis
4 pages
Comandos SapScript
No ratings yet
Comandos SapScript
21 pages
My Reference Manual EmuCv PDF
No ratings yet
My Reference Manual EmuCv PDF
23 pages
Module 1a - 23 - 24
No ratings yet
Module 1a - 23 - 24
131 pages
Chapter6 - Signed Number Arithmetic Operations
No ratings yet
Chapter6 - Signed Number Arithmetic Operations
32 pages
Dbms Module 1 Questions With Answers
No ratings yet
Dbms Module 1 Questions With Answers
7 pages
Nireesha Resume
No ratings yet
Nireesha Resume
1 page
Software Testing-Model Paper
No ratings yet
Software Testing-Model Paper
3 pages
W3Schools JQuery Quiz Test
100% (2)
W3Schools JQuery Quiz Test
9 pages
Bitmap: Using Using Using Using Using Using Using Using Using Namespace Public Partial Class Static Public
No ratings yet
Bitmap: Using Using Using Using Using Using Using Using Using Namespace Public Partial Class Static Public
22 pages
Class 11 Final Paper 2023-24
No ratings yet
Class 11 Final Paper 2023-24
8 pages
Ayushi Singh Resume SpringBoot Microservices
No ratings yet
Ayushi Singh Resume SpringBoot Microservices
4 pages
Addressing Modes Instruction Set Architecture
100% (1)
Addressing Modes Instruction Set Architecture
15 pages
Lab: Text Processing: 1. Reverse Strings
No ratings yet
Lab: Text Processing: 1. Reverse Strings
4 pages
Java Lab
100% (1)
Java Lab
20 pages
Matplotlib Handout
No ratings yet
Matplotlib Handout
30 pages
BDII Tema05
0% (1)
BDII Tema05
4 pages
Explicit Messaging EtherNetIP Example using Micrologix 850
No ratings yet
Explicit Messaging EtherNetIP Example using Micrologix 850
5 pages
Introduction To Algorithms Flowcharts and Pseudocode
No ratings yet
Introduction To Algorithms Flowcharts and Pseudocode
10 pages
Grammer
No ratings yet
Grammer
13 pages
Unit 20 - Assignment 1 Frontsheet
No ratings yet
Unit 20 - Assignment 1 Frontsheet
14 pages
Modul Pemrograman Database I
No ratings yet
Modul Pemrograman Database I
62 pages

CHAPTER_2_Query_Processing_&_Optimization_Handout_Material

Uploaded by

CHAPTER_2_Query_Processing_&_Optimization_Handout_Material

Uploaded by

Query Processing and Optimization

11.2 Basics of Query Processing

Query Processing : Query Processing is a procedure of converting a query written in

11.2.1 General Strategy for Query Processing

11.2.2 Steps in Query Processing

User input General query

High level query language

Scanning, parsing, Syntax checking and verification

Check existence of relations and attributes,

Optimization to reduce execution time and

Execution plan Query modification

Query code generator Generate code for queries

Main Runtime database processor Deals with database to do necessary

FIGURE 11.1. Steps in query processing.

SQL query Query analysis

FIGURE 11.2. Steps in query decomposition.

FIGURE 11.3. Query treee notation.

FIGURE 11.4. Query graph notation.

FIGURE 11.5. Left-deep execution plan.

FIGURE 11.6. Right-deep execution plan.

FIGURE 11.7. Linear tree execution plan.

FIGURE 11.8. Bushy execution plan.

The main advantage of bushy execution plan is the flexibility provided by it in

11.3 Query Optimization

Cost model Execution plan generator

Execution plan in form of

FIGURE 11.9. Query optimization process.

The main issues that need to be considered in query optimization are:

11.3.1 Transformation Rules for Relational Algebra

Rule 8. Commutativity of Union (∪) and Intersection (∩)

11.3.2 Heuristic Query Optimization

p.Name = ‘LUXMI PUB.’  d.Proj_ID = p.Proj_ID  e.Dept_ID = d.Dept_ID  e.Age > 35

FIGURE 11.10. Initial query tree.

e.Dept_ID = d.Dept_ID p.Name = ‘LUXMI PUB.’

e.Age > 35 Department

FIGURE 11.11. Improved query tree by first applying selection operations.

d.Proj_ID = p.Proj_ID e.Age > 35

p.Name = ‘LUXMI PUB.’ Department

— A further improvement can be done by replacing Cartesian product operations by

d.Proj_ID = p.Proj_ID e.Age > 35

FIGURE 11.13. Improved query tree by replacing Cartesian product and

d.Dept_ID e.Emp_ID, e.Emp_Name, e.Dept_ID

d.Proj_ID = p.Proj_ID e.Age > 35

p.Proj_ID d.Dept_ID, d.Proj_ID Employee

p.Name = ‘LUXMI PUB.’ Department

FIGURE 11.14. Improved query tree by applying and moving projection

S.No. Strategies Cost

Type of Join Cost

You might also like

p.Name = ‘LUXMI PUB.’ d.Proj_ID = p.Proj_ID e.Dept_ID = d.Dept_ID e.Age > 35

e.Dept_ID = d.Dept_ID p.Name = ‘LUXMI PUB.’

e.Age > 35 Department

d.Proj_ID = p.Proj_ID e.Age > 35

p.Name = ‘LUXMI PUB.’ Department

d.Proj_ID = p.Proj_ID e.Age > 35

d.Dept_ID e.Emp_ID, e.Emp_Name, e.Dept_ID

d.Proj_ID = p.Proj_ID e.Age > 35

p.Proj_ID d.Dept_ID, d.Proj_ID Employee

p.Name = ‘LUXMI PUB.’ Department