0% found this document useful (0 votes)
4 views

CHAPTER_2_Query_Processing_&_Optimization_Handout_Material

Query processing involves converting high-level queries into efficient execution plans, with optimization techniques to enhance performance. The process includes syntax analysis, query decomposition, optimization, and execution plan generation, utilizing both heuristic and cost-based methods. Key considerations in optimization include data transfer reduction, index utilization, and the order of join operations.

Uploaded by

tsionnegash12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

CHAPTER_2_Query_Processing_&_Optimization_Handout_Material

Query processing involves converting high-level queries into efficient execution plans, with optimization techniques to enhance performance. The process includes syntax analysis, query decomposition, optimization, and execution plan generation, utilizing both heuristic and cost-based methods. Key considerations in optimization include data transfer reduction, index utilization, and the order of join operations.

Uploaded by

tsionnegash12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Query Processing and Optimization

Chapter 2
11.1 Introduction

Query processing requires that the DBMS identify and execute a strategy for retrieving the
results of the query. The query determines what data is to be found, but does not define
the method by which the data manager searches the database. Therefore Query optimization
is necessary to determine the optimal alternative to process a query. There are two main
techniques for query optimization. The first approach is to use a rule based or heuristic
method for ordering the operations in a query execution strategy. The second approach
estimates the cost of different execution strategies and chooses the best solution. In general
most commercial database systems use a combination of both techniques.

11.2 Basics of Query Processing

Query Processing : Query Processing is a procedure of converting a query written in


high-level language (Ex. SQL, QBE (Query by Example)) into a correct and efficient
execution plan expressed in low-level language, which is used for data manipulation.
Query Processor : Query processor is responsible for generating execution plan.
Execution Plan : Query processing is a stepwise process. Before retrieving or updating
data in database, a query goes through a series of query compilation steps. These steps are
known as execution plan.
The success of a query language also depends upon its query processor i.e., how much
efficient execution plan it can create? The better execution plan leads to low time and cost.
In query processing, the first phase is transformation in which parser first checks the syntax
of query and also checks the relations and attributes used in the query that are defined in
the database. After checking the syntax and verifying the relations, query is transformed into
equivalent expression that are more efficient to execute. Transformation, depends upon various

419
420 Introduc tion to Database Management System
factors like existence of certain database structures, presence of different indexes, file is sorted
or not, cost of transformation, physical characteristics of data etc. After transformation of
query, transformed query is evaluated by using number of strategies known as access plans.
While generating access plans, factors like physical properties of data and storage are taken
into account and the optimal access plan is executed. The next step is to validate the user
privileges and ensure that the query does not disobey the relevant integrity constraints.
Finally, execution plan is executed to generate the result.

11.2.1 General Strategy for Query Processing


The general strategy for query processing is as follows:
(i) Representation of query : Query written by user cannot be processed directly by
system. Query processor first checks the syntax and existence of relations and their
attributes in database. After validations, query processor transform it into equivalent
and more efficient expression for example query will be converted into a standard
internal format that parser can manipulate. Parser also adds some additional predicates
to the query to enforce security. Internal form may be relational algebra, relational
calculus, any low-level language, operator graphs etc.
(ii) Operator graphs : Operator graphs are used to represent query. It gives the sequence
of operations that can be performed. It is easy to understand the query represented
by operator graphs. It is useful to determine redundancy in query expressions, result
of transformation, simplify the view etc.
(iii) Response time and Data characteristics consideration : Data characteristics like length
of records, expected sizes of both intermediate and final results, size of relations etc.,
are also considered for optimizing the query. In addition to this overall response time
is also determined.

11.2.2 Steps in Query Processing


Various steps in query processing are shown in Figure 11.1. Suppose that user inputs a query
in general query language say QBE, then it is first converted into high-level query language
say SQL etc. Other steps in query processing are discussed below in detail:
(i) Syntax Analysis : Query in high-level language is parsed into tokens and tokens are
analyzed for any syntax error. Order of tokens are also maintained to make sure that all the
rules of language grammars are followed. In case of any error, query is rejected and an error
code with explanation for rejection is returned to the user. (Only syntax is checked in this
step).
(ii) Query Decomposition : In this step, query is decomposed into query blocks which
are the low-level operations. It starts with the high-level query that is transformed into low-
level operations and checks whether that query is syntactically and semantically correct. For
example, a SQL query is decomposed into blocks like Select block, From block, Where block
etc. Various stages in query decomposition are shown in Figure 11.2.
421

User input General query

Transform into

High level query language


ex. SQL

Scanning, parsing, Syntax checking and verification


validating by parser

Check existence of relations and attributes,


Database Query decomposer semantic analysis, decomposing complex
catalog query into smaller ones

Optimization to reduce execution time and


Query optimizer cost by substituting equivalent expressions
for those in the query

Execution plan Query modification

Query code generator Generate code for queries

Main Runtime database processor Deals with database to do necessary


database operation

Query result

FIGURE 11.1. Steps in query processing.

=============================================================================================================

SQL query Query analysis

Query normalization

Semantic analysis

Query simplifier

Query restructuring

Algebraic expressions

FIGURE 11.2. Steps in query decomposition.


422 Introduc tion to Database Management System
(a) Query Analysis : In the query analysis stage, programming language compiler checks
that the query is lexically and syntactically correct. A syntactically correct query is
analyzed using system catalogues to verify the existence of relations and attributes used
in query. After analysis a correct query is converted into some internal representation,
which is more efficient for processing.
The type specification of the query qualifier and result is also checked at this stage.
The internal representation may be, query tree or query graph.
Query tree notation : A Typical internal representation of query is query tree. It is
also known as relational algebra tree. A query tree is constructed using tree data
structure that corresponds to the relational algebra expression. Main components of
query tree are:
 Root of tree – represents result of query.
 Leaf nodes – represent input relations of the query.
 Internal nodes – represent intermediate relation that is the output of applying
an operation in the algebra.
 The sequence of operations is directed from leaves to the root node.
For example,
Employee_Name, Address, JOB, Department, Department-location

 E.Department = D.Department

E D
Employee Department

FIGURE 11.3. Query treee notation.

Query graph notation : Graph data structure is also used for internal representation
of query. In graphs:
 Relation nodes – represent relations by single circle.
 Constant nodes – represent constant values by double circle.
 Edges – represent relation and join conditions.
 Square brackets – represent attributes retrieved from each
relation. [E.Employee_Name, E.Address,
[D.Manager-ID]
E.Job, E.Department]

‘Delhi’ D E
Department-Location = Delhi D.Department = E.Department

FIGURE 11.4. Query graph notation.


Query Processing and Optimiz ation 423
(b) Query normalization : After query analysis, it is normalized to remove any redundancy.
In this phase query is converted into normalized form that can be easily manipulated.
A set of equivalency rules are applied to query to simplify the projection and selection
operations to avoid redundancy. Query can be converted into one of the following
two normal forms :
 Conjunctive normal form : It is a sequence of conjuncts that are connected with
‘AND’ operator. A conjunct consists of one or more terms connected with ‘OR’
operator. A conjuctive selection consists only those tuples that satisfy all conjuncts.
Example. (Emp_Job = ‘Analyst’ ∨ salary < 50000) ∧ (Hire_Date > 1–1–2000)
 Disjunctive normal forms : It is a sequence of disjuncts that are connected with ‘OR’
operator. A disjunct consists of one or more terms connected with ‘AND’ operator.
A disjunctive selection contains those tuples that satisfy anyone of the disjunct.
Example. (Emp_Job= ‘Analyst’ ∧ salary < 5000) ∨ (Hire_Date > 1–1–2000)
Disjunctive normal form is more useful as it allows the query to break into a series
of independent sub-queries linked by union.
(c) Semantic analyzer : The semantic analyzer performs the following tasks :
 It helps in reducing the number of predicates.
 It rejects contradictory normalized forms.
 In case of missing join specification, components of query do not contribute to
generation of results. It identifies these queries and rejects them.
 It makes sure that each object in query is referenced correctly according to its data
type.
(d) Query simplifier : The major tasks of query simplifier are as follows :
 It eliminates common sub-expressions.
 It eliminates redundant qualification.
 It introduces integrity constraints, view definitions into the query graph representation.
 It eliminates query that voids any integrity constraint without accessing the database.
 It transforms sub-graphs into semantically equivalent and more efficient form.
 It deals with user access rights.
Idempotence rules of Boolean Algebra are applied to get final form of simplification.
(e) Query Restructuring : At the final stage of query decomposition, transformation rules
are applied to restructure the query to give a more efficient implementation.

(iii) Query Optimization : The aim of the query optimization step is to choose the best
possible query execution plan with minimum resources required to execute that plan. Query
optimization is discussed in detail in section 11.3.

(iv) Execution Plan : Execution plan is the basic algorithm used for each operation in
the query. Execution plans are classified into following Four types : (a) Left-deep tree query
execution plane, (b) Right-deep tree query execution plan, (c) Linear tree execution plan,
(d) Bushy execution plan.
424 Introduc tion to Database Management System
(a) Left-deep tree query execution plan : In left-deep tree query execution plan, development
of plan starts with a single relation and successively adding a operation involving
a single relation until the query is completed. For example, Only the left hand side
of a join is allowed to participate in result from a previous join and hence named
left-deep tree. It is shown in Figure 11.5.

Result

 R4

 R3

R1 R2

FIGURE 11.5. Left-deep execution plan.

Advantages : The main advantages of left-deep tree query execution plan are

 It reduces search space.
 Query optimiser is based on dynamic programming techniques.
 It is convenient for pipelined evaluation as only one input to each join is pipelined.
Disadvantage : The disadvantages of left-deep tree query execution plan are

 Reduction in search space leads to miss some lower cost execution strategies.
(b) Right-deep tree query execution plan : It is almost same as left-deep query execution
plan with the only difference that only the right hand side of a join is allowed to
participate in result from a previous join and hence named right-deep tree. It is
applicable on applications having a large main memory. It is shown in Figure 11.6.
Result

R4 

R3 

R2 R1

FIGURE 11.6. Right-deep execution plan.

(c) Linear tree execution plan : The combination of left-deep and right-deep execution
plans with a restriction that the relation on one side of each operator is always a
base relation is known as linear trees. It is shown in Figure 11.7.
Query Processing and Optimiz ation 425

Result

 R4

R3 

R2 R1

FIGURE 11.7. Linear tree execution plan.

(d) Bushy execution plan : Bushy execution plan is the most general type of execution
plan. More than one relation can participate in intermediate results. It is shown in
Figure 11.8.

Result

 

R3  R4 R5

R2 R1

FIGURE 11.8. Bushy execution plan.

The main advantage of bushy execution plan is the flexibility provided by it in


choosing the best execution plan by increasing search space but this flexibility may
leads to considerably increase the search space.
(v) Query Code Generator : After selecting the best possible execution plan query is
converted into low-level language so that it can be taken as input by runtime database
process.
(vi) Runtime Database Processor : It deals directly with main database and do the
necessary operation mentioned in query and returns the result to user.

11.3 Query Optimization

Query performance of a database systems is dependent not only on the database structure,
but also on the way in which the query is optimized. Query optimization means converting a
query into an equivalent form which is more efficient to execute. It is necessary for high-level
relation queries and it provides an opportunity to DBMS to systematically evaluate alterative
426 Introduc tion to Database Management System
query execution strategies and to choose an optimal strategy. A typical query optimization
process is shown in Figure 11.9.
Statistical data
Estimation formulas
Simplified relational
(determine cardinality
algebra query tree
of intermediate result tables)
Query optimiser

Cost model Execution plan generator

Execution plan in form of


optimized relational algebra query

FIGURE 11.9. Query optimization process.

The main issues that need to be considered in query optimization are:


1. Reduction of data transfer with database.
2. Use of available indexes for fast searching.
3. Reduction of number of times the database is manipulated.
4. The order in which joins should be performed.
5. How to store intermediate results?
Following are the three relations we used in each example:
Employee (Emp-ID, Emp-Name, Age, Salary, Dept-ID)
Department (Dept-ID, Proj-ID, Dept-Name)
Project (Proj-ID, Name, Location, Duration)
There are two main techniques used to implement query optimization. These are heuristic
query optimization and cost based query optimization.

11.3.1 Transformation Rules for Relational Algebra


The transformation rules are used to formulate a relational algebra expression into different
ways and query optimizer choose the most efficient equivalent expression to execute. Two
expressions are considered to be equivalent if they have same set of attributes in different
order but representing the same information.
Let us consider relations R, S, and T with set of attributes
X = {X1, X2, ..., Xn}, Y = {Y1, Y2, ..., Yn} and Z = {Z1, Z2, ..., Zn}
respectively, where X, Y, and Z represent predicates and L, L1, L2, M, M1, M2, and N denote
sets of attributes.
Rule 1. Cascading of selection (σ)
σX ∧ Y ∧ Z (R) ≡ σX (σY(σZ (R))).
Query Processing and Optimiz ation 427
It means that conjunctive selection operations can be transformed into individual selection
operations and vice versa.
σAge
Ex.        = 35∧ salary > 50000 (Employee) = σAge = 35 (σsalary> 50000 (Employee)).
Rule 2. Commutativity of selection (σ)
σX (σY (R)) ≡ σY (σX (R)).
Ex.       σAge = 35 (σSalary > 50000 (Employee)) ≡ σSalary > 5000 (σAge = 35 (Employee)).
Rule 3. Cascading of projection (π)
πL πM ... πN (R) ≡ πL (R)
Ex. πEmp-name πEmp-name, Age (Employee) ≡ πEmp-name (Employee)
Rule 4. Commutativity of selection (σ) and projection (π)
pX1, X2, ..., Xn (sA (R)) ≡ sA (pX1, X2, ..., Xn (R))
πEmp-name, Age (σSalary > 50000 (Employee)) ≡ σSalary > 5000 (πEmp-name, Age (Employee))
Rule 5. Commutativity of join () and Cartesian Product (X)
R Y S ≡ S Y R
R × S ≡ S × R
Ex. Employee Employee.Dept-ID = Department.Dept-ID Department ≡
             Department Employee.Dept-ID = Department.Dept-ID Employee

Rule 6. Commutavity of selection (σ) and join () or Cartesian product (X)
(σX R Y S ≡ (σX (R) Y S)
σX (R × S) ≡ (σX (R)) × S
Ex. σEmployee.Age > 30 ∧ Dept-Name = ‘MARKETING’ (Employee) Employee.Dept-ID = Department.Dept-ID
(Department) ≡ σEmployee.Age>30 (Employee) Employee.Dept-ID = Department.Dept-ID
(σDept-Name = ‘MARKETING’ (Department))

Rule 7. Commutavity of projection (π) and join () or Cartesian product (X).
pL1 ∪ L2 (R Z S) ≡ (pL1 (R)) Z (pL2 (S))
Ex. πEmp-Name, Dept-Name, Dept-ID (Employee E.Dept-ID = D.Dept-ID Department) ≡
(πEmp-name, Dept-ID (Employee)) E.Dept-ID = D.Dept-ID (πDept-Name, Dept-ID (Department))

Rule 8. Commutativity of Union (∪) and Intersection (∩)


R ∪ S = S ∪ R
R ∩ S = S ∩ R
Rule 9. Commutativity of Selection (σ) and Union (∪) or Intersection (∩) or
Differerence (–).
σX (R ∪ S ) = σX (S) ∪ σX (R)
σX (R ∩ S) = σX (S) ∩ σX (R)
σX (R – S) = σX (S) – σX (R)
428 Introduc tion to Database Management System
Rule 10. Comutativity of projection (π) and Union (∪)
πL (R ∪ S) = πL (S) ∪ πL (R)
Rule 11. Associativity of Join () and Cartesian product (X)
(R  S)  T ≡ R  (S  T)
(R X S) X T ≡ R X (S X T)
Rule 12. Associativity of Union (∪) and Intersection (∩)
(R ∪ S ) ∪ T ≡ S ∪ (R ∪ T)
(R ∩ S) ∩ T ≡ S ∩ (R ∩ T)
Rule 13. Converting a selection (σ) and Cartesian product (X) sequence into Join ()
σX (R X S ) ≡ (R X S)

11.3.2 Heuristic Query Optimization


Heuristic query optimization technique is used to modify the internal representation of a
query by using heuristic rules and transformation rules. Heuristic rules are used in the
form of a query tree or query graph structure. Optimiser starts with initial query tree and
transform it into an equivalent and efficient query tree using transformation rules.
Heuristic Optimization Algorithm : DBMS use heuristic optimization algorithms to improve
the efficiency of query by converting initial query tree into an equivalent and optimized
query tree. Optimizers utilize transformation rules to optimize the structure of query tree.
Following are the steps of heuristic optimization algorithm.
Step 1. Perform Selection operation as early as possible : By using selection operation at
early stages, you can reduce the unwanted number of record or data, to transfer
from database to primary memory.
Optimizer use transformation rule 1 to divide selection operations with conjunctive
conditions into a cascade of selection operations.
Step 2. Perform commutativity of selection operation with other operations as early as
possible : Optimizer use transformation rule 2, 4, 6, and 9 to move selection
operation as far down the tree as possible and keep selection predicates on the
same relation together. By keeping selection operation down at tree reduces the
unwanted data transfer and by keeping selection predicates together on same
relations reduces the number of times of database manipulation to retrieve
records from same database table.
Step 3. Combine the Cartesian Product with subsequent selection operation whose predicates
represents a join condition into a JOIN operation : Optimizer uses transformation rule
13 to convert a selection and cartesian product sequence into join. It reduces
data transfer. It is always better to transfer only required data from database
instead of transferring whole data and then refine it. (Cartesian product combines
all data of all the tables mention in query while join operation retrieves only
those records from database that satisfy the join condition).
Step 4. Use Commutativity and Associativity of Binary operations : Optimizer use transformation
rules 5, 11, and 12 to execute the most restrictive selection operations first.
Query Processing and Optimiz ation 429
It rearranges the leaf nodes of query tree. By using the most restrictive selection
operations, the number of records fetched from database reduces and also
subsequent operations can be performed on less number of records.
Step 5. Perform projection operations as early as possible : After performing selection operations,
optimizer use transformation rules 3, 4, 7 and 10 to reduce the number of
columns of a relation by moving projection operations as far down the tree as
possible and keeping projection predicates on the same relation together.
Step 6. Compute common expressions only once: It is used to identify sub-trees that represent
groups of operations that can be executed by a single algorithm.

Consider the query below in SQL and transformation of its initial query tree into an
optimal query tree.
Select Emp_Name
From Employee e, Department d, Project p
Where p.Name = ‘LUXMI PUB.’
AND d.Proj_ID = p.Proj_ID
AND e.Dept_ID= d.Dept_ID
AND e.Age > 35
This query needs to display names of all employees working for project “LUXMI PUB.”
and having age more than 35 years.
Figure 11.10 shows the initial query tree for the given SQL query. If the tree is executed
directly then it results in the Cartesian product of entire Employee, Department, and Project
table but in reality, the query needed only one record from relation Project and only the
employee records for those whose age is greater than 35 years.

Emp_Name

p.Name = ‘LUXMI PUB.’  d.Proj_ID = p.Proj_ID  e.Dept_ID = d.Dept_ID  e.Age > 35

X Project

Employee Department

FIGURE 11.10. Initial query tree.

— We can improve the performance by first applying selection operations to reduce the
number of records that appear in Casterian product. Figure 11.11 shows the improved query
tree.
430 Introduc tion to Database Management System

Emp_Name

d.Proj_ID = p.Proj_ID

e.Dept_ID = d.Dept_ID p.Name = ‘LUXMI PUB.’

X Project

e.Age > 35 Department

Employee

FIGURE 11.11. Improved query tree by first applying selection operations.

— The query tree can be further improved by applying more restrictive selection operation.
So, switch the positions of relations Project and Employee as you know that in a single
project it may be more than one employee. Figure 11.12 shows the improved query tree.
Emp_Name

e.Dept_ID = d.Dept_ID

d.Proj_ID = p.Proj_ID e.Age > 35

X Employee

p.Name = ‘LUXMI PUB.’ Department

Project

FIGURE 11.12. Improved query tree by applying more restrictive selection operations.

— A further improvement can be done by replacing Cartesian product operations by


Join operations with a join condition as shown in Figure 11.13.
Query Processing and Optimiz ation 431
Emp_Name

p.Dept_ID = d.Dept_ID

d.Proj_ID = p.Proj_ID e.Age > 35

Employee
p.Name = ‘LUXMI PUB.’ Department

Project

FIGURE 11.13. Improved query tree by replacing Cartesian product and


selection operations by join operations.

— Further improvement can be done in query tree by keeping only required attributes
(columns) of relations by applying projection operations as early as possible in the query
tree. Optimizer keep the attributes required to display, and the attributes needed by
the subsequent operation in the intermediate relations. Modified query tree is shown in
Figure 11.14.
Emp_Name

e.Dept_ID = d.Dept_ID

d.Dept_ID e.Emp_ID, e.Emp_Name, e.Dept_ID

d.Proj_ID = p.Proj_ID e.Age > 35

p.Proj_ID d.Dept_ID, d.Proj_ID Employee

p.Name = ‘LUXMI PUB.’ Department

Project

FIGURE 11.14. Improved query tree by applying and moving projection


operations down the query tree.

The SELECT clause in SQL is equivalent to projection operation and WHERE clause in SQL is
equivalent to selection operation in relational algebra.
432 Introduc tion to Database Management System
11.3.3 Cost Based Query Optimization
In cost based query optimization, optimizer estimates the cost of running of all alternatives
of a query and choose the optimum alternative. The alternative which uses the minimum
resources is having minimum cost. The cost of a query operation is mainly depend on
its selectivity i.e., the proportion of the input relations that forms the output. Following are
the main components used to determine the cost of execution of a query:
(a) Access cost of secondary storage : Access cost to secondary storage consists of cost of
database manipulation operations which includes searching, writing, reading of data
blocks stored in the secondary memory. The cost of searching depends upon the type
of indexes (primary, secondary, hashed), type of file structure, ordering of relation
in addition to physical storage location like file blocks are allocated contiguously on
the same disk or scattered on the disk.
(b) Storage cost : Storage cost consists of cost of storing intermediate results (tables or
files) that are generated by the execution strategy for the query.
(c) Computation cost : Computation cost consists of performing in-memory operations
during query execution such as sorting of records in a file, merging of records,
performing computations on field values, searching of records. These are mainly
performed on data buffers.
(d) Memory usage cost : It consists of cost of pertaining to the number of memory
buffers needed during query execution.
(e) Communication cost : It consists of the cost of transferring query and its result from
database location to the location of terminal where the query is originated.
From all the above components, the most important is access cost to secondary storage
because secondary storage is comparatively slower than other devices. Optimizer try to
minimize computation cost for small databases as most of the data files are stored in main
memory. For large database, it try to minimize the access cost to secondary storage and for
distributed databases, it trys to minimize the communication cost because various sites are
involved for data transfer.
To estimate the cost of execution strategies, optimizer access statistical data stored in
DBMS catalog. The information stored in DBMS catalog is given below:
(i) Number of records in relation X, given as R.
(ii) Number of blocks required to store relation X, given as B.
(iii) Blocking factor of relation X, given as BFR.
(iv) Primary access method for each file and attributes for each file.
(v) Number of levels for each multi-level index for a attribute A given as IA.
(vi) Number of first-level index blocks for a attribute A, given as BAI1.
(vii) Selection cardinality of attribute A in relation R, given as SA, where SA= R × SLA,
where SLA is the selectivity of the attributes.
Cost Function for Selection Operation : Selection operation works on a single relation in
relation algebra and retrieves the records that satisfy the given condition. Depending upon
the structure of file, available indexes, searching methods, the estimated cost of strategies
for selection operation is as given below.
Query Processing and Optimiz ation 433

S.No. Strategies Cost


1. Linear search B/2, if record is found
B, if record not found
2. Binary search log2 B, if equality condition is one a
unique key attribute
log2 B + (SA/BFR)-1, otherwise
3. Using primary index to retrive a single 1, assuming no overflow
record
4. Equality condition on primary key IA + 1
5. Equality condition on hash key 1, assuming no overflow
6. Inequality condition of primary index IA + B/2
7. Inequality condition of any ordered index IA + B/2
8. Equality condition on clustering index [IA + 1] + [SA/BFR]
(secondary index)
9. Equality condition on non-clustering [IA + 1] + [SA]
index (secondary index)
10. Inequality condition on B+-tree index IA + [BAI1/2] + [R/2]
(secondary index)
Now consider the example of Employee table. Consider, there are 10,000 records stored
in 2000 blocks. Also the following indexes are available.
 A secondary index on the Emp-ID with 4 levels.
 A clustering index on salary with 3 levels and average selection cardinality of 30.
 A secondary index on non-key attributes Age with 2 levels and 4 first level index
blocks. There are 200 distinct values for Age.
Consider the queries:
(i) σEmp-ID = 1A (Employee).
(ii) σAge > 20 (Employee).
(iii) σAge = 20 AND salary > 9000 (Employee).
Consider the following cost components.
 Number of records (R) = 10,000
 Number of blocks (B)= 2000
 Blocking Factor (BFR) = 10000/2000 = 5
 IEmp-ID = 4, SEmp-ID = 10000/10000 = 1
 IAge = 2, SAge = 10000/200 = 50
 ISalary = 2, SSalary = 30
Now cost of the above queries (suppose records are in table)
(i) If linear search is used then cost will be B/2 = 2000/2= 1000
(ii) If linear search is used then cost will be B = 2000.
434 Introduc tion to Database Management System
(iii) This query has a conjunctive selection condition. To estimate the cost of use using
anyone of the two components of selection condition, to retrieve the records plus
the linear search. The linear search costs 2000 and condition salary > 9000 first gives
cost estimate of Isalary + (B/2)= 2 + 1000 = 1002 and cost of condition Age = 20 is
30 + 2 = 32 So, total cost is 1034.
Cost Function for Join Operation : Join operation is the most time consuming operation
in databases and an accurate cost function for join operations depends upon the estimate of
size of file (number of records) after the join operation. The estimated cost of strategies for
join operation is given below:
S.No. Strategies Cost
1. Nested loop joins (Block) (a) B(R) + [B(R) * B(S)], if the buffer has only
one block
(b) B(R) + [B(S) * (B(R)/(Buffer-2))], if [Buffer-2]
block for relation R.
(c) B(R) + B(S), if all blocks of R can be read into
database buffer
2. Indexed nested-loop join (a) B(R) + R * (IA + 1), if join attribute A in
relation S is a primary key.
(b) B(R) + R * [IA + (SA(R)/BFR(R))], for clustering
index I on attribute A
3. Sort merged join (a) B(R) * [log2 (B(R) + B(S) * log2 (BCR)))], for
sorts
(b) B(R) + B(S) for merge
4. Hash join (a) (3 * B(R)) + B(S), if hash index is held in
memory
(b) 2[B(R) + B(S)] * [log (B(S) – 1)] + B(R) +
B(S), otherwise
(R) and (S) are relations R and S respectively.

Example. Consider we have 600 records in Department table. BFR for department table
is 60 and number of blocks are 600/60 = 10.
For the Join operation, Employee Dept-ID Department

Type of Join Cost


Block nested-loop joins 22000 (Buffer has only one block)
2010 (if all blocks of employee can be read into
database buffer)
Hash Join 6010 (is hash index is held in memory)
Example. Given two unary relation (contains only one attribute), R1 and R2.
R1 R2
7 8
2 4
Query Processing and Optimiz ation 435
9 2
8 1
3 3
9 2
1 7
3 3
6
Show the result of joining R1 and R2 using each of the following algorithms. List the
results in the order that would be output by the join algorithm.
(a) Nested Loop Join. Use R1 for outer Loop and R2 for the inner loop.
(b) Sort Merge Join.
Solution. The result relation contains only one attribute, which is the common attribute
between R and S.
(a) Nested loop join. For every tuple in R1, find matches in R2.
7, 2, 2, 8, 3, 3, 1, 3, 3
(b) Sort merge join. The result appears in sorted order on the join attribute.
1, 2, 2, 3, 3, 3, 3, 7, 8

You might also like