0% found this document useful (0 votes)

35 views127 pages

Query Processing and Optimization

The document discusses query processing and optimization in database management systems, detailing the steps involved such as parsing, translation, optimization, and evaluation of SQL queries. It emphasizes the importance of query optimization to minimize evaluation costs, which can be measured in terms of disk access and CPU time. Various algorithms for selection, sorting, and join operations are also outlined, highlighting their respective costs and methodologies.

Uploaded by

sub9967222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views127 pages

Query Processing and Optimization

Uploaded by

sub9967222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 127

Query Processing and

Optimization
Prof. Smita Joshi
Query Processing and Optimization
● Overview
● Measures of Query cost
● Selection operation, Sorting, Join Operations, and other Operations
● Evaluation of Expression
● Query Optimization :Translations of SQL Queries into relational
algebra
● Heuristic approach
● Cost based optimization
Overview
● Techniques used internally by a DBMS to process high-level queries. A query expressed in a
high-level query language such as SQL must first be scanned, parsed, and validated
● The scanner identifies the query tokens—such as SQL keywords, attribute names, and relation
names—that appear in the text of the query
● The parser checks the query syntax to determine whether it is formulated according to the syntax rules
(rules of grammar) of the query language
● The query must also be validated by checking that all attribute and relation names are valid and
semantically meaningful names in the schema of the particular database being queried
● An internal representation of the query is then created, usually as a tree data structure called a query
tree
● It is also possible to represent the query using a graph data structure called a query graph, which is
generally a directed acyclic graph (DAG)
● The DBMS must then devise an execution strategy or query plan for retrieving the results of the query
from the database files
● A query has many possible execution strategies, and the process of choosing a suitable one for
processing a query is known as query optimization
● First we focus on how queries are processed and what algorithms are used to perform individual
operations within the query
Steps of processing a query
The basic steps involved in processing a
query are
1. Parsing and translation.
2. Optimization.
3. Evaluation.
Parsing and translation
● The fired queries undergo lexical, syntactic, and semantic analysis.
● Essentially, the query gets broken down into different tokens and white spaces are
removed along with the comments (Lexical Analysis).
● In the next step, the query gets checked for the correctness, both syntax and semantic
wise.
● The query processor first checks the query if the rules of SQL have been correctly
followed or not (Syntactic Analysis).
● Finally, the query processor checks if the meaning of the query is right or not. Things
like if the table(s) mentioned in the query are present in the DB or not? if the
column(s) referred from all the table(s) are actually present in them or not? (Semantic
Analysis)
● Once the above mentioned checks pass, the flow moves to convert all the tokens into
relational expressions, graphs, and trees.
Parsing and translation
As an example, consider the query: The query would be divided into the following tokens:
SELECT SELECT, salary, FROM, instructor, WHERE, salary, <,
salary 75000.

FROM The tokens (and hence the query) get validated for
instructor
WHERE ● The name of the queried table is looked into the data
dictionary table.
salary < 75000; ● The name of the columns mentioned (salary) in the
tokens are validated for existence.
● The type of column(s) being compared have to be of
the same type (salary and the value 75000 should have
the same data type).

The next step is to translate the generated set of tokens into a relational algebra query.
These are easy to handle for the optimizer in further processes.
Parsing and translation
This query can be translated into either of the following
relational-algebra expressions:

1. σsalary<75000 (πsalary (instructor))

Query Tree for RAE 1

2. πsalary (σsalary<75000 (instructor))

Relational graphs and trees can also be generated

Query Tree for RAE 2

Query Evaluation
● As you can see in previous slides possible query trees, one way could be first
projecting followed by selection (on the right).
● Another way would be to do selection followed by projection (on the left).
● The sample query is kept simple and straightforward to ensure better
comprehension but in the case of joins and views, more such paths
(evaluation plans) start to open up.
Query Evaluation
● Further, we can execute each relational-algebra operation by one of several different
algorithms.
○ For example, to implement the preceding selection, we can search every tuple in instructor to find tuples
with salary less than 75000. If a B+-tree index is available on the attribute salary, we can use the index
instead to locate the tuples.
● To specify fully how to evaluate a query, we need not only to provide the
relational-algebra expression, but also to annotate it with instructions specifying how to
evaluate each operation.
● Annotations may state the algorithm to be used for a specific operation, or the particular
index or indices to use.
● A relational-algebra operation annotated with instructions on how to evaluate it is called
an evaluation primitive. These evaluation primitives are very essential and play an
important role as they define the sequence of operations to be performed for a given plan.
● A sequence of primitive operations that can be used to evaluate a query is a
query-execution plan or query-evaluation plan.
Query Optimization
● The different evaluation plans for a given query can have different costs.
● We do not expect users to write their queries in a way that suggests the
most efficient evaluation plan.
● Rather, it is the responsibility of the system to construct a query evaluation
plan that minimizes the cost of query evaluation; this task is called query
optimization.
● Once the query plan is chosen, The query-execution engine takes a
query-evaluation plan, executes that plan, and returns the answers to the
query.
Measures of Query Cost
● There are multiple possible evaluation plans for a query, and it is important to
● be able to compare the alternatives in terms of their (estimated) cost, and
choose the best plan.
● To do so, we must estimate the cost of individual operations, and combine
them to get the cost of a query evaluation plan.
● The cost of query evaluation can be measured in terms of a number of
different resources, including
○ disk access, CPU time to execute a query, and in a distributed or parallel
database system, the cost of communication
Assumptions: Measures of Query Cost
● For simplicity we just use two measures number of block transfers from disk and
number of seeks as the cost measure
○ We ignore the difference in cost between sequential and random I/O for
simplicity
○ We also ignore CPU costs for simplicity
● Data must be read from disk initially in buffer. 1 seek operation will be required that
time
● Costs depends on the size of the buffer in main memory
○ Having more memory reduces need for disk access
○ Amount of real memory available to buffer depends on other concurrent OS
processes, and hard to determine ahead of actual execution
○ We often use worst case estimates, assuming only the minimum amount of
memory needed for the operation is available
● Real systems take CPU cost into account, differentiate between sequential and
random I/O, and take buffer size into account
● We do not include cost to writing output to disk in our cost formulae
Measures of Query Cost
Disk access is the predominant cost, and is also relatively easy to estimate. How to calculate
the disk access cost?

● Number of seeks *average-seek-cost

○ A disk is divided into many circular tracks. Seek Time is defined as the time
required by the read/write head to move from one track to another.
○ Rotational Latency is defined as the time required by the read/write head
to rotate to the requested sector from the current position.
● Number of blocks read *average-block-read-cost
● Number of blocks written *average-block-write-cost
○ Cost to write a block is greater than cost to read a block
■ data is read back after being written to ensure that the write was
successful
Note : Number of block transfers in that block transfer it can be Number of blocks read
or Number of blocks written
Measures of Query Cost
● For simplicity we just use the number of block transfers (b) from disk and the number of
seeks (S) as the cost measures

○ t – Average time to transfer one block of data (Assuming for simplicity that write cost is
T
same as read cost. Ideally read cost and write cost is different)

○ t – Average block access time (time for one seek = disk seek time + rotational latency)
S
The formula to calculate the cost or measure the cost is

Time = b * t +S*t (Cost for b block transfers plus S seeks)

T S
○ t and t depends on where the data is stored
T S
E.g. for 4 KB block size

○ High end magnetic disk : t = 4 msec and t = 0.1 msec

S T
○ SSD : t = 20-90 microsec and t = microsec msec
S T
Algorithms
1. Selection
2. Linear Search Algorithm
3. Binary Search Algorithm
4. Sorting Algorithm [Quick sort and External merge sort]
5. Join Operation
a. Nested - Loop Join
b. Block Nested - Loop Join
c. Indexed Nested - Loop Join
d. Merge Join
e. Hash Join
f. Complex Join
Selection Operation
● File scan – Lowest level operator to access data. Search algorithms that
locate and retrieve records that fulfill a selection condition.
● Algorithm A1 (linear search). Scan each file block and test all records to
see whether they satisfy the selection condition.
● Linear search can be applied regardless of
○ selection condition or
○ ordering of records in the file, or
○ availability of indices
● An initial seek is required to access first block of the file. If blocks are not stored
continuously, extra seeks may be required
○ br denotes number of blocks containing records from relation r and 1 seek operation

Cost = 1 * tS + br * t
T
Selection operation: A1(Linear search, Equality on key)
● If selection is on a key attribute, search stop on finding record
Select *
from employee
where emp_id = 1004;

Average case cost = tS + ⌈br / 2⌉ * tT

Worst case cost =tS + br * tT
Selection operation: A2(binary search, Equality on key)
● A2 (binary search). Applicable if selection is an equality comparison on the
(primary) key attribute and the file is sorted (ordered) on (primary) key.
○ Assume that the blocks of a relation are stored contiguously
○ Cost estimate (number of disk blocks to be scanned):
■ Cost of binary search = ⌈log2(br)⌉ * (tT + tS)
● br- denotes number of blocks containing records from relation r
○ If selection is on non-key (primary) attribute then multiple blocks
may contain required records, then the cost of scanning such a
blocks need to be added to the cost estimate
Selection operation: A3(primary B+-tree Index, equality on key)
● For an equality comparison on a key attribute with a primary index, we can use the
index to retrieve a single record that satisfies the corresponding equality condition.
● Only one record will be return since only one value (unique) value exist for key
● Assume B+ tree index file
Select * from Student where Roll_no = 4;
Cost estimates
Cost = (hi + 1) * (tT + tS) First we need to identify the leaf node and then the
Where actual record.
● hi denotes the height of the index.
● Index lookup traverses the height of the tree plus one I/O to fetch the record;
each of these I/O operations requires a seek and a block transfer.
Selection operation: A4(primary
Index:Index table, equality on key)

Dense Index, Linear Search, equality on key

Total Cost: cost of selection in index table + cost of

selection in data file

Cost of selection in index table : tS+ br * tT

Cost of selection in data file : (tT + tS)

Cost = tS+ br * tT + tT + tS

Cost = 2 tS+ tT (br + 1)

Selection operation: A5(primary
Index:Index table, equality on key)

Sparse Index, Binary Search, equality on key

Total Cost: cost of selection in index table + cost of selection in
data file
Cost of selection in index table : ⌈log2(br)⌉ * (tT + tS)
Cost of selection in data file : (tT + tS)
Cost = ⌈log2(br)⌉ * (tT + tS) + (tT + tS)
Cost = (⌈log2(br)⌉ + 1) * (tT + tS)
Selection operation: A6(primary B+-tree Index, equality on non key)

No of seeks = hi + 1
No of block transfer = hi + b Assuming b blocks contain search key
Cost = (hi + 1) * tS + (hi + b) * tT
Cost = hi (tT + tS) + tS + b * tT
● One seek for each level of the tree, one seek for the first block.
● Here b is the number of blocks containing records with the specified
search key, all of which are read.
● These blocks are leaf blocks assumed to be stored sequentially (since it
is a primary index) and don’t require additional seeks.
Hence Cost = hi ∗ (tT + tS) + b ∗ tT
Selection operation: (primary Index (dense/sparse), equality on non key)
Dense Index, Linear Search, equality on non key
Total Cost: cost of selection in index table + cost of selection in data file
Cost of selection in index table : tS+ br * tT
Cost of selection in data file : (n * tT + tS)
Cost = tS+ br * tT + n * tT + tS
Sparse Index, Binary Search, equality on non key
Total Cost: cost of selection in index table + cost of selection in data file
Cost of selection in index table : ⌈log2(br)⌉ * (tT + tS)
Cost of selection in data file : (n * tT + tS)
Cost = ⌈log2(br)⌉ * (tT + tS) + n * tT + tS
Selection operation: (Secondary Index, equality, only one record)

B+ tree, equality, only one record

Cost = (hi + 1) * (tT + tS)

Dense Index, Linear Search, equality, only one record

Cost = tS+ br * tT + tT + tS
Sparse Index, Binary Search, equality, only one record
Cost = ⌈log2(br)⌉ * (tT + tS) + (tT + tS)
Cost = (⌈log2(br)⌉ + 1) * (tT + tS)
Selection operation: (Secondary Index, equality, more than one record)

B+ tree, equality, more than one record

Cost = hi * (tT + tS) + n * (tT + tS) = (hi + n) * (tT + tS)

Dense Index, Linear Search, equality, more than only one record
Cost = tS+ br * tT + n * (tT + tS)
Sparse Index, Binary Search, equality, more than one record
Cost = ⌈log2(br)⌉ * (tT + tS) + n * (tT + tS)
Cost = (⌈log2(br)⌉ + n) * (tT + tS)
Sorting in Query Processing
Sorting Operation is required when
● SQL Queries can specify that the output be sorted
● For efficient query processing
○ E.g. for join operation we have to sort the relation first

● Logical sorting
○ Sort a relation by building an index on the sort key, and use that index to read the
relation in sorted order
○ Reading of tuples in the sorted order may lead to a disk access for each record,
which may be very expensive, since the number of record can be much larger
than the number of blocks
● Hence it is desirable to order the record, physically.
Sorting in Query Processing
There are two different types of sorting

1. Internal sorting.
2. External sorting.

Internal sorting:

When all the tuples fit into the memory then we can use a standard in-memory sorting
algorithm. Some of the algorithms are Quick sort and Bubble sort.

External sorting:

Refers to sorting algorithms that are suitable for large files of records stored on disk that
do not fit entirely in main memory, such as most database files. If data does not fit in
memory, then we need to use a technique that is aware of the cost of writing data out to
disk.
External Merge Sort Algorithm
The algorithm works in 2 stages

● In first stage, relation will be divided into as per the main memory space
available and the individual part of the relation is sorted and saved back to the
disk
● In second stage, all the sorted runs that has been generated will be merged
○ Read one block of each of the N run files into a buffer block in memory
repeat
● Choose the first tuple in sort order among all the buffer blocks
● Write the tuple to the output, delete it from buffer block
● If the buffer block of any run Ri and not end of the file Ri
○ Then read the next block of Ri into the buffer block
until all input buffer blocks are empty
External Merge Sort Algorithm
Let M denotes the number of blocks can fit into memory available for
sorting.
● Sorted runs are produced.
○ M blocks are read at a time.
○ Sorted in memory.
○ M blocks are written back.
● Merge M-1 runs at a time (M-1 may merge) (read M-1 buffers and 1
output buffer to write sorted output)
○ Read 1st blocks of M-1 runs
○ Outputs of 1st records to buffered blocks.
○ Once the buffered block is done it continue till buffer block is full.
○ Output buffer block to disk.
○ When a block of run is exhausted next block of that run is read.
Assumption:
● Number of buffer fit in
memory, M = 3
● Number of tuples fit in one
buffer = 1
● Read M-1 buffers and 1 output
buffer to write sorted output)
Example: 3 Buffer pages to sort 12 page file

Assumption:
● Number of buffer fit in
memory, M = 3
● Number of tuples fit in one
buffer = 1
● Read M-1 buffers and 1 output
buffer to write sorted output)
Join Operation
There are several different algorithms that can be used to implement
joins (natural-join, equi-join, condition-join)
○ Nested-Loop Join
○ Block Nested-Loop Join
○ Index Nested-Loop Join
○ Sort-Merge Join
○ Hash-Join
● Choice of a particular algorithm is based on cost estimate
Nested-Loop Join Algorithm
● A nested loop join is a join that contains a pair of nested for
loops.
● To perform the nested loop join i.e., θ on two relations r and s,
we use an algorithm known as the Nested loop join algorithm.
● The computation takes place as:
r⋈θs
where r is known as the outer relation and s is the inner
relation of the join. It is because the for loop of r encloses the for
loop of s.
Nested-Loop Join Algorithm (r ⋈ θ s)
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr, ts) to test if they satisfy the given join condition ?
if test satisfied Here,
r - Outer Relation
add tr . ts to the result; s - Inner Relation
tr and ts are the tuples of relations r and
end inner loop s, respectively.
The notation tr. ts is a tuple constructed
end outer loop
by concatenating the attribute values of
tuples tr and ts.
Nested-Loop Join Example

s⋈θr
Here,

● s is outer relation
● r is inner relation
Nested-Loop Join Algorithm Example

Here,
Employee is a outer relation
Dept is a inner relation
Nested-Loop Join Algorithm
○ The nested-loop join does not need any indexing similar to a linear file scan for
accessing the data.
○ Nested-loop join does not care about the given join condition. It is suitable for each
given join condition.
○ The nested-loop join algorithm is expensive in nature. It is because it computes and
examines each pair of tuples in the given two relations.
Cost of Nested-Loop Join Algorithm
For analyzing the cost of the nested-loop join algorithm,
● Consider a number of pairs of tuples as nr * ns.
Here, nr specifies the number of tuples in relation r and ns specifies the number of
tuples in relation s.
● For each record in outer relation r, complete scan on inner relation s is performed
● For computing the cost, perform a complete scan on relation s. Thus,
Worst Case
● The buffer can hold only one block of each relation
Total number of block transfers in worst case = nr * bs + br
Total number of seeks required in worst case = nr + br

where,
br - number of blocks containing tuples of relation r
b - number of blocks containing tuples of relation s
Cost of Nested-Loop Join Algorithm
Best Case

● Enough space for both relations to fit simultaneously in memory, so each block would
have to be read only once

Total number of block transfers in best case = bs + br

Total number of seeks required in best case = 2

● If one of the relations fits entirely in memory, it is beneficial to use that relation as a
inner relation , since the inner relation would then be read once

Total number of block transfers = bs + br

Total number of seeks required = 2
Example: Cost of Nested-Loop Join Algorithm
● Number of records of student relation: nstudent = 5000
● Number of blocks of student relation: bstudent = 100
● Number of records of takes relation: ntakes = 10000
● Number of blocks of takes relation: btakes = 400
Example: Cost of Nested-Loop Join Algorithm
Example: Cost of Nested-Loop Join Algorithm
Example: Cost of Nested-Loop Join Algorithm
Example: Cost of Nested-Loop Join Algorithm
Block Nested-Loop Join Algorithm
Block Nested-Loop Join Algorithm
for each block br of r do begin
for each block bs of s do begin
for each tuple tr in br do begin
for each tuple ts in bs do begin
test pair (tr, ts) to determine if they pass the given join condition
if test passed
add tr . ts to the result;
end
end
Here,
r is a outer relation
end
s is a inner relation
end
Block Nested-Loop Join Algorithm
1. Block Nested-Loop Join Algorithm
Block Nested-Loop Join Algorithm Example
Block Nested-Loop Join Algorithm Example
Block Nested-Loop Join Algorithm Example
Block Nested-Loop Join Algorithm Example
Block Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example

Primary index is available in main

memory so look up will be more fast
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Hash Join Algorithm
Hash Join Algorithm
Hash Join Algorithm
Hash Join Algorithm: Basic Idea
Hash Join Algorithm
Hash Join Algorithm

Hsi is the partition number where hash

value is i, called as build input.
Hri is the partition number where hash
value is i, called as Probe input.
nh is the number of partitions that has been
generated
//Step 1 //Step 2

//Partition s// //Perform the join operation on each partition//

for each tuple ts in s do begin for i= 0 to nh do begin
i = h(ts [JoinAttrs]); read Hsi and build an in-memory hash index on it;
Hsi = Hsi U {ts}; for each tuple tr in Hri do begin
end probe the hash index on Hsi to locate all tuples
such that ts[JoinAttrs] = tr[JoinAttrs];
//Partition r// for each matching tuple ts in Hsi do begin
for each tuple tr in r do begin add tr ⋈ ts to the result;
i = h(tr[JoinAttrs]); end
Hri = Hri U {tr}; end Hsi is the partition number where hash
end value is i, called as build input.
end H is the partition number where hash
ri
value is i, called as Probe input.
nh is the number of partitions that has been
generated
Hash Join Algorithm
Hash Join Algorithm: Step 1

Partitions after applying hash function on Department relation

Hash Join Algorithm: Step 1

Partitions after applying hash function on Employee relation

Hash Join Algorithm: Step 2
Hash Join Algorithm: Step 2
Hash Join Algorithm: Step 2
Hash Join Algorithm: Step 2
Hash Join Algorithm: Step 2
Recursive Partitioning

nh is the number of partitions that

has been generated
Handling of overflow
Handling of overflow
Cost of Hash Join
Cost of Hash Join
Cost of Hash Join
Cost of Hash Join
Cost of Hash Join
Cost of Hash Join
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Let us switch the relation sequence.
Let us take Dept as a outer relation and Employee as a inner relation
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Cost Analysis: Merge Join Algorithm
Cost Analysis: Merge Join Algorithm
Query Optimization
Query Optimization
● The process of selecting the most efficient query-evaluation plan.
● The relational-algebra level, where the system attempts to find an
expression that is equivalent to the given expression, but more
efficient to execute.
● Choosing the algorithm to use for executing an operation, choosing
the specific indices to use, and so on.
● The difference in cost (in terms of evaluation time) between a good
strategy and a bad strategy is often substantial, and may be several
orders of magnitude.
Query Optimization
Topics covered in Query Optimization are,

❏ Query Optimization Overview

❏ Transformation of Relational Expression

❏ Estimating Statistics of Expression Results.

❏ Choice of Evaluation Plan.

Query Optimization
Topics covered in Query Optimization are,

❏ Query Optimization Overview

Consider the following relational-algebra expression, for the query “Find the
names of all instructors in the Music department together with the course title of all the
courses that the instructors teach.”

Note that the projection of course on (course id,title) is required since course shares an
attribute dept name with instructor; if we did not remove this attribute using the projection, the
above expression using natural joins would return only courses from the Music department,
even if some Music department instructors taught courses in other departments.
Query Optimization
The above expression constructs a large intermediate relation
Query Optimization
Transformed expression tree will take less time to find or project the output, because it
directly select dept_name =”Music” from instructor.

Now the query is represented by relation algebra expression.

An evaluation plan defines exactly what algorithm should be used for each operation, and how
the execution of the operation should be coordinated.

● Different Operation such as, Hash Join, Merge Join/Sorted Join etc..
● ID attributes helps to merge / sort the relation where edges are marked as pipelined,
the output of the producer is pipelined directly to the consumer, without being
written out to disk.
● Given a relational-algebra expression, it is the job of the query optimizer to come up
with a query-evaluation plan that computes the same result as the given expression,
and is the least-costly way of generating the result.
Query Optimization
●
To find the least-costly query-evaluation plan, the optimizer needs to generate
alternative plans that produce the same result as the given expression, and to choose
the least-costly one.
Generation of query-evaluation plans involves three

steps:

1. Generating expressions that are logically equivalent to the given expression,

2. Annotating the resultant expressions in alternative ways to generate alternative
query-evaluation plans, and
3. Estimating the cost of each evaluation plan, and choosing the one whose estimated
cost is the least.
Query Optimization
An query-evaluation plans
Query Optimization
Alternatives for evaluating an entire expression tree –

Materialization: Materialize (i.e., store into temporary relations in the disk) intermediate
results from lower-level operations, and use them as inputs to upper-level operations.

Materialized evaluation: evaluate one operation at a time, starting at the lowest-level.

Use intermediate results materialized into temporary relations to evaluate next-level
operations

Pipelining: In pipelining approach,several relational operations are combined into a

pipeline of operations,in which the results of one operation are passed along to the next
operation in the pipeline.

Much cheaper than materialization: no need to store a temporary relation to disk • For
pipelining to be effective, use evaluation algorithms that generate output tuples even as
tuples are received for inputs to the operation
Query Optimization
❏ Transformation of Relational Expression

A query can be expressed in several different ways, with different costs of

evaluation.
Two relational-algebra expressions are said to be

equivalent. The two expressions generate the same set of

tuples.

The order of the tuples is irrelevant.

The two expressions may generate the tuples in different orders, but would be
considered equivalent as long as the set of tuples is the same.
Query Optimization
Equivalence Rules:

An equivalence rule says that expressions of two forms are equivalent.

We can replace an expression of the first form by an expression of the second

form. We can replace an expression of the second form by an expression of the

first form. The two expressions generate the same result on any valid database.

List a number of general equivalence rules on relational-algebra expressions.

and so on to denote predicates,

L1, L2, L3, and so on to denote lists of attributes,

E, E1, E2, and so on to denote relational-algebra expressions.

Query Optimization
Equivalence Rules:

A relation name r is simply a special case of a relational-algebra expression, and can

be used wherever E appears.

List of Equivalence Rules:

From Textbook (Database System Concepts, Abraham Silberschatz, Henry F. Korth,

S. Sudarshan, 6th Edition) Page No. 583 - 585

Example of Transformation:

Solve university example with the relation schemas:

Query Optimization
Multiple equivalence rules: Example from(DatabaseSystem Concepts,
Abraham Silberschatz, Henry F. Korth, S. Sudarshan, 6th Edition) Page No. 586 - 588

Turbo Codes: Principles and Applications
No ratings yet
Turbo Codes: Principles and Applications
24 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
DBMS
No ratings yet
DBMS
24 pages
3 Query Processing and Optimization-1
No ratings yet
3 Query Processing and Optimization-1
18 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
28 pages
Ivunit Query Processing
No ratings yet
Ivunit Query Processing
12 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Relational Query Optimization: Warih Maharani, ST.,MT
No ratings yet
Relational Query Optimization: Warih Maharani, ST.,MT
39 pages
Query Processing
No ratings yet
Query Processing
39 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
33 pages
Query Processing
No ratings yet
Query Processing
8 pages
Final DBMS Unit 7
No ratings yet
Final DBMS Unit 7
48 pages
UNIT 4 Query Processing and Different Types of Databases
No ratings yet
UNIT 4 Query Processing and Different Types of Databases
13 pages
Rdbms Assignment
No ratings yet
Rdbms Assignment
12 pages
Query Processing and Query Optimization Techniques
No ratings yet
Query Processing and Query Optimization Techniques
20 pages
Chapter One1
No ratings yet
Chapter One1
21 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Unit 1
No ratings yet
Unit 1
23 pages
Chapter 2 Querry Proccessing
No ratings yet
Chapter 2 Querry Proccessing
7 pages
Adbs CH2
No ratings yet
Adbs CH2
56 pages
Ad Database All Slide
No ratings yet
Ad Database All Slide
49 pages
Chapter 2 Query Processing
No ratings yet
Chapter 2 Query Processing
56 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
9 pages
05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
Advanced Database System Chapter Three Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Three Query Processing and Optimization
94 pages
Chapter 15: Query Processing
No ratings yet
Chapter 15: Query Processing
41 pages
Chapter 2 Adb
No ratings yet
Chapter 2 Adb
21 pages
Ch1 Query Processing
No ratings yet
Ch1 Query Processing
49 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
1.3 PPT - Measure of Query Cost
100% (1)
1.3 PPT - Measure of Query Cost
42 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
Unit 3
No ratings yet
Unit 3
5 pages
Query Processing
No ratings yet
Query Processing
5 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
Query Processing
No ratings yet
Query Processing
20 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
Chapter 15: Query Processing
No ratings yet
Chapter 15: Query Processing
36 pages
Unit VIII - Query Processing and Security
No ratings yet
Unit VIII - Query Processing and Security
29 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
11 Query Evaluations
No ratings yet
11 Query Evaluations
17 pages
Advancedchapter 2 2013
No ratings yet
Advancedchapter 2 2013
16 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Lesson 05
No ratings yet
Lesson 05
29 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Query Processing
No ratings yet
Query Processing
19 pages
CO3 Session 11
No ratings yet
CO3 Session 11
27 pages
Query Proc Notes
No ratings yet
Query Proc Notes
10 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Co3 Session 23
No ratings yet
Co3 Session 23
27 pages
CO3-Notes-Query Processing and Optimization
No ratings yet
CO3-Notes-Query Processing and Optimization
5 pages
04 Advanced Database System Chap 02 (RVUNC)
No ratings yet
04 Advanced Database System Chap 02 (RVUNC)
50 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
ADBChapter 1
No ratings yet
ADBChapter 1
32 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
55 pages
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Enhanced Entity-Relationship Modeling Transparencies: © Pearson Education Limited 1995, 2005
No ratings yet
Enhanced Entity-Relationship Modeling Transparencies: © Pearson Education Limited 1995, 2005
18 pages
Xamarin Android PDF
No ratings yet
Xamarin Android PDF
79 pages
Skoopref
No ratings yet
Skoopref
72 pages
Developing and Porting C and C++ Application On AIX
No ratings yet
Developing and Porting C and C++ Application On AIX
546 pages
Building SAP Fiori-Like UIs With SAPUI5 Completed Till Ex12 - Final
No ratings yet
Building SAP Fiori-Like UIs With SAPUI5 Completed Till Ex12 - Final
57 pages
Top 18 Java Design Pattern Interview Questions Answers For Experienced
No ratings yet
Top 18 Java Design Pattern Interview Questions Answers For Experienced
23 pages
Chapter 5
No ratings yet
Chapter 5
14 pages
Full Notes
No ratings yet
Full Notes
114 pages
BSP and Device Driver Development Guide: On-Line Applications Research Corporation
No ratings yet
BSP and Device Driver Development Guide: On-Line Applications Research Corporation
98 pages
Computer Science: Name: - Date
No ratings yet
Computer Science: Name: - Date
1 page
RPG Free - Tutorial 4
No ratings yet
RPG Free - Tutorial 4
9 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
Chapter 8 Data Structures - I Linear Lists Solutions
No ratings yet
Chapter 8 Data Structures - I Linear Lists Solutions
5 pages
Q10. What Is Multiple Inheritance? Is It Supported by Java?
No ratings yet
Q10. What Is Multiple Inheritance? Is It Supported by Java?
2 pages
V Programming Language
No ratings yet
V Programming Language
34 pages
Flutter Fondation
No ratings yet
Flutter Fondation
24 pages
SQL Ansi 1992
No ratings yet
SQL Ansi 1992
924 pages
Lazarus Databases
75% (4)
Lazarus Databases
69 pages
My Resume
No ratings yet
My Resume
1 page
Python Module 2 Vtu
No ratings yet
Python Module 2 Vtu
90 pages
WinDev 22 Tutorial
No ratings yet
WinDev 22 Tutorial
459 pages
Autocad VBA Selection Sets - Frfly
No ratings yet
Autocad VBA Selection Sets - Frfly
4 pages
Cds Lab Manual
No ratings yet
Cds Lab Manual
23 pages
Chapter 10 Object-Oriented Analysis and Modeling Using UML: True/False Questions
No ratings yet
Chapter 10 Object-Oriented Analysis and Modeling Using UML: True/False Questions
19 pages
Programming Techniques and Logic Introductions
100% (1)
Programming Techniques and Logic Introductions
5 pages
11-Formatted Output Functions PDF
No ratings yet
11-Formatted Output Functions PDF
16 pages
Forms Tutorial
100% (1)
Forms Tutorial
37 pages
Master Revit Parameters
No ratings yet
Master Revit Parameters
21 pages
Sai Daine Oops Microproject
No ratings yet
Sai Daine Oops Microproject
17 pages

Query Processing and Optimization

Uploaded by

Query Processing and Optimization

Uploaded by

Query Processing and

1. σsalary<75000 (πsalary (instructor))

2. πsalary (σsalary<75000 (instructor))

Query Tree for RAE 2

● Number of seeks *average-seek-cost

Time = b * t +S*t (Cost for b block transfers plus S seeks)

○ High end magnetic disk : t = 4 msec and t = 0.1 msec

Average case cost = tS + ⌈br / 2⌉ * tT

Dense Index, Linear Search, equality on key

Total Cost: cost of selection in index table + cost of

Cost of selection in index table : tS+ br * tT

Cost = 2 tS+ tT (br + 1)

Sparse Index, Binary Search, equality on key

B+ tree, equality, only one record

Dense Index, Linear Search, equality, only one record

B+ tree, equality, more than one record

Total number of block transfers in best case = bs + br

Total number of block transfers = bs + br

Primary index is available in main

Hsi is the partition number where hash

//Partition s// //Perform the join operation on each partition//

Partitions after applying hash function on Department relation

Partitions after applying hash function on Employee relation

nh is the number of partitions that

❏ Query Optimization Overview

❏ Transformation of Relational Expression

❏ Estimating Statistics of Expression Results.

❏ Choice of Evaluation Plan.

❏ Query Optimization Overview

Now the query is represented by relation algebra expression.

1. Generating expressions that are logically equivalent to the given expression,

Materialized evaluation: evaluate one operation at a time, starting at the lowest-level.

Pipelining: In pipelining approach,several relational operations are combined into a

A query can be expressed in several different ways, with different costs of

equivalent. The two expressions generate the same set of

The order of the tuples is irrelevant.

An equivalence rule says that expressions of two forms are equivalent.

form. We can replace an expression of the second form by an expression of the

List a number of general equivalence rules on relational-algebra expressions.

and so on to denote predicates,

L1, L2, L3, and so on to denote lists of attributes,

E, E1, E2, and so on to denote relational-algebra expressions.

A relation name r is simply a special case of a relational-algebra expression, and can

List of Equivalence Rules:

From Textbook (Database System Concepts, Abraham Silberschatz, Henry F. Korth,

Solve university example with the relation schemas:

You might also like