0% found this document useful (0 votes)
8 views

Query Processing and Optimization

The document discusses query processing and optimization in database management systems, detailing the steps involved such as parsing, translation, optimization, and evaluation of SQL queries. It emphasizes the importance of query optimization to minimize evaluation costs, which can be measured in terms of disk access and CPU time. Various algorithms for selection, sorting, and join operations are also outlined, highlighting their respective costs and methodologies.

Uploaded by

sub9967222
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Query Processing and Optimization

The document discusses query processing and optimization in database management systems, detailing the steps involved such as parsing, translation, optimization, and evaluation of SQL queries. It emphasizes the importance of query optimization to minimize evaluation costs, which can be measured in terms of disk access and CPU time. Various algorithms for selection, sorting, and join operations are also outlined, highlighting their respective costs and methodologies.

Uploaded by

sub9967222
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

Query Processing and

Optimization
Prof. Smita Joshi
Query Processing and Optimization
● Overview
● Measures of Query cost
● Selection operation, Sorting, Join Operations, and other Operations
● Evaluation of Expression
● Query Optimization :Translations of SQL Queries into relational
algebra
● Heuristic approach
● Cost based optimization
Overview
● Techniques used internally by a DBMS to process high-level queries. A query expressed in a
high-level query language such as SQL must first be scanned, parsed, and validated
● The scanner identifies the query tokens—such as SQL keywords, attribute names, and relation
names—that appear in the text of the query
● The parser checks the query syntax to determine whether it is formulated according to the syntax rules
(rules of grammar) of the query language
● The query must also be validated by checking that all attribute and relation names are valid and
semantically meaningful names in the schema of the particular database being queried
● An internal representation of the query is then created, usually as a tree data structure called a query
tree
● It is also possible to represent the query using a graph data structure called a query graph, which is
generally a directed acyclic graph (DAG)
● The DBMS must then devise an execution strategy or query plan for retrieving the results of the query
from the database files
● A query has many possible execution strategies, and the process of choosing a suitable one for
processing a query is known as query optimization
● First we focus on how queries are processed and what algorithms are used to perform individual
operations within the query
Steps of processing a query
The basic steps involved in processing a
query are
1. Parsing and translation.
2. Optimization.
3. Evaluation.
Parsing and translation
● The fired queries undergo lexical, syntactic, and semantic analysis.
● Essentially, the query gets broken down into different tokens and white spaces are
removed along with the comments (Lexical Analysis).
● In the next step, the query gets checked for the correctness, both syntax and semantic
wise.
● The query processor first checks the query if the rules of SQL have been correctly
followed or not (Syntactic Analysis).
● Finally, the query processor checks if the meaning of the query is right or not. Things
like if the table(s) mentioned in the query are present in the DB or not? if the
column(s) referred from all the table(s) are actually present in them or not? (Semantic
Analysis)
● Once the above mentioned checks pass, the flow moves to convert all the tokens into
relational expressions, graphs, and trees.
Parsing and translation
As an example, consider the query: The query would be divided into the following tokens:
SELECT SELECT, salary, FROM, instructor, WHERE, salary, <,
salary 75000.

FROM The tokens (and hence the query) get validated for
instructor
WHERE ● The name of the queried table is looked into the data
dictionary table.
salary < 75000; ● The name of the columns mentioned (salary) in the
tokens are validated for existence.
● The type of column(s) being compared have to be of
the same type (salary and the value 75000 should have
the same data type).

The next step is to translate the generated set of tokens into a relational algebra query.
These are easy to handle for the optimizer in further processes.
Parsing and translation
This query can be translated into either of the following
relational-algebra expressions:

1. σsalary<75000 (πsalary (instructor))


Query Tree for RAE 1

2. πsalary (σsalary<75000 (instructor))


Relational graphs and trees can also be generated

Query Tree for RAE 2


Query Evaluation
● As you can see in previous slides possible query trees, one way could be first
projecting followed by selection (on the right).
● Another way would be to do selection followed by projection (on the left).
● The sample query is kept simple and straightforward to ensure better
comprehension but in the case of joins and views, more such paths
(evaluation plans) start to open up.
Query Evaluation
● Further, we can execute each relational-algebra operation by one of several different
algorithms.
○ For example, to implement the preceding selection, we can search every tuple in instructor to find tuples
with salary less than 75000. If a B+-tree index is available on the attribute salary, we can use the index
instead to locate the tuples.
● To specify fully how to evaluate a query, we need not only to provide the
relational-algebra expression, but also to annotate it with instructions specifying how to
evaluate each operation.
● Annotations may state the algorithm to be used for a specific operation, or the particular
index or indices to use.
● A relational-algebra operation annotated with instructions on how to evaluate it is called
an evaluation primitive. These evaluation primitives are very essential and play an
important role as they define the sequence of operations to be performed for a given plan.
● A sequence of primitive operations that can be used to evaluate a query is a
query-execution plan or query-evaluation plan.
Query Optimization
● The different evaluation plans for a given query can have different costs.
● We do not expect users to write their queries in a way that suggests the
most efficient evaluation plan.
● Rather, it is the responsibility of the system to construct a query evaluation
plan that minimizes the cost of query evaluation; this task is called query
optimization.
● Once the query plan is chosen, The query-execution engine takes a
query-evaluation plan, executes that plan, and returns the answers to the
query.
Measures of Query Cost
● There are multiple possible evaluation plans for a query, and it is important to
● be able to compare the alternatives in terms of their (estimated) cost, and
choose the best plan.
● To do so, we must estimate the cost of individual operations, and combine
them to get the cost of a query evaluation plan.
● The cost of query evaluation can be measured in terms of a number of
different resources, including
○ disk access, CPU time to execute a query, and in a distributed or parallel
database system, the cost of communication
Assumptions: Measures of Query Cost
● For simplicity we just use two measures number of block transfers from disk and
number of seeks as the cost measure
○ We ignore the difference in cost between sequential and random I/O for
simplicity
○ We also ignore CPU costs for simplicity
● Data must be read from disk initially in buffer. 1 seek operation will be required that
time
● Costs depends on the size of the buffer in main memory
○ Having more memory reduces need for disk access
○ Amount of real memory available to buffer depends on other concurrent OS
processes, and hard to determine ahead of actual execution
○ We often use worst case estimates, assuming only the minimum amount of
memory needed for the operation is available
● Real systems take CPU cost into account, differentiate between sequential and
random I/O, and take buffer size into account
● We do not include cost to writing output to disk in our cost formulae
Measures of Query Cost
Disk access is the predominant cost, and is also relatively easy to estimate. How to calculate
the disk access cost?

● Number of seeks *average-seek-cost


○ A disk is divided into many circular tracks. Seek Time is defined as the time
required by the read/write head to move from one track to another.
○ Rotational Latency is defined as the time required by the read/write head
to rotate to the requested sector from the current position.
● Number of blocks read *average-block-read-cost
● Number of blocks written *average-block-write-cost
○ Cost to write a block is greater than cost to read a block
■ data is read back after being written to ensure that the write was
successful
Note : Number of block transfers in that block transfer it can be Number of blocks read
or Number of blocks written
Measures of Query Cost
● For simplicity we just use the number of block transfers (b) from disk and the number of
seeks (S) as the cost measures

○ t – Average time to transfer one block of data (Assuming for simplicity that write cost is
T
same as read cost. Ideally read cost and write cost is different)

○ t – Average block access time (time for one seek = disk seek time + rotational latency)
S
The formula to calculate the cost or measure the cost is

Time = b * t +S*t (Cost for b block transfers plus S seeks)


T S
○ t and t depends on where the data is stored
T S
E.g. for 4 KB block size

○ High end magnetic disk : t = 4 msec and t = 0.1 msec


S T
○ SSD : t = 20-90 microsec and t = microsec msec
S T
Algorithms
1. Selection
2. Linear Search Algorithm
3. Binary Search Algorithm
4. Sorting Algorithm [Quick sort and External merge sort]
5. Join Operation
a. Nested - Loop Join
b. Block Nested - Loop Join
c. Indexed Nested - Loop Join
d. Merge Join
e. Hash Join
f. Complex Join
Selection Operation
● File scan – Lowest level operator to access data. Search algorithms that
locate and retrieve records that fulfill a selection condition.
● Algorithm A1 (linear search). Scan each file block and test all records to
see whether they satisfy the selection condition.
● Linear search can be applied regardless of
○ selection condition or
○ ordering of records in the file, or
○ availability of indices
● An initial seek is required to access first block of the file. If blocks are not stored
continuously, extra seeks may be required
○ br denotes number of blocks containing records from relation r and 1 seek operation

Cost = 1 * tS + br * t
T
Selection operation: A1(Linear search, Equality on key)
● If selection is on a key attribute, search stop on finding record
Select *
from employee
where emp_id = 1004;

Average case cost = tS + ⌈br / 2⌉ * tT


Worst case cost =tS + br * tT
Selection operation: A2(binary search, Equality on key)
● A2 (binary search). Applicable if selection is an equality comparison on the
(primary) key attribute and the file is sorted (ordered) on (primary) key.
○ Assume that the blocks of a relation are stored contiguously
○ Cost estimate (number of disk blocks to be scanned):
■ Cost of binary search = ⌈log2(br)⌉ * (tT + tS)
● br- denotes number of blocks containing records from relation r
○ If selection is on non-key (primary) attribute then multiple blocks
may contain required records, then the cost of scanning such a
blocks need to be added to the cost estimate
Selection operation: A3(primary B+-tree Index, equality on key)
● For an equality comparison on a key attribute with a primary index, we can use the
index to retrieve a single record that satisfies the corresponding equality condition.
● Only one record will be return since only one value (unique) value exist for key
● Assume B+ tree index file
Select * from Student where Roll_no = 4;
Cost estimates
Cost = (hi + 1) * (tT + tS) First we need to identify the leaf node and then the
Where actual record.
● hi denotes the height of the index.
● Index lookup traverses the height of the tree plus one I/O to fetch the record;
each of these I/O operations requires a seek and a block transfer.
Selection operation: A4(primary
Index:Index table, equality on key)

Dense Index, Linear Search, equality on key

Total Cost: cost of selection in index table + cost of


selection in data file

Cost of selection in index table : tS+ br * tT


Cost of selection in data file : (tT + tS)

Cost = tS+ br * tT + tT + tS

Cost = 2 tS+ tT (br + 1)


Selection operation: A5(primary
Index:Index table, equality on key)

Sparse Index, Binary Search, equality on key


Total Cost: cost of selection in index table + cost of selection in
data file
Cost of selection in index table : ⌈log2(br)⌉ * (tT + tS)
Cost of selection in data file : (tT + tS)
Cost = ⌈log2(br)⌉ * (tT + tS) + (tT + tS)
Cost = (⌈log2(br)⌉ + 1) * (tT + tS)
Selection operation: A6(primary B+-tree Index, equality on non key)

No of seeks = hi + 1
No of block transfer = hi + b Assuming b blocks contain search key
Cost = (hi + 1) * tS + (hi + b) * tT
Cost = hi (tT + tS) + tS + b * tT
● One seek for each level of the tree, one seek for the first block.
● Here b is the number of blocks containing records with the specified
search key, all of which are read.
● These blocks are leaf blocks assumed to be stored sequentially (since it
is a primary index) and don’t require additional seeks.
Hence Cost = hi ∗ (tT + tS) + b ∗ tT
Selection operation: (primary Index (dense/sparse), equality on non key)
Dense Index, Linear Search, equality on non key
Total Cost: cost of selection in index table + cost of selection in data file
Cost of selection in index table : tS+ br * tT
Cost of selection in data file : (n * tT + tS)
Cost = tS+ br * tT + n * tT + tS
Sparse Index, Binary Search, equality on non key
Total Cost: cost of selection in index table + cost of selection in data file
Cost of selection in index table : ⌈log2(br)⌉ * (tT + tS)
Cost of selection in data file : (n * tT + tS)
Cost = ⌈log2(br)⌉ * (tT + tS) + n * tT + tS
Selection operation: (Secondary Index, equality, only one record)

B+ tree, equality, only one record


Cost = (hi + 1) * (tT + tS)

Dense Index, Linear Search, equality, only one record


Cost = tS+ br * tT + tT + tS
Sparse Index, Binary Search, equality, only one record
Cost = ⌈log2(br)⌉ * (tT + tS) + (tT + tS)
Cost = (⌈log2(br)⌉ + 1) * (tT + tS)
Selection operation: (Secondary Index, equality, more than one record)

B+ tree, equality, more than one record


Cost = hi * (tT + tS) + n * (tT + tS) = (hi + n) * (tT + tS)

Dense Index, Linear Search, equality, more than only one record
Cost = tS+ br * tT + n * (tT + tS)
Sparse Index, Binary Search, equality, more than one record
Cost = ⌈log2(br)⌉ * (tT + tS) + n * (tT + tS)
Cost = (⌈log2(br)⌉ + n) * (tT + tS)
Sorting in Query Processing
Sorting Operation is required when
● SQL Queries can specify that the output be sorted
● For efficient query processing
○ E.g. for join operation we have to sort the relation first

● Logical sorting
○ Sort a relation by building an index on the sort key, and use that index to read the
relation in sorted order
○ Reading of tuples in the sorted order may lead to a disk access for each record,
which may be very expensive, since the number of record can be much larger
than the number of blocks
● Hence it is desirable to order the record, physically.
Sorting in Query Processing
There are two different types of sorting

1. Internal sorting.
2. External sorting.

Internal sorting:

When all the tuples fit into the memory then we can use a standard in-memory sorting
algorithm. Some of the algorithms are Quick sort and Bubble sort.

External sorting:

Refers to sorting algorithms that are suitable for large files of records stored on disk that
do not fit entirely in main memory, such as most database files. If data does not fit in
memory, then we need to use a technique that is aware of the cost of writing data out to
disk.
External Merge Sort Algorithm
The algorithm works in 2 stages

● In first stage, relation will be divided into as per the main memory space
available and the individual part of the relation is sorted and saved back to the
disk
● In second stage, all the sorted runs that has been generated will be merged
○ Read one block of each of the N run files into a buffer block in memory
repeat
● Choose the first tuple in sort order among all the buffer blocks
● Write the tuple to the output, delete it from buffer block
● If the buffer block of any run Ri and not end of the file Ri
○ Then read the next block of Ri into the buffer block
until all input buffer blocks are empty
External Merge Sort Algorithm
Let M denotes the number of blocks can fit into memory available for
sorting.
● Sorted runs are produced.
○ M blocks are read at a time.
○ Sorted in memory.
○ M blocks are written back.
● Merge M-1 runs at a time (M-1 may merge) (read M-1 buffers and 1
output buffer to write sorted output)
○ Read 1st blocks of M-1 runs
○ Outputs of 1st records to buffered blocks.
○ Once the buffered block is done it continue till buffer block is full.
○ Output buffer block to disk.
○ When a block of run is exhausted next block of that run is read.
Assumption:
● Number of buffer fit in
memory, M = 3
● Number of tuples fit in one
buffer = 1
● Read M-1 buffers and 1 output
buffer to write sorted output)
Example: 3 Buffer pages to sort 12 page file

Assumption:
● Number of buffer fit in
memory, M = 3
● Number of tuples fit in one
buffer = 1
● Read M-1 buffers and 1 output
buffer to write sorted output)
Join Operation
There are several different algorithms that can be used to implement
joins (natural-join, equi-join, condition-join)
○ Nested-Loop Join
○ Block Nested-Loop Join
○ Index Nested-Loop Join
○ Sort-Merge Join
○ Hash-Join
● Choice of a particular algorithm is based on cost estimate
Nested-Loop Join Algorithm
● A nested loop join is a join that contains a pair of nested for
loops.
● To perform the nested loop join i.e., θ on two relations r and s,
we use an algorithm known as the Nested loop join algorithm.
● The computation takes place as:
r⋈θs
where r is known as the outer relation and s is the inner
relation of the join. It is because the for loop of r encloses the for
loop of s.
Nested-Loop Join Algorithm (r ⋈ θ s)
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr, ts) to test if they satisfy the given join condition ?
if test satisfied Here,
r - Outer Relation
add tr . ts to the result; s - Inner Relation
tr and ts are the tuples of relations r and
end inner loop s, respectively.
The notation tr. ts is a tuple constructed
end outer loop
by concatenating the attribute values of
tuples tr and ts.
Nested-Loop Join Example

s⋈θr
Here,

● s is outer relation
● r is inner relation
Nested-Loop Join Algorithm Example

Here,
Employee is a outer relation
Dept is a inner relation
Nested-Loop Join Algorithm
○ The nested-loop join does not need any indexing similar to a linear file scan for
accessing the data.
○ Nested-loop join does not care about the given join condition. It is suitable for each
given join condition.
○ The nested-loop join algorithm is expensive in nature. It is because it computes and
examines each pair of tuples in the given two relations.
Cost of Nested-Loop Join Algorithm
For analyzing the cost of the nested-loop join algorithm,
● Consider a number of pairs of tuples as nr * ns.
Here, nr specifies the number of tuples in relation r and ns specifies the number of
tuples in relation s.
● For each record in outer relation r, complete scan on inner relation s is performed
● For computing the cost, perform a complete scan on relation s. Thus,
Worst Case
● The buffer can hold only one block of each relation
Total number of block transfers in worst case = nr * bs + br
Total number of seeks required in worst case = nr + br

where,
br - number of blocks containing tuples of relation r
b - number of blocks containing tuples of relation s
Cost of Nested-Loop Join Algorithm
Best Case

● Enough space for both relations to fit simultaneously in memory, so each block would
have to be read only once

Total number of block transfers in best case = bs + br


Total number of seeks required in best case = 2

● If one of the relations fits entirely in memory, it is beneficial to use that relation as a
inner relation , since the inner relation would then be read once

Total number of block transfers = bs + br


Total number of seeks required = 2
Example: Cost of Nested-Loop Join Algorithm
● Number of records of student relation: nstudent = 5000
● Number of blocks of student relation: bstudent = 100
● Number of records of takes relation: ntakes = 10000
● Number of blocks of takes relation: btakes = 400
Example: Cost of Nested-Loop Join Algorithm
Example: Cost of Nested-Loop Join Algorithm
Example: Cost of Nested-Loop Join Algorithm
Example: Cost of Nested-Loop Join Algorithm
Block Nested-Loop Join Algorithm
Block Nested-Loop Join Algorithm
for each block br of r do begin
for each block bs of s do begin
for each tuple tr in br do begin
for each tuple ts in bs do begin
test pair (tr, ts) to determine if they pass the given join condition
if test passed
add tr . ts to the result;
end
end
Here,
r is a outer relation
end
s is a inner relation
end
Block Nested-Loop Join Algorithm
1. Block Nested-Loop Join Algorithm
Block Nested-Loop Join Algorithm Example
Block Nested-Loop Join Algorithm Example
Block Nested-Loop Join Algorithm Example
Block Nested-Loop Join Algorithm Example
Block Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example

Primary index is available in main


memory so look up will be more fast
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Index Nested-Loop Join Algorithm Example
Hash Join Algorithm
Hash Join Algorithm
Hash Join Algorithm
Hash Join Algorithm: Basic Idea
Hash Join Algorithm
Hash Join Algorithm

Hsi is the partition number where hash


value is i, called as build input.
Hri is the partition number where hash
value is i, called as Probe input.
nh is the number of partitions that has been
generated
//Step 1 //Step 2

//Partition s// //Perform the join operation on each partition//


for each tuple ts in s do begin for i= 0 to nh do begin
i = h(ts [JoinAttrs]); read Hsi and build an in-memory hash index on it;
Hsi = Hsi U {ts}; for each tuple tr in Hri do begin
end probe the hash index on Hsi to locate all tuples
such that ts[JoinAttrs] = tr[JoinAttrs];
//Partition r// for each matching tuple ts in Hsi do begin
for each tuple tr in r do begin add tr ⋈ ts to the result;
i = h(tr[JoinAttrs]); end
Hri = Hri U {tr}; end Hsi is the partition number where hash
end value is i, called as build input.
end H is the partition number where hash
ri
value is i, called as Probe input.
nh is the number of partitions that has been
generated
Hash Join Algorithm
Hash Join Algorithm: Step 1

Partitions after applying hash function on Department relation


Hash Join Algorithm: Step 1

Partitions after applying hash function on Employee relation


Hash Join Algorithm: Step 2
Hash Join Algorithm: Step 2
Hash Join Algorithm: Step 2
Hash Join Algorithm: Step 2
Hash Join Algorithm: Step 2
Recursive Partitioning

nh is the number of partitions that


has been generated
Handling of overflow
Handling of overflow
Cost of Hash Join
Cost of Hash Join
Cost of Hash Join
Cost of Hash Join
Cost of Hash Join
Cost of Hash Join
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Let us switch the relation sequence.
Let us take Dept as a outer relation and Employee as a inner relation
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Merge Join Algorithm
Cost Analysis: Merge Join Algorithm
Cost Analysis: Merge Join Algorithm
Query Optimization
Query Optimization
● The process of selecting the most efficient query-evaluation plan.
● The relational-algebra level, where the system attempts to find an
expression that is equivalent to the given expression, but more
efficient to execute.
● Choosing the algorithm to use for executing an operation, choosing
the specific indices to use, and so on.
● The difference in cost (in terms of evaluation time) between a good
strategy and a bad strategy is often substantial, and may be several
orders of magnitude.
Query Optimization
Topics covered in Query Optimization are,

❏ Query Optimization Overview

❏ Transformation of Relational Expression

❏ Estimating Statistics of Expression Results.

❏ Choice of Evaluation Plan.


Query Optimization
Topics covered in Query Optimization are,

❏ Query Optimization Overview


Consider the following relational-algebra expression, for the query “Find the
names of all instructors in the Music department together with the course title of all the
courses that the instructors teach.”

Note that the projection of course on (course id,title) is required since course shares an
attribute dept name with instructor; if we did not remove this attribute using the projection, the
above expression using natural joins would return only courses from the Music department,
even if some Music department instructors taught courses in other departments.
Query Optimization
The above expression constructs a large intermediate relation
Query Optimization
Transformed expression tree will take less time to find or project the output, because it
directly select dept_name =”Music” from instructor.

Now the query is represented by relation algebra expression.

An evaluation plan defines exactly what algorithm should be used for each operation, and how
the execution of the operation should be coordinated.

● Different Operation such as, Hash Join, Merge Join/Sorted Join etc..
● ID attributes helps to merge / sort the relation where edges are marked as pipelined,
the output of the producer is pipelined directly to the consumer, without being
written out to disk.
● Given a relational-algebra expression, it is the job of the query optimizer to come up
with a query-evaluation plan that computes the same result as the given expression,
and is the least-costly way of generating the result.
Query Optimization

To find the least-costly query-evaluation plan, the optimizer needs to generate
alternative plans that produce the same result as the given expression, and to choose
the least-costly one.
Generation of query-evaluation plans involves three

steps:

1. Generating expressions that are logically equivalent to the given expression,


2. Annotating the resultant expressions in alternative ways to generate alternative
query-evaluation plans, and
3. Estimating the cost of each evaluation plan, and choosing the one whose estimated
cost is the least.
Query Optimization
An query-evaluation plans
Query Optimization
Alternatives for evaluating an entire expression tree –

Materialization: Materialize (i.e., store into temporary relations in the disk) intermediate
results from lower-level operations, and use them as inputs to upper-level operations.

Materialized evaluation: evaluate one operation at a time, starting at the lowest-level.


Use intermediate results materialized into temporary relations to evaluate next-level
operations

Pipelining: In pipelining approach,several relational operations are combined into a


pipeline of operations,in which the results of one operation are passed along to the next
operation in the pipeline.

Much cheaper than materialization: no need to store a temporary relation to disk • For
pipelining to be effective, use evaluation algorithms that generate output tuples even as
tuples are received for inputs to the operation
Query Optimization
❏ Transformation of Relational Expression

A query can be expressed in several different ways, with different costs of


evaluation.
Two relational-algebra expressions are said to be

equivalent. The two expressions generate the same set of

tuples.

The order of the tuples is irrelevant.

The two expressions may generate the tuples in different orders, but would be
considered equivalent as long as the set of tuples is the same.
Query Optimization
Equivalence Rules:

An equivalence rule says that expressions of two forms are equivalent.


We can replace an expression of the first form by an expression of the second

form. We can replace an expression of the second form by an expression of the

first form. The two expressions generate the same result on any valid database.

List a number of general equivalence rules on relational-algebra expressions.

and so on to denote predicates,

L1, L2, L3, and so on to denote lists of attributes,

E, E1, E2, and so on to denote relational-algebra expressions.


Query Optimization
Equivalence Rules:

A relation name r is simply a special case of a relational-algebra expression, and can


be used wherever E appears.

List of Equivalence Rules:

From Textbook (Database System Concepts, Abraham Silberschatz, Henry F. Korth,


S. Sudarshan, 6th Edition) Page No. 583 - 585

Example of Transformation:

Solve university example with the relation schemas:


Query Optimization
Multiple equivalence rules: Example from(DatabaseSystem Concepts,
Abraham Silberschatz, Henry F. Korth, S. Sudarshan, 6th Edition) Page No. 586 - 588

You might also like