0% found this document useful (0 votes)
38 views34 pages

Unit 6

Uploaded by

Ghanashyam Bk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views34 pages

Unit 6

Uploaded by

Ghanashyam Bk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Unit-6

Query Processing
and Optimization
COMPILED BY:
GHANASHYAM
BK
Introduction to Query Processing
Query Processing is the activity performed in extracting data from the database.
The query processor turns user queries and data modification commands into a query plan - a
sequence of operations (or algorithm) on the database from high level queries to low level
commands
Decisions taken by the query processor
Which of the algebraically equivalent forms of a query will lead to the most efficient
algorithm?
For each algebraic operator what algorithm should we use to run the operator?
How should the operators pass data from one to the other? (eg, main memory buffers, disk
buffers)
Basic Steps in Query Processing
In query processing, it takes various steps for fetching the data from the database.
The steps involved are:
Parsing and translation
Optimization
Evaluation
Parsing and Translation
Initially, the given user queries get translated in high-level database languages such as SQL.
It gets translated into expressions that can be further used at the physical level of the file
system.
After this, the actual evaluation of the queries and a variety of query -optimizing
transformations and takes place.
Thus before processing a query, a computer system needs to translate the query into a human-
readable and understandable language.
Consequently, SQL or Structured Query Language is the best suitable choice for humans. But, it
is not perfectly suitable for the internal representation of the query to the system.
Relational algebra is well suited for the internal representation of a query.
Parsing and Translation
The translation process in query processing is similar to the parser of a query.
When a user executes any query, for generating the internal form of the query, the parser in
the system checks the syntax of the query, verifies the name of the relation in the database, the
tuple, and finally the required attribute value.
The parser creates a tree of the query, known as 'parse-tree.' Further, translate it into the form
of relational algebra.
With this, it evenly replaces all the use of the views when used in the query.
Notation for Query Trees (Parse
Tree)
Query Tree:
Standard technique for estimating the work involved in
executing the query, the generation of intermediate
results, and the optimization of execution
Nodes stand for operations like selection, projection,
join, renaming, ….
Leaf nodes represent base relations
A tree gives a good visual feel of the complexity of the
query and the operations involved
Parsing and Translation
Suppose, a user wants to fetch the records of the employees whose salary is greater than or
equal to 10000.
For doing this, the following query is undertaken:
select emp_name from Employee where salary>10000;
Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra.
We can bring this query in the relational algebra form as:
σsalary>10000 (πsalary (Employee))
πsalary (σsalary>10000 (Employee))
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
Optimization
The cost of the query evaluation can vary for different types of queries.
Although the system is responsible for constructing the evaluation plan, the user does need not
to write their query efficiently.
Usually, a database system generates an efficient query evaluation plan, which minimizes its
cost.
This type of task performed by the database system and is known as Query Optimization.
For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation.
It is because the overall operation cost depends on the memory allocations to several
operations, execution costs, and so on.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and evaluating
each operation.
Thus, after translating the user query, the system executes a query evaluation plan.
Query Evaluation Plan:
In order to fully evaluate a query, the system needs to construct a query evaluation plan.
The annotations in the evaluation plan may refer to the algorithms to be used for the particular index or
the specific operations.
Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation
primitives carry the instructions needed for the evaluation of the operation.
Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a query.
The query evaluation plan is also referred to as the query execution plan.
A query execution engine is responsible for generating the output of the given query. It takes the query
execution plan, executes it, and finally makes the output for the user query.
Query evaluation
Finally, after selecting an evaluation plan, the system evaluates the query and produces the
output of the query.
There are two methods of evaluating the query.
Materialization: In this method, queries are broken into individual queries and then the
results of which are used to get the final result.
Pipelining: In this method, DBMS do not store the records into temporary tables. Instead, it
queries each query and result of which will be passed to next query to process and so on.
There are two types of pipelining:
Demand Driven or Lazy evaluation
Producer Driven or Eager Pipelining
Materialization
To be more specific, suppose there is a requirement to find the students who are studying in
class ‘DESIGN_01’.
SELECT * FROM STUDENT s, CLASS c
WHERE s.CLASS_ID = c.CLASS_ID AND c.CLASS_NAME = ‘DESIGN_01’;

Here we can observe two queries:


one is to select the CLASS_ID of ‘DESIGN_01’ and
another is to select the student details of the CLASS_ID retrieved in the first query.

The DBMS also does the same. It breaks the query into two as mentioned above.
Once it is broken, it evaluates the first query and stores it in the temporary table in the
memory.
This temporary table data will be then used to evaluate the second query.
Materialization
This is the example of two level queries in materialization method.
We can have any number of levels and so many numbers of temporary tables.
Although this method looks simple, the cost of this type of evaluation is always more.
It takes the time to evaluate and write into temporary table, then retrieve from this temporary
table and query to get the next level of result and so on.
Hence cost of evaluation in this method is:
Cost = cost of individual SELECT + cost of write into temporary table
Pipelining
It will process the query one after the other and each will use the result of previous query for
its processing.
In the example above, CLASS_ID of DESIGN_01 is passed to the STUDENT table to get the
student details.
In this method no extra cost of writing into temporary tables.
It has only cost of evaluation of individual queries; hence it has better performance than
materialization.
There are two types of pipelining:
Demand Driven or Lazy evaluation
Producer Driven or Eager Pipelining
Demand Driven or Lazy
evaluation
In this method, the result of lower level queries are not passed to the higher level
automatically.
It will be passed to higher level only when it is requested by the higher level.
In this method, it retains the result value and state with it and it will be transferred to the next
level only when it is requested.
In our example above, CLASS_ID for DESIGN_01 will be retrieved, but it will not be passed to
STUDENT query only when it is requested.
 Once it gets the request, it is passed to student query and that query will be processed.
Producer Driven or Eager
Pipelining
In this method, the lower level queries eagerly pass the results to higher level queries.
It does not wait for the higher level queries to request for the results.
In this method, lower level query creates a buffer to store the results and the higher level
queries pulls the results for its use.
If the buffer is full, then the lower level query waits for the higher level query to empty it.
Hence it is also called as PULL and PUSH pipelining.
There are still more methods of pipelining like Linear and non-linear methods of pipelining.
Equivalence of Expressions
The first step in selecting a query-processing strategy is to find a relational algebra expression
that is equivalent to the given query and is efficient to execute.
We'll use the following relations as examples:
 Customer(cname, street, ccity)
Deposit(bname, account#, name, balance)
Branch(bname, assets, bcity)
We will use instances customer, deposit and branch of these schemes.
Selection Operation
Consider the query to find the assets and branch-names of all banks who have depositors living
in Port Chester. In relational algebra, this is

This expression constructs a huge relation,

of which we are only interested in a few tuples.


We also are only interested in two attributes of this relation.
We can see that we only want tuples for which ccity = ``Port Chester''.
Thus we can rewrite our query as:

This should considerably reduce the size of the intermediate relation.


Selection Operation
Project operation
Like selection, projection reduces the size of relations.It is advantageous to apply projections
early.
Consider this form of our example query:

When we compute the subexpression

we obtain a relation whose scheme is


(cname, ccity, bname, account#, balance)
Project operation
We can eliminate several attributes from this scheme. The only ones we need to retain are
those that
appear in the result of the query or
are needed to process subsequent operations.

By eliminating unneeded attributes, we reduce the number of columns of the intermediate
result, and thus its size.
In our example, the only attribute we need is bname (to join with branch). So we can rewrite
our expression as:
Natural Join Operation
Another way to reduce the size of temporary results is to choose an optimal ordering of the
join operations.
Natural join is associative:

Although these expressions are equivalent, the costs of computing them may differ.Look again
at our expression
Natural Join Operation
The other part,

is probably a small relation (comparatively).


So, if we compute

first, we get a reasonably small relation.


It has one tuple for each account held by a resident of Port Chester.
This temporary relation is much smaller than
Natural Join Operation
Natural join is commutative:

Thus we could rewrite our relational algebra expression as:


Other operations
Some other equivalences for union and set difference:
Query Cost Estimation
Cost of query is the time taken by the query to hit the database and return the result.
It involves query processing time i.e.; time taken to parse and translate the query, optimize it,
evaluate, execute and return the result to the user is called cost of the query.
Though it is in fraction of seconds, it includes multiple sub tasks and time taken by each of
them.
Executing the optimized query involves hitting the primary and secondary memory based on
the file organization method.
Depending on file organization and the indexes used, time taken to retrieve the data may vary.
Query Cost Estimation
The cost estimation of a query evaluation plan is calculated in terms of various resources that
include:
Number of disk accesses
Execution time taken by the CPU to execute a query
Communication costs in distributed or parallel database systems.

Disk access time is the time taken by the processor to search and find the record in the
secondary memory and return the result.
This takes the majority of time while processing a query. Other times can be ignored compared
to disk I/O time.
Query Cost Estimation
While calculating the disk I/O time, usually only two factors are considered
seek time and
transfer time.

The seek time is the time taken the processor to find a single record in the disk memory and is
represented by tS.
For example, in order to find the student ID of a student ‘John’, the processor will fetch in the memory
based on the index and the file organization method.
The time taken by the processor to hit the disk block and search for his ID is called the seek time.

The time taken by the disk to return fetched result back to the processor / user is called
transfer time and is represented by tT.
Query Cost Estimation
Suppose a query need to seek S times to fetch a record and there is B blocks needs to be
returned to the user.
Then the disk I/O cost is calculated as below (S* tS)+ (B* tT)
If tT=0.1 ms, tS =4 ms, the block size is 4 KB, and its transfer rate is 40 MB per second. With this,
we can easily calculate the estimated cost of the given query evaluation plan.
Example:
Given:
tS=4 ms (seek time)
tT​=0.1 ms (transfer time)
Block size = 4 KB
Transfer rate = 40 MB per second
Query Cost Estimation
Query Cost Estimation
Query Optimization
The query optimizer (also known as the optimizer) is database software that identifies the most
efficient way (like by reducing time) for a SQL statement to access data.
The process of selecting an efficient execution plan for processing a query is known as query
optimization.
Query optimization is used to access and modify the database in the most efficient way
possible.
It is the art of obtaining necessary information in a predictable, reliable, and timely manner.
Query optimization is formally described as the process of transforming a query into an
equivalent form that may be evaluated more efficiently.
The goal of query optimization is to find an execution plan that reduces the time required to
process a query.
We must complete two major tasks to attain this optimization target.
The first is to determine the optimal plan to access the database, and
the second is to reduce the time required to execute the query plan.
Methods of query optimization
Cost based Optimization (Physical)
This is based on the cost of the query.
The query can use different paths based on indexes, constraints, sorting methods etc.
This method mainly uses the statistics like record size, number of records, number of records
per block, number of blocks, table size, whether whole table fits in a block, organization of
tables, uniqueness of column values, size of columns etc.
Methods of query optimization
Heuristic Optimization (Logical)
This method is also known as rule based optimization.
This is based on the equivalence rule on relational expressions; hence the number of
combination of queries get reduces here.
Hence the cost of the query too reduces.
This method creates relational tree for the given query based on the equivalence rules.
These equivalence rules by providing an alternative way of writing and evaluating the query,
gives the better path to evaluate the query.
This rule need not be true in all cases. It needs to be examined after applying those rules.
Methods of query optimization
Heuristic Optimization (Logical) (Contd…)
The most important set of rules followed in this method is listed below:
Perform all the selection operation as early as possible in the query.
This should be first and foremost set of actions on the tables in the query.
By performing the selection operation, we can reduce the number of records involved in the query,
rather than using the whole tables throughout the query.
Suppose we have a query to retrieve the students with age 18 and studying in class DESIGN_01.
We can get all the student details from STUDENT table, and class details from CLASS table.

Reference: https://www.recw.ac.in/v1.8/wp-content/uploads/2021/03/DBMS-Unit-4.pdf

You might also like