0% found this document useful (0 votes)

38 views34 pages

Unit 6

Uploaded by

Ghanashyam Bk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views34 pages

Unit 6

Uploaded by

Ghanashyam Bk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Unit-6

Query Processing
and Optimization
COMPILED BY:
GHANASHYAM
BK
Introduction to Query Processing
Query Processing is the activity performed in extracting data from the database.
The query processor turns user queries and data modification commands into a query plan - a
sequence of operations (or algorithm) on the database from high level queries to low level
commands
Decisions taken by the query processor
Which of the algebraically equivalent forms of a query will lead to the most efficient
algorithm?
For each algebraic operator what algorithm should we use to run the operator?
How should the operators pass data from one to the other? (eg, main memory buffers, disk
buffers)
Basic Steps in Query Processing
In query processing, it takes various steps for fetching the data from the database.
The steps involved are:
Parsing and translation
Optimization
Evaluation
Parsing and Translation
Initially, the given user queries get translated in high-level database languages such as SQL.
It gets translated into expressions that can be further used at the physical level of the file
system.
After this, the actual evaluation of the queries and a variety of query -optimizing
transformations and takes place.
Thus before processing a query, a computer system needs to translate the query into a human-
readable and understandable language.
Consequently, SQL or Structured Query Language is the best suitable choice for humans. But, it
is not perfectly suitable for the internal representation of the query to the system.
Relational algebra is well suited for the internal representation of a query.
Parsing and Translation
The translation process in query processing is similar to the parser of a query.
When a user executes any query, for generating the internal form of the query, the parser in
the system checks the syntax of the query, verifies the name of the relation in the database, the
tuple, and finally the required attribute value.
The parser creates a tree of the query, known as 'parse-tree.' Further, translate it into the form
of relational algebra.
With this, it evenly replaces all the use of the views when used in the query.
Notation for Query Trees (Parse
Tree)
Query Tree:
Standard technique for estimating the work involved in
executing the query, the generation of intermediate
results, and the optimization of execution
Nodes stand for operations like selection, projection,
join, renaming, ….
Leaf nodes represent base relations
A tree gives a good visual feel of the complexity of the
query and the operations involved
Parsing and Translation
Suppose, a user wants to fetch the records of the employees whose salary is greater than or
equal to 10000.
For doing this, the following query is undertaken:
select emp_name from Employee where salary>10000;
Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra.
We can bring this query in the relational algebra form as:
σsalary>10000 (πsalary (Employee))
πsalary (σsalary>10000 (Employee))
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
Optimization
The cost of the query evaluation can vary for different types of queries.
Although the system is responsible for constructing the evaluation plan, the user does need not
to write their query efficiently.
Usually, a database system generates an efficient query evaluation plan, which minimizes its
cost.
This type of task performed by the database system and is known as Query Optimization.
For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation.
It is because the overall operation cost depends on the memory allocations to several
operations, execution costs, and so on.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and evaluating
each operation.
Thus, after translating the user query, the system executes a query evaluation plan.
Query Evaluation Plan:
In order to fully evaluate a query, the system needs to construct a query evaluation plan.
The annotations in the evaluation plan may refer to the algorithms to be used for the particular index or
the specific operations.
Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation
primitives carry the instructions needed for the evaluation of the operation.
Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a query.
The query evaluation plan is also referred to as the query execution plan.
A query execution engine is responsible for generating the output of the given query. It takes the query
execution plan, executes it, and finally makes the output for the user query.
Query evaluation
Finally, after selecting an evaluation plan, the system evaluates the query and produces the
output of the query.
There are two methods of evaluating the query.
Materialization: In this method, queries are broken into individual queries and then the
results of which are used to get the final result.
Pipelining: In this method, DBMS do not store the records into temporary tables. Instead, it
queries each query and result of which will be passed to next query to process and so on.
There are two types of pipelining:
Demand Driven or Lazy evaluation
Producer Driven or Eager Pipelining
Materialization
To be more specific, suppose there is a requirement to find the students who are studying in
class ‘DESIGN_01’.
SELECT * FROM STUDENT s, CLASS c
WHERE s.CLASS_ID = c.CLASS_ID AND c.CLASS_NAME = ‘DESIGN_01’;

Here we can observe two queries:

one is to select the CLASS_ID of ‘DESIGN_01’ and
another is to select the student details of the CLASS_ID retrieved in the first query.

The DBMS also does the same. It breaks the query into two as mentioned above.
Once it is broken, it evaluates the first query and stores it in the temporary table in the
memory.
This temporary table data will be then used to evaluate the second query.
Materialization
This is the example of two level queries in materialization method.
We can have any number of levels and so many numbers of temporary tables.
Although this method looks simple, the cost of this type of evaluation is always more.
It takes the time to evaluate and write into temporary table, then retrieve from this temporary
table and query to get the next level of result and so on.
Hence cost of evaluation in this method is:
Cost = cost of individual SELECT + cost of write into temporary table
Pipelining
It will process the query one after the other and each will use the result of previous query for
its processing.
In the example above, CLASS_ID of DESIGN_01 is passed to the STUDENT table to get the
student details.
In this method no extra cost of writing into temporary tables.
It has only cost of evaluation of individual queries; hence it has better performance than
materialization.
There are two types of pipelining:
Demand Driven or Lazy evaluation
Producer Driven or Eager Pipelining
Demand Driven or Lazy
evaluation
In this method, the result of lower level queries are not passed to the higher level
automatically.
It will be passed to higher level only when it is requested by the higher level.
In this method, it retains the result value and state with it and it will be transferred to the next
level only when it is requested.
In our example above, CLASS_ID for DESIGN_01 will be retrieved, but it will not be passed to
STUDENT query only when it is requested.
 Once it gets the request, it is passed to student query and that query will be processed.
Producer Driven or Eager
Pipelining
In this method, the lower level queries eagerly pass the results to higher level queries.
It does not wait for the higher level queries to request for the results.
In this method, lower level query creates a buffer to store the results and the higher level
queries pulls the results for its use.
If the buffer is full, then the lower level query waits for the higher level query to empty it.
Hence it is also called as PULL and PUSH pipelining.
There are still more methods of pipelining like Linear and non-linear methods of pipelining.
Equivalence of Expressions
The first step in selecting a query-processing strategy is to find a relational algebra expression
that is equivalent to the given query and is efficient to execute.
We'll use the following relations as examples:
 Customer(cname, street, ccity)
Deposit(bname, account#, name, balance)
Branch(bname, assets, bcity)
We will use instances customer, deposit and branch of these schemes.
Selection Operation
Consider the query to find the assets and branch-names of all banks who have depositors living
in Port Chester. In relational algebra, this is

This expression constructs a huge relation,

of which we are only interested in a few tuples.

We also are only interested in two attributes of this relation.
We can see that we only want tuples for which ccity = ``Port Chester''.
Thus we can rewrite our query as:

This should considerably reduce the size of the intermediate relation.

Selection Operation
Project operation
Like selection, projection reduces the size of relations.It is advantageous to apply projections
early.
Consider this form of our example query:

When we compute the subexpression

we obtain a relation whose scheme is

(cname, ccity, bname, account#, balance)
Project operation
We can eliminate several attributes from this scheme. The only ones we need to retain are
those that
appear in the result of the query or
are needed to process subsequent operations.

By eliminating unneeded attributes, we reduce the number of columns of the intermediate
result, and thus its size.
In our example, the only attribute we need is bname (to join with branch). So we can rewrite
our expression as:
Natural Join Operation
Another way to reduce the size of temporary results is to choose an optimal ordering of the
join operations.
Natural join is associative:

Although these expressions are equivalent, the costs of computing them may differ.Look again
at our expression
Natural Join Operation
The other part,

is probably a small relation (comparatively).

So, if we compute

first, we get a reasonably small relation.

It has one tuple for each account held by a resident of Port Chester.
This temporary relation is much smaller than
Natural Join Operation
Natural join is commutative:

Thus we could rewrite our relational algebra expression as:

Other operations
Some other equivalences for union and set difference:
Query Cost Estimation
Cost of query is the time taken by the query to hit the database and return the result.
It involves query processing time i.e.; time taken to parse and translate the query, optimize it,
evaluate, execute and return the result to the user is called cost of the query.
Though it is in fraction of seconds, it includes multiple sub tasks and time taken by each of
them.
Executing the optimized query involves hitting the primary and secondary memory based on
the file organization method.
Depending on file organization and the indexes used, time taken to retrieve the data may vary.
Query Cost Estimation
The cost estimation of a query evaluation plan is calculated in terms of various resources that
include:
Number of disk accesses
Execution time taken by the CPU to execute a query
Communication costs in distributed or parallel database systems.

Disk access time is the time taken by the processor to search and find the record in the
secondary memory and return the result.
This takes the majority of time while processing a query. Other times can be ignored compared
to disk I/O time.
Query Cost Estimation
While calculating the disk I/O time, usually only two factors are considered
seek time and
transfer time.

The seek time is the time taken the processor to find a single record in the disk memory and is
represented by tS.
For example, in order to find the student ID of a student ‘John’, the processor will fetch in the memory
based on the index and the file organization method.
The time taken by the processor to hit the disk block and search for his ID is called the seek time.

The time taken by the disk to return fetched result back to the processor / user is called
transfer time and is represented by tT.
Query Cost Estimation
Suppose a query need to seek S times to fetch a record and there is B blocks needs to be
returned to the user.
Then the disk I/O cost is calculated as below (S* tS)+ (B* tT)
If tT=0.1 ms, tS =4 ms, the block size is 4 KB, and its transfer rate is 40 MB per second. With this,
we can easily calculate the estimated cost of the given query evaluation plan.
Example:
Given:
tS=4 ms (seek time)
tT=0.1 ms (transfer time)
Block size = 4 KB
Transfer rate = 40 MB per second
Query Cost Estimation
Query Cost Estimation
Query Optimization
The query optimizer (also known as the optimizer) is database software that identifies the most
efficient way (like by reducing time) for a SQL statement to access data.
The process of selecting an efficient execution plan for processing a query is known as query
optimization.
Query optimization is used to access and modify the database in the most efficient way
possible.
It is the art of obtaining necessary information in a predictable, reliable, and timely manner.
Query optimization is formally described as the process of transforming a query into an
equivalent form that may be evaluated more efficiently.
The goal of query optimization is to find an execution plan that reduces the time required to
process a query.
We must complete two major tasks to attain this optimization target.
The first is to determine the optimal plan to access the database, and
the second is to reduce the time required to execute the query plan.
Methods of query optimization
Cost based Optimization (Physical)
This is based on the cost of the query.
The query can use different paths based on indexes, constraints, sorting methods etc.
This method mainly uses the statistics like record size, number of records, number of records
per block, number of blocks, table size, whether whole table fits in a block, organization of
tables, uniqueness of column values, size of columns etc.
Methods of query optimization
Heuristic Optimization (Logical)
This method is also known as rule based optimization.
This is based on the equivalence rule on relational expressions; hence the number of
combination of queries get reduces here.
Hence the cost of the query too reduces.
This method creates relational tree for the given query based on the equivalence rules.
These equivalence rules by providing an alternative way of writing and evaluating the query,
gives the better path to evaluate the query.
This rule need not be true in all cases. It needs to be examined after applying those rules.
Methods of query optimization
Heuristic Optimization (Logical) (Contd…)
The most important set of rules followed in this method is listed below:
Perform all the selection operation as early as possible in the query.
This should be first and foremost set of actions on the tables in the query.
By performing the selection operation, we can reduce the number of records involved in the query,
rather than using the whole tables throughout the query.
Suppose we have a query to retrieve the students with age 18 and studying in class DESIGN_01.
We can get all the student details from STUDENT table, and class details from CLASS table.

Reference: https://www.recw.ac.in/v1.8/wp-content/uploads/2021/03/DBMS-Unit-4.pdf

AWS Certified Cloud Practitioner Exam (CLF-C02)
100% (1)
AWS Certified Cloud Practitioner Exam (CLF-C02)
13 pages
Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
Unit-5 Query Processing and Optimization
No ratings yet
Unit-5 Query Processing and Optimization
40 pages
Ivunit Query Processing
No ratings yet
Ivunit Query Processing
12 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
23 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
9 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Chapter 2 Adb
No ratings yet
Chapter 2 Adb
21 pages
Unit 3
No ratings yet
Unit 3
24 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
26 pages
CO3-Notes-Query Processing and Optimization
No ratings yet
CO3-Notes-Query Processing and Optimization
5 pages
Sudhansu, DBMS 3rd
No ratings yet
Sudhansu, DBMS 3rd
6 pages
QUERY Processing and Relational Algebra
No ratings yet
QUERY Processing and Relational Algebra
27 pages
Uds24201j Unit III
No ratings yet
Uds24201j Unit III
34 pages
Query Processing
No ratings yet
Query Processing
4 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
42 pages
Unit 10 PL-SQL Query Processing & Query Optimization
No ratings yet
Unit 10 PL-SQL Query Processing & Query Optimization
8 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
28 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
CH - 2 Query Process
No ratings yet
CH - 2 Query Process
44 pages
ADBMS Chapter 1
No ratings yet
ADBMS Chapter 1
47 pages
Chapter 2 Query Processing
No ratings yet
Chapter 2 Query Processing
21 pages
Query Optimization
No ratings yet
Query Optimization
60 pages
Rdbms Assignment
No ratings yet
Rdbms Assignment
12 pages
AMSAL
No ratings yet
AMSAL
58 pages
ADBMS Chapter One
No ratings yet
ADBMS Chapter One
21 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
129 pages
CH 02
No ratings yet
CH 02
127 pages
CH - 1 Query Process SW
No ratings yet
CH - 1 Query Process SW
43 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
Presentations PPT Unit-5 25042019031434AM
No ratings yet
Presentations PPT Unit-5 25042019031434AM
38 pages
Query Evalution
No ratings yet
Query Evalution
27 pages
UNIT 4 Query Processing and Different Types of Databases
No ratings yet
UNIT 4 Query Processing and Different Types of Databases
13 pages
What Is Query: Lecture's Name: Amanj Anwar Abdullah
No ratings yet
What Is Query: Lecture's Name: Amanj Anwar Abdullah
6 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
ADB Chapter 2
No ratings yet
ADB Chapter 2
40 pages
Chapter One1
No ratings yet
Chapter One1
21 pages
Query Processing
No ratings yet
Query Processing
3 pages
Query Processing 16 Oct
No ratings yet
Query Processing 16 Oct
12 pages
Query Processing in DBMS
No ratings yet
Query Processing in DBMS
22 pages
4.query Processing and Optimization
No ratings yet
4.query Processing and Optimization
5 pages
ch2 PDF
No ratings yet
ch2 PDF
72 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
42 pages
2 Algorithms For Query Processing Optimization
No ratings yet
2 Algorithms For Query Processing Optimization
46 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Advanced Database Systems Chapter 2
100% (1)
Advanced Database Systems Chapter 2
16 pages
Chapter Two Query Processing
No ratings yet
Chapter Two Query Processing
60 pages
Adb ch2
No ratings yet
Adb ch2
72 pages
KD Query Processing1
No ratings yet
KD Query Processing1
32 pages
1 Intro Select Project
No ratings yet
1 Intro Select Project
28 pages
Query Processing
No ratings yet
Query Processing
20 pages
Module - 4
No ratings yet
Module - 4
60 pages
Query Processing
No ratings yet
Query Processing
5 pages
Advanced Database
No ratings yet
Advanced Database
47 pages
Chapter 2 Querry Proccessing
No ratings yet
Chapter 2 Querry Proccessing
7 pages
Advancedchapter 2 2013
No ratings yet
Advancedchapter 2 2013
16 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
35 Database Examples: A Database Reference Book For Anyone
From Everand
35 Database Examples: A Database Reference Book For Anyone
Mark Hayford
5/5 (1)
Unit 7
No ratings yet
Unit 7
46 pages
Unit 1
No ratings yet
Unit 1
25 pages
Unit 2
No ratings yet
Unit 2
69 pages
Unit 5
No ratings yet
Unit 5
18 pages
Himanshu
No ratings yet
Himanshu
2 pages
Introduction To ADAMS: Course Manual For Multibody Dynamics A, wb1310
No ratings yet
Introduction To ADAMS: Course Manual For Multibody Dynamics A, wb1310
49 pages
Multimedia - Is The Field Concerned With The Computer-Controlled Integration of Text, Graphics, Drawings
No ratings yet
Multimedia - Is The Field Concerned With The Computer-Controlled Integration of Text, Graphics, Drawings
9 pages
18BIT010 Final Project Documentation
No ratings yet
18BIT010 Final Project Documentation
44 pages
Unit 5
No ratings yet
Unit 5
102 pages
CV MII New Nama Lengkap
No ratings yet
CV MII New Nama Lengkap
3 pages
SAP Performance Tuning
No ratings yet
SAP Performance Tuning
36 pages
Documentation
No ratings yet
Documentation
3,516 pages
Oracle® Ireceivables: Implementation Guide Release 12
100% (1)
Oracle® Ireceivables: Implementation Guide Release 12
76 pages
Your Organization-S Guide To Data Maturity 2
No ratings yet
Your Organization-S Guide To Data Maturity 2
21 pages
Informatica TDM Resume
No ratings yet
Informatica TDM Resume
16 pages
DBMS Lab Report 1
No ratings yet
DBMS Lab Report 1
8 pages
Ims DB Courseware Mainframes Online Training
No ratings yet
Ims DB Courseware Mainframes Online Training
129 pages
A Generic Framework For Rule-Based Classification
No ratings yet
A Generic Framework For Rule-Based Classification
18 pages
DateSheet III Mid Term - 241007 - 164105
No ratings yet
DateSheet III Mid Term - 241007 - 164105
1 page
Intelligent Design of Industrial Steel Buildings - A BIM Approach
No ratings yet
Intelligent Design of Industrial Steel Buildings - A BIM Approach
8 pages
7 Sec 090 Popov Et Al Application
No ratings yet
7 Sec 090 Popov Et Al Application
9 pages
db2 Built-In Routines and Views 115
No ratings yet
db2 Built-In Routines and Views 115
1,670 pages
The Future of Quality and Accreditation Surveys
No ratings yet
The Future of Quality and Accreditation Surveys
3 pages
FC 75 Sp16 Fixed Issues en
No ratings yet
FC 75 Sp16 Fixed Issues en
47 pages
IWFM GPG BIM Data For FM Systems Screen DPS
No ratings yet
IWFM GPG BIM Data For FM Systems Screen DPS
21 pages
UNIT-1 Data Warehousing Part-III
No ratings yet
UNIT-1 Data Warehousing Part-III
68 pages
UNIT-1 Introduction To Software Architecture
No ratings yet
UNIT-1 Introduction To Software Architecture
34 pages
Video Library Management System Project
No ratings yet
Video Library Management System Project
2 pages
Migrating Oracle E-Business Suite On AWS
No ratings yet
Migrating Oracle E-Business Suite On AWS
26 pages
Memory Structure of Oracle
No ratings yet
Memory Structure of Oracle
15 pages
Creating A Proof of Concept
No ratings yet
Creating A Proof of Concept
3 pages
3bus094075r0301 Advant Ocs
No ratings yet
3bus094075r0301 Advant Ocs
400 pages
CS403 Midterm Solved MCQS by Junaid Malik
No ratings yet
CS403 Midterm Solved MCQS by Junaid Malik
52 pages

Unit 6

Uploaded by

Unit 6

Uploaded by

Unit-6

Here we can observe two queries:

This expression constructs a huge relation,

of which we are only interested in a few tuples.

This should considerably reduce the size of the intermediate relation.

When we compute the subexpression

we obtain a relation whose scheme is

is probably a small relation (comparatively).

first, we get a reasonably small relation.

Thus we could rewrite our relational algebra expression as:

You might also like