0% found this document useful (0 votes)
5 views

Advanced Database System Chapter Two Query Processing and Optimization

Uploaded by

Fedasa Bote
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Advanced Database System Chapter Two Query Processing and Optimization

Uploaded by

Fedasa Bote
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Advanced Database

System
Chapter Two
Query processing
and Optimization

2
Parsing checks the
query syntax to
Query processing and
scanner whether
determine
itidentifies
is formulatedthe
Optimization
validate
that all
checking
query tokens—
according to the
attribute
such rules
syntax as (rules
SQL
and relation
keywords,
of grammar) of the
names
attributearenames,
valid
query language.
andandsemantically
relation
meaningful
names—that names
inappear
the schema in theof
thetextparticular
of the query
database being
queried.

3
Query processing
 What is Query Processing?
• Steps required to transform high level SQL query into a
correct and “efficient” strategy for execution and
retrieval.
• Processing can be divided into : Decomposition,
Optimization, Execution, and Code generation
1. Query Decomposition
• It is the process of transforming a high level query
into a relational algebra query, and to check that
the query is syntactically and semantically correct.
It Consists of parsing and validation 5
Typical
Typical stages
stages in
in query
query decomposition
decomposition are:
are:

i. Analysis: lexical and syntactical analysis of the


query(correctness) based on attributes, data type.. ,. Query
tree will be built for the query containing leaf node for base
relations, one or many non-leaf nodes for relations produced
by relational algebra operations and root node for the result
of the query. Sequence of operation is from the leaves to the
root.
(SELECT * FROM Catalog c ,Author a Where a.authorid =
c.authorid AND c.price>200 AND a.country= ‘ USA’ )
ii. Normalization: convert the query into a normalized
form. The predicate WHERE will be converted to
Conjunctive (∨) or Disjunctive (∧) Normal form.
6
iii. Semantic Analysis: to reject normalized queries that
are not correctly formulated or contradictory. Incorrect
if components do not contribute to generate result.
Contradictory if the predicate can not be satisfied by any
tuple. Say for example,(Catalog =“BS”  Catalog=
“CS”) since a given book can only be classified in either
of the category at a time
iv. Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform the
query to a semantically equivalent but more easily and
effectively computed form. For example, If a user don’t
have the necessary access to all of the objects of the
query , it should be rejected.
7
2. Query Optimization
What is Query Optimization?

– The activity of choosing a single “efficient” execution


strategy (from hundreds) as determined by database
catalog statistics.
– Which relational algebra expression, equivalent to the
given query, will lead to the most efficient solution
plan?
– For each algebraic operator, what algorithm (of several
available) do we use to compute that operator?
– How do operations pass data (main memory buffer,
8
disk buffer,…)?
 Everyone wants the performance of their database to be optimal. In
particular, there is often a requirement for a specific query or object that is
query based, to run faster.
 Problem of query optimization is to find the sequence of steps that
produces the answer to user request in the most efficient manner, given
the database structure.
 The performance of a query is affected by the tables or queries that
underlies the query and by the complexity of the query.
 Given a request for data manipulation or retrieval, an optimizer will choose
an optimal plan for evaluating the request from among the manifold
alternative strategies. i.e. there are many ways (access paths) for accessing
desired file/record.
 hence ,DBMS is responsible to pick the best execution strategy based on
9
various considerations( Least amount of I/O and CPU resources. )
…continued

• A query typically has many possible execution


strategies, and the process of choosing a suitable
one for processing a query is known as query
optimization.
• Is not the optimal (or absolute best) strategy—it is
just a reasonably efficient strategy for executing
the query.

1
0
…continued
• There are two main techniques that are employed
during query optimization.
• The first technique is based on heuristic rules for
ordering the operations in a query execution strategy. A
heuristic is a rule that works well in most cases but is
not guaranteed to work well in every case. The rules
typically reorder the operations in a query tree.
• The second technique involves systematically
estimating the cost of different execution strategies and
choosing the execution plan with the lowest cost
estimate. These techniques are usually combined in a 1
1
query optimizer.
…continued
 Example: Consider relations r(AB) and s(CD). We
require r X s.
 Method 1 :
a. Load next record of r in RAM.
b. Load all records of s, one at a time and
concatenate with r.
c. All records of r concatenated?
 NO: goto a.
 YES: exit (the result in RAM or on disk).
 Performance: Too many accesses.

12
…continued
 Method 2: Improvement
a. Load as many blocks of r as possible leaving
room for one block of s.
b. Run through the s file completely one block
at a time.
 Performance: Reduces the number of times s blocks are
loaded by a factor of equal to the number of r records than
can fit in main memory.
 Considerations during query Optimization:
– Narrow down intermediate result sets
quickly. SELECT and PROJECTION before
JOIN
1
– Use access structures (indexes). 3
Using Heuristics in Query Optimization

• In practice, SQL is the query language that is


used in most commercial RDBMSs. An SQL
query is first translated into an equivalent
extended relational algebra expression-
represented as a query tree data structure-
that is then optimized.
• Typically, SQL queries are decomposed into
query blocks, which form the basic units that
can be translated into the algebraic operators
and optimized.
1
5
Transformation rule for relational
algebra with example
2. Commutativity of
1. Cascade of SELECTION
SELECTION
Rule: Multiple SELECTION operations
Rule: The order of SELECTION
can be combined into a single
operations can be interchanged
SELECTION operation.
without affecting the result.

Example: Example:
 Initial Query:  Initial Query:

 Equivalent Query:
 Optimized Query:

Explanation: Instead of first Explanation: Whether you first


selecting employees with a salary select employees older than 30
greater than 50,000 and then or those in the HR department,
selecting those older than 30, you the final result will be the same.
can combine these conditions into
Transformation rule for relational
algebra with example….
4. Commutativity of SELECTION
3. Cascade of PROJECTION
with PROJECTION
Rule: In a sequence of
Rule: SELECTION and PROJECTION
PROJECTION operations, only
operations can be interchanged if
the last one is necessary.
the SELECTION predicate involves
only the attributes in the
Example: PROJECTION list.
 Initial Query:
Example:
  Initial Query:
Optimized Query:

 Equivalent Query:
Explanation: If you first project
the attributes name, age, and
salary, and then project only
Explanation: If you first project the
name and age, you can directly
attributes name and age and then
project name and age from the
select employees older than 30, or
start.
if you first select employees older
Transformation rule for relational
algebra with example….
5. Commutativity of THETA JOIN/Cartesian
Product
Rule: The THETA JOIN (⨝) and Cartesian Product
(×) operations are commutative, meaning the
order of the relations can be swapped without
affecting the result.

Example:
 Initial Query:
R×S
 Equivalent Query:
S×R
Explanation: Whether you join R with S or S with
R, the result will be the same set of tuples.
Transformation rule for relational
algebra with example….
6. Commutativity of SELECTION Case b: SELECTION
with THETA JOIN Predicate Involves
Rule: If the SELECTION predicate Attributes of Both
involves only attributes of one of Relations
the relations being joined, the
SELECTION and JOIN operations can Example:
be interchanged.
 Initial Query:

Case a: SELECTION Predicate


 Equivalent Query:
Involves Only Attributes of One
Relation
Example: Explanation: If c1 involves
 Initial Query: only attributes of R and c2
involves only attributes of S,
you can first select the tuples
 Equivalent Query: from R that satisfy c1 and the
Explanation: If the predicate c1 tuples from S that satisfy c2,
involves only attributes of R, you and then join the results.
Transformation rule for relational
algebra with example….
7. Commutativity of PROJECTION and THETA JOIN
Rule: If the projection list is of the form
L1, L2, where L1 involves only attributes of R and L2
involves only attributes of S being joined, and the predicate
θ involves only attributes in the projection list, then:

Example:
 Initial Query:

 Optimized Query:

Explanation: Instead of projecting the attributes after the


join, you can project the relevant attributes from each
relation before performing the join.
Transformation rule for relational
algebra with example….
8. Commutativity of the Set 9. Associativity of the THETA
Operations: UNION and JOIN, CARTESIAN PRODUCT,
INTERSECTION but not SET UNION, and INTERSECTION
DIFFERENCE Rule: These operations are
Rule: UNION and INTERSECTION associative.
operations are commutative, but
SET DIFFERENCE is not.
Explanation: The order in which
you perform the JOIN, CARTESIAN
PRODUCT, UNION, and
INTERSECTION does not affect the
final result.
Example:
 Initial Query:

 Optimized Query:

Explanation: The order of


Transformation rule for relational
algebra with example….

10. Commuting SELECTION with SET OPERATIONS


Rule: SELECTION operations can commute with UNION and
INTERSECTION.

Example:

Explanation: Instead of applying the SELECTION after the


UNION, you can apply the SELECTION to each relation before
performing the UNION.
Transformation rule for relational
algebra with example….
11. Commuting PROJECTION with UNION
Rule: PROJECTION operations can commute with UNION.

Example:

Explanation: Instead of projecting the attributes after the


UNION, you can project the relevant attributes from each
relation before performing the UNION.
24
Using Heuristics
Heuristic optimization in query processing
involves using rule-based techniques to
transform a query into a more efficient form.
Here’s a detailed explanation of the process:

Process for heuristics optimization


1.Initial Internal Representation:
 When a high-level query (like SQL)
is submitted, the parser translates
it into an initial internal
representation, often in the form of
a relational algebra tree. This tree
represents the logical steps
needed to execute the query.
Using Heuristics…

2. Applying Heuristic Rules:


o Heuristic rules are applied to this internal
representation to optimize it. These rules
are based on general principles that
typically lead to more efficient query
execution. Some common heuristic rules
include:
 Selection Pushdown: Moving selection
operations as close to the base relations
as possible to reduce the size of
intermediate results.
 Projection Pushdown: Moving
projection operations down the query
tree to eliminate unnecessary columns
early.
Using Heuristics…

3. Generating a Query Execution


Plan:
 After applying heuristic rules, the
optimized internal representation is used
to generate a query execution plan.
This plan outlines the specific steps and
methods the DBMS will use to execute
the query.
 The execution plan considers the access
paths available, such as indexes and
sequential scans, to determine the most
efficient way to retrieve and process the
data.
 The plan may include operations like
index scans, nested loop joins, hash
Using Heuristics…

 The main heuristic is to apply first the operations that reduce


the size of intermediate results.
– E.g. Apply SELECT and PROJECT operations
before applying the JOIN or other binary operations.

Intermediate results in the context of


database query processing are the temporary
data sets produced during the execution of a
query before arriving at the final result.
Intermediate results are not stored permanently
in the database. They exist only for the duration
of the query execution and are discarded once Sli
the final result is produced. de
15-
28
…continued
• Heuristics Approach uses the knowledge of the
characteristics of the relational algebra operations and
the relationship between the operators to optimize the
query.
• Thus the heuristic approach of optimization will make
use of:
– Properties of individual operators
– Association between operators
– Query Tree: a graphical representation of the operators,
relations, attributes and predicates and processing
sequence during query processing.
• It is composed of three main parts: 2
9
– Sequence of execution of operation in a query tree will
…continued

 Query block: The basic unit that can be translated


into the algebraic operators and optimized.
 A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
 Nested queries within a query are identified as
separate query blocks.
 There are two types of nested queries: 3
0
Uncorrelated Nested Queries

Uncorrelated nested queries could be


performed separately and their results will be
used in outer query.

SELECT name
FROM employees
WHERE department_id IN (SELECT department_id
FROM departments WHERE location = 'New York’);

In this example, the inner query (SELECT


department_id FROM departments WHERE location
= 'New York') is executed first, and its result is used
by the outer query to filter employees.
Correlated Nested Queries
• Correlated nested queries need
information (tuple variable) from outer
query in their execution.

SELECT name
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM
employees WHERE department_id =
e.department_id);

In this example, the inner query (SELECT


AVG(salary) FROM employees WHERE
department_id = e.department_id) depends on the
department_id of each row in the outer query.
Therefore, the inner query is executed for each
employee to compare their salary with the average
• Query tree:
– A tree data structure that corresponds to a relational
algebra expression. It represents the input relations
of the query as leaf nodes of the tree, and represents
the relational algebra operations as internal nodes.
– Leafs: the base relations used for processing

the query/ extracting the required information


– Root: the final result/relation as an out put
based on the operation on the relations used
for query processing
– Nodes: intermediate results or relations
before reaching the final result.
• An execution of the query tree consists of executing an
internal node operation whenever its operands are Sli
de

available and then replacing that internal node by the 15-


33
Query graph

• A query graph is a visual representation used


in database theory to illustrate a relational
calculus expression. Here’s a breakdown of the
key points:
 Graph Data Structure: The query graph is a
type of graph that visually represents the
relationships and constraints of a query.
 Relational Calculus Expression: It
corresponds to a relational calculus expression,
which is a non-procedural query language used
to specify what data to retrieve rather than how
to retrieve it.
 No Operation Order: The graph does not
3
specify the order in which operations should be 4
performed. It simply shows the relationships and
…continued
 Example:
• For every project located in ‘Stafford’, retrieve the project number,
the controlling department number and the department manager’s last
name, address and birthdate.
 Relation algebra:

πPNUMBER, DNUM, LNAME, ADDRESS, BDATE (((σPLOCATION=‘STAFFORD’(PROJECT))


DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
 SQL query:

SELECT P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS,
E.BDATE FROM PROJECT AS P,DEPARTMENT AS D,
EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’;
3
5
Sli
de
15-
36
Sli
de
15-
37
…cont
Step 1. Perform Selection operation as early
as possible : By using selection operation at
early stages, you can reduce the unwanted
number of record or data, to transfer from
database to primary memory. Optimizer use
transformation rule 1 to divide selection
operations with conjunctive conditions into a
cascade of selection operations.
… cont

Step 2. Perform commutativity of selection


operation with other operations as early as
possible : Optimizer use transformation rule 2,
4, 6, and 9 to move selection operation as far
down the tree as possible and keep selection
predicates on the same relation together. By
keeping selection operation down at tree
reduces the unwanted data transfer and by
keeping selection predicates together on same
relations reduces the number of times of
database manipulation to retrieve records from
same database table.
… cont

Step 3. Combine the Cartesian Product with


subsequent selection operation whose predicates
represents a join condition into a JOIN operation :
Optimizer uses transformation rule 13 to convert a
selection and cartesian product sequence into join. It
reduces data transfer. It is always better to transfer
only required data from database instead of
transferring whole data and then refine it. (Cartesian
product combines all data of all the tables mention in
query while join operation retrieves only those records
from database that satisfy the join condition).
Step 4. Use Commutativity and Associativity of
Binary operations : Optimizer use transformation rules
5, 11, and 12 to execute the most restrictive
selection operations first.
Step 5. Perform projection operations as early as
possible : After performing selection operations,
optimizer use transformation rules 3, 4, 7 and 10 to
reduce the number of columns of a relation by
moving projection operations as far down the tree as
possible and keeping projection predicates on the
same relation together.
Step 6. Compute common expressions only once: It
is used to identify sub-trees that represent groups of
operations that can be executed by a single
algorithm.
• Heuristic Optimization of Query Trees:
– The same query could correspond to many
different relational algebra expressions — and
hence many different query trees.
– The task of heuristic optimization of query trees
is to find a final query tree that is efficient to
execute.
• Example:
Q2: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND Sli

PNMUBER=PNO AND ESSN=SSN AND BDATE de


15-
42
(a) Initial (canonical)
query tree for SQL
query Q.
Executing this tree directly
first creates a very large file
containing the CARTESIAN
PRODUCT of the entire
(b) Moving SELECT
EMPLOYEE, WORKS_ON,
and the
operations down PROJECT files.
query tree.
an improved query tree that
first applies the SELECT (c) Applying the more
operations to reduce the restrictive SELECT
number of tuples that appear inoperation first.
the CARTESIAN PRODUCT. A further improvement is achieved
by switching the positions of the
EMPLOYEE and PROJECT
relations in the tree, as shown in
(c).This uses the information that
Pnumber is a key attribute of theSli
PROJECT relation, and hence thede
SELECT operation on the 15-
PROJECT relation will retrieve a43
(d) Replacing CARTESIAN
PRODUCT and SELECT
with JOIN operations.
We can further improve the
query tree by replacing any
CARTESIAN PRODUCT
operation that is followed by a
join condition with a JOIN
operation
(e) Moving PROJECT
operations down the query
tree.
Another improvement is to keep
only the attributes needed by
subsequent operations in the
intermediate relations, by
including PROJECT (π) operations
as early as possible in the query Sli
de
tree, as shown in (e). This reduces 15-
the attributes (columns) of the 44
Summary of Heuristics for Algebraic Optimization:

1. The main heuristic is to apply first the operations that reduce the size
of intermediate results.

2. Perform select operations as early as possible to reduce the number of


tuples and perform project operations as early as possible to reduce
the number of attributes. (This is done by moving select and
project operations as far down the tree as possible.)
3. The select and join operations that are most restrictive should be
executed before other similar operations. (This is done by reordering
the leaf nodes of the tree among themselves and adjusting the rest of
the tree appropriately.)

Slide 15-
45
B. Cost Estimation Approach to Query Optimization
• The main idea is to minimize he cost of processing a query. The cost
function is comprised of:
• I/O cost + CPU processing cost + communication cost + Storage cost
• These components might have different weights in different
processing environments
• The DBMs will use information stored in the system catalogue for the
purpose of estimating cost.
• The main target of query optimization is to minimize the size of the
intermediate relation. The size will have effect in the cost of:
• Disk Access
• Data Transportation
• Storage space in the Primary Memory
• Writing on Disk

46
• Cost-based query optimization:
• Estimate and compare the costs of executing a
query using different execution strategies and
choose the strategy with the lowest cost estimate.
(Compare to heuristic query optimization)
• Issues
• Cost function
• Number of execution strategies to be considered
• Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
Sli
de
15-
47
1. Access Cost of Secondary Storage
• Data is going to be accessed from secondary storage, as a query will
be needing some part of the data stored in the database. The disk
access cost can again be analyzed in terms of:
– Searching
– Reading, and
– Writing, data blocks used to store some portion of a
relation.
• Remark: The disk access cost will vary depending on
– The file organization used and the access method
implemented for the file organization.
– whether the data is stored contiguously or in
scattered manner, will affect the disk access cost.
48
…continued

2. Storage Cost
• While processing a query, as any query would be
composed of many database operations, there could
be one or more intermediate results before reaching
the final output. These intermediate results should be
stored in primary memory for further processing. The
bigger the intermediate relation, the larger the
memory requirement, which will have impact on the 4
9
limited available space. This will be considered as a
3. Query Execution Plans

– An execution plan for a relational algebra


query consists of a combination of the
relational algebra query tree and
information about the access methods to be
used for each relation as well as the
methods to be used in computing the Sli
de
15-
50
4. Computation Cost
• Query is composed of many operations. The operations could be database
operations like reading and writing to a disk, or mathematical and other
operations like:
• Searching
• Sorting
• Merging
• Computation on field values
5. Communication Cost
• In most database systems the database resides in one
station and various queries originate from different
terminals. This will have impact on the performance
of the system adding cost for query processing. Thus,
the cost of transporting data between the database site
51
and the terminal from where the query originate
should be analyzed.

You might also like