0% found this document useful (0 votes)
4 views

Chapter 1 Query processing and Optimization

query processining and optimization

Uploaded by

mtaddis19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Chapter 1 Query processing and Optimization

query processining and optimization

Uploaded by

mtaddis19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Advanced Database system 2019

Chapter 1
Query Processing and Optimization
Introduction
 In this chapter we shall discuss the techniques used by a DBMS to
process, Optimize and execute high level queries.
 The techniques used to split complex queries into multiple simple
operations and methods of implementing these low-level
operations.
 The query optimization techniques are used to choose an efficient
execution plan that will minimize the runtime as well as many
other types of resources such as number of disk I/O, CPU time and so
on.
What is Query Processing?
 The procedure of transforming high level SQL query into a correct
and efficient execution plan expressed in low-level language.
 When a database system receives a query for update or retrieval of
information, it goes through a series of compilation steps, called
execution plan.
 It goes through various phases.
1. First phase is called syntax checking phase: -the system
parses the query and checks that it follows the syntax rules or
not. It then matches the objects in the query syntax with the
view tables and columns listed in the system table. This phase is
divided into three: -Scanning, Parsing, Validating
A. Scanner: The scanner identifies the language tokens such as
SQL Keywords, attribute names, and relation names in the
text of the query.

AUWC, School of Technology and Informatics Page 1


Advanced Database system 2019

B. Parser: The parser checks the query syntax to determine


whether it is formulated according to the syntax rules of the
query language.
C. Validation: The query must be validated by checking that all
attributes and relation names are valid and semantically
meaningful names in the schema of the particular database
being queried.

2. In second phase the SQL query is translated in to an


algebraic expression using various rules. So that the
process of transforming a high level SQL query into a
relational algebraic form is called Query Decomposition.
The relational algebraic expression now passes to the query
optimizer.
3. In third phase Optimization is performed by substituting
equivalent expression depends on the factors such that the
existence of certain database structures, whether or not a given
file is stored, the presence of different indexes and so on. Query
optimization module work in cycle with the join manager module
to improve the order in which joins are performed.
 At this stage the cost model and several other
estimation formulas are used to rewrite the query.
 The modified query is written to utilize system resources so
as to bring the optimal performance.
 The query optimizer then generates an action plan also
called execution plan. This action plans are converted
into a query codes that are finally executed by a run time
database processor.
 Query Code Generator: It generates the code to execute
the plan.

AUWC, School of Technology and Informatics Page 2


Advanced Database system 2019

 The run time database processor estimate the cost of


each action plan and chose the optimal one for the
execution. It has the task of running the query code
whether in compiled or interpreted mode. If a runtime error
results an error message is generated by the runtime
database processor.

Figure 1: -Steps in Processing High-Level Query

What is the aim of query processing?


 To transform a query written in a high-level language, typically SQL,
into a correct and efficient execution strategy expressed in a low-level
language (implementing the relational algebra), and to execute the
strategy to retrieve the required data.
Query Analyzer

AUWC, School of Technology and Informatics Page 3


Advanced Database system 2019

 The syntax analyzer takes the query from the users, parses it
into tokens and analyses the tokens (symbols) and their order
to make sure they follow the rules of language grammar.
 If the error is found in the query submitted by the user, it is rejected
and an error code together with an explanation of why the query was
rejected is return to the user.
Query Decomposition
 In query decomposition the query processing aims are to transfer the
high-level query into a relational algebra query and to check
whether that query is syntactically or semantically correct.
Thus the query decomposition is start with a high level query and
transform into query graph of low-level operations, which satisfy the
query.
 The SQL query is decomposed into query blocks (low-level
operations), which form the basic unit. Hence nested queries within a
query are identified as separate query blocks.
 The query decomposer goes through five stages of processing for
decomposition into low-level operation and translation into algebraic
expressions.

AUWC, School of Technology and Informatics Page 4


Advanced Database system 2019

Query Analysis

 During the query analysis phase, the query is syntactically analyzed


using the programming language compiler (parser). A syntactically
legal query is then validated, using the system catalog, to ensure
that all data objects (relations and attributes) referred to by the
query are defined in the database.
 The type specification of the query qualifiers and result is also checked
at this stage.
 Example: -SELECT emp_nm FROM EMPLOYEE WHERE
emp_desg>100
 This query will be rejected because the comparison ">100" is
incompatible with the data type of emp_desg which is a
variable character string.
 At the end of query analysis phase, the high-level query (SQL) is
transformed into some internal representation that is more

AUWC, School of Technology and Informatics Page 5


Advanced Database system 2019

suitable for processing. This internal representation is typically a


kind of query tree.
 A Query Tree is a tree data structure that corresponds expression.
 A Query Tree is also called a relational algebra tree.
 Leaf node of the tree, representing the base input relations
of the query.
 Internal nodes result of applying an operation in the
algebra.
 Root of the tree representing a result of the query.
SELECT (P.proj_no, P.dept_no, E.name, E.add, E.dob)
FROM PROJECT P, DEPARTMENT D, EMPLOYEE E
WHERE P.dept_no = D.d_no AND D.mgr_id = E.emp_id AND
P.proj_loc = `Mumbai) ;

 The three relations PROJECT, DEPARTMENT, EMPLOYEE are


represent as a leaf nodes P, D and E, while the relational algebra
operations of the represented by internal tree nodes.
 Same SQL query can have many different relational algebra
expressions and hence many different query trees.
 The query parser typically generates a standard initial (canonical)
query tree.

AUWC, School of Technology and Informatics Page 6


Advanced Database system 2019

Query Normalization

 The primary phase of the normalization is to avoid redundancy. The


normalization phase converts the query into a normalized form
that can be more easily manipulated.
 In the normalization phase, a set of equivalency rules are applied so
that the projection and selection operations included on the query
are simplified to avoid redundancy.
 The projection operation corresponds to the SELECT clause of SQL
query and the selection operation correspond to the predicate
found in WHERE clause.
 The equivalency transformation rules are applied.

Semantic Analyzer

 The objective of this phase of query processing is to reduce the


number of predicates. The semantic analyzer rejects the
normalized queries that are incorrectly formulated.
 A query is incorrectly formulated if components do not contribute to
the generation of result. This happens in case of missing join
specification. A query is contradictory if its predicate cannot satisfy
by any tuple in the relation.
 The semantic analyzer examine the relational calculus query (SQL) to
make sure it contains only data objects that is table, columns, views,
indexes that are defined in the database catalog. It makes sure that
each object in the query is referenced correctly according to its data
type.
 In case of missing join specifications the components do not contribute
to the generation of the results, and thus, a query may be incorrect
formulated.

AUWC, School of Technology and Informatics Page 7


Advanced Database system 2019

Query Simplifier

 The objectives of this phase are: -


 To detect redundant qualification,
 To eliminate common sub-expressions and
 To transform sub-graph too semantically equivalent but
easier and efficiently computed form.
 Why to simplify?
 Commonly integrity constraints, view definitions and access
restrictions are introduced into the graph at this stage of analysis
so that the query must be simplified as much as possible.
 Integrity constraints defines constants which must holds for all
state of database, so any query that contradict an integrity
constraints must be avoid and can be rejected without accessing
the database.

Query Restructuring

 In the final stage of the query decomposition, the query can be


restructured to give a more efficient implementation. Transformation
rules are used to convert one relational algebra expression into an
equivalent form that is more efficient.
 The query can now be regarded as a relational algebra program,
consisting of a series of operations on relation.

Query Optimization

 The primary goal of query optimization is of choosing an efficient


execution strategy for processing a query. The query optimizer
attempts to minimize the use of certain resources (mainly the number
of I/O and CPU time) by selecting a best execution plan (access
plan).

AUWC, School of Technology and Informatics Page 8


Advanced Database system 2019

 A query optimization start during the validation phase by the system to


validate the user has appropriate privileges. Now an action plan is
generate to perform the query.

 Relational algebra query tree generated by the query simplifier module


of query decomposer.
 Estimation formulas used to determine the cardinality of the
intermediate result table.
 A cost Model.
 Statistical data from the database catalogue.
 The output of the query optimizer is the execution plan in form of
optimized relational algebra query.
 A query typically has many possible execution strategies, and the
process of choosing a suitable one for processing a query is known as
Query Optimization.

The basic issues in Query Optimization

 How to use available indexes?

AUWC, School of Technology and Informatics Page 9


Advanced Database system 2019

 How to use memory to accumulate information and perform immediate


steps such as sorting?
 How to determine the order in which joins should be performed?

Objective of query optimization

 The term query optimization does not mean giving always an optimal
(best) strategy as the execution plan. It is just a responsibly efficient
strategy for execution of the query.
 The decomposed query block of SQL is translating into an equivalent
extended relational algebra expression and then optimized.

Techniques for Query Optimization

1. The first technique is based on Heuristic Rules for ordering


the operations in a query execution strategy.
2. The second technique involves the systematic estimation
of the cost of the different execution strategies and
choosing the execution plan with the lowest cost.
3. The third technique is Semantic query optimization: - it is
used with the combination of the heuristic query
transformation rules. It uses constraints specified on the
database schema such as unique attributes and other more
complex constraints, in order to modify one query into
another query that is more efficient to execute.

Heuristic Rules

 The heuristic rules are used as an optimization technique to


modify the internal representation of query. Usually, heuristic
rules are used in the form of query tree of query graph data structure,
to improve its performance.

AUWC, School of Technology and Informatics Page 10


Advanced Database system 2019

 One of the main heuristic rules is to apply SELECT operation before


applying the JOIN or other BINARY operations. This is because the
size of the file resulting from a binary operation such as JOIN is usually
a multi-value function of the sizes of the input files.
 The main idea behind is to reduce intermediate results. This includes
performing
 SELECT operation to reduce the number of tuples &
 PROJECT operation to reduce number of attributes.
 The SELECT and PROJECT reduced the size of the file and hence,
should be applied before the JOIN or other binary operation. Heuristic
query optimizer transforms the initial (canonical) query tree into final
query tree using equivalence transformation rules. This final query
tree is efficient to execute.
Examples for query Optimization: Identify all managers who work in a
London branch
SQL:-
SELECT * FROM Staff s, Branch b WHERE s.branchNo = b.branchNo AND
s.position = ‘Manager’ AND b.city = ‘london’;
Results in these equivalent relational algebra statements
1. S(position =’Manager’) ^(city=’London’) ^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
2. S(position =’Manager’) ^(city=’London’) (Staff Staff.branchNo = Branch.branchNo Branch)
3. [S(position =’Manager’)( Staff)] Staff.branchNo = Branch.branchNo [s(city=‘London’)
(Branch)]
Assume:
 1000 tuples in Staff.
 50 Managers
 50 tuples in Branch.
 5 London branches
 No indexes or sort keys
 All temporary results are written back to disk (memory is small)
 Tuples are accessed one at a time (not in blocks)
AUWC, School of Technology and Informatics Page 11
Advanced Database system 2019

Query 1 (Bad)

 Requires (1000+50) disk accesses to read from Staff and Branch


relations
 Creates temporary relation of Cartesian Product (1000*50) tuples
 Requires (1000*50) disk access to read in temporary relation and
test predicate
Total Work = (1000+50) + 2*(1000*50) = 101,050 I/O
operations

Query 2 (Better)

 Again requires (1000+50) disk accesses to read from Staff and


Branch
 Joins Staff and Branch on branchNo with 1000 tuples (1
employee : 1 branch )
 Requires (1000) disk access to read in joined relation and check
predicate
Total Work = (1000+50) + 2*(1000) = 3050 I/O operations
3300% Improvement over Query 1
Query 3 (Best)

 Read Staff relation to determine ‘Managers’ (1000 reads)


 Create 50 tuple relation(50 writes)
 Read Branch relation to determine ‘London’ branches (50 reads)
 Create 5 tuple relation(5 writes)
 Join reduced relations and check predicate (50 + 5 reads)

AUWC, School of Technology and Informatics Page 12


Advanced Database system 2019

Total Work = 1000 + 2*(50) + 5 + (50 + 5) = 1160 I/O


operations
8700% Improvement over Query 1
Summary of Heuristics for Algebraic Optimization:

1. The main heuristic is to apply first the operations that reduce the size
of intermediate results.
2. Perform select operations as early as possible to reduce the number of
tuples and perform project operations as early as possible to reduce
the number of attributes. (This is done by moving select and project
operations as far down the tree as possible.)
3. The select and join operations that are most restrictive should be
executed before other similar operations. (This is done by reordering
the leaf nodes of the tree among themselves and adjusting the rest of
the tree appropriately.)

AUWC, School of Technology and Informatics Page 13

You might also like