Query Processing and Optimization

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

QUERY PROCESSING

&
OPTIMIZATION
Query processing
Query processing refers to the range of activities
involved in extracting data from a database.
 The activities include
translation of queries in high-level database
languages into expressions that can be used at
the physical level of the file system
a variety of query-optimizing transformations ,
and
actual evaluation of queries.
Cont..
• The steps involved in processing a query appear
in Figure below . The basic steps are
Steps in query processing
Cont..
 Before query processing can begin, the system
must translate the query into a usable form.
 A language such as SQL is suitable for human use,
but is ill-suited to be the system’s internal
representation of a query.
A more useful internal representation
is one based on the extended relational algebra.
 Thus, the first action the system must take in query
processing is to translate a given query into its internal
form.
Cont..
This translation process is similar to the work
performed by the parser of a compiler.
In generating the internal form of the query, the
parser checks the syntax of the user’s query, verifies
that the relation names appearing in the query are
names of the relations in the database, and so on.
The system constructs a parse-tree representation
of the query, which it then translates into a
relational-algebra expression.
Cont..
If the query was expressed in terms of a view, the
translation phase also replaces all uses of the view
by the relational-algebra expression that define
the view
Given a query, there are generally a variety of
methods for computing the answer.
For example, we have seen that, in SQL, a query
could be expressed in several different ways.
Each SQL query can itself be translated into a
relational-algebra expression in one of several
ways.
Cont..
 Furthermore, the relational-algebra representation of a query specifies only partially
how to evaluate a query; there are usually several ways to evaluate relational-algebra
expressions.
 As an illustration, consider the query
Cont..
 Further, we can execute each relational-algebra
operation by one of several different algorithms.
 For example, to implement the preceding selection, we
can search every tuple in account to find tuples with
balance less than 2500. If a B+-tree index is available on
the attribute balance, we can use the index instead to
locate the tuples.
 To specify fully how to evaluate a query, we need not
only to provide the relational algebra expression, but
also to annotate it with instructions specifying how to
evaluate
Cont..

Annotations may state the algorithm to be used for a specific


operation, or the particular index or indices to use. A relational-algebra
operation annotated with instructions on how to evaluate it is called an
evaluation primitive.
A sequence of primitive operations that can be used to evaluate a query
is a query execution plan or query-evaluation plan. Figure above
illustrates an evaluation plan
for our example query, in which a particular index (denoted in the figure
as “index 1”) is specified for the selection operation
Cont..
The query-execution engine takes a
query-evaluation plan, executes that plan, and returns the
answers to the query.
The different evaluation plans for a given query can have
different costs.
We do not expect users to write their queries in a way that
suggests the most efficient evaluation plan.
Rather, it is the responsibility of the system to -construct a query-
evaluation plan that minimizes the cost of query evaluation.
Once the query plan is chosen, the query is evaluated with that
plan, and the result of the query is output.
Cont..
 The sequence of steps already described for processing
a query is representative; not all databases exactly
follow those steps.
 For instance, instead of using the
relational-algebra representation, several databases use
an annotated parse-tree representation based on the
structure of the given SQL query.
 However, the concepts that we describe here form the
basis of query processing in databases.
Cont..
In order to optimize a query, a query optimizer
must know the cost of each operation.
Although the exact cost is hard to compute, since
it depends on many parameters such as actual
memory available to the operation, it is possible
to get a rough estimate of execution cost for
each operation.
Measures of Query Cost

 The cost of query evaluation can be measured in terms


of a number of different resources, including disk
accesses, CPU time to execute a query, and, in a
distributed or parallel database system, the cost of
communication
The response time for a query-evaluation plan
(that is, the clock time required to execute the
plan), assuming no other activity is going on the
computer, would account for all these costs, and
could be used as a good measure of the
cost of the plan.
Cont..
In large database systems, however, disk accesses (which
we measure as the number of transfers of blocks from
disk) are usually the most important cost, since disk
accesses are slow compared to in-memory operations.
 Moreover, CPU speeds have been improving much faster
than have disk speeds. Thus, it is likely that the time spent
in disk activity will continue to dominate the total time to
execute a query.
Finally, estimating the CPU time is relatively hard,
compared to estimating the disk-access cost.
Therefore, most people consider the disk-access cost a
reasonable measure of the cost of a query-evaluation
plan.
Cont..
 We use the number of block transfers from disk as a
measure of the actual cost.
 To simplify our computation of disk-access cost, we
assume that all transfers of blocks have the same cost.
 This assumption ignores the variance arising from
rotational latency (waiting for the desired data to spin
under the read–write head) and seek time (the time
that it takes to move the head over the desired track or
cylinder).
Query optimization
Cont..
 Query optimization is the process of selecting the most
efficient query-evaluation plan from among the many
strategies usually possible for processing a given query,
especially if the query is complex.
 We do not expect users to write their queries so that
they can be processed efficiently.
 Rather, we expect the system to construct a query-
evaluation plan that minimizes the cost of query
evaluation.
 This is where query optimization comes into play.
Cont..
 One aspect of optimization occurs at the relational-
algebra level, where the system attempts to find an
expression that is equivalent to the given expression,
but more efficient to execute.
 Another aspect is selecting a detailed strategy for
processing the query, such as choosing the algorithm to
use for executing an operation, choosing the
specific indices to use, and so on.
 The difference in cost (in terms of evaluation time)
between a good strategy and a bad strategy is often
substantial, and may be several orders of magnitude.
Cont..
 Hence, it is worthwhile for the system to spend a
-substantial amount of time on the selection of a good
strategy for processing a query, even if the query is
executed only once.
Cont..
 This expression constructs a large intermediate relation,
branch account depositor.
 However, we are interested in only a few tuples of this
relation (those pertaining to branches located in
Brooklyn), and in only one of the six attributes of this
relation.
 Since we are concerned with only those tuples in the
branch relation that pertain to branches located in
Brooklyn, we do not need to consider those tuples that
do not have branch-city = “Brooklyn”. By reducing the
number of tuples of the branch relation that we need to
access, we reduce the size of the intermediate result.
Cont..
 Our query is now represented by the relational-algebra expression
Cost based optimization
Heuristic Optimization

A drawback of cost-based optimization is the cost of


optimization itself. Although the cost of query processing can
be reduced by clever optimizations, cost-based optimization is
still expensive.
Hence, many systems use heuristics to reduce the number of
choices that must be made in a cost-based fashion.
Some systems even choose to use only heuristics, and do not
use cost-based optimization at all .
An example of a heuristic rule is the following rule for
transforming relational algebra queries:
Perform selection operations as early as possible.
A heuristic optimizer would use this rule without finding out
whether the cost is reduced by this transformation
Thank You!

You might also like