0% found this document useful (0 votes)
214 views

Chapter 13: Query Processing

The document discusses various aspects of query processing including: 1. Query processing involves translating queries into expressions that can be evaluated at the physical level, including query optimization and evaluation. 2. Query optimization involves choosing the most efficient evaluation plan by estimating the cost of alternative plans using statistical information about relations. 3. Query evaluation executes the optimized query plan and returns the results.

Uploaded by

krishna
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views

Chapter 13: Query Processing

The document discusses various aspects of query processing including: 1. Query processing involves translating queries into expressions that can be evaluated at the physical level, including query optimization and evaluation. 2. Query optimization involves choosing the most efficient evaluation plan by estimating the cost of alternative plans using statistical information about relations. 3. Query evaluation executes the optimized query plan and returns the results.

Uploaded by

krishna
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Chapter 13: Query Processing

Query Processing
• What is Query Processing
• Measures of Query Cost
• Selection Operation
• Sorting
• Join Operation
What is Query Processing?
• Query processing: Activities involved in
extracting data from a database.
– Translation of queries in high-level DB languages
into expressions that can be used at physical level
of file system.
– Includes query optimization and query evaluation.
• Three basic steps:
1. Parsing and Translation
2. Optimization
3. Evaluation
Three Basic Steps in
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Parsing and translation
• Translate the query into its internal form.
– This is then translated into relational algebra.
• Parser checks syntax, verifies relations.
• A relational algebra expression may have many
equivalent expressions
– E.g., balance2500(balance(account)) is
equivalent to
balance(balance2500(account))
Parsing and translation (cont.)
• Each relational algebra operation can be
evaluated using one of several different
algorithms
• Correspondingly, a relational-algebra
expression can be evaluated in many ways.
• Evaluation-plan: Annotated expression
specifying detailed evaluation strategy.
– e.g., can use an index on balance to find
accounts with balance < 2500,
– or can perform complete relation scan and
discard accounts with balance  2500
Query Optimization
• Alternative ways of evaluating a given query
– Equivalent expressions
– Different algorithms for each operation
Query Optimization
• An evaluation plan defines exactly what algorithm is used for each
operation, and how the execution of the operations is coordinated.
Query Optimization
• Amongst all equivalent evaluation plans
choose the one with lowest cost.
– Cost is estimated using statistical information
from the database catalog
• e.g. number of tuples in each relation, size of
tuples, etc.
• How to measure query costs
• How to optimize queries, that is, how to find an
evaluation plan with lowest estimated cost
Query Optimization
• Estimation of plan cost based on:
– Statistical information about relations.
Examples:
• number of tuples, number of distinct values for
an attribute
– Statistics estimation for intermediate results
• to compute cost of complex expressions
– Cost formulae for algorithms, computed
using statistics
Query Optimization
• Cost difference between evaluation
plans for a query can be enormous
– E.g. seconds vs. days in some cases
• Steps in cost-based query optimization
– Generate logically equivalent expressions
using equivalence rules
– Annotate resultant expressions to get
alternative query plans
– Choose the cheapest plan based on
estimated cost
Evaluation
• The query-execution engine takes a query-
evaluation plan, executes that plan, and
returns the answers to the query.
• Parsed execution plan for previously
executed SQL statements is stored in
Shared pool (a portion of memory or
buffer).
– If a new SQL statement (query) is exactly the same
string as the one in the shared pool, no need to call
optimizer and recalculate the execution plan for the
SQL statement.
Transformation of Relational
Expressions
• Two relational algebra expressions are said
to be equivalent if the two expressions
generate the same set of tuples on every
legal database instance
– Note: order of tuples is irrelevant
• An equivalence rule says that expressions
of two forms are equivalent
– Can replace expression of first form by second,
or vice versa
Equivalence Rules
1. Conjunctive selection operations can
be deconstructed into a sequence of
individual selections.
 1  2 ( E )   1 (  2 ( E ))

2. Selection operations are commutative.


 1 (  2 ( E ))    2 ( 1 ( E ))
Equivalence Rules (Cont.)
3. Only the last in a sequence of
projection operations is needed, the
others can be omitted.
 L1 ( L2 ( ( Ln ( E )) ))   L1 ( E )

4. Selections can be combined with


Cartesian products and theta joins.
a. (E1 X E2) = E1  E2

b. 1(E1 2 E2) = E1 1 2 E2


Equivalence Rules (Cont.)
5. The selection operation distributes over the
theta join operation under the following two
conditions:
(a) When all the attributes in 0 involve only the attributes
of one of the expressions (E1) being joined.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2


involves only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))
Transformation Example:
Pushing Selections
• Query: Find the names of all customers who
have an account at some branch located in
Brooklyn.
customer_name(branch_city = “Brooklyn”
(branch (account depositor)))
• Transformation using rule 5a
customer_name
((branch_city =“Brooklyn” (branch))
(account depositor))
• Performing the selection as early as possible
reduces the size of the relation to be joined.
Cost Estimation
• Cost of each operator computer
– Need statistics of input relations
• E.g. number of tuples, sizes of tuples
• Inputs can be results of sub-expressions
– Need to estimate statistics of expression
results
– To do so, we require additional statistics
• E.g. number of distinct values for an attribute
Measures of Query Cost
• Cost is generally measured as total
elapsed time for answering query
• Factors contribute to time cost
– Disk accesses
• How does the index/hashing approach impact?
– CPU
– Network communication
Measures of Query Cost
• Typically disk access is the predominant cost,
and is also relatively easy to estimate.
• Measured by taking into account
– Number of seeks * average-seek-cost
– Number of blocks read* average-block-read-cost
– Number of blocks written*average-block-write-cost
• Cost to write a block is greater than cost to read a block
– data is read back after being written to ensure that the write
was successful
Selection Operation
• Let start with a select query
• File scan – search algorithms that locate
and retrieve records that fulfill a selection
condition.
• Two ways to accomplish
– Algorithm A1 linear search
– Algorithm A2 binary search
Selections Using Indices
• Index scan – search algorithms that
use an index
– selection condition must be on search-key
of index.
• Algorithm A3 (primary index on
candidate key, equality). Retrieve a
single record that satisfies the
corresponding equality condition
Sorting
• Sorting is useful not only to return sorted data
to users but also to facilitate join.
• We may build an index on the relation, and then
use the index to read the relation in sorted
order.
– May lead to one disk block access for each tuple.
• For relations that fit in memory, techniques like
quicksort can be used.
• For relations that don’t fit in memory, external
sort-merge is a good choice.
Example: External Sorting Using
Sort-Merge
Join Operation
• Several different algorithms to implement joins
– Nested-loop join
– Block nested-loop join
– Indexed nested-loop join
– Merge-join
– Hash-join
• Choice based on cost estimate

You might also like