0% found this document useful (0 votes)
3 views

DBMS_Unit5_Lecture1

Unit 5 of the Database Management System covers query processing and optimization, detailing the steps involved in extracting data from a database, including parsing, optimization, and evaluation. It emphasizes the importance of query cost estimation, which is based on factors such as disk accesses and CPU time, and discusses various algorithms for executing queries efficiently. The document also outlines different methods for selection operations using indices and the impact of these methods on query performance.

Uploaded by

mhjbinisha12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DBMS_Unit5_Lecture1

Unit 5 of the Database Management System covers query processing and optimization, detailing the steps involved in extracting data from a database, including parsing, optimization, and evaluation. It emphasizes the importance of query cost estimation, which is based on factors such as disk accesses and CPU time, and discusses various algorithms for executing queries efficiently. The document also outlines different methods for selection operations using indices and the impact of these methods on query performance.

Uploaded by

mhjbinisha12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Database Management System

Unit 5: Query Processing and Optimization

Lecture 1
Outline
• Query Processing
• Query Cost Estimation
• Query Operations
Query Processing
• Query processing refers to the range of activities involved in
extracting data from a database
• The activities include
• translation of queries in high-level database languages into expressions that
can be used at the physical level of the file system,
• a variety of query-optimizing transformations,
• and actual evaluation of queries
Query Processing …
• It is a step wise process that can be used at the physical level of the
file system, query optimization and actual execution of the query to
get the result
• It requires the basic concepts of relational algebra and file structure
• The actual updating and retrieval of data is performed through various “low-
level” operations.
• Examples of such operations for a relational DBMS can be relational algebra
operations such as project, join, select, Cartesian product, etc
Basic Steps in Query Processing [1]
1. Parsing and translation
2. Optimization
3. Evaluation
Basic Steps in Query Processing [2]
• Parsing and translation
• Translate the query into its internal form and then into relational algebra
• Parser checks syntax and verifies relations
• Optimization
• Amongst all equivalent evaluation plans choose the one with lowest cost
• Cost is estimated using statistical information from the database catalog, such as
the number of tuples in each relation, size of tuples, etc.
• Evaluation
• The query-execution engine takes a query-evaluation plan, executes that plan, and
returns the answers to the query
Evaluation Plans [1]
• A relational algebra expression may have many equivalent expressions
• Consider a query
select salary
from instructor
where salary < 75000
This query can be translated into either of the following relational-algebra
expressions:
• E.g., salary75000(salary(instructor)) is equivalent to

salary(salary75000(instructor))
Evaluation Plans [2]
• Each relational algebra operation can be evaluated using one of several different
algorithms
• For example, to implement the preceding selection, every tuple in instructor
can be searched to find tuples with salary less than 75000
• If a B+ tree index is available on the attribute salary, the index can be used
instead to locate the tuple
• Correspondingly, a relational-algebra expression can be evaluated in many
ways
Evaluation Plans [3]
• To specify fully how to evaluate a query, it requires both
• to specify the relational algebra expression and
• to annotate it with instructions specifying how to evaluate each operation
• Annotations may state the algorithm to be used for a specific
operation or the particular index or indices to use
• A relational-algebra operation annotated with instructions on how to
evaluate it is called an evaluation primitive
Evaluation Plans [4]
• A sequence of primitive operations that can be used
to evaluate a query is a query-execution plan or
query-evaluation plan
• Annotated expression specifying detailed evaluation
strategy
• E.g.:
• Use an index on salary to find instructors with
salary < 75000,
• Or perform complete relation scan and discard
instructors with salary  75000
Fig. A Query Evaluation plan
Basic Steps: Optimization
• Query Optimization:
• Amongst all equivalent evaluation plans, choose the one with lowest cost
• Cost is estimated using statistical information from the database catalog
• e.g. number of tuples in each relation, size of tuples, etc
• To Learn
• To measure query costs
• Algorithms for evaluating relational algebra operations
• To combine algorithms for individual operations in order to evaluate a complete expression
• To optimize queries: how to find an evaluation plan with lowest estimated cost
Measures of Query Cost
• Cost is generally measured as total elapsed time for answering query
• Many factors contribute to time cost
• disk accesses, CPU, or even network communication
• Typically disk access is the predominant cost, and is also relatively easy to
estimate.
• Measured by taking into account
• Number of seeks * average-seek-cost
• Number of blocks read * average-block-read-cost
• Number of blocks written * average-block-write-cost
• Cost to write a block is greater than cost to read a block
• data is read back after being written to ensure that the write was successful
Measures of Query Cost (Cont.)
• For simplicity we just use the number of block transfers from disk and the
number of seeks as the cost measures
• tT – time to transfer one block
• tS – time for one seek
• Cost for b block transfers plus S seeks
b * tT + S * tS
• We ignore CPU costs for simplicity
• Real systems do take CPU cost into account
• Cost to writing output to disk is not included in cost formula
Measures of Query Cost (Cont.)
• tT – time to transfer one block
• tS – time for one seek
• tS and tT depend on where data is stored;
• with 4 KB blocks:
• High end magnetic disk: tS = 4 msec and tT =0.1 msec
• SSD: : tS = 20-90 microsec and tT = 2-10 microsec for 4KB
• Costs of algorithms depend on the size of the buffer in main memory, as having
more memory reduces need for disk access
• Thus memory size should be a parameter while estimating cost; often use worst case
estimates
• The cost estimate of algorithm A is referred to as EA
Catalog Information for Cost Estimation
• nr : number of tuples in relation r.
• br : number of blocks containing tuples of r.
• sr : size of a tuple of r in bytes.
• fr : blocking factor of r — i.e., the number of tuples of r that fit into one block.
• V(A, r): number of distinct values that appear in r for attribute
• A; same as the size of A (r).
• SC(A, r): selection cardinality of attribute A of relation r; average number
of records that satisfy equality on A.
• If tuples of r are stored together physically in a file, then: br = nr / fr
Selection Operation
• File scan
• search algorithms that locate and retrieve records that fulfill a selection condition.
• Algorithm A1 (linear search)
• Scan each file block and test all records to see whether they satisfy the selection condition
• Cost estimate = br block transfers + 1 seek
Cost = br* tr + ts
• If selection is on a key attribute, can stop on finding record
• Average case, cost = (br /2) block transfers + 1 seek
Cost = (br/2)* tr + ts
• Linear search can be applied regardless of selection condition or ordering of
records in the file, or availability of indices
Selections Using Indices
• Index scan – search algorithms that use an index
• selection condition must be on search-key of index.
• A2 (primary index, equality on key). Retrieve a single record that satisfies the corresponding
equality condition
• Cost = (hi + 1) * (tT + tS)
• Where, hi denotes the height of the index. Index lookup traverses the height of the tree plus
one I/O to fetch the record
• Each of the I/O operations requires a seek and a block transfer
• A3 (primary index, equality on nonkey) Retrieve multiple records.
• Records will be on consecutive blocks
• Let b = number of blocks containing matching records
• Cost = hi * (tT + tS) + tS + tT * b
Selections Using Indices ..
• A4 (secondary index, equality on nonkey).
• Retrieve a single record if the search-key is a candidate key
• Cost = (hi + 1) * (tT + tS)
• This case is similar to primary index
• Retrieve multiple records if search-key is not a candidate key
• each of n matching records may be on a different block
• Cost = (hi + n) * (tT + tS)
• Can be very expensive!
Selections Involving Comparisons
• Can implement selections of the form AV (r) or A  V(r) by using
• a linear file scan,
• or by using indices in the following ways:
• A5 (primary index, comparison). (Relation is sorted on A)
• For A  V(r) use index to find first tuple  v and scan relation sequentially from there
• For AV (r) just scan relation sequentially till first tuple > v; do not use index
• Identical to the case of A3, equality on nonkey
• A6 (secondary index, comparison).
• For A  V(r) use index to find first index entry  v and scan index sequentially from
there, to find pointers to records.
• For AV (r) just scan leaf pages of index finding pointers to records, till first entry > v
• Identical to the case of A4, equality on nonkey
Implementation of Complex Selections
• Conjunction: 1 2. . . n(r)
• A7 (conjunctive selection using one index).
• Select a combination of i and algorithms A1 through A7 that results in the least cost for i
(r).
• Test other conditions on tuple after fetching it into memory buffer.
• A8 (conjunctive selection using composite index).
• Use appropriate composite (multiple-key) index if available.
• A9 (conjunctive selection by intersection of identifiers).
• Requires indices with record pointers.
• Use corresponding index for each condition, and take intersection of all the obtained sets of
record pointers.
• Then fetch records from file
• If some conditions do not have appropriate indices, apply test in memory.
Algorithms for Complex Selections
• Disjunction:1 2 . . . n (r).
• A10 (disjunctive selection by union of identifiers).
• Applicable if all conditions have available indices.
• Otherwise use linear scan.
• Use corresponding index for each condition, and take union of all the obtained sets of record
pointers.
• Then fetch records from file
• Negation: (r)
• Use linear scan on file
• If very few records satisfy , and an index is applicable to 
• Find satisfying records using index and fetch from file
Next

You might also like