Lesson 05

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

ADVANCED DATABASE MANAGEMENT

SYSTEMS
ICT3273

Query Processing Part I

Nuwan Laksiri
Department of ICT
Faculty of Technology
University of Ruhuna Lecture 05
WHAT WE DISCUSS TODAY ……..
• RECAP B+ TREE
• OVERVIEW
• MEASURES OF QUERY COST
• SELECTION OPERATION
• BASIC ALGORITHMS
• SELECTIONS USING INDICES
• SELECTIONS INVOLVING COMPARISONS
• IMPLEMENTATION OF COMPLEX SELECTIONS

NEXT WEEK
• QUERY PROCESSING PART II
• SORTING
• JOIN OPERATION
• OTHER OPERATIONS
• EVALUATION OF EXPRESSIONS
RECAP
• TREE
• M-WAY SEARCH TREE
• B TREE
• B TREE OPERATIONS
• B+ TREE
• INTRODUCTION
• BASICS
• STRUCTURE
• PROPERTIES
• OPERATIONS
• SEARCH
• INSERT
• DELETE
Query Processing
• Query processing
includes translation of high-level queries into low-level
expressions
that can be used
at the physical level of the file system,
query optimization and actual execution of the
query to get the result.
Basic Steps in Query Processing
• The basic steps involved in processing a query,
• Parsing and translation
• Optimization
• Evaluation
Query Processing
Query Parsing and Translation (Query
compiler)

• Check the syntax (e.g SQL for relational DBMS)


• Verify that the mentioned relations do exist
• Transform the SQL query to a query plan
represented by a relational algebra expression
(for relational DBMS)
• Different possible relational algebra
expressions for a single query
Query Optimization (Query Optimizer)
• Transform the initial query plan into the best possible query plan
based on the given data set
• Specify the execution of single query plan operations
(evaluation primitives)
Ex. Which algorithms and indices to be used
• The query execution plan is defined by a sequence of
evaluation primitives.
Query Evaluation (Command Processor)
• Evaluate the query execution plan and return the
results
• To optimize a query,
• Query optimizer must know the cost of each operation

cost???
Measures Of Query Cost
• Cost is generally measured as total elapsed time for
answering query
• Many factors contribute to time cost
• Access cost to secondary storage(disk accesses)
• Disk storage cost
• Computation cost(CPU cost)
• Memory usage cost
• Communication cost
Measures Of Query Cost
• Typically disk access is the predominant cost, and is also
relatively easy to estimate. Measured by taking into account,
• Number of seeks
* average-seek-cost
• Number of blocks read
* average-block-read-cost
• Number of blocks written
* average-block-write-cost

• Cost to write a block is greater than cost to read a block


data is read back after being written to ensure that the
write was successful
Measures Of Query Cost
• For simplicity we just use the number of block transfers
from disk and the number of seeks as the cost
measures
• tT – time to transfer one block
• tS – time for one seek

• Cost for b block transfers plus S seeks


b * tT + S * tS
Selection Operation - Basic Algorithms
A1 (Linear Search)
• In a linear search, the system scans each file block
and tests all records to see whether they satisfy the
selection condition.
• For a selection on a key attribute, the system can
terminate the scan if the required record is found,
without looking at the other records of the relation.
• The cost of linear search, in terms of number of I/O
operations, is br ,where br denotes number of blocks
containing records from relation r.
Selection Operation - Basic Algorithms
A2 (Binary Search)
• If the file is ordered on an attribute, and the selection
condition is an equality comparison on the attribute, we
can use a binary search to locate records that satisfy the
selection.
• The system performs the binary search on the blocks of the
file.
• The number of blocks that need to be examined to find a
block containing the required records is log2 (br) , where br
denotes the number of blocks in the file.
Selection Operation - Selections using Indices
Index scan – search algorithms that use an index
• selection condition must be on search-key of index.

A3 (primary index, equality on key)


• For an equality comparison on a key attribute with a
primary index, we can use the index to retrieve a
single record that satisfies the corresponding equality
condition.
• If a B+ tree is used, the cost of the operation, in terms
of I/O operations, is equal to the height of the tree plus
one I/O to fetch the record.
Selection Operation - Selections using Indices
A4 (primary index, equality on non-key)
• The only difference from the previous case is that
multiple records may need to be fetched.
• However, the records would be stored consecutively in
the file since the file is sorted on the search key.
• The cost of the operation is proportional to the height
of the tree, plus the number of blocks containing
records with the specified search key.
Selection Operation - Selections using Indices
A5 (secondary index, equality)
• Retrieve a single record if the equality condition is on a
candidate key.
• only one record is retrieved, and the cost is equal to the
height of the tree plus one I/O operation to fetch the record.
• Retrieve multiple records if the equality condition is not on
a candidate key.
• each record may be resident on a different block, which
may result in one I/O operation per retrieved record.
• The cost could become even worse than that of linear
search, if a large number of records are retrieved.
Selection Operation - Selections Involving
Comparisons
• Can implement selections of the form
σA≤V (r) or σA ≥V (r)
by using,
• a linear file scan,
• or by using indices in the following ways:
• primary index, comparison
• secondary index, comparison
Selection Operation - Selections Involving
Comparisons
A6(primary index, comparison) (Relation is sorted on A)
• For σA ≥V(r)
use index to find first tuple ≥ v, and scan relation
sequentially from there.
• For σA≤V(r)
just scan relation sequentially till first tuple > v;
do not use index.
Selection Operation - Selections Involving
Comparisons
A7(secondary index, comparison)(Relation is sorted on A)
• For σA ≥V(r)
use index to find first index entry ≥ v, and scan
index sequentially from there, to find pointers to
records.
• For σA≤V(r)
just scan leaf pages of index finding pointers to
records, till first entry > v.
Selection Operation - Implementation of
Complex Selections
Conjunction
• A conjunctive selection is a selection of the form
σ θ1∧θ2∧··· ∧θn (r)
Disjunction
• A disjunctive selection is a selection of the form
σ θ1∨θ2∨··· ∨θn (r)
Negation
• The result of a selection σ ~θ (r) the set of tuples of r
for which the condition θ evaluates to false
Selection Operation - Implementation of
Complex Selections
A8(conjunctive selection using one index)
• Select a combination of θi and algorithms A1 through
A7 that results in the least cost for σθi(r).
Selection Operation - Implementation of
Complex Selections
A9(conjunctive selection using composite index)
• Use appropriate composite (multiple-key) index if
available.
Selection Operation - Implementation of
Complex Selections
A10(conjunctive selection by intersection of
identifiers)
• Use corresponding index for each condition, and take
intersection of all the obtained sets of record pointers.
• Then fetch records from file.
Selection Operation - Implementation of
Complex Selections
A11(disjunctive selection by union of identifiers)
• Applicable if all conditions have available indices.
• Otherwise use linear scan.
• Use corresponding index for each condition, and take
union of all the obtained sets of record pointers.
• Then fetch records from file
SUMMARY

• RECAP B+ TREE
• OVERVIEW
• MEASURES OF QUERY COST
• SELECTION OPERATION
• BASIC ALGORITHMS
• SELECTIONS USING INDICES
• SELECTIONS INVOLVING COMPARISONS
• IMPLEMENTATION OF COMPLEX SELECTIONS
REFERENCES

• Fundamentals of database systems


(6th edition) by remez elmasri & shamkant B. Navathe )

• Database Management Systems


(3rd edition) - by Raghu Ramakrishnan and Johannes Gehrke, McGraw Hill,
2003.

• Advanced Database Management Systems


by Rini Chakrabarti, Shibhadra Dasgupta
THANK YOU

You might also like