Query Processing
Query Processing
Database System Concepts - 7th Edition 15.1 ©Silberschatz, Korth and Sudarshan
Query Processing
Database System Concepts - 7th Edition 15.2 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing
1. Parsing and translation We mainly focus on
the optimization phase
2. Optimization
3. Evaluation
Database System Concepts - 7th Edition 15.3 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing (cont.)
Parser and translator
Translate the (SQL) query into relational algebra
Parser checks syntax (e.g., correct relation and operator names)
Evaluation engine
The query-execution engine takes a query-evaluation plan, executes
that plan, and returns the answers to the query
Database System Concepts - 7th Edition 15.4 ©Silberschatz, Korth and Sudarshan
Basic Steps: Optimization
1st level of optimization: an SQL query has many equivalent relational
algebra expressions
salary75000(salary(instructor)) and
salary(salary75000(instructor)) are equivalent
They both correspond to SELECT salary
FROM instructor
WHERE salary < 75000
Database System Concepts - 7th Edition 15.5 ©Silberschatz, Korth and Sudarshan
Basic Steps: Optimization (Cont.)
Different query evaluation plans have different costs
User is not expected to specify least-cost plans
⋆
Silberschatz, Korth, and Sudarshan, Database System Concepts, 7° ed.
Database System Concepts - 7th Edition 15.6 ©Silberschatz, Korth and Sudarshan
How to measure query costs
(cost model)
These slides are a modified version of the slides provided with the book:
(however, chapter numeration refers to 7 th Ed.)
Thus
1. cost models (like ours) focus on resource consumption rather than response time
(optimizers minimize resource consumption rather than response time)
2. different optimizers may make different assumptions (parameters): every theoretical
analysis must be recast with the actual parameters used by the concrete system
(optimizer) to which the analysis is going to be applied
Database System Concepts - 7th Edition 15.8 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost (Cont.)
Query cost (total elapsed time for answering a query) is measured in terms of
different resources
disk access (I/O operation on disk)
CPU usage
(network communication for distributed DBMS – later in this course)
Typically disk access is the predominant cost, and is also relatively easy to
estimate. Measured by taking into account
Number of seeks (number of random I/O accesses)
Number of blocks read
Number of blocks written
It is generally assumed cost for writing to be twice as the cost for reading
(data is read back after being written to ensure the write was successful)
VERY IMPORTANT!!!
- “disk” refers to permanent drive for file storage, hard-disk, secondary memory, permanent memory
- “memory” refers to volatile drive for data storage, RAM, main memory, buffer
These are all used as synonims
This is a so far accepted choice for measuring query costs (cost model).
New technologies: faster hard-disks (solid-state drives – SSD) and cheaper (thus bigger) RAM
might direct towards different cost models (e.g., based also on CPU usage or RAM I/O operations)
Database System Concepts - 7th Edition 15.9 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost (Cont.)
We ignore difference between writing and reading: we just consider
tS – time for one seek
tT – time to transfer one block
Example: cost for b block transfers plus S seeks
b * tT + S * t S
Values of tT and tS must be calibrated for the specific disk system
Typical values (2018): tS = 4 ms, tT = 0.1 ms
Some DBMS performs, during installation, seeks and block transfers to
estimate average values
We ignore CPU costs for simplicity
Real systems usually do take CPU cost into account
We do not include cost to writing output to disk in our cost formulae
Database System Concepts - 7th Edition 15.10 ©Silberschatz, Korth and Sudarshan
Algorithms for evaluating relational
algebra operations
These slides are a modified version of the slides provided with the book:
(however, chapter numeration refers to 7 th Ed.)
At the physical level, records are stored (on permanent disks) in files
(managed and organized by the filesystem)
We assume files are organized according to sequential file
organization
i.e., a file is stored in contiguous blocks, with records ordered according
to some attribute(s) – not necessarily ordered by primary key
Other file organization techniques exist (e.g., B+-tree file
organization), leading to different formulas for cost estimate
Database System Concepts - 7th Edition 15.12 ©Silberschatz, Korth and Sudarshan
Selection Operation
File scan (relation scan without indices)
PROs: can be applied to any file, regardless of its ordering, availability of indices,
nature of selection operation, etc.
CONs: it is slow
Algorithm A1 (linear search). Retrieve and scan each file block and
test all records to see whether they satisfy the selection condition
br denotes number of blocks containing records from relation r
Cost estimate??? (selection on a generic, non-key attribute)
cost = br block transfers + 1 seek = tS + br * tT
We assume blocks are stored contiguously so 1 seek operation is enough (disk head
does not need to move to seek next block)
Database System Concepts - 7th Edition 15.13 ©Silberschatz, Korth and Sudarshan
Selections Using Indices
Database System Concepts - 7th Edition 15.14 ©Silberschatz, Korth and Sudarshan
Selections Using Indices
Database System Concepts - 7th Edition 15.15 ©Silberschatz, Korth and Sudarshan
Selections Involving Comparisons
Database System Concepts - 7th Edition 15.16 ©Silberschatz, Korth and Sudarshan
Selections Involving Comparisons
Database System Concepts - 7th Edition 15.17 ©Silberschatz, Korth and Sudarshan
Summary of costs for selections
Database System Concepts - 7th Edition 15.18 ©Silberschatz, Korth and Sudarshan
Complex Selections (cont’d)