Query Processing
Query Processing
Overview
Measures of Query Cost
Selection Operation
Sorting
Join Operation
Other Operations
Evaluation of Expressions
Database System Concepts - 7th Edition 15.2 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing
Database System Concepts - 7th Edition 15.3 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing (Cont.)
Database System Concepts - 7th Edition 15.4 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing:
Optimization
A relational algebra expression may have many equivalent expressions
• E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
Each relational algebra operation can be evaluated using one of several
different algorithms
• Correspondingly, a relational-algebra expression can be evaluated in
many ways.
Annotated expression specifying detailed evaluation strategy is called an
evaluation-plan. E.g.,:
• Use an index on salary to find instructors with salary < 75000,
• Or perform complete relation scan and discard instructors with salary
75000
Database System Concepts - 7th Edition 15.5 ©Silberschatz, Korth and Sudarshan
Basic Steps: Optimization (Cont.)
Database System Concepts - 7th Edition 15.6 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost
Database System Concepts - 7th Edition 15.7 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost
Database System Concepts - 7th Edition 15.8 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost (Cont.)
Database System Concepts - 7th Edition 15.9 ©Silberschatz, Korth and Sudarshan
Selection Operation
File scan
Algorithm A1 (linear search). Scan each file block and test all records
to see whether they satisfy the selection condition.
• Cost estimate = br block transfers + 1 seek
br denotes number of blocks containing records from relation r
• If selection is on a key attribute, can stop on finding record
cost = (br /2) block transfers + 1 seek
• Linear search can be applied regardless of
selection condition or
ordering of records in the file, or
availability of indices
Note: binary search generally does not make sense since data is not
stored consecutively
• except when there is an index available,
• and binary search requires more seeks than index search
Database System Concepts - 7th Edition 15.10 ©Silberschatz, Korth and Sudarshan
Selections Using Indices
Database System Concepts - 7th Edition 15.11 ©Silberschatz, Korth and Sudarshan
Selections Using Indices
Database System Concepts - 7th Edition 15.12 ©Silberschatz, Korth and Sudarshan
Selections Involving Comparisons
Database System Concepts - 7th Edition 15.13 ©Silberschatz, Korth and Sudarshan
Implementation of Complex Selections
Database System Concepts - 7th Edition 15.14 ©Silberschatz, Korth and Sudarshan
Algorithms for Complex Selections
Disjunction:1 2 . . . n (r).
A10 (disjunctive selection by union of identifiers).
• Applicable if all conditions have available indices.
Otherwise use linear scan.
• Use corresponding index for each condition, and take union of all the
obtained sets of record pointers.
• Then fetch records from file
Negation: (r)
• Use linear scan on file
• If very few records satisfy , and an index is applicable to
Find satisfying records using index and fetch from file
Database System Concepts - 7th Edition 15.15 ©Silberschatz, Korth and Sudarshan
Bitmap Index Scan
Database System Concepts - 7th Edition 15.16 ©Silberschatz, Korth and Sudarshan
Sorting
We may build an index on the relation, and then use the index to read
the relation in sorted order. May lead to one disk block access for each
tuple.
For relations that fit in memory, techniques like quicksort can be used.
• For relations that don’t fit in memory, external merge sort is a good
choice.
Database System Concepts - 7th Edition 15.17 ©Silberschatz, Korth and Sudarshan
Example: External Sorting Using Merge Sort
Database System Concepts - 7th Edition 15.18 ©Silberschatz, Korth and Sudarshan
External Merge Sort
Database System Concepts - 7th Edition 15.19 ©Silberschatz, Korth and Sudarshan
External Merge Sort (Cont.)
2. Merge the runs (N-way merge). We assume (for now) that N < M.
1. Use N blocks of memory to buffer input runs, and 1 block to buffer
output. Read the first block of each run into its buffer page
2. repeat
1. Select the first record (in sort order) among all buffer pages
2. Write the record to the output buffer. If the output buffer is full
write it to disk.
3. Delete the record from its input buffer page.
If the buffer page becomes empty then
read the next block (if any) of the run into the buffer.
3. until all input buffer pages are empty:
Database System Concepts - 7th Edition 15.20 ©Silberschatz, Korth and Sudarshan
External Merge Sort (Cont.)
Database System Concepts - 7th Edition 15.21 ©Silberschatz, Korth and Sudarshan
External Merge Sort (Cont.)
Cost analysis:
• 1 block per run leads to too many seeks during merge
Instead use bb buffer blocks per run
read/write bb blocks at a time
Can merge M/bb–1 runs in one pass
• Total number of merge passes required: log M/bb–1(br/M).
• Block transfers for initial run creation as well as in each pass is 2br
for final pass, we don’t count write cost
• we ignore final write cost for all operations since the output of
an operation may be sent to the parent operation without
being written to disk
Thus total number of block transfers for external sorting:
br ( 2 log M/bb–1 (br / M) + 1)
• Seeks: next slide
Database System Concepts - 7th Edition 15.22 ©Silberschatz, Korth and Sudarshan
External Merge Sort (Cont.)
Cost of seeks
• During run generation: one seek to read each run and one seek
to write each run
2 br / M
• During the merge phase
Need 2 br / bb seeks for each merge pass
• except the final one which does not require a write
Total number of seeks:
2 br / M + br / bb (2 logM/bb–1(br / M) -1)
Database System Concepts - 7th Edition 15.23 ©Silberschatz, Korth and Sudarshan
Schema Diagram for University Database
Database System Concepts - 7th Edition 15.24 ©Silberschatz, Korth and Sudarshan
Join Operation
Database System Concepts - 7th Edition 15.25 ©Silberschatz, Korth and Sudarshan
Nested-Loop Join
Database System Concepts - 7th Edition 15.26 ©Silberschatz, Korth and Sudarshan
Nested-Loop Join (Cont.)
In the worst case, if there is enough memory only to hold one block of
each relation, the estimated cost is
nr bs + br block transfers, plus nr + br seeks
If the smaller relation fits entirely in memory, use that as the inner
relation.
• Reduces cost to br + bs block transfers and 2 seeks
Assuming worst case memory availability cost estimate is
• with student as outer relation:
5000 400 + 100 = 2,000,100 block transfers,
5000 + 100 = 5100 seeks
• with takes as the outer relation
10000 100 + 400 = 1,000,400 block transfers and 10,400 seeks
If smaller relation (student) fits entirely in memory, the cost estimate will
be 500 block transfers and 2 seeks.
Block nested-loops algorithm (next slide) is preferable.
Database System Concepts - 7th Edition 15.27 ©Silberschatz, Korth and Sudarshan
Block Nested-Loop Join
Database System Concepts - 7th Edition 15.28 ©Silberschatz, Korth and Sudarshan
Block Nested-Loop Join (Cont.)
Database System Concepts - 7th Edition 15.29 ©Silberschatz, Korth and Sudarshan
Indexed Nested-Loop Join
Database System Concepts - 7th Edition 15.30 ©Silberschatz, Korth and Sudarshan
Example of Nested-Loop Join Costs
Database System Concepts - 7th Edition 15.31 ©Silberschatz, Korth and Sudarshan
Merge-Join
1. Sort both relations on their join attribute (if not already sorted on the join
attributes).
2. Merge the sorted relations to join them
1. Join step is similar to the merge stage of the sort-merge algorithm.
2. Main difference is handling of duplicate values in join attribute —
every pair with same value on join attribute must be matched
3. Detailed algorithm in book
Database System Concepts - 7th Edition 15.32 ©Silberschatz, Korth and Sudarshan
Merge-Join (Cont.)
Database System Concepts - 7th Edition 15.33 ©Silberschatz, Korth and Sudarshan
Hash-Join
Database System Concepts - 7th Edition 15.34 ©Silberschatz, Korth and Sudarshan
Hash-Join (Cont.)
Database System Concepts - 7th Edition 15.35 ©Silberschatz, Korth and Sudarshan
Hash-Join (Cont.)
Database System Concepts - 7th Edition 15.36 ©Silberschatz, Korth and Sudarshan
Hash-Join Algorithm
Relation s is called the build input and r is called the probe input.
Database System Concepts - 7th Edition 15.37 ©Silberschatz, Korth and Sudarshan
Hash-Join algorithm (Cont.)
The value n and the hash function h is chosen such that each si should
fit in memory.
• Typically n is chosen as bs/M * f where f is a “fudge factor”,
typically around 1.2
• The probe relation partitions ri need not fit in memory
Recursive partitioning required if number of partitions n is greater than
number of pages M of memory.
• instead of partitioning n ways, use M – 1 partitions for s
• Further partition the M – 1 partitions using a different hash function
• Use same partitioning method on r
• Rarely required: e.g., with block size of 4 KB, recursive partitioning
not needed for relations of < 1GB with memory size of 2MB, or
relations of < 36 GB with memory of 12 MB
Database System Concepts - 7th Edition 15.38 ©Silberschatz, Korth and Sudarshan
Handling of Overflows
Database System Concepts - 7th Edition 15.39 ©Silberschatz, Korth and Sudarshan
Cost of Hash-Join
Database System Concepts - 7th Edition 15.40 ©Silberschatz, Korth and Sudarshan
Example of Cost of Hash-Join
instructor ⨝ teaches
Database System Concepts - 7th Edition 15.41 ©Silberschatz, Korth and Sudarshan
Hybrid Hash–Join
Useful when memory sized are relatively large, and the build input is
bigger than memory.
Main feature of hybrid hash join:
Keep the first partition of the build relation in memory.
E.g. With memory size of 25 blocks, instructor can be partitioned into five
partitions, each of size 20 blocks.
• Division of memory:
The first partition occupies 20 blocks of memory
1 block is used for input, and 1 block each for buffering the other
4 partitions.
teaches is similarly partitioned into five partitions each of size 80
• the first is used right away for probing, instead of being written out
Cost of 3(80 + 320) + 20 +80 = 1300 block transfers for
hybrid hash join, instead of 1500 with plain hash-join.
Hybrid hash-join most useful if M >>
bs
Database System Concepts - 7th Edition 15.42 ©Silberschatz, Korth and Sudarshan
Complex Joins
Database System Concepts - 7th Edition 15.43 ©Silberschatz, Korth and Sudarshan
Joins over Spatial Data
Database System Concepts - 7th Edition 15.44 ©Silberschatz, Korth and Sudarshan
Other Operations
Database System Concepts - 7th Edition 15.45 ©Silberschatz, Korth and Sudarshan
Other Operations : Aggregation
Database System Concepts - 7th Edition 15.46 ©Silberschatz, Korth and Sudarshan
Other Operations : Set Operations
Set operations (, and ): can either use variant of merge-join after
sorting, or variant of hash-join.
E.g., Set operations using hashing:
1. Partition both relations using the same hash function
2. Process each partition i as follows.
1. Using a different hashing function, build an in-memory hash
index on ri.
2. Process si as follows
• r s:
1. Add tuples in si to the hash index if they are not already
in it.
2. At end of si add the tuples in the hash index to the
result.
Database System Concepts - 7th Edition 15.47 ©Silberschatz, Korth and Sudarshan
Other Operations : Set Operations
Database System Concepts - 7th Edition 15.48 ©Silberschatz, Korth and Sudarshan
Answering Keyword Queries
Database System Concepts - 7th Edition 15.49 ©Silberschatz, Korth and Sudarshan
Other Operations : Outer Join
Database System Concepts - 7th Edition 15.50 ©Silberschatz, Korth and Sudarshan
Other Operations : Outer Join
Database System Concepts - 7th Edition 15.51 ©Silberschatz, Korth and Sudarshan
Evaluation of Expressions
Database System Concepts - 7th Edition 15.52 ©Silberschatz, Korth and Sudarshan
Materialization
Database System Concepts - 7th Edition 15.53 ©Silberschatz, Korth and Sudarshan
Materialization (Cont.)
Database System Concepts - 7th Edition 15.54 ©Silberschatz, Korth and Sudarshan
Pipelining
Database System Concepts - 7th Edition 15.55 ©Silberschatz, Korth and Sudarshan
Pipelining (Cont.)
Database System Concepts - 7th Edition 15.56 ©Silberschatz, Korth and Sudarshan
Pipelining (Cont.)
Database System Concepts - 7th Edition 15.57 ©Silberschatz, Korth and Sudarshan
Blocking Operations
Database System Concepts - 7th Edition 15.58 ©Silberschatz, Korth and Sudarshan
Pipeline Stages
Pipeline stages:
• All operations in a stage run concurrently
• A stage can start only after preceding stages have completed
execution
Database System Concepts - 7th Edition 15.59 ©Silberschatz, Korth and Sudarshan
Evaluation Algorithms for Pipelining
Some algorithms are not able to output results even as they get input
tuples
• E.g., merge join, or hash join
• intermediate results written to disk and then read back
Algorithm variants to generate (at least some) results on the fly, as
input tuples are read in
• E.g., hybrid hash join generates output tuples even as probe
relation tuples in the in-memory partition (partition 0) are read in
• Double-pipelined join technique: Hybrid hash join, modified to
buffer partition 0 tuples of both relations in-memory, reading them
as they become available, and output results of any matches
between partition 0 tuples
When a new r0 tuple is found, match it with existing s0 tuples,
output matches, and save it in r0
Symmetrically for s0 tuples
Database System Concepts - 7th Edition 15.60 ©Silberschatz, Korth and Sudarshan
Pipelining for Continuous-Stream Data
Data streams
• Data entering database in a continuous manner
• E.g., Sensor networks, user clicks, …
Continuous queries
• Results get updated as streaming data enters the database
• Aggregation on windows is often used
E.g., tumbling windows divide time into units, e.g., hours,
minutes
Need to use pipelined processing algorithms
• Punctuations used to infer when all data for a window has been
received
Database System Concepts - 7th Edition 15.61 ©Silberschatz, Korth and Sudarshan
Query Processing in Memory
Database System Concepts - 7th Edition 15.62 ©Silberschatz, Korth and Sudarshan
Cache Conscious Algorithms
Goal: minimize cache misses, make best use of data fetched into the
cache as part of a cache line
For sorting:
• Use runs that are as large as L3 cache (a few megabytes) to avoid
cache misses during sorting of a run
• Then merge runs as usual in merge-sort
For hash-join
• First create partitions such that build+probe partitions fit in memory
• Then subpartition further s.t. build subpartition+index fits in L3
cache
Speeds up probe phase significantly by avoiding cache misses
Lay out attributes of tuples to maximize cache usage
• Attributes that are often accessed together should be stored
adjacent to each other
Use multiple threads for parallel query processing
• Cache misses leads to stall of one thread, but others can proceed
Database System Concepts - 7th Edition 15.63 ©Silberschatz, Korth and Sudarshan
End of Chapter 15
Database System Concepts - 7th Edition 15.64 ©Silberschatz, Korth and Sudarshan