0% found this document useful (0 votes)

15 views37 pages

Lecture11 Query Processing

This document provides an overview of query processing in database management systems, detailing the steps involved such as parsing, optimization, and evaluation. It emphasizes the importance of query optimization for efficient execution and discusses various techniques and equivalence rules used to improve query performance. Additionally, it highlights the role of cost estimation and the need for statistics in determining the efficiency of different query plans.

Uploaded by

Shukra Shukra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views37 pages

Lecture11 Query Processing

Uploaded by

Shukra Shukra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 37

Data Modeling Techniques

Query Processing
Last Lecture…
 Indexes

 Any questions?
This Lecture…
 Query Processing: Overview

 What?

 Why?

 How? (Basics)
Query Processing
 What happens to a query inside
the DBMS?

 Query processing considers this

issue.
Steps in QP…
 The steps involved in query processing include…
 Parsing and translation
 Optimization
 Evaluation
Steps in QP… (contd.)
 Parsing & translation step verifies the
correctness of the query and converts
the query into an internal form (usually
an extended relational algebra
expression)

 This internal form is either a tree (i.e.

query tree) or a graph (i.e. query
graph)
Steps in QP… (contd.)
 Next, an efficient execution strategy for
retrieving the results of the query from
the database files is generated. This step
is called query optimization.

 This is the heart of query processing in a

relational database system

 The evaluation engine executes the query

according to the chosen plan
Parsing & Translating…
 The first step in query processing
is to convert the query into an
form that can be executed
 Consider the following schema
S(sno, sname, status, city)
P(pno, pname, colour, weight)
SP(sno, pno, qty)
Parsing & Translating…
(contd.)

SELECT s.sname
FROM S, SP
WHERE S.sno = SP.sno AND
SP.pno = ‘P2’

We can express this query as a

relational algebra as follows…
Parsing & Translating…
(contd.)
Parsing & Translating…
(contd.)
 Usually, a SELECT-FROM-WHERE-
GROUP BY called a query block is
converted an extended relational
algebra expression

 There can be many query blocks

(i.e. with nested queries) in a
complex query
Why do we need Query
Optimization?
 Consider the following number of tuples for S and
SP
S – 100 pages
SP - 10000 pages
 Cost for computing Cartesian Product:
- read 10,000 * 100 pages
- write 1,000,000 pages (intermediate result)
 Selection
- read 1,000,000 pages
- keep 50 tuples (assuming 50 tuple SP.pno = ‘P2’)
 Total number of disk I/O ~ 3,000,000 disk I/Os
Optimization? (contd.)
 Consider the following
relational algebra, which
produces the same result

 Selection operator
 Read 10,000 pages
 Keep the result, 50 tuples in
memory
 Join with S
 Read 100 pages
Total cost 10,100 disk I/Os
Heuristic Optimization
 A query can have many equivalent
query trees

 Optimizer must find an efficient

query plan to execute

 Heuristic rules are used for

algebraic optimization
Equivalence Rules…

 To transform a relational algebra

expression from to an equivalent
efficient query expression, certain
equivalence rules are used.
Equivalence Rules…
(contd.)
 Conjunctive selection operations can be
deconstructed into a sequence of individual
selections. This transformation is referred to as a
cascade of .
C1  C2(E) = C1 (C2 (E))

 Selection operations are commutative.

C1 (C2 (E)) = C2 (C1 (E))

 Cascade of .
L1 ( L2 (… (Ln (E)…)) = L1 ( E )
Equivalence Rules…
(contd.)
 Selections can be combined with
Cartesian products and theta joins.
a.   ( E1 E2 ) E1  E2

 This expression is just the definition of

the theta join.
b.  1 ( E1  2 E2) = E1 12 E2

Etc.
Equivalence Rules (contd.)
 Equivalence rules states that two expressions are
equal
 It does not state, which one is better

 A large number of plans are possible! Estimating the

cost for each is prohibitively expensive

 The optimizer uses heuristic rules to prune the plan

space (reduce the number of plans to be considered)
 E.g. Bring selects down, avoid cartesian products,

etc.
Indexes and cost of query
plans…
 Using an index does not necessarily
mean efficient query plan.

 Can you think of an instance where

using an index is inefficient?

 Query optimizer estimates costs to

compare different execution plans
Cost estimation
 Execution plan has
 A set of relational algebra operators (query plan) to
obtain the result of the query
 Algorithm used to evaluate each relational algebra
operators

 Some relational algebra operators have many

possible ways (algorithms).

 Costs may differ significantly based on the

chosen algorithm

 We will study algorithms some algorithms

Schema for Examples
Sailors (sid: integer, sname: string, rating: integer, age: rea
Reserves (sid: integer, bid: integer, day: dates, rname: strin

Reserves:
 Each tuple is 40 bytes long, 100 tuples per
page, 1000 pages.

 Sailors:
 Each tuple is 50 bytes long, 80 tuples per
page, 500 pages.
SELECT *

Simple Selections FROM Reserves R

WHERE R.rname < ‘C%’

 Of the form  R . attr op value ( R)

 Size of result approximated as size of R *
reduction factor; we will consider how to
estimate reduction factors later.
 With no index, unsorted: Must essentially scan
the whole relation; cost is M (#pages in R).
 With an index on selection attribute: Use
index to find qualifying data entries, then
retrieve corresponding data records. (Hash
index useful only for equality selections.)
Using an Index for
Selections
 Cost depends on #qualifying tuples, and
clustering.
 Cost of finding qualifying data entries (typically
small) plus cost of retrieving records (could be
large w/o clustering).
 In example, assuming uniform distribution of
names, about 10% of tuples qualify (100 pages,
10000 tuples). With a clustered index, cost is
little more than 100 I/Os; if unclustered, upto
10000 I/Os!
Equality Joins With One Join
Column SELECT *
FROM Reserves R1, Sailors S1
WHERE R1.sid=S1.sid

 In algebra: R S.
 Common! Must be carefully
optimized. R
 S is large; so, R
selection is inefficient.

S followed by a

 Assume: M tuples in R, pR tuples per page, N tuples in S, pS

tuples per page.

In our examples, R is Reserves and S is Sailors.

 Cost metric: # of I/Os. We will ignore output costs.

Simple Nested Loops Join
foreach tuple r in R do
foreach tuple s in S do
if ri == sj then add <r, s> to result
 For each tuple in the outer relation R, we
scan the entire inner relation S.
 Cost: M + pR * M * N = 1000 +
100*1000*500 I/Os.
Page-oriented Nested Loop
Join
 Page-oriented Nested Loops join:
For each page of R, get each page
of S, and write out matching pairs
of tuples <r, s>, where r is in R-
page and S is in S-page.
 Cost: M + M*N = 1000 + 1000*500
 If smaller relation (S) is outer, cost =
500 + 500*1000
Block Nested Loops Join
 Use one page as an input buffer for scanning the
inner S, one page as the output buffer, and use all
remaining pages to hold ``block’’ of outer R.
 For each matching tuple r in R-block, s in S-page,

add <r, s> to result. Then read next R-block,

scan
R &S,S etc. Join Result
Hash table for block of R
(k < B-1 pages)
...
... ...
Input buffer for S Output buffer
Examples of Block Nested
Loops
 Cost: Scan of outer + #outer blocks * scan of inner
 #outer blocks =
 # of pages of outer / blocksize 
 Block size = available buffers - 2
 With Reserves (R) as outer, and 100 pages of R:
 Cost of scanning R is 1000 I/Os; a total of 10 blocks.

 Per block of R, we scan Sailors (S); 10*500 I/Os.

 If space for just 90 pages of R, we would scan S 12

times.
 With 100-page block of Sailors as outer:
 Cost of scanning S is 500 I/Os; a total of 5 blocks.

 Per block of S, we scan Reserves; 5*1000 I/Os.

Index Nested Loops Join
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result
 If there is an index on the join column of one relation (say
S), can make it the inner and exploit the index.
 Cost: M + ( (M*p ) * cost of finding matching S tuples)
R

 For each R tuple, cost of probing S index is about 1.2 for

hash index, 2-4 for B+ tree. Cost of then finding S tuples
(assuming Alt. (2) or (3) for data entries) depends on
clustering.
 Clustered index: 1 I/O (typical), unclustered: upto 1 I/O

per matching S tuple.

 S)
Sort-Merge Join (R
i=
j
 Sort R and S on the join column, then scan them to
do a “merge’’ (on join col.), and output result tuples.
 Advance scan of R until current R-tuple >=

current S tuple, then advance scan of S until

current S-tuple >= current R tuple; do this until
current R tuple = current S tuple.
 At this point, all R tuples with same value in Ri

(current R group) and all S tuples with same value

in Sj (current S group) match; output <r, s> for
all pairs of such tuples.
 Then resume scanning R and S.
Example of Sort-Merge
Join sid bid day rname
sid sname rating age 28 103 12/4/96 guppy
22 dustin 7 45.0 28 103 11/3/96 yuppy
28 yuppy 9 35.0 31 101 10/10/96 dustin
31 lubber 8 55.5 31 102 10/12/96 lubber
44 guppy 5 35.0 31 101 10/11/96 lubber
58 rusty 10 35.0 58 103 11/12/96 dustin
 R is scanned once; each S group is scanned once per
matching R tuple. (Multiple scans of an S group are likely to
find needed pages in buffer.)
 Cost: O(M log M) + O(N log N) + (M+N)
 The cost of scanning, M+N, could be M*N (very unlikely!)
Cost-based Optimization
 The fact that there are many algorithms for
relational operators, choosing a good query plan
using heuristics is not enough

 We can calculate the cost of each candidate

query tree with the possible algorithms for each
operator (& the difference can be significant)

 To compare such query plans, cost-based

optimization techniques are used
Cost Estimation

 For each plan considered, must estimate cost:


Must estimate cost of each operation in plan tree.

Depends on input cardinalities.

We’ve already discussed how to estimate the
cost of operations (sequential scan, index scan,
joins, etc.)


Must estimate size of result for each operation in
tree!

Use information about the input relations.

For selections and joins, assume independence of
Statistics and Catalogs

 Need information about the relations

and indexes involved. Catalogs
typically contain at least:
 # tuples (NTuples) and # pages (NPages)
for each relation.
 # distinct key values (NKeys) and NPages
for each index.
 Index height, low/high key values
(Low/High) for each tree index.
Statistics and Catalogs
 Catalogs updated periodically.
 Updating whenever data changes is
too expensive; lots of approximation
anyway, so slight inconsistency ok.
 More detailed information (e.g.,
histograms of the values in some
field) are sometimes stored.
Size Estimation and
Reduction Factors
SELECT attribute list
FROM relation list
 Consider a query block: WHERE term1 AND ... AND termk
 Maximum # tuples in result is the product of the
cardinalities of relations in the FROM clause.
 Reduction factor (RF) associated with each term reflects
the impact of the term in reducing result size. Result
cardinality = Max # tuples * product of all RF’s.

Implicit assumption that terms are independent!
System R:

Term col=value has RF 1/NKeys(I), given index I on
col

Term col1=col2 has RF 1/MAX(NKeys(I1), NKeys(I2))

Term col>value has RF (High(I)-value)/(High(I)-Low(I))
Summary
 Query Processing consists of several steps

 Query optimization is an important task in a relational DBMS.

 Must understand optimization in order to understand the

performance impact of a given database design (relations,
indexes) on a workload (set of queries).

 Two parts to optimizing a query:

 Consider a set of alternative plans (heuristics are used to

reduce the number of possible plans to consider).

 Must estimate cost of each plan that is considered.


Must estimate size of result and cost for each plan node.

Key issues: Statistics, indexes, operator implementations.

Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
13 pages
Lab Report Guidelines For Ib in Biology
No ratings yet
Lab Report Guidelines For Ib in Biology
5 pages
QEII
No ratings yet
QEII
44 pages
13 QP1
No ratings yet
13 QP1
33 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
Query Processing
No ratings yet
Query Processing
39 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
Relational Query Optimization: Warih Maharani, ST.,MT
No ratings yet
Relational Query Optimization: Warih Maharani, ST.,MT
39 pages
CSE 444: Database Internals: Section 4: Query Optimizer
No ratings yet
CSE 444: Database Internals: Section 4: Query Optimizer
16 pages
13 Query Plan Space
No ratings yet
13 Query Plan Space
71 pages
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Unit 3 - DBMS
No ratings yet
Unit 3 - DBMS
15 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
Query Execution
No ratings yet
Query Execution
87 pages
Dbms Chapter 5
No ratings yet
Dbms Chapter 5
54 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
12 Query Plan Space
No ratings yet
12 Query Plan Space
72 pages
Query Optimization
No ratings yet
Query Optimization
7 pages
Final Review
No ratings yet
Final Review
96 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
Lesson 06
No ratings yet
Lesson 06
44 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
45 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
L10-Query Evaluaion
No ratings yet
L10-Query Evaluaion
50 pages
Relational Algebra Optimization
No ratings yet
Relational Algebra Optimization
24 pages
Relational Query Optimization: CS186 R & G Chapters 12/15
No ratings yet
Relational Query Optimization: CS186 R & G Chapters 12/15
51 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
Query Optimization
No ratings yet
Query Optimization
51 pages
Hash Tables and Query Execution: March 1st, 2004
No ratings yet
Hash Tables and Query Execution: March 1st, 2004
32 pages
ADBMS TypicalQueryOptimizer
No ratings yet
ADBMS TypicalQueryOptimizer
30 pages
3 Join Optimization
No ratings yet
3 Join Optimization
32 pages
Relational Query Optimization: Plan: Tree of R.A. Ops, With Choice of Alg For Each Op
No ratings yet
Relational Query Optimization: Plan: Tree of R.A. Ops, With Choice of Alg For Each Op
7 pages
Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The
No ratings yet
Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The
20 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
Lecture 17
No ratings yet
Lecture 17
52 pages
Unit 3
No ratings yet
Unit 3
63 pages
CAS CS 460/660 Introduction To Database Systems Query Optimization
No ratings yet
CAS CS 460/660 Introduction To Database Systems Query Optimization
20 pages
Algorithms For Query Processing and Optimization
No ratings yet
Algorithms For Query Processing and Optimization
77 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
CH 02
No ratings yet
CH 02
127 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Q Evaluation
No ratings yet
Q Evaluation
17 pages
Unit 1
No ratings yet
Unit 1
23 pages
by R. Ewing and J. N. Saunders, Sydney, Currency Press, 2016, 310 PP., $49.99 (Paperback), ISBN 9781925005349
No ratings yet
by R. Ewing and J. N. Saunders, Sydney, Currency Press, 2016, 310 PP., $49.99 (Paperback), ISBN 9781925005349
2 pages
Treasurers Affidavit - Subscribed Capital
No ratings yet
Treasurers Affidavit - Subscribed Capital
2 pages
CH 09
No ratings yet
CH 09
26 pages
Tiny Chicken From Teeny-Tiny Mochimochi by Anna Hrachovec
22% (9)
Tiny Chicken From Teeny-Tiny Mochimochi by Anna Hrachovec
6 pages
Service List Printout 7-1-24
No ratings yet
Service List Printout 7-1-24
66 pages
English Bussines
No ratings yet
English Bussines
4 pages
Wells Fargo Fake Account Scandal
No ratings yet
Wells Fargo Fake Account Scandal
3 pages
MSDS - SOS Handwash Smooth&Fragrant - TUS - 24.02.2019
No ratings yet
MSDS - SOS Handwash Smooth&Fragrant - TUS - 24.02.2019
4 pages
Economics Test Paper
No ratings yet
Economics Test Paper
6 pages
Skills For Life
No ratings yet
Skills For Life
3 pages
Nexus DR Systems Brochure - 2017
No ratings yet
Nexus DR Systems Brochure - 2017
2 pages
FT4 Rapid Quantitative Test COA - F25916401AD
No ratings yet
FT4 Rapid Quantitative Test COA - F25916401AD
1 page
History of Architecture Iii: (Written Report)
No ratings yet
History of Architecture Iii: (Written Report)
16 pages
Prmary 3
No ratings yet
Prmary 3
72 pages
Solved Problems in Electromagnetics
100% (1)
Solved Problems in Electromagnetics
4 pages
Alberto Olarte Sr. National High School Itinerary of Travel Marichu C. Beterbo, Mpa
No ratings yet
Alberto Olarte Sr. National High School Itinerary of Travel Marichu C. Beterbo, Mpa
15 pages
23 Cash Flow Forecasting and Working Capital - Sample Answers To Short Answer and Data Response Question 2
No ratings yet
23 Cash Flow Forecasting and Working Capital - Sample Answers To Short Answer and Data Response Question 2
2 pages
Instruction Manual: - Maintenance
No ratings yet
Instruction Manual: - Maintenance
4 pages
Latihan Bahasa Inggris Paket A Uan
No ratings yet
Latihan Bahasa Inggris Paket A Uan
9 pages
TEST B - Key
No ratings yet
TEST B - Key
4 pages
Stepper Motor
100% (1)
Stepper Motor
9 pages
Exploring Students' Perceptions of Peer Interaction in Developing English Speaking Skills
No ratings yet
Exploring Students' Perceptions of Peer Interaction in Developing English Speaking Skills
11 pages
Counters
No ratings yet
Counters
19 pages
Degree Form
No ratings yet
Degree Form
1 page
Capnography Explained
No ratings yet
Capnography Explained
22 pages
TSB El-20
No ratings yet
TSB El-20
3 pages
Pengaruh Temperatur Dan Waktu Terhadap Proses Pengeringan Mie Kering
No ratings yet
Pengaruh Temperatur Dan Waktu Terhadap Proses Pengeringan Mie Kering
4 pages
Assessing Pulse Rate Checklist Udd 2024
No ratings yet
Assessing Pulse Rate Checklist Udd 2024
3 pages
Ou, II Axatagrman 20-Am Aki N, Oum ST 'Anak Rti MH : Abaka
No ratings yet
Ou, II Axatagrman 20-Am Aki N, Oum ST 'Anak Rti MH : Abaka
16 pages

Lecture11 Query Processing

Uploaded by

Lecture11 Query Processing

Uploaded by

Data Modeling Techniques

 Query processing considers this

 This internal form is either a tree (i.e.

 This is the heart of query processing in a

 The evaluation engine executes the query

We can express this query as a

 There can be many query blocks

 Optimizer must find an efficient

 Heuristic rules are used for

 To transform a relational algebra

 Selection operations are commutative.

 This expression is just the definition of

 A large number of plans are possible! Estimating the

 The optimizer uses heuristic rules to prune the plan

 Can you think of an instance where

 Query optimizer estimates costs to

 Some relational algebra operators have many

 Costs may differ significantly based on the

 We will study algorithms some algorithms

Simple Selections FROM Reserves R

 Of the form  R . attr op value ( R)

 Assume: M tuples in R, pR tuples per page, N tuples in S, pS

 Cost metric: # of I/Os. We will ignore output costs.

add <r, s> to result. Then read next R-block,

 Per block of R, we scan Sailors (S); 10*500 I/Os.

 If space for just 90 pages of R, we would scan S 12

 Per block of S, we scan Reserves; 5*1000 I/Os.

 For each R tuple, cost of probing S index is about 1.2 for

per matching S tuple.

current S tuple, then advance scan of S until

(current R group) and all S tuples with same value

 We can calculate the cost of each candidate

 To compare such query plans, cost-based

 For each plan considered, must estimate cost:

 Need information about the relations

 Query optimization is an important task in a relational DBMS.

 Must understand optimization in order to understand the

 Two parts to optimizing a query:

reduce the number of possible plans to consider).

You might also like