0% found this document useful (0 votes)
42 views53 pages

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

 parse query  generate logical plan  generate physical plans  estimate costs  pick best 51 STRUCTURE OF QUERY OPTIMIZERS: System R: 15-415 - C. Faloutsos  break query in query blocks  simple queries (ie., no joins): look at statistics and pick best  joins: try all join orders, pick best

Uploaded by

sammy21791
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views53 pages

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

 parse query  generate logical plan  generate physical plans  estimate costs  pick best 51 STRUCTURE OF QUERY OPTIMIZERS: System R: 15-415 - C. Faloutsos  break query in query blocks  simple queries (ie., no joins): look at statistics and pick best  joins: try all join orders, pick best

Uploaded by

sammy21791
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

CARNEGIE MELLON UNIV. DEPT.

OF COMPUTER SCIENCE 15-415 - DATABASE APPLICATIONS


C. Faloutsos Query Optimization part 2

GENERAL OVERVIEW - REL. MODEL


Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing

15-415 - C. Faloutsos

Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections
15-415 - C. Faloutsos

(simple; complex predicates) sorting; projections joins

estimate cost; pick best

REMINDER STATISTICS:

Sr
#1 #2 #3

for each relation r we keep


nr

15-415 - C. Faloutsos

: # tuples; Sr : size of tuple in bytes V(A,r): number of distinct values of attr. A

#nr

DERIVABLE STATISTICS
fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr )

15-415 - C. Faloutsos

DERIVABLE STATISTICS

SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) eg: 10,000 students, 10 colleges how many students in SCS?

15-415 - C. Faloutsos

SELECTIONS
we saw simple predicates (A=constant; eg., name=Smith) how about more complex predicates, like

15-415 - C. Faloutsos

salary > 10K age = 30 and job-code=analyst


what is their selectivity?

SELECTIONS COMPLEX PREDICATES

selectivity sel(P) of predicate P :


== fraction of tuples that qualify sel(P) = SC(P) / nr
15-415 - C. Faloutsos

SELECTIONS COMPLEX PREDICATES


eg., assume that V(grade, TAKES)=5 distinct values simple predicate P: A=constant count
sel(A=constant)

15-415 - C. Faloutsos

= 1/V(A,r) eg., sel(grade=B) = 1/5

(what if V(A,r) is unknown??) F

A grade
9

SELECTIONS COMPLEX PREDICATES

range query: sel( grade >= C)


sel(A>a)

= (Amax a) / (Amax Amin)


15-415 - C. Faloutsos

count

A grade

10

SELECTIONS - COMPLEX PREDICATES

negation: sel( grade != C)


sel(

not P) = 1 sel(P) (Observation: selectivity =~ probability)

15-415 - C. Faloutsos

count P

A grade

11

SELECTIONS COMPLEX PREDICATES


conjunction:
sel(

grade = C and course = 415) sel(P1 and P2) = sel(P1) * sel(P2) INDEPENDENCE ASSUMPTION

15-415 - C. Faloutsos

P1

P2

12

SELECTIONS COMPLEX PREDICATES


disjunction:
sel(

grade = C or course = 415) sel(P1 or P2) = sel(P1) + sel(P2) sel(P1 and P2) = sel(P1) + sel(P2) sel(P1)*sel(P2) INDEPENDENCE ASSUMPTION, again

15-415 - C. Faloutsos

P1

P2

13

SELECTIONS COMPLEX PREDICATES


disjunction: in general
15-415 - C. Faloutsos

sel(P1 or P2 or Pn) = 1 - (1- sel(P1) ) * (1 - sel(P2) ) * (1 - sel(Pn))

P1

P2

14

SELECTIONS SUMMARY
sel(A=constant)

= 1/V(A,r) sel( A>a) = (Amax a) / (Amax Amin) sel(not P) = 1 sel(P) sel(P1 and P2) = sel(P1) * sel(P2) sel(P1 or P2) = sel(P1) + sel(P2) sel(P1)*sel(P2)
UNIFORMITY

15-415 - C. Faloutsos

and INDEPENDENCE ASSUMPTIONS

15

Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections
15-415 - C. Faloutsos

(simple; complex predicates) sorting; projections joins

estimate cost; pick best

16

SORTING
Assume br blocks of rel. r, and only M (<br) buffers in main memory Q1: how to sort (external sorting)? Q2: cost?

15-415 - C. Faloutsos

r
1 M 1 2

...

...
br
17

SORTING
Q1: how to sort (external sorting)? A1:
create merge
15-415 - C. Faloutsos

sorted runs of size M

r
1 M 1 2

...

...
br
18

SORTING
create

sorted runs of size M (how many?) merge them (how?)

15-415 - C. Faloutsos

...

...
19

SORTING
create

sorted runs of size M merge first M-1 runs into a sorted run of

15-415 - C. Faloutsos

(M-1) *M, ...

...

...

..

20

SORTING
How How

many steps we need to do?


15-415 - C. Faloutsos

i, where M*(M-1)^i > br


many reads/writes per step? br+br

...

...

..

21

SORTING
In

short, excluding the final write, we need


15-415 - C. Faloutsos

ceil(log(br/M) / log(M-1)) * 2 * br + br

...

...

..

22

Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections
15-415 - C. Faloutsos

(simple; complex predicates) sorting; projections, aggregations joins

estimate cost; pick best

23

PROJECTION - DUP. ELIMINATION


eg., select distinct c-id from TAKES How? Pros and cons?
15-415 - C. Faloutsos

24

SET OPERATIONS
eg., select * from REGULAR-STUDENT union select * from SPECIAL-STUDENT How? Pros and cons?
15-415 - C. Faloutsos

25

AGGREGATIONS
eg., select ssn, avg(grade) from TAKES group by ssn How?
15-415 - C. Faloutsos

26

Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections; joins

15-415 - C. Faloutsos

sorting; projections, aggregations

2-way joins n-way joins

estimate cost; pick best


27

2-WAY JOINS
output size estimation: r JOIN s nr, ns tuples each case#1: cartesian product (R, S have no common attribute) #of output tuples=??

15-415 - C. Faloutsos

28

2-WAY JOINS
output size estimation: r JOIN s case#2: r(A,B), s(A,C,D), A is cand. key for r #of output tuples=??

15-415 - C. Faloutsos

<=ns r(A, ...) s(A, ......) nr ns

29

2-WAY JOINS
output size estimation: r JOIN s case#3: r(A,B), s(A,C,D), A is cand. key for neither (is it possible??) #of output tuples=??

15-415 - C. Faloutsos

r(A, ...) s(A, ......) nr ns


30

2-WAY JOINS

#of output tuples~ nr * ns/V(A,s) or ns * nr/V(A,r) (whichever is less)

15-415 - C. Faloutsos

r(A, ...) s(A, ......) nr ns


31

Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections; joins

15-415 - C. Faloutsos

sorting; projections, aggregations

2-way joins - output size estimation; algorithms n-way joins

estimate cost; pick best


32

2-WAY JOINS
algorithm(s) for r JOIN s? nr, ns tuples each

15-415 - C. Faloutsos

r(A, ...) s(A, ......) nr ns


33

2-WAY JOINS

Algorithm #0: (naive) nested loop (SLOW!) for each tuple tr of r for each tuple ts of s print, if they match

15-415 - C. Faloutsos

r(A, ...) s(A, ......) nr ns


34

2-WAY JOINS
Algorithm #0: why is it bad? how many disk accesses (br and bs are the number of blocks for r and s)? br + nr*bs

15-415 - C. Faloutsos

r(A, ...) s(A, ......) nr ns


35

2-WAY JOINS

Algorithm #1: Blocked nested-loop join


15-415 - C. Faloutsos

read

in a block of r

read in a block of s print matching tuples

cost: br + br * bs

r(A, ...) s(A, ......) nr, br ns records, bs blocks


36

2-WAY JOINS

Arithmetic example:
15-415 - C. Faloutsos

nr

= 10,000 tuples, br = 1,000 blocks ns = 1,000 tuples, bs = 200 blocks

alg#0: 2,001,000 d.a. r(A, ...) 10,000 1,000 s(A, ......) alg#1: 201,000 d.a. 1,000 records, 200 blocks

37

2-WAY JOINS

Observation1: Algo#1: asymmetric:


cost:

br + br * bs - reverse roles: cost= bs + bs*br

15-415 - C. Faloutsos

Best choice? smallest relation in outer loop

r(A, ...) s(A, ......) nr, br ns records, bs blocks


38

2-WAY JOINS
Observation2 [NOT IN BOOK]:
what

if we have k buffers available?

15-415 - C. Faloutsos

read in k-1 blocks of r read in a block of s r(A, ...) s(A, ......) nr, br print matching tuples

ns records, bs blocks
39

2-WAY JOINS
read in k-1 blocks of r read in a block of s print matching tuples
Cost? br + br/(k-1) * bs

r(A, ...) s(A, ......) nr, br

what if br=k-1? what if we assign k1 blocks to inner?)

ns records, bs blocks
40

15-415 - C. Faloutsos

2-WAY JOINS

Observation3: can we get rid of the br term?


cost:

br + br * bs

15-415 - C. Faloutsos

A: read the inner relation backwards half of the times! Q: cons? r(A, ...) s(A, ......) nr, br

ns records, bs blocks
41

2-WAY JOINS
Other algorithm(s) for r JOIN s? nr, ns tuples each

15-415 - C. Faloutsos

r(A, ...) s(A, ......) nr ns


42

2-WAY JOINS - OTHER ALGOS

sort-merge
15-415 - C. Faloutsos

sort

r; sort s; merge sorted versions (good, if one or both are already sorted)

r(A, ...) s(A, ......) nr ns


43

2-WAY JOINS - OTHER ALGOS


sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order

15-415 - C. Faloutsos

r(A, ...) s(A, ......) nr ns


44

2-WAY JOINS - OTHER ALGOS


use an existing index, or even build one on the fly cost: br + nr * c (c: look-up cost)

15-415 - C. Faloutsos

r(A, ...) nr

s(A, ......) ns
45

2-WAY JOINS - OTHER ALGOS

hash join:
hash

r into (0, 1, ..., max) buckets hash s into buckets (same hash function) join each pair of matching buckets

15-415 - C. Faloutsos

r(A, ...)

0 1

s(A, ......)

max 46

2-WAY JOINS - HASH JOIN DETAILS


how

to join each pair of partitions Hr-i, Hs-i ? A: build another hash table for Hs-i, and probe it with each tuple of Hr-i

15-415 - C. Faloutsos

r(A, ...)

Hr-0

Hs-0 0 1

s(A, ......)

max 47

2-WAY JOINS - HASH JOIN DETAILS


what

if Hs-i is too large to fit in main-memory? A: recursive partitioning more details (overflows, hybrid hash joins): in book cost of hash join? (under certain assumptions)

15-415 - C. Faloutsos

3(br + bs) + 2* max

48

Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections; joins

15-415 - C. Faloutsos

sorting; projections, aggregations

2-way joins - output size estimation; algorithms n-way joins

estimate cost; pick best


49

N-WAY JOINS
r1 JOIN r2 JOIN ... JOIN rn typically, break problem into 2-way joins

15-415 - C. Faloutsos

50

STRUCTURE OF QUERY OPTIMIZERS:


System R:
15-415 - C. Faloutsos

break

query in query blocks simple queries (ie., no joins): look at stats n-way joins: left-deep join trees; ie., only one intermediate result at a time
pros: smaller search space; pipelining cons: may miss optimal

2-way

joins: NL and sort-merge

r1

r2 r3 r4
51

STRUCTURE OF QUERY OPTIMIZERS:


More heuristics by Oracle, Sybase and Starburst (-> DB2) : in book In general: q-opt is very important for large databases. (explain select <sql-statement> gives plan)
15-415 - C. Faloutsos

52

Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections
15-415 - C. Faloutsos

(simple; complex predicates) sorting; projections joins

estimate cost; pick best

53

You might also like