Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications
15-415 - C. Faloutsos
Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections
15-415 - C. Faloutsos
REMINDER STATISTICS:
Sr
#1 #2 #3
15-415 - C. Faloutsos
#nr
DERIVABLE STATISTICS
fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr )
15-415 - C. Faloutsos
DERIVABLE STATISTICS
SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) eg: 10,000 students, 10 colleges how many students in SCS?
15-415 - C. Faloutsos
SELECTIONS
we saw simple predicates (A=constant; eg., name=Smith) how about more complex predicates, like
15-415 - C. Faloutsos
15-415 - C. Faloutsos
A grade
9
count
A grade
10
15-415 - C. Faloutsos
count P
A grade
11
grade = C and course = 415) sel(P1 and P2) = sel(P1) * sel(P2) INDEPENDENCE ASSUMPTION
15-415 - C. Faloutsos
P1
P2
12
grade = C or course = 415) sel(P1 or P2) = sel(P1) + sel(P2) sel(P1 and P2) = sel(P1) + sel(P2) sel(P1)*sel(P2) INDEPENDENCE ASSUMPTION, again
15-415 - C. Faloutsos
P1
P2
13
P1
P2
14
SELECTIONS SUMMARY
sel(A=constant)
= 1/V(A,r) sel( A>a) = (Amax a) / (Amax Amin) sel(not P) = 1 sel(P) sel(P1 and P2) = sel(P1) * sel(P2) sel(P1 or P2) = sel(P1) + sel(P2) sel(P1)*sel(P2)
UNIFORMITY
15-415 - C. Faloutsos
15
Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections
15-415 - C. Faloutsos
16
SORTING
Assume br blocks of rel. r, and only M (<br) buffers in main memory Q1: how to sort (external sorting)? Q2: cost?
15-415 - C. Faloutsos
r
1 M 1 2
...
...
br
17
SORTING
Q1: how to sort (external sorting)? A1:
create merge
15-415 - C. Faloutsos
r
1 M 1 2
...
...
br
18
SORTING
create
15-415 - C. Faloutsos
...
...
19
SORTING
create
sorted runs of size M merge first M-1 runs into a sorted run of
15-415 - C. Faloutsos
...
...
..
20
SORTING
How How
...
...
..
21
SORTING
In
ceil(log(br/M) / log(M-1)) * 2 * br + br
...
...
..
22
Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections
15-415 - C. Faloutsos
23
24
SET OPERATIONS
eg., select * from REGULAR-STUDENT union select * from SPECIAL-STUDENT How? Pros and cons?
15-415 - C. Faloutsos
25
AGGREGATIONS
eg., select ssn, avg(grade) from TAKES group by ssn How?
15-415 - C. Faloutsos
26
Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections; joins
15-415 - C. Faloutsos
2-WAY JOINS
output size estimation: r JOIN s nr, ns tuples each case#1: cartesian product (R, S have no common attribute) #of output tuples=??
15-415 - C. Faloutsos
28
2-WAY JOINS
output size estimation: r JOIN s case#2: r(A,B), s(A,C,D), A is cand. key for r #of output tuples=??
15-415 - C. Faloutsos
29
2-WAY JOINS
output size estimation: r JOIN s case#3: r(A,B), s(A,C,D), A is cand. key for neither (is it possible??) #of output tuples=??
15-415 - C. Faloutsos
2-WAY JOINS
15-415 - C. Faloutsos
Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections; joins
15-415 - C. Faloutsos
2-WAY JOINS
algorithm(s) for r JOIN s? nr, ns tuples each
15-415 - C. Faloutsos
2-WAY JOINS
Algorithm #0: (naive) nested loop (SLOW!) for each tuple tr of r for each tuple ts of s print, if they match
15-415 - C. Faloutsos
2-WAY JOINS
Algorithm #0: why is it bad? how many disk accesses (br and bs are the number of blocks for r and s)? br + nr*bs
15-415 - C. Faloutsos
2-WAY JOINS
read
in a block of r
cost: br + br * bs
2-WAY JOINS
Arithmetic example:
15-415 - C. Faloutsos
nr
alg#0: 2,001,000 d.a. r(A, ...) 10,000 1,000 s(A, ......) alg#1: 201,000 d.a. 1,000 records, 200 blocks
37
2-WAY JOINS
15-415 - C. Faloutsos
2-WAY JOINS
Observation2 [NOT IN BOOK]:
what
15-415 - C. Faloutsos
read in k-1 blocks of r read in a block of s r(A, ...) s(A, ......) nr, br print matching tuples
ns records, bs blocks
39
2-WAY JOINS
read in k-1 blocks of r read in a block of s print matching tuples
Cost? br + br/(k-1) * bs
ns records, bs blocks
40
15-415 - C. Faloutsos
2-WAY JOINS
br + br * bs
15-415 - C. Faloutsos
A: read the inner relation backwards half of the times! Q: cons? r(A, ...) s(A, ......) nr, br
ns records, bs blocks
41
2-WAY JOINS
Other algorithm(s) for r JOIN s? nr, ns tuples each
15-415 - C. Faloutsos
sort-merge
15-415 - C. Faloutsos
sort
r; sort s; merge sorted versions (good, if one or both are already sorted)
15-415 - C. Faloutsos
15-415 - C. Faloutsos
r(A, ...) nr
s(A, ......) ns
45
hash join:
hash
r into (0, 1, ..., max) buckets hash s into buckets (same hash function) join each pair of matching buckets
15-415 - C. Faloutsos
r(A, ...)
0 1
s(A, ......)
max 46
to join each pair of partitions Hr-i, Hs-i ? A: build another hash table for Hs-i, and probe it with each tuple of Hr-i
15-415 - C. Faloutsos
r(A, ...)
Hr-0
Hs-0 0 1
s(A, ......)
max 47
if Hs-i is too large to fit in main-memory? A: recursive partitioning more details (overflows, hybrid hash joins): in book cost of hash join? (under certain assumptions)
15-415 - C. Faloutsos
48
Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections; joins
15-415 - C. Faloutsos
N-WAY JOINS
r1 JOIN r2 JOIN ... JOIN rn typically, break problem into 2-way joins
15-415 - C. Faloutsos
50
break
query in query blocks simple queries (ie., no joins): look at stats n-way joins: left-deep join trees; ie., only one intermediate result at a time
pros: smaller search space; pipelining cons: may miss optimal
2-way
r1
r2 r3 r4
51
52
Q-OPT STEPS
bring query in internal form (eg., parse tree) into canonical form (syntactic q-opt) generate alt. plans
selections
15-415 - C. Faloutsos
53