0% found this document useful (0 votes)

17 views56 pages

3 - QueryProcessing - Ch15

The document discusses different file organizations and indexing strategies for database queries and updates. It covers heap files, sorted files, B-tree indexes, hash indexes, and their suitability for different query types like scans, searches, range selections, inserts and deletes. The key aspects of workload, index selection guidelines, trade-offs between query performance and update costs, and examples of clustered indexes are explained. External sorting techniques are also introduced to efficiently sort large datasets that do not fit in memory.

Uploaded by

modyxstar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views56 pages

3 - QueryProcessing - Ch15

Uploaded by

modyxstar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Algorithms for Query Processing and

Optimization
Chapter Outline

0. Introduction to Query Processing

1. Translating SQL Queries into Relational Algebra
2. External Sorting
3. Algorithms for SELECT and JOIN Operations
4. Algorithms for PROJECT and SET Operations
5. Implementing Aggregate Operations and Outer Joins
Different File Organizations
We need to understand the importance
of appropriate file organization and index
Search key = <age, sal>

Consider following options:

• Heap files
– random order; insert at end-of-file
• Sorted files
– sorted on <age, sal>
• Clustered B+ tree file
– search key <age, sal>
• Heap file with unclustered B+-tree index
– on search key <age, sal>
• Heap file with unclustered hash index
– on search key <age, sal>
Possible Operations
• Scan
– Fetch all records from disk to buffer pool
• Equality search
– Find all employees with age = 23 and sal = 50
– Fetch page from disk, then locate qualifying record in page
• Range selection
– Find all employees with age > 35
• Insert a record
– identify the page, fetch that page from disk, inset record, write back
to disk (possibly other pages as well)
• Delete a record
– similar to insert
Understanding the Workload
• A workload is a mix of queries and updates

• For each query in the workload:

– Which relations does it access?
– Which attributes are retrieved?
– Which attributes are involved in selection/join conditions? How
selective are these conditions likely to be?

• For each update in the workload:

– Which attributes are involved in selection/join conditions? How
selective are these conditions likely to be?
– The type of update (INSERT/DELETE/UPDATE), and the attributes that are
affected
Choice of Indexes

• What indexes should we create?

– Which relations should have indexes? What field(s)
should be the search key? Should we build several
indexes?

• For each index, what kind of an index should it be?

– Clustered? Hash/tree?
More on Choice of Indexes
• One approach:
– Consider the most important queries
– Consider the best plan using the current indexes
– See if a better plan is possible with an additional index.
– If so, create it.
– Obviously, this implies that we must understand how a DBMS
evaluates queries and creates query evaluation plans
– We will learn query execution and optimization later - For
now, we discuss simple 1-table queries.

• Before creating an index, must also consider the impact

on updates in the workload
Trade-offs for Indexes
• Indexes can make
– queries go faster
– updates slower

• Require disk space, too

Index Selection Guidelines
• Attributes in WHERE clause are candidates for index keys
– Exact match condition suggests hash index
– Range query suggests tree index
– Clustering is especially useful for range queries
• can also help on equality queries if there are many duplicates

• Try to choose indexes that benefit as many queries as possible

– Since only one index can be clustered per relation, choose it based on
important queries that would benefit the most from clustering

• Multi-attribute search keys should be considered when a WHERE clause

contains several conditions
– Order of attributes is important for range queries

• Note: clustered index should be used judiciously

– expensive updates, although cheaper than sorted files
Examples of Clustered Indexes
• B+ tree index on E.age can be used What is a good indexing
strategy?
to get qualifying tuples

SELECT E.dno
• How selective is the condition? FROM Emp E
– everyone > 40, index not of much WHERE E.age>40
help, scan is as good
– Suppose 10% > 40. Then?

Which attribute(s)?
• Depends on if the index is clustered Clustered/Unclustered?
– otherwise can be more expensive B+ tree/Hash?
than a linear scan
– if clustered, 10% I/O (+ index pages)
Examples of Clustered Indexes
Group-By query What is a good indexing
strategy?
• Use E.age as search key?
– Bad If many tuples have E.age > 10 or if not
clustered…. SELECT E.dno, COUNT (*)
– …using E.age index and sorting the retrieved FROM Emp E
tuples by E.dno may be costly
WHERE E.age>10
GROUP BY E.dno
• Clustered E.dno index may be better
– First group by, then count tuples with age >
10
– good when age > 10 is not too selective Which attribute(s)?
Clustered/Unclustered?
• Note: the first option is good when the B+ tree/Hash?
WHERE condition is highly selective (few
tuples have age > 10), the second is good
when not highly selective
Examples of Clustered Indexes
What is a good indexing
strategy?
Equality queries and duplicates SELECT E.dno
FROM Emp E
• Clustering on E.hobby helps WHERE E.hobby=‘Stamps’
– hobby not a candidate key, several
tuples possible Which attribute(s)?
Clustered/Unclustered?
B+ tree/Hash?
• Does clustering help now?
– (eid = key) SELECT E.dno
– Not much FROM Emp E
– at most one tuple satisfies the WHERE E.eid=50
condition
Indexes with Composite Search Keys
• Composite Search Keys: Search on a Examples of composite key
combination of fields indexes using lexicographic order.

• Equality query: Every field value is 11,80 11

equal to a constant value. E.g. wrt
<sal,age> index: 12,10 12
name age sal
– age=20 and sal =75 12,20 12
13,75 bob 12 10 13
cal 11 80
• Range query: Some field value is not a <age, sal> <age>
constant. E.g.: joe 12 20
– sal > 10 – which combination(s) 10,12 sue 13 75 10
would help? 20,12 Data records 20
75,13 sorted by name 75
– <age, sal> does not help 80,11 80
– B+tree on <sal> or <sal, age> helps <sal, age> <sal>
– has to be a prefix Data entries in index Data entries
sorted by <sal,age> sorted by <sal>
Composite Search Keys
• To retrieve Emp records with age=30 AND sal=4000, an index on
<age,sal> would be better than an index on age or an index on sal
– first find age = 30, among them search sal = 4000

• If condition is: 20<age<30 AND 3000<sal<5000:

– Clustered tree index on <age,sal> or <sal,age> is best.

• If condition is: age=30 AND 3000<sal<5000:

– Clustered <age,sal> index much better than <sal,age> index
– more index entries are retrieved for the latter

• Composite indexes are larger, updated more often

Index-Only Plans
• A number of queries can be answered without retrieving any
tuples from one or more of the relations involved if a suitable
index is available
SELECT E.dno, MIN(E.sal)
SELECT E.dno, COUNT(*) FROM Emp E
FROM Emp E GROUP BY E.dno
GROUP BY E.dno <E.dno,E.sal>
Tree index!
<E.dno> <E. age,E.sal>
Tree index!
SELECT AVG(E.sal)
• For index-only strategies, FROM Emp E
clustering is not WHERE E.age=25 AND
important
E.sal BETWEEN 3000 AND 5000
External Sorting
Why Sort?
• A classic problem in computer science
• Data requested in sorted order
– e.g., find students in increasing gpa order
• Sorting is first step in bulk loading B+ tree index
• Sorting useful for eliminating duplicate copies in a
collection of records
• Sort-merge join algorithm involves sorting
• Problem: sort 1Gb of data with 1Mb of RAM
– need to minimize the cost of disk access
2-Way Sort: Requires 3 Buffers
• Suppose N = 2k pages in the file
• Pass 0: Read a page, sort it, write it.
– repeat for all 2k pages
– only one buffer page is used
• Pass 1:
– Read two pages, sort (merge) them using one output page, write them to disk
– repeat 2k-1 times
– three buffer pages used
• Pass 2, 3, 4, ….. continue

INPUT 1

OUTPUT
INPUT 2

Main memory buffers Disk

Disk
Two-Way External Merge Sort
3,4 6,2 9,4 8,7 5,6 3,1 2 Input file
• Each sorted sub-file is PASS 0
called a run 3,4 2,6 4,9 7,8 5,6 1,3 2 1 page runs

– each run can contain 2,3 4,7 1,3

PASS 1

multiple pages 4,6 8,9 5,6 2

2 page runs

• Each pass we read + write PASS 2

each page in file. 2,3
4,4 1,2
• N pages in the file, 6,7 3,5
4-page runs

• => the number of passes 8,9 6

PASS 3
= log2 N +1
• So total cost is: 1,2

2N (log 2 N  + 1)
2,3
3,4
8-page runs
• Not too practical, but useful to 4,5
learn basic concepts for 6,6
external sorting 7,8
9
General External Merge Sort
• Suppose we have more than 3 buffer pages.
• How can we utilize them?
• To sort a file with N pages using B buffer pages:
– Pass 0: use B buffer pages:
• Produce ⌈N/B⌉ sorted runs of B pages each.
– Pass 1, 2, …, etc.: merge B-1 runs to one output page
• keep writing to disk once the output page is full

INPUT 1

INPUT 2
... ... OUTPUT ...
INPUT B-1
Disk Disk
B Main memory buffers
Cost of External Merge Sort
• Number of passes:1 + ⌈logB-1⌈N/B⌉⌉
• Cost = 2N * (# of passes) – why 2 times?
• E.g., with 5 buffer pages, to sort 108 page file:
• Pass 0: sorting 5 pages at a time
– ⌈108/5⌉ = 22 sorted runs of 5 pages each (last run is only 3
pages)
• Pass 1: 4-way merge
– ⌈22/4⌉ = 6 sorted runs of 20 pages each (last run is only 8 pages)
• Pass 2: 4-way merge
– (but 2-way for the last two runs)
– [6/4⌉ = 2 sorted runs, 80 pages and 28 pages
• Pass 3: 2-way merge (only 2 runs remaining)
– Sorted file of 108 pages
Number of Passes of External Sort
High B is good, although CPU cost increases

N B=3 B=5 B=9 B=17 B=129 B=257

100 7 4 3 2 1 1
1,000 10 5 4 3 2 2
10,000 13 7 5 4 2 2
100,000 17 9 6 5 3 3
1,000,000 20 10 7 5 3 3
10,000,000 23 12 8 6 4 3
100,000,000 26 14 9 7 4 4
1,000,000,000 30 15 10 8 5 4
I/O for External Merge Sort
• If 10 buffer pages
– either merge 9 runs at a time with one output buffer
– or 8 runs with two output buffers
• If #page I/O is the metric
– goal is minimize the #passes
– each page is read and written in each pass
• If we decide to read a block of b pages sequentially
– Suggests we should make each buffer (input/output) be a
block of pages
– But this will reduce fan-out during merge passes
• i.e. not as many runs can be merged again any more
– In practice, most files still sorted in 2-3 passes
Double Buffering
• To reduce CPU wait time for I/O request to
complete, can prefetch into `shadow block’.

INPUT 1

INPUT 1'

INPUT 2
OUTPUT
INPUT 2'
OUTPUT'

b
block size
Disk INPUT k
Disk
INPUT k'

B main memory buffers, k-way merge

Overview of Query Evaluation
Overview of Query Evaluation
• How queries are evaluated in a DBMS
– How DBMS describes data (tables and indexes)

• Relational Algebra Tree/Plan = Logical Query Plan

• Now Algorithms will be attached to each operator =

Physical Query Plan

• Plan = Tree of RA ops, with choice of algorithm for each op.

– Each operator typically implemented using a “pull” interface
– when an operator is “pulled” for the next output tuples, it
“pulls” on its inputs and computes them
Overview of Query Evaluation
• Two main issues in query optimization:

1. For a given query, what plans are considered?

– Algorithm to search plan space for cheapest
(estimated) plan
2. How is the cost of a plan estimated?

• Ideally: Want to find best plan

• Practically: Avoid worst plans!
Assumption: ignore final write

• i.e. assume that your final results can be left in

memory
– and does not be written back to disk
– unless mentioned otherwise

• Why such an assumption?

Algorithms for Joins
Equality Joins With One Join Column
SELECT *
FROM Reserves R, Sailors S
WHERE R.sid=S.sid

• In algebra: R⨝ S
– Common! Must be carefully optimized
– R X S is large; so, R X S followed by a selection is inefficient

• Cost metric: # of I/Os

– Remember, we will ignore output costs (always)
= the cost to write the final result tuples back to the disk
Common Join Algorithms
1. Nested Loops Joins (NLJ)
– Simple nested loop join
– Block nested loop join

2. Sort Merge Join Very similar to external sort

3. Hash Join
Algorithms for Joins
1. NESTED LOOP JOINS
M = 1000 pages in R
pR = 100 tuples per page
Simple Nested Loops Join
N = 500 pages in S
R⨝S pS = 80 tuples per page
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result

• For each tuple in the outer relation R, we scan the entire inner relation S.
– Cost: M + (pR * M) * N = 1000 + 100*1000*500 I/Os.

• Page-oriented Nested Loops join:

– For each page of R, get each page of S
– and write out matching pairs of tuples <r, s>
– where r is in R-page and S is in S-page.
– Cost: M + M*N = 1000 + 1000*500
How many buffer pages
do you need?
• If smaller relation (S) is outer
– Cost: N + M*N = 500 + 500*1000
Block Nested Loops Join
• Simple-Nested does not properly utilize buffer pages (uses 3 pages)
• Suppose have enough memory to hold the smaller relation R + at least two other
pages
– e.g. in the example on previous slide (S is smaller), and we need 500 + 2 = 502 pages in the buffer
• Then use one page as an input buffer for scanning the inner
– one page as the output buffer
– For each matching tuple r in R-block, s in S-page, add <r, s> to result
• Total I/O = M+N
• What if the entire smaller relation does not fit?
R&S Join Result
Entire smaller relation R

...
... ...
Input buffer for S
Output buffer
Block Nested Loops Join
• If R does not fit in memory,
– Use one page as an input buffer for scanning the inner S
– one page as the output buffer
– and use all remaining pages to hold ``block’’ of outer R.
– For each matching tuple r in R-block, s in S-page, add <r, s> to result
– Then read next R-block, scan S, etc.

R &S Join Result

Block of R
(k <= B-2 pages)
...
... ...
Input buffer for S Output buffer
M = 1000 pages in R
pR = 100 tuples per page
Cost of Block Nested Loops
N = 500 pages in S
in class pS = 80 tuples per page
• R is outer
foreach block of B-2 pages of R do
• B-2 = 100-page blocks
foreach page of S do {
• How many blocks of R? for all matching in-memory tuples r in R-
• Cost to scan R? block and s in S-page
add <r, s> to result
• Cost to scan S?
• Total Cost?

R &S Join Result

Block of R
(k <= B-2 pages)
...
... ...
Input buffer for S Output buffer
M = 1000 pages in R
pR = 100 tuples per page
Cost of Block Nested Loops
• R is outer N = 500 pages in S
pS = 80 tuples per page
• B-2 = 100-page blocks
• How many blocks of R? 10 foreach block of B-2 pages of R do
foreach page of S do {
• Cost to scan R? 1000 for all matching in-memory tuples r in R-
• Cost to scan S? 10 * 500 block and s in S-page
• Total Cost? 1000 + 5000 = 6000 add <r, s> to result
• (check yourself) • Cost: Scan of outer + #outer blocks * scan of
• If space for just 90 pages of R, we inner
would scan S 12 times, cost = 7000 – #outer blocks = ⌈#pages of outer relation/blocksize⌉

R&S Join Result

for blocked Block of R
access, (k <= B-2 pages)
it might be good
to equally divide
...
buffer pages
among R and S ... ...
(“seek time” less)
Input buffer for S Output buffer
Algorithms for Joins
2. SORT-MERGE JOINS
Sort-Merge Join

• Sort R and S on the join column

• Then scan them to do a ``merge’’ (on join col.)
• Output result tuples.
Sort-Merge Join: 1/3
• Advance scan of R until current R-tuple >= current S tuple
– then advance scan of S until current S-tuple >= current R tuple
– do this as long as current R tuple = current S tuple

Reserves
Sailors

sid bid day rname

sid sname rating age
28 103 12/4/96 guppy
22 dustin 7 45.0
28 103 11/3/96 yuppy
S 28 yuppy 9 35.0
R 31 101 10/10/96 dustin
31 lubber 8 55.5
44 guppy 5 35.0 31 102 10/12/96 lubber
58 rusty 10 35.0 31 101 10/11/96 lubber
58 103 11/12/96 dustin
Sort-Merge Join: 2/3
• At this point, all R tuples with same value in Ri (current R
group) and all S tuples with same value in Sj (current S
group)
– match
– find all the equal tuples
– output <r, s> for all pairs of such tuples

sid bid day rname

• Then resume scanning R and S

sid bid day rname

sid sname rating age
28 103 12/4/96 guppy
22 dustin 7 45.0 R 28 103 11/3/96 yuppy
S 28 yuppy 9 35.0
31 101 10/10/96 dustin
31 lubber 8 55.5
44 guppy 5 35.0 31 102 10/12/96 lubber
58 rusty 10 35.0 31 101 10/11/96 lubber
58 103 11/12/96 dustin
WRITE ONE OUTPUT TUPLE
Example of Sort-Merge Join
sid bid day rname
sid sname rating age
28 103 12/4/96 guppy
22 dustin 7 45.0
28 103 11/3/96 yuppy
28 yuppy 9 35.0
31 101 10/10/96 dustin
31 lubber 8 55.5
44 guppy 5 35.0 31 102 10/12/96 lubber
58 rusty 10 35.0 31 101 10/11/96 lubber
58 103 11/12/96 dustin

• Typical Cost: O(M log M) + O(N log N) + (M+N)

– ignoring B (as the base of log)
– cost of sorting R + sorting S + merging R, S
– The cost of scanning in merge-sort, M+N, could be M*N!
• assume the same single value of join attribute in both R and S
• but it is extremely unlikely
M = 1000 pages in R
pR = 100 tuples per page
Cost of Sort-Merge Join N = 500 pages in S
pS = 80 tuples per page
sid bid day rname
sid sname rating age 28 103 12/4/96 guppy
22 dustin 7 45.0
28 103 11/3/96 yuppy
28 yuppy 9 35.0
31 101 10/10/96 dustin
31 lubber 8 55.5
44 guppy 5 35.0 31 102 10/12/96 lubber
58 rusty 10 35.0 31 101 10/11/96 lubber
58 103 11/12/96 dustin
•
•
100 buffer pages
Sort R:
• Check yourself:
–
–
(pass 0) 1000/100 = 10 sorted runs
(pass 1) merge 10 runs
– Consider #buffer
– read + write, 2 passes pages 35, 100, 300
– 4 * 1000 = 4000 I/O
• Similarly, Sort S: 4 * 500 = 2000 I/O – Cost of sort-merge =
• Second merge phase of sort-merge join 7500 in all three
– another 1000 + 500 = 1500 I/O
– assume uniform ~2.5 matches per sid, so M+N is – Cost of block nested
sufficient
• Total 7500 I/O 16500, 6500, 2500
Algorithms for Joins
3. HASH JOINS
Two Phases

1. Partition Phase
– partition R and S using the same hash function h
2. Probing Phase
– join tuples from the same partition (same h(..)
value) of R and S
– tuples in different partition of h will never join
– use a “different” hash function h2 for joining
these tuples
• (why different – see next slide first)
S R

Hash-Join Original
Relation OUTPUT Partitions
1

• Partition both INPUT 2

1
relations using hash hash 2
function h ... function
h B-1
• R tuples in partition i B-1
will only match S
Disk B main memory buffers Disk
tuples in partition i
Partitions
of R & S Join Result
Partitions Hash table for partition
❖ Read in a partition of R, Ri (k < B-1 pages)
hash
hash it using h2 (≠ h). fn
❖ Scan matching partition of 1 h2
S, search for matches.
2 h2

Input buffer Output

for Si buffer
B-1
B main memory buffers Disk
Disk
Visit in next lecture

Cost of Hash-Join
• In partitioning phase
– read+write both relns; 2(M+N)
– In matching phase, read both relns; M+N I/Os
– remember – we are not counting final write

• In our running example, this is a total of 4500 I/Os

– 3 * (1000 + 500)
– Compare with the previous joins
Sort-Merge Join vs. Hash Join

• Both can have a cost of 3(M+N) I/Os

– if sort-merge gets enough buffer
• Hash join holds smaller relation in buffer-
better if limited buffer
• Hash Join shown to be highly parallelizable
• Sort-Merge less sensitive to data skew
– also result is sorted
Other operator algorithms
SELECT *
Algorithms FROM Reserves R
WHERE R.rname = ‘Joe’
for Selection
• No index, unsorted data
– Scan entire relation
– May be expensive if not many `Joe’s
• No index, sorted data (on ‘rname’)
– locate the first tuple, scan all matching tuples
– first binary search, then scan depends on matches
• B+-tree index, Hash index
– Discussed earlier
– Cost of accessing data entries + matching data records
– Depends on clustered/unclustered
• More complex condition like day<8/9/94 AND bid=5 AND sid=3
– Either use one index, then filter
– Or use two indexes, then take intersection, then apply third condition
– etc.
SELECT DISTINCT
Algorithms R.sid, R.bid
FROM Reserves R
for Projection
• Two parts
– Remove fields: easy
– Remove duplicates (if distinct is specified): expensive
• Sorting-based
– Sort, then scan adjacent tuples to remove duplicates
– Can eliminate unwanted attributes in the first pass of merge sort
• Hash-based
– Exactly like hash join
– Partition only one relation in the first pass
– Remove duplicates in the second pass
• Sort vs Hash
– Sorting handles skew better, returns results sorted
– Hash table may not fit in memory – sorting is more standard
• Index-only scan may work too
– If all required attributes are part of index
Algorithms for Set Operations

• Intersection, cross product are special cases of

joins
• Union, Except
– Sort-based
– Hash-based
– Very similar to joins and projection
Algorithms for Aggregate Operations
• SUM, AVG, MIN etc.
– again similar to previous approaches

• Without grouping:
– In general, requires scanning the relation.
– Given index whose search key includes all attributes in the SELECT
or WHERE clauses, can do index-only scan

• With grouping:
– Sort on group-by attributes
– or, hash on group-by attributes
– can combine sort/hash and aggregate
– can do index-only scan here as well

Dbms PPT For Chapter 7
No ratings yet
Dbms PPT For Chapter 7
45 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Experiment 4 Lab Report
No ratings yet
Experiment 4 Lab Report
13 pages
Final Review
No ratings yet
Final Review
96 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
QueryProcessing Sorting
No ratings yet
QueryProcessing Sorting
44 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
V Unit
No ratings yet
V Unit
15 pages
V Unit
No ratings yet
V Unit
36 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
20 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
27 pages
File Organization
No ratings yet
File Organization
41 pages
Storage and Indexing
No ratings yet
Storage and Indexing
41 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
13 QP1
No ratings yet
13 QP1
33 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
Unit 2
No ratings yet
Unit 2
104 pages
File Structure Data Storage Query Evaluation Indexing and Hashing
No ratings yet
File Structure Data Storage Query Evaluation Indexing and Hashing
14 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
External Sorting: R & G - Chapter 13
No ratings yet
External Sorting: R & G - Chapter 13
52 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Lec 8 Indexing & Data Structures For Query Processing
No ratings yet
Lec 8 Indexing & Data Structures For Query Processing
51 pages
External Sorting: R & G - Chapter 13
No ratings yet
External Sorting: R & G - Chapter 13
52 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
IT3031 L06 Indexing
No ratings yet
IT3031 L06 Indexing
45 pages
File Organization
No ratings yet
File Organization
19 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Introduction To Query Processing and Query Optimization Techniques
No ratings yet
Introduction To Query Processing and Query Optimization Techniques
77 pages
26 - Databse Indexes
No ratings yet
26 - Databse Indexes
48 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Chapter 2-1: Query Processing
No ratings yet
Chapter 2-1: Query Processing
31 pages
Indexing
No ratings yet
Indexing
62 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
Index 1
No ratings yet
Index 1
25 pages
10 Sorting
No ratings yet
10 Sorting
3 pages
SQL Server: Tips and Tricks - 2
From Everand
SQL Server: Tips and Tricks - 2
Priyanka Agarwal
4.5/5 (3)
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Ctrl+Shift+Enter Mastering Excel Array Formulas: Do the Impossible with Excel Formulas Thanks to Array Formula Magic
From Everand
Ctrl+Shift+Enter Mastering Excel Array Formulas: Do the Impossible with Excel Formulas Thanks to Array Formula Magic
Mike Girvin
4/5 (11)
Mock Test AI 11 July 2021
No ratings yet
Mock Test AI 11 July 2021
26 pages
SAP HANA SP5 Course Content Details
No ratings yet
SAP HANA SP5 Course Content Details
3 pages
Operating Systems Exam Guide
100% (1)
Operating Systems Exam Guide
32 pages
1 Testing Interview Questions Ans
No ratings yet
1 Testing Interview Questions Ans
15 pages
FDS Answer Bank 11-20
No ratings yet
FDS Answer Bank 11-20
38 pages
Osy Unit 2
No ratings yet
Osy Unit 2
16 pages
Final Practical File 21058570040
No ratings yet
Final Practical File 21058570040
19 pages
Tìm hiểu GAE 01 - Building.High-Perf
No ratings yet
Tìm hiểu GAE 01 - Building.High-Perf
41 pages
Python Password Generator
No ratings yet
Python Password Generator
15 pages
Concurrency in OOPS
No ratings yet
Concurrency in OOPS
62 pages
Section Solutions #5: Winky Winky Pinky Pinky
No ratings yet
Section Solutions #5: Winky Winky Pinky Pinky
6 pages
Oracle
No ratings yet
Oracle
23 pages
QB 2
No ratings yet
QB 2
3 pages
C l2 - Hands-On Assignment
No ratings yet
C l2 - Hands-On Assignment
8 pages
04 PictureBlocks PDF
No ratings yet
04 PictureBlocks PDF
46 pages
1537462316sai Keerthi Java
No ratings yet
1537462316sai Keerthi Java
5 pages
Formal Chapter 2
No ratings yet
Formal Chapter 2
21 pages
2019 ABAP For HANA KT Session
No ratings yet
2019 ABAP For HANA KT Session
18 pages
QN
No ratings yet
QN
50 pages
Scoring Guidelines
No ratings yet
Scoring Guidelines
5 pages
SwadhinSethiResume3y 3m
No ratings yet
SwadhinSethiResume3y 3m
1 page
Shankar Final CS
No ratings yet
Shankar Final CS
30 pages
Function Module in ABAP
No ratings yet
Function Module in ABAP
9 pages
Lab 02 Conditional Flow Charts and Pseudo Code Fall 2019-20
No ratings yet
Lab 02 Conditional Flow Charts and Pseudo Code Fall 2019-20
6 pages
SAP ABAP 4 Tutorial - Simple Tabstrip Control
No ratings yet
SAP ABAP 4 Tutorial - Simple Tabstrip Control
11 pages
QA Assignment 02
No ratings yet
QA Assignment 02
2 pages
Worflow Configuration
No ratings yet
Worflow Configuration
14 pages
L11 - Sound and Audio
No ratings yet
L11 - Sound and Audio
36 pages
Cfdem 2017
No ratings yet
Cfdem 2017
27 pages

3 - QueryProcessing - Ch15

Uploaded by

3 - QueryProcessing - Ch15

Uploaded by

Algorithms for Query Processing and

0. Introduction to Query Processing

Consider following options:

• For each query in the workload:

• For each update in the workload:

• What indexes should we create?

• For each index, what kind of an index should it be?

• Before creating an index, must also consider the impact

• Require disk space, too

• Try to choose indexes that benefit as many queries as possible

• Multi-attribute search keys should be considered when a WHERE clause

• Note: clustered index should be used judiciously

• Equality query: Every field value is 11,80 11

• If condition is: 20<age<30 AND 3000<sal<5000:

• If condition is: age=30 AND 3000<sal<5000:

• Composite indexes are larger, updated more often

Main memory buffers Disk

– each run can contain 2,3 4,7 1,3

multiple pages 4,6 8,9 5,6 2

• Each pass we read + write PASS 2

• => the number of passes 8,9 6

N B=3 B=5 B=9 B=17 B=129 B=257

B main memory buffers, k-way merge

• Relational Algebra Tree/Plan = Logical Query Plan

• Now Algorithms will be attached to each operator =

• Plan = Tree of RA ops, with choice of algorithm for each op.

1. For a given query, what plans are considered?

• Ideally: Want to find best plan

• i.e. assume that your final results can be left in

• Why such an assumption?

• Cost metric: # of I/Os

2. Sort Merge Join Very similar to external sort

• Page-oriented Nested Loops join:

R &S Join Result

R &S Join Result

R&S Join Result

• Sort R and S on the join column

sid bid day rname

sid bid day rname

• Then resume scanning R and S

sid bid day rname

sid bid day rname

sid bid day rname

• Typical Cost: O(M log M) + O(N log N) + (M+N)

• Partition both INPUT 2

Input buffer Output

• In our running example, this is a total of 4500 I/Os

• Both can have a cost of 3(M+N) I/Os

• Intersection, cross product are special cases of

You might also like