0% found this document useful (0 votes)

67 views30 pages

Overview of Query Evaluation: R&G Chapter 12

The document provides an overview of query evaluation and optimization in a database management system. It discusses that a query optimizer must consider different possible query execution plans involving various join algorithms and access paths. The optimizer's goal is to minimize the number of disk I/Os by choosing the most efficient plan. The document reviews different algorithms for relational operators like selection, projection, joins, and how to estimate their costs using statistics stored in catalogs. It provides examples of using indexes and sorting to evaluate queries more efficiently.

Uploaded by

budisetiono56

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views30 pages

Overview of Query Evaluation: R&G Chapter 12

Uploaded by

budisetiono56

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 30

Overview of Query

Evaluation
R&G Chapter 12
Lecture 13

Administrivia
Exams graded
HW2 due in a week
No Office Hours Today

Review: Storage
A DBMS has layers

Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management

Now to Midterm 2

Review
We studied Relational Algebra
Many equivalent queries, produce same result
Which expression is most efficient?
We studied file organizations
Hash files, Sorted files, Clustered &
Unclustered Indexes
Compared scans, sorting, searches, insert,
delete
Today: costs to implement relational
operations
Thurs, Tues: Sorting, Joins

Queries today, more on sorting next

time
Remember: SQL declarative language
It describes the query result, but not how to get it
Relational Algebra describes how to get results
But many rel. algebra queries equivalent
How to choose the right one for an SQL query?
In a nutshell:
When database executing query, it must generate
a variety of possible plans (relational algebra
queries), and find the cheapest one to execute.

Review: Relational Algebra

First, remember Relational Algebra
Selection ( )
(horizontal).
Projection (
(vertical).

Selects a subset of rows from relation

Retains only wanted columns from relation

Cross-product ( ) Allows us to combine two relations.

Set-difference ( ) Tuples in r1, but not in r2.
Union ( ) Tuples in r1 and/or in r2.
Intersection ()
Join (
)
Division ( / )

Overview of Query Evaluation

Plan: Tree of R.A. ops, with choice of alg for each
op.
Two main issues in query optimization:
For a given query, what plans are considered?
Algorithm to search plan space for cheapest (estimated) plan.

How is the cost of a plan estimated?

Ideally: Want to find best plan.

Practically: Avoid worst plans!
We will study the System R approach.

Overview (cont)
Query Evaluation involves:
Choosing an Access Path to get at each
table
Evaluating different algorithms for each
relational operator
Choosing the order to apply the relational
operators
These choices interrelated

Overview (cont)
Overall goal: minimize I/Os
Algorithms for evaluating relational
operators use simple ideas extensively:
Indexing: Can use WHERE conditions to
retrieve small set of tuples (selections, joins)
Iteration: Sometimes, faster to scan all tuples
even if there is an index. (sometimes scan the
data entries in an index instead of the table
itself.)
Partitioning: By using sorting or hashing, we
can partition the input tuples and replace an
expensive operation by similar operations on
smaller
inputs.
* Watch
for these
techniques as we discuss query evaluation!

Intermission: a preview of
sorting
Data can only be sorted when in memory
But tables often *much* bigger than
memory
One solution: merge sort
Every one stand up
Go to the aisle by the windows
I will take 10 people at a time onto the
stage
I will sort each group of 10 on last name
from A to Z
Groups will then be merged

Two-Way External Merge Sort

Each pass we read +
write each page in file.
N pages in the file =>
the number of passes

log 2 N 1
So total cost is:

3,4

6,2

9,4

8,7

5,6

3,1

3,4

2,6

4,9

7,8

5,6

1,3

4,7
8,9

2,3
4,6

1,3
5,6

Input file
PASS 0
1-page runs
PASS 1
2

2-page runs
PASS 2

2,3

2 N log 2 N 1

Idea: Divide and conquer:

sort subfiles and merge

4,4
6,7
8,9

1,2
3,5
6

4-page runs

PASS 3
1,2
2,3
3,4
4,5
6,6
7,8
9

8-page runs

Schema for Examples

Sailors (sid: integer, sname: string, rating: integer, age

Reserves (sid: integer, bid: integer, day: dates, rname:
Similar to old schema; rname added for
variations.
Reserves:
Each tuple is 40 bytes long, 100 tuples per page,
1000 pages.
Sailors:
Each tuple is 50 bytes long, 80 tuples per page, 500
pages.

Example 1
Select sname, bid from Sailors S, Reserves
R where s.sid = r.sid and S.age > 99
Several possible rel. algebra queries:

s.age>99)(S

s.age>99)S)

Second one may be much cheaper if right

indexes exist.

Statistics and Catalogs

Need information about relations and indexes
involved. Catalogs typically contain at least:
# tuples (NTuples), # pages (NPages) for each relation.
# distinct key values (NKeys) and NPages for each index.
Index height, low/high key values (Low/High) for each
tree index.
Catalogs updated periodically.
Updating whenever data changes is too expensive; lots
of approximation anyway, so slight inconsistency ok.
More detailed information (e.g., histograms of the
values in some field) are sometimes stored.

Access Paths Getting tuples from a

Table

Access path: a method of retrieving tuples:

File scan, or index that matches a selection (in the query)

Is an index useful for a query? If it matches

predicate:

Tree index matches (a conjunction of) terms that involve

only attributes in a prefix of the search key.
E.g., Tree index on <a, b, c>
matches the selection a=5 AND b=3, and a=5 AND
b>6,
but not b=3.
Hash index matches (a conjunction of) terms that has a
term attribute = value for every attribute in the search key.
E.g., Hash index on <a, b, c>
matches a=5 AND b=3 AND c=5;
but it does not match b=3, or a=5 AND b=3, or a>5

A Note on Complex Selections

(day<8/9/94 AND rname=Paul) OR bid=5 OR sid

Selection conditions are first converted to
conjunctive normal form (CNF):
(day<8/9/94 OR bid=5 OR sid=3 ) AND
(rname=Paul OR bid=5 OR sid=3)
We only discuss case with no ORs; see text if
you are curious about the general case.

One Approach to Selections

Find the most selective access path,
retrieve tuples using it, and
apply any remaining terms that dont match the
index:
Most selective access path: An index or file scan that will
require the fewest I/Os.
Terms that match this index reduce the number of tuples
retrieved; other terms are used to discard some retrieved
tuples, but do not affect number of tuples/pages fetched.
Consider day<8/9/94 AND bid=5 AND sid=3. A B+ tree
index on day can be used; then, bid=5 and sid=3 must
be checked for each retrieved tuple. Similarly, a hash
index on <bid, sid> could be used; day<8/9/94 must
then be checked.

Using an Index for Selections

Cost depends on #qualifying tuples, and
clustering.
Cost of finding qualifying data entries (typically
small) plus cost of retrieving records (could be
large w/o clustering).
For example, assuming uniform distribution of
names, about 5% of tuples qualify (50 pages,
5000 tuples). With a clustered index, cost is
little more than 50 I/Os; if unclustered, upto
1000 I/Os!
SELECT *
FROM
Reserves R
WHERE R.rname < C%

Projection

SELECT

DISTINCT

R.sid,

R.bid
FROM
Reserves
Expensive part is removing duplicates.
SQL systems dont removeR duplicates unless the
keyword DISTINCT is specified in a query.
Sorting Approach: Sort on <sid, bid> and remove
duplicates. (Can optimize this by dropping unwanted
information while sorting.)
Hashing Approach: Hash on <sid, bid> to create
partitions. Load partitions into memory one at a time,
build in-memory hash structure, and eliminate
duplicates.
If there is an index with both R.sid and R.bid in the
search key, may be cheaper to sort data entries!

Join: Index Nested Loops

foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result
No index: Cost M + M * N
If there is an index on the join column of one
relation (say S), can make it the inner and exploit
the index.
Cost: M + ( (M*pR) * cost of finding matching S tuples)
For each R tuple, cost of probing S index is about
1.2 for hash index, 2-4 for B+ tree. Cost of then
finding S tuples (assuming Alt. (2) or (3) for data
entries) depends on clustering.
Clustered index: 1 I/O (typical), unclustered: up to 1
I/O per matching S tuple.

Examples of Index Nested Loops

Hash-index (Alt. 2) on sid of Sailors (as inner):
Scan Reserves: 1000 page I/Os, 100*1000 tuples.
For each Reserves tuple: 1.2 I/Os to get data entry in
index, plus 1 I/O to get (the exactly one) matching
Sailors tuple. Total: 220,000 I/Os.
Hash-index (Alt. 2) on sid of Reserves (as inner):
Scan Sailors: 500 page I/Os, 80*500 tuples.
For each Sailors tuple: 1.2 I/Os to find index page with
data entries, plus cost of retrieving matching Reserves
tuples. Assuming uniform distribution, 2.5 reservations
per sailor (100,000 / 40,000). Cost of retrieving them
is 1 or 2.5 I/Os depending on whether the index is
clustered.

Join: Sort-Merge (R

i=j

Sort R and S on the join column, then scan them to do

a ``merge (on join col.), and output result tuples.
Advance scan of R until current R-tuple >= current S
tuple, then advance scan of S until current S-tuple >=
current R tuple; do this until current R tuple = current S
tuple.
At this point, all R tuples with same value in Ri (current R
group) and all S tuples with same value in Sj (current S
group) match; output <r, s> for all pairs of such tuples.
Then resume scanning R and S.
R is scanned once; each S group is scanned once per
matching R tuple. (Multiple scans of an S group are
likely to find needed pages in buffer.)

Example of Sort-Merge Join

sid
22
28
31
44
58

sname rating age

dustin
7
45.0
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0

sid
28
28
31
31
31
58

bid
103
103
101
102
101
103

day
12/4/96
11/3/96
10/10/96
10/12/96
10/11/96
11/12/96

rname
guppy
yuppy
dustin
lubber
lubber
dustin

Cost: M log M + N log N + (M+N)

The cost of scanning, M+N, could be M*N (very unlikely!)
With 35, 100 or 300 buffer pages, both Reserves and
Sailors can be sorted in 2 passes; total join cost: 7500.

Highlights of System R Optimizer

Impact:
Most widely used currently; works well for < 10 joins.
Cost estimation: Approximate art at best.
Statistics, maintained in system catalogs, used to
estimate cost of operations and result sizes.
Considers combination of CPU and I/O costs.
Plan Space: Too large, must be pruned.
Only the space of left-deep plans is considered.
Left-deep plans allow output of each operator to be pipelined
into the next operator without storing it in a temporary relation.

Cartesian products avoided.

Cost Estimation
For each plan considered, must estimate
cost:
Must estimate cost of each operation in plan
tree.
Depends on input cardinalities.
Weve already discussed how to estimate the cost
of operations (sequential scan, index scan, joins,
etc.)

Must also estimate size of result for each

operation in tree!
Use information about the input relations.
For selections and joins, assume independence of
predicates.

Size Estimation and Reduction

Factors
SELECT attribute list
FROM relation list
Consider a query block: WHERE term1 AND ... AND

termk

Maximum # tuples in result is the product of the

cardinalities of relations in the FROM clause.
Reduction factor (RF) associated with each term
reflects the impact of the term in reducing result
size. Result cardinality = Max # tuples * product
of all RFs.
Implicit assumption that terms are independent!
Term col=value has RF 1/NKeys(I), given index I on col
Term col1=col2 has RF 1/MAX(NKeys(I1), NKeys(I2))
Term col>value has RF (High(I)-value)/(High(I)-Low(I))

Motivating Example

RA Tree:

SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5

sname

bid=100

rating > 5

sid=sid

Sailors

Reserves
Cost: 500+500*1000 I/Os
(On-the-fly)
By no means the worst plan!
Plan: sname
Misses several opportunities:
selections could have been
rating > 5 (On-the-fly)
`pushed earlier, no use is
bid=100
made of any available indexes,
etc.
(Simple Nested Loops)
Goal of optimization: To find
sid=sid
more efficient plans that
compute the same answer.
Reserves

Sailors

(On-the-fly)

Alternative Plans 1
(No Indexes)

sname

sid=sid

(Scan;
write to bid=100
temp T1)

(Sort-Merge Join)

rating > 5

(Scan;
write to
temp T2)

Main difference: push selects.

Reserves
Sailors
With 5 buffers, cost of plan:
Scan Reserves (1000) + write temp T1 (10 pages, if we have 100
boats, uniform distribution).
Scan Sailors (500) + write temp T2 (250 pages, if we have 10
ratings).
Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250)
Total: 3560 page I/Os.
If we used BNL join, join cost = 10+4*250, total cost = 2770.
If we `push projections, T1 has only sid, T2 only sid and sname:
T1 fits in 3 pages, cost of BNL drops to under 250 pages, total <
2000.

Alternative Plans 2
With Indexes

sname

(On-the-fly)

rating > 5 (On-the-fly)

(Index Nested Loops,

With clustered index on bid of
sid=sid with pipelining )
Reserves, we get 100,000/100 =
1000 tuples on 1000/100 = 10
(Use hash
Sailors
index; do bid=100
pages.
not write
result to
INL with pipelining (outer is not
temp)
Reserves
materialized).
Projecting out unnecessary fields from outer
help.
Joindoesnt
column
sid is a key for Sailors.

At most one matching tuple, unclustered index on sid OK.

Decision not to push rating>5 before the join is based on

availability of sid index on Sailors.
Cost: Selection of Reserves tuples (10 I/Os); for each,
must get matching Sailors tuple (1000*1.2); total 1210

Summary
There are several alternative evaluation algorithms for
each relational operator.
A query is evaluated by converting it to a tree of operators
and evaluating the operators in the tree.
Must understand query optimization in order to fully
understand the performance impact of a given database
design (relations, indexes) on a workload (set of queries).
Two parts to optimizing a query:
Consider a set of alternative plans.
Must prune search space; typically, left-deep plans only.

Must estimate cost of each plan that is considered.

Must estimate size of result and cost for each plan node.
Key issues: Statistics, indexes, operator implementations.

ADBMS TypicalQueryOptimizer
No ratings yet
ADBMS TypicalQueryOptimizer
30 pages
13 QP1
No ratings yet
13 QP1
33 pages
Final Review
No ratings yet
Final Review
96 pages
QEII
No ratings yet
QEII
44 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
L10-Query Evaluaion
No ratings yet
L10-Query Evaluaion
50 pages
Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The
No ratings yet
Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The
20 pages
Hash Tables and Query Execution: March 1st, 2004
No ratings yet
Hash Tables and Query Execution: March 1st, 2004
32 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Relational Query Optimization: Warih Maharani, ST.,MT
No ratings yet
Relational Query Optimization: Warih Maharani, ST.,MT
39 pages
Query Processing
No ratings yet
Query Processing
77 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
09 Query Eval
No ratings yet
09 Query Eval
29 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
CSE 444: Database Internals: Section 4: Query Optimizer
No ratings yet
CSE 444: Database Internals: Section 4: Query Optimizer
16 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
3 Query Processing and Optimization-1
No ratings yet
3 Query Processing and Optimization-1
18 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
Unit 4 - Query Processing
No ratings yet
Unit 4 - Query Processing
49 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
3 - QueryProcessing - Ch15
No ratings yet
3 - QueryProcessing - Ch15
56 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
20 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
Query Processing
No ratings yet
Query Processing
39 pages
1.3 PPT - Measure of Query Cost
100% (1)
1.3 PPT - Measure of Query Cost
42 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
QueryProcessing Sorting
No ratings yet
QueryProcessing Sorting
44 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
27 pages
Introduction To Query Processing and Query Optimization Techniques
No ratings yet
Introduction To Query Processing and Query Optimization Techniques
77 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
Advanced Dbms Unit2
No ratings yet
Advanced Dbms Unit2
17 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Relational Query Optimization: CS186 R & G Chapters 12/15
No ratings yet
Relational Query Optimization: CS186 R & G Chapters 12/15
51 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
No ratings yet
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
29 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
55 pages
Relational Query Optimization: Plan: Tree of R.A. Ops, With Choice of Alg For Each Op
No ratings yet
Relational Query Optimization: Plan: Tree of R.A. Ops, With Choice of Alg For Each Op
7 pages
Chapter 13: Query Processing: Database System Concepts, 5th Ed
No ratings yet
Chapter 13: Query Processing: Database System Concepts, 5th Ed
55 pages
Q Evaluation
No ratings yet
Q Evaluation
17 pages
Database Modeling - notes-VI
No ratings yet
Database Modeling - notes-VI
8 pages
DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
Unit 1
No ratings yet
Unit 1
23 pages
UEU Basis Data Pertemuan 14
No ratings yet
UEU Basis Data Pertemuan 14
32 pages
DBMS
No ratings yet
DBMS
24 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Mastering Algorithms and Data Structures
From Everand
Mastering Algorithms and Data Structures
Manish Soni
No ratings yet
Object Sorting System Using PDF
No ratings yet
Object Sorting System Using PDF
8 pages
Intro To Human Computer Interaction
No ratings yet
Intro To Human Computer Interaction
53 pages
LPC2000 Secondary Bootloader
No ratings yet
LPC2000 Secondary Bootloader
21 pages
SpecFlow Guide
No ratings yet
SpecFlow Guide
15 pages
Openstack Laboratory Guide v4.0.3 Ocata Release
No ratings yet
Openstack Laboratory Guide v4.0.3 Ocata Release
194 pages
Aa 7 1 Awp Sec Guid
100% (1)
Aa 7 1 Awp Sec Guid
144 pages
Computer NTS MCQS
No ratings yet
Computer NTS MCQS
4 pages
Virtual Reality
No ratings yet
Virtual Reality
21 pages
Using LabVIEW and Lab Windows CVI With Teststand Manual
No ratings yet
Using LabVIEW and Lab Windows CVI With Teststand Manual
129 pages
Mininet at A Glance
No ratings yet
Mininet at A Glance
9 pages
Delta Ia-Plc As C en 20180205 Web
No ratings yet
Delta Ia-Plc As C en 20180205 Web
44 pages
C Programming
No ratings yet
C Programming
37 pages
Introduction To GeoEvent Processor - Module 1
0% (1)
Introduction To GeoEvent Processor - Module 1
37 pages
2.1.2 Software Interfaces 3
No ratings yet
2.1.2 Software Interfaces 3
10 pages
Basic Interview Question
No ratings yet
Basic Interview Question
8 pages
AWS Ac Ra Web 01 PDF
100% (1)
AWS Ac Ra Web 01 PDF
1 page
10 Minutes To Pandas
No ratings yet
10 Minutes To Pandas
26 pages
SQL Views:-: CREATE VIEW View - Name AS SELECT Column - List FROM Table - Name (WHERE Condition)
No ratings yet
SQL Views:-: CREATE VIEW View - Name AS SELECT Column - List FROM Table - Name (WHERE Condition)
19 pages
Sinamics HMI Lab
No ratings yet
Sinamics HMI Lab
10 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
2 pages
The Context-Driven Approach To Software Testing
No ratings yet
The Context-Driven Approach To Software Testing
14 pages
Lab File ON Digital Image Processing: Session-2017-2018 Signal Processing (M.TECH.I
No ratings yet
Lab File ON Digital Image Processing: Session-2017-2018 Signal Processing (M.TECH.I
20 pages
CV Atul
No ratings yet
CV Atul
1 page
Bfs Find Shortest Path On Unweighted Graph
No ratings yet
Bfs Find Shortest Path On Unweighted Graph
3 pages
Bootloader
100% (2)
Bootloader
2 pages
Java Programming
80% (5)
Java Programming
200 pages
Indexing in DBMS - Ordered Indices - Primary Index - Dense Index - Sparse Index - Secondary Index - Multilevel Indices - Clustering Index in Database
No ratings yet
Indexing in DBMS - Ordered Indices - Primary Index - Dense Index - Sparse Index - Secondary Index - Multilevel Indices - Clustering Index in Database
7 pages
DBMS Profiler
No ratings yet
DBMS Profiler
4 pages
Secure Log Storage Using Blockchain and Cloud Infrastructure
No ratings yet
Secure Log Storage Using Blockchain and Cloud Infrastructure
14 pages
Labview Serial Communication
No ratings yet
Labview Serial Communication
0 pages

Overview of Query Evaluation: R&G Chapter 12

Uploaded by

Overview of Query Evaluation: R&G Chapter 12

Uploaded by

Overview of Query

Queries today, more on sorting next

Review: Relational Algebra

Selects a subset of rows from relation

Cross-product ( ) Allows us to combine two relations.

Overview of Query Evaluation

How is the cost of a plan estimated?

Ideally: Want to find best plan.

Two-Way External Merge Sort

Idea: Divide and conquer:

Schema for Examples

Sailors (sid: integer, sname: string, rating: integer, age

Second one may be much cheaper if right

Statistics and Catalogs

Access Paths Getting tuples from a

Access path: a method of retrieving tuples:

File scan, or index that matches a selection (in the query)

Is an index useful for a query? If it matches

Tree index matches (a conjunction of) terms that involve

A Note on Complex Selections

(day<8/9/94 AND rname=Paul) OR bid=5 OR sid

One Approach to Selections

Using an Index for Selections

Join: Index Nested Loops

Examples of Index Nested Loops

Sort R and S on the join column, then scan them to do

Example of Sort-Merge Join

sname rating age

Cost: M log M + N log N + (M+N)

Highlights of System R Optimizer

Cartesian products avoided.

Must also estimate size of result for each

Size Estimation and Reduction

Maximum # tuples in result is the product of the

Main difference: push selects.

rating > 5 (On-the-fly)

(Index Nested Loops,

At most one matching tuple, unclustered index on sid OK.

Decision not to push rating>5 before the join is based on

Must estimate cost of each plan that is considered.

You might also like