0% found this document useful (0 votes)
49 views30 pages

Query Planning & Optimization: Intro To Database Systems Andy Pavlo

This document discusses query optimization in database systems. It begins by explaining that SQL is declarative and the database management system (DBMS) determines the best execution plan. It then discusses the IBM System R optimizer from the 1970s that introduced many concepts still used today like cost-based optimization. The document outlines the architecture of a query optimizer and the process of generating logical and physical query plans. It also notes that query optimization is an NP-hard problem.

Uploaded by

akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views30 pages

Query Planning & Optimization: Intro To Database Systems Andy Pavlo

This document discusses query optimization in database systems. It begins by explaining that SQL is declarative and the database management system (DBMS) determines the best execution plan. It then discusses the IBM System R optimizer from the 1970s that introduced many concepts still used today like cost-based optimization. The document outlines the architecture of a query optimizer and the process of generating logical and physical query plans. It also notes that query optimization is an NP-hard problem.

Uploaded by

akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Query Planning &

14 Optimization
Part I
Intro to Database Systems Andy Pavlo
15-445/15-645
Fall 2019 AP Computer Science
Carnegie Mellon University
2

ADMINISTRIVIA

Mid-Term Exam is Wed Oct 16th @ 12:00pm


→ See mid-term exam guide for more info.

Project #2 is due Sun Oct 20th @ 11:59pm

CMU 15-445/645 (Fall 2019)


3

Q U E R Y O P T I M I Z AT I O N

Remember that SQL is declarative.


→ User tells the DBMS what answer they want, not how to
get the answer.

There can be a big difference in performance based


on plan is used:
→ See last week: 1.3 hours vs. 0.45 seconds

CMU 15-445/645 (Fall 2019)


4

IBM SYSTEM R

First implementation of a query optimizer from


the 1970s.
→ People argued that the DBMS could never choose a query
plan better than what a human could write.

Many concepts and design decisions from the


System R optimizer are still used today.

CMU 15-445/645 (Fall 2019)


5

Q U E R Y O P T I M I Z AT I O N

Heuristics / Rules
→ Rewrite the query to remove stupid / inefficient things.
→ These techniques may need to examine catalog, but they
do not need to examine data.

Cost-based Search
→ Use a model to estimate the cost of executing a plan.
→ Evaluate multiple equivalent plans for a query and pick
the one with the lowest cost.

CMU 15-445/645 (Fall 2019)


6

ARCHITECTURE OVERVIEW
Cost
Application Schema Info
Model
System
Catalog

1 SQL Query 5 Logical


Estimates

Plan
Schema Info Optimizer
SQL Rewriter
(Optional)
Name→Internal ID
Tree Rewriter 6 Physical
(Optional)
Plan
2 SQL Query
Binder
4 Logical
Plan
Parser
3 Abstract
Syntax
Tree
CMU 15-445/645 (Fall 2019)
7

LOGICAL VS. PHYSICAL PL ANS

The optimizer generates a mapping of a logical


algebra expression to the optimal equivalent
physical algebra expression.

Physical operators define a specific execution


strategy using an access path.
→ They can depend on the physical format of the data that
they process (i.e., sorting, compression).
→ Not always a 1:1 mapping from logical to physical.

CMU 15-445/645 (Fall 2019)


8

Q U E R Y O P T I M I Z AT I O N I S N P - H A R D

This is the hardest part of building a DBMS.


If you are good at this, you will get paid $$$.

People are starting to look at employing ML to


improve the accuracy and efficacy of optimizers.

I am expanding the Advanced DB Systems class to


cover this topic in greater detail.

CMU 15-445/645 (Fall 2019)


9

T O D AY ' S A G E N D A

Relational Algebra Equivalences


Static Rules

CMU 15-445/645 (Fall 2019)


10

R E L AT I O N A L A L G E B R A E Q U I VA L E N C E S

Two relational algebra expressions are equivalent


if they generate the same set of tuples.

The DBMS can identify better query plans without


a cost model.

This is often called query rewriting.

CMU 15-445/645 (Fall 2019)


11

P R E D I C AT E P U S H D O W N
SELECT s.name, e.cid
FROM student AS s, enrolled AS e
WHERE s.sid = e.sid
AND e.grade = 'A'

πname, cid(σgrade='A'(student⋈enrolled))

CMU 15-445/645 (Fall 2019)


11

P R E D I C AT E P U S H D O W N
SELECT s.name, e.cid
FROM student AS s, enrolled AS e
WHERE s.sid = e.sid
AND e.grade = 'A'

p s.name,e.cid p s.name,e.cid

s grade='A'
⨝ s.sid=e.sid

⨝ s.sid=e.sid
s grade='A'

student enrolled student enrolled


CMU 15-445/645 (Fall 2019)
12

R E L AT I O N A L A L G E B R A E Q U I VA L E N C E S
SELECT s.name, e.cid
FROM student AS s, enrolled AS e
WHERE s.sid = e.sid
AND e.grade = 'A'

πname, cid(σgrade='A'(student⋈enrolled))
=
πname, cid(student⋈(σgrade='A'(enrolled )))

CMU 15-445/645 (Fall 2019)


13

R E L AT I O N A L A L G E B R A E Q U I VA L E N C E S

Selections:
→ Perform filters as early as possible.
→ Reorder predicates so that the DBMS applies the most
selective one first.
→ Break a complex predicate, and push down
σp1∧p2∧…pn(R) = σp1(σp2(…σpn(R)))
Simplify a complex predicate
→ (X=Y AND Y=3) → X=3 AND Y=3

CMU 15-445/645 (Fall 2019)


14

R E L AT I O N A L A L G E B R A E Q U I VA L E N C E S

Projections:
→ Perform them early to create smaller tuples and reduce
intermediate results (if duplicates are eliminated)
→ Project out all attributes except the ones requested or
required (e.g., joining keys)

This is not important for a column store…

CMU 15-445/645 (Fall 2019)


15

PROJECTION PUSHDOWN
SELECT s.name, e.cid
FROM student AS s, enrolled AS e
WHERE s.sid = e.sid
AND e.grade = 'A'

p s.name,e.cid p s.name,e.cid

⨝ s.sid=e.sid ⨝ s.sid=e.sid

s grade='A' p ps
sid,name
sid,cid

grade='A'
student enrolled student enrolled
CMU 15-445/645 (Fall 2019)
16
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Impossible / Unnecessary Predicates


SELECT * FROM A WHERE 1 = 0; X

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


16
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Impossible / Unnecessary Predicates


SELECT * FROM A WHERE 1 = 0; X
SELECT * FROM A WHERE 1 = 1;

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


16
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Impossible / Unnecessary Predicates


SELECT * FROM A WHERE 1 = 0; X
SELECT * FROM A;
A WHERE 1 = 1;

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


16
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Impossible / Unnecessary Predicates


SELECT * FROM A WHERE 1 = 0; X
SELECT * FROM A;
A WHERE 1 = 1;

Join Elimination
SELECT A1.*
FROM A AS A1 JOIN A AS A2
ON A1.id = A2.id;

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


16
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Impossible / Unnecessary Predicates


SELECT * FROM A WHERE 1 = 0; X
SELECT * FROM A;
A WHERE 1 = 1;

Join Elimination
SELECT * FROM A;

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


17
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Ignoring Projections
SELECT * FROM A AS A1
WHERE EXISTS(SELECT val FROM A AS A2
WHERE A1.id = A2.id);

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


17
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Ignoring Projections
SELECT * FROM A;

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


17
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Ignoring Projections
SELECT * FROM A;

Merging Predicates
SELECT * FROM A
WHERE val BETWEEN 1 AND 100
OR val BETWEEN 50 AND 150;

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


17
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Ignoring Projections
SELECT * FROM A;

Merging Predicates
SELECT * FROM A
WHERE val BETWEEN 1 AND 100
OR val BETWEEN 50 AND 150;

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


17
CREATE TABLE A (
id INT PRIMARY KEY,
val INT NOT NULL ); MORE EXAMPLES

Ignoring Projections
SELECT * FROM A;

Merging Predicates
SELECT * FROM A
WHERE val BETWEEN 1 AND 150;

Source: Lukas Eder

CMU 15-445/645 (Fall 2019)


18

R E L AT I O N A L A L G E B R A E Q U I VA L E N C E S

Joins:
→ Commutative, associative
R⋈S = S⋈R
(R⋈S)⋈T = R⋈(S⋈T)

How many different orderings are there for an n-


way join?

CMU 15-445/645 (Fall 2019)


19

R E L AT I O N A L A L G E B R A E Q U I VA L E N C E S

How many different orderings are there for an n-


way join?

Catalan number ≈4n


→ Exhaustive enumeration will be too slow.

We’ll see in a second how an optimizer limits the


search space...

CMU 15-445/645 (Fall 2019)


20

CONCLUSION

We can use static rules and heuristics to optimize a


query plan without needing to understand the
contents of the database.

CMU 15-445/645 (Fall 2019)


21

NEXT CLASS

MID-TERM EXAM!
→ Seriously, this is not a joke.

CMU 15-445/645 (Fall 2019)

You might also like