You're Smarter Than A Database: Overcoming The Optimizer's Bad Cardinality Estimates
You're Smarter Than A Database: Overcoming The Optimizer's Bad Cardinality Estimates
Database
select one.a,two.b
from
one,two
where
one.c=10 and one.a=two.a;
Pre-SQL versus SQL
Pre-SQL code very efficient – runs in
megabytes – VSE mainframe COBOL
Labor intensive
SQL can be inefficient – runs in
gigabytes (if you are lucky!)
Much more productive – do in minutes
what took hours before – create tables
What the database doesn’t
know
Optimizer has a limited set of statistics
that describe the data
It can miscalculate the number of rows
a query will return, its cardinality
A cardinality error can lead optimizer to
choose a slow way to run the SQL
Example plan/Cardinality
-------------------------------------------------
| Id | Operation | Name | Rows | Cost
-------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 3
|* 1 | TABLE ACCESS FULL| TEST1 | 10 | 3
-------------------------------------------------
Plan = how Oracle will run your query
Rows = how many rows optimizer thinks that
step will return
Cost = estimate of time query will take, a
function of the number of rows
How to fix cardinality
problems
Find out if it really is a cardinality issue
Determine the reason it occurred
Single column
Multiple columns
Choose a strategy
Give the optimizer more information
Override optimizer decision
Change the application
Four examples
Four examples of how the optimizer
calculates cardinality
Full scripts and their outputs on portal,
pieces on slides – edited for simplicity
Step 1: Find out if it really is a
cardinality issue
Example 1
Data
A COUNT(*)
---------- ----------
1 10
Query
table join
table table
Step 2: Understand the reason
for the wrong cardinality
Unequal distribution of data:
Within a single column
Last name
“Smith” or “Jones”
Among multiple columns –
Address
Zipcode and State
Step 2: Understand the reason
for the wrong cardinality
Example 2 - Unequal distribution of values in a
single column
1,000,000 rows with value 1
1 row with value 2
A COUNT(*)
---------- ----------
1 1000000
2 1
Step 2: Understand the reason
for the wrong cardinality
SQL statement – returns one row
---------------------------------------------
| Operation | Name | Rows |
---------------------------------------------
| SELECT STATEMENT | | 500K|
| INDEX FAST FULL SCAN| TEST2INDEX | 500K|
---------------------------------------------
Step 2: Understand the reason
for the wrong cardinality
Column statistics – two distinct values
select sum(a+b)
from TEST3
where
a=1 and b=2;
Step 2: Understand the reason
for the wrong cardinality
Plan with wrong number of rows = 500,000
Inefficient full scan
----------------------------------------------
| Operation | Name | Rows |
----------------------------------------------
| SELECT STATEMENT | | 1 |
| SORT AGGREGATE | | 1 |
| INDEX FAST FULL SCAN| TEST3INDEX | 500K|
----------------------------------------------
Step 2: Understand the reason
for the wrong cardinality
Column statistics
C LOW HIGH NUM_DISTINCT
- ---------- ---------- ------------
A 1 2 2
B 1 2 2
NUM_ROWS
----------
2000001
Step 2: Understand the reason
for the wrong cardinality
Rows in plan = (rows in table)/
(distinct values A * distinct values B)
500000=2000001/(2 * 2)
Optimizer assumes all four
combinations (1,1),(1,2),(2,1),(2,2)
equally likely
Step 2: Understand the reason
for the wrong cardinality
How to tell which assumption is in play?
Select count(*) each column
select a,count(*) from TEST3 group by a;
select b,count(*) from TEST3 group by b;
NUM_ROWS
----------
1000001
Step 3: Choose the best strategy
for fixing the cardinality problem
Elapsed: 00:00:01.00
Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem
Giving the optimizer more information – using
SQL Profiles
Works for unequal distribution among multiple
columns
Includes information about the relationship
between columns in the SQL – correlated columns
or predicates
Step 3: Choose the best strategy
for fixing the cardinality problem
SQL Tuning Advisor gathers statistics on the
columns
...DBMS_SQLTUNE.CREATE_TUNING_TASK(...
...DBMS_SQLTUNE.EXECUTE_TUNING_TASK(...
...DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (...
Step 3: Choose the best strategy
for fixing the cardinality problem
Example 3 plan with correct number of rows = 1
using SQL profile
--------------------------------------------------
| Operation | Name | Rows | Bytes |
--------------------------------------------------
| SELECT STATEMENT | | 1 | 6 |
| SORT AGGREGATE | | 1 | 6 |
| INDEX RANGE SCAN| TEST3INDEX | 1 | 6 |
-------------------------------------------------|
Step 3: Choose the best strategy
for fixing the cardinality problem
Elapsed: 00:00:01.09
Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem
NUM COUNT(*)
---------- ----------
1 1000000
2 1
Step 3: Choose the best strategy
for fixing the cardinality problem
SQL statement – returns one row
select B.NUM
from SMALL A,LARGE B
where
A.NUM=B.NUM and
A.NAME='FEW';
Step 3: Choose the best strategy
for fixing the cardinality problem
Plan with wrong number of rows = 125,000
----------------------------------------------
| Operation | Name | Rows |
----------------------------------------------
| SELECT STATEMENT | | 125K|
| HASH JOIN | | 125K|
| TABLE ACCESS FULL | SMALL | 1 |
| INDEX FAST FULL SCAN| LARGEINDEX | 1000K|
----------------------------------------------
Step 3: Choose the best strategy
for fixing the cardinality problem
Column statistics – two buckets on all
columns – using histograms
LOW HIGH NUM_DISTINCT NUM_BUCKETS
---------- ---------- ------------ -----------
1 2 2 2
NUM_ROWS
----------
1000001
Step 3: Choose the best strategy
for fixing the cardinality problem
125000=1000001/8
Optimizer appears to assume all eight
combinations of the three columns’
values are equally likely
Can’t verify formula – references don’t
include formula with histograms
Even worse without histograms –
cardinality is 500000
Step 3: Choose the best strategy
for fixing the cardinality problem
No SQL profile from SQL Tuning Advisor:
Elapsed: 00:00:01.03
Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem
Elapsed: 00:00:01.03
Elapsed: 00:00:00.01
Conclusion