0% found this document useful (0 votes)
80 views53 pages

You're Smarter Than A Database: Overcoming The Optimizer's Bad Cardinality Estimates

The document discusses how database optimizers can sometimes provide incorrect cardinality estimates that result in inefficient query plans. It provides four examples of how cardinality errors can occur and strategies for addressing them, such as collecting better statistics through histograms and SQL profiles to provide the optimizer with more accurate information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views53 pages

You're Smarter Than A Database: Overcoming The Optimizer's Bad Cardinality Estimates

The document discusses how database optimizers can sometimes provide incorrect cardinality estimates that result in inefficient query plans. It provides four examples of how cardinality errors can occur and strategies for addressing them, such as collecting better statistics through histograms and SQL profiles to provide the optimizer with more accurate information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

You’re Smarter than a

Database

Overcoming the optimizer’s bad


cardinality estimates
About me
 Bobby Durrett
 US Foodservice
[email protected]
 Scripts in
http://www.geocities.com/bobbyandma
rielle/sqltuning.zip
What you know
What the database knows
Before SQL
 Example - mainframe Datacom/DB COBOL
 List index names
 Write loops

read a from one index i1 where one.c=10


while more table one rows exist get next row
read b from two index i2 where two.a = one.a
while more table two rows exist get next row
print one.a,two.b
end while
end while
SQL
 Tell what you want, not how to get it

select one.a,two.b
from
one,two
where
one.c=10 and one.a=two.a;
Pre-SQL versus SQL
 Pre-SQL code very efficient – runs in
megabytes – VSE mainframe COBOL
 Labor intensive
 SQL can be inefficient – runs in
gigabytes (if you are lucky!)
 Much more productive – do in minutes
what took hours before – create tables
What the database doesn’t
know
 Optimizer has a limited set of statistics
that describe the data
 It can miscalculate the number of rows
a query will return, its cardinality
 A cardinality error can lead optimizer to
choose a slow way to run the SQL
Example plan/Cardinality
-------------------------------------------------
| Id | Operation | Name | Rows | Cost
-------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 3
|* 1 | TABLE ACCESS FULL| TEST1 | 10 | 3
-------------------------------------------------
 Plan = how Oracle will run your query
 Rows = how many rows optimizer thinks that
step will return
 Cost = estimate of time query will take, a
function of the number of rows
How to fix cardinality
problems
 Find out if it really is a cardinality issue
 Determine the reason it occurred
 Single column
 Multiple columns
 Choose a strategy
 Give the optimizer more information
 Override optimizer decision
 Change the application
Four examples
 Four examples of how the optimizer
calculates cardinality
 Full scripts and their outputs on portal,
pieces on slides – edited for simplicity
Step 1: Find out if it really is a
cardinality issue
 Example 1
 Data

select a,count(*) from test1 group by a;

A COUNT(*)
---------- ----------
1 10
 Query

select * from test1 where a=1;


Step 1: Find out if it really is a
cardinality issue
 Get estimated cardinality from plan
-------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------
| 0 | SELECT STATEMENT | | 10 |
|* 1 | TABLE ACCESS FULL| TEST1 | 10 |
-------------------------------------------

 Do query for actual number of rows


select count(*) from test1 where a=1;
Step 1: Find out if it really is a
cardinality issue
 Plan is a tree – find cardinality and select
count(*) on part of query represented by that
part of plan. join

table join

table table
Step 2: Understand the reason
for the wrong cardinality
 Unequal distribution of data:
 Within a single column
 Last name
 “Smith” or “Jones”
 Among multiple columns –
 Address
 Zipcode and State
Step 2: Understand the reason
for the wrong cardinality
 Example 2 - Unequal distribution of values in a
single column
 1,000,000 rows with value 1
 1 row with value 2

select a,count(*) from TEST2 group by a;

A COUNT(*)
---------- ----------
1 1000000
2 1
Step 2: Understand the reason
for the wrong cardinality
 SQL statement – returns one row

select * from TEST2 where a=2;


Step 2: Understand the reason
for the wrong cardinality
 Plan with wrong number of rows = 500,000
 Full scan instead of range scan – 100 times slower

---------------------------------------------
| Operation | Name | Rows |
---------------------------------------------
| SELECT STATEMENT | | 500K|
| INDEX FAST FULL SCAN| TEST2INDEX | 500K|
---------------------------------------------
Step 2: Understand the reason
for the wrong cardinality
 Column statistics – two distinct values

LOW HIGH NUM_DISTINCT


---------- ---------- ------------
1 2 2

 Table statistic – total # of rows – 1,000,001


NUM_ROWS
----------
1000001
Step 2: Understand the reason
for the wrong cardinality
 Rows in plan = (rows in table)/
(distinct values of column)
 500000=1000001/2
 Optimizer knew that there were only
two values – assumed they had equal
number of rows
Step 2: Understand the reason
for the wrong cardinality
 Example 3 - Combinations of column values
not equally distributed
 1,000,000 rows with values 1,1
 1,000,000 rows with values 2,2
 1 row with value 1,2
 ~ Equal numbers of 1’s and 2’s in each column
A B COUNT(*)
---------- ---------- ----------
1 1 1000000
1 2 1
2 2 1000000
Step 2: Understand the reason
for the wrong cardinality
 SQL statement – retrieves one row

select sum(a+b)
from TEST3
where
a=1 and b=2;
Step 2: Understand the reason
for the wrong cardinality
 Plan with wrong number of rows = 500,000
 Inefficient full scan

----------------------------------------------
| Operation | Name | Rows |
----------------------------------------------
| SELECT STATEMENT | | 1 |
| SORT AGGREGATE | | 1 |
| INDEX FAST FULL SCAN| TEST3INDEX | 500K|
----------------------------------------------
Step 2: Understand the reason
for the wrong cardinality
 Column statistics
C LOW HIGH NUM_DISTINCT
- ---------- ---------- ------------
A 1 2 2
B 1 2 2

 Table statistic – total # of rows – 2,000,001

NUM_ROWS
----------
2000001
Step 2: Understand the reason
for the wrong cardinality
 Rows in plan = (rows in table)/
(distinct values A * distinct values B)
 500000=2000001/(2 * 2)
 Optimizer assumes all four
combinations (1,1),(1,2),(2,1),(2,2)
equally likely
Step 2: Understand the reason
for the wrong cardinality
 How to tell which assumption is in play?
 Select count(*) each column
select a,count(*) from TEST3 group by a;
select b,count(*) from TEST3 group by b;

 count(*) each column combination


select a,b,count(*) from TEST3
group by a,b;
Step 3: Choose the best strategy
for fixing the cardinality problem
 Giving the optimizer more information
 Histograms
 SQL Profiles
 Overriding optimizer decisions
 Hints
 Changing the application

Try to use optimizer as much as possible to


minimize development work
Step 3: Choose the best strategy
for fixing the cardinality problem
 Giving the optimizer more information –
using histograms
 Works for unequal distribution within a
single column
 A histogram records the distribution of
values within a column in up to 254
“buckets”
 Works best on columns with fewer than
255 distinct values
Step 3: Choose the best strategy
for fixing the cardinality problem
 Run gather_table_stats command to get
histograms on the column – 254 is max
number of buckets

method_opt=>'FOR ALL COLUMNS SIZE 254'


Step 3: Choose the best strategy
for fixing the cardinality problem
 Plan for Example 2 with correct number of rows
with histogram
 Uses range scan
-----------------------------------------
| Operation | Name | Rows |
-----------------------------------------
| SELECT STATEMENT | | 1 |
| INDEX RANGE SCAN| TEST2INDEX | 1 |
-----------------------------------------
Step 3: Choose the best strategy
for fixing the cardinality problem
 Column statistics – two buckets

LOW HIGH NUM_DISTINCT NUM_BUCKETS


---------- ---------- ------------ -----------
1 2 2 2

 Table statistic – unchanged

NUM_ROWS
----------
1000001
Step 3: Choose the best strategy
for fixing the cardinality problem

 Time without histograms (1 second):

Elapsed: 00:00:01.00

 Time with histograms(1/100th second):

Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem
 Giving the optimizer more information – using
SQL Profiles
 Works for unequal distribution among multiple
columns
 Includes information about the relationship
between columns in the SQL – correlated columns
or predicates
Step 3: Choose the best strategy
for fixing the cardinality problem
 SQL Tuning Advisor gathers statistics on the
columns

...DBMS_SQLTUNE.CREATE_TUNING_TASK(...
...DBMS_SQLTUNE.EXECUTE_TUNING_TASK(...

 Accept the SQL Profile it creates to use the


new statistics

...DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (...
Step 3: Choose the best strategy
for fixing the cardinality problem
 Example 3 plan with correct number of rows = 1
using SQL profile

--------------------------------------------------
| Operation | Name | Rows | Bytes |
--------------------------------------------------
| SELECT STATEMENT | | 1 | 6 |
| SORT AGGREGATE | | 1 | 6 |
| INDEX RANGE SCAN| TEST3INDEX | 1 | 6 |
-------------------------------------------------|
Step 3: Choose the best strategy
for fixing the cardinality problem

 Time without a profile (1 second):

Elapsed: 00:00:01.09

 Time with a profile(1/100th second):

Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem

 Overriding optimizer decisions – using hints


 Example 4 has unequal distribution of column
values across two tables – histograms and SQL
Profiles don’t work
 Hint forces index range scan
 Small amount of additional code – not like
Cobol on mainframe
Step 3: Choose the best strategy
for fixing the cardinality problem
 Example 4 - SMALL table
 MANY relates to 1 – there are many rows with
value 1
 FEW relates to 2 – there are few with value 2

insert into SMALL values ('MANY',1);


insert into SMALL values ('FEW',2);
Step 3: Choose the best strategy
for fixing the cardinality problem
 Example 4 - LARGE table:
 1,000,000 rows with value 1
 1 row with value 2

NUM COUNT(*)
---------- ----------
1 1000000
2 1
Step 3: Choose the best strategy
for fixing the cardinality problem
 SQL statement – returns one row

select B.NUM
from SMALL A,LARGE B
where
A.NUM=B.NUM and
A.NAME='FEW';
Step 3: Choose the best strategy
for fixing the cardinality problem
 Plan with wrong number of rows = 125,000

----------------------------------------------
| Operation | Name | Rows |
----------------------------------------------
| SELECT STATEMENT | | 125K|
| HASH JOIN | | 125K|
| TABLE ACCESS FULL | SMALL | 1 |
| INDEX FAST FULL SCAN| LARGEINDEX | 1000K|
----------------------------------------------
Step 3: Choose the best strategy
for fixing the cardinality problem
 Column statistics – two buckets on all
columns – using histograms
LOW HIGH NUM_DISTINCT NUM_BUCKETS
---------- ---------- ------------ -----------
1 2 2 2

LOW HIGH NUM_DISTINCT NUM_BUCKETS


---- ---- ------------ -----------
FEW MANY 2 2
Step 3: Choose the best strategy
for fixing the cardinality problem
 Table statistics – SMALL has 2 rows,
LARGE 1000001
NUM_ROWS
----------
2

NUM_ROWS
----------
1000001
Step 3: Choose the best strategy
for fixing the cardinality problem
 125000=1000001/8
 Optimizer appears to assume all eight
combinations of the three columns’
values are equally likely
 Can’t verify formula – references don’t
include formula with histograms
 Even worse without histograms –
cardinality is 500000
Step 3: Choose the best strategy
for fixing the cardinality problem
 No SQL profile from SQL Tuning Advisor:

There are no recommendations to improve the


statement.

 Neither histograms nor SQL profiles help


example 4
Step 3: Choose the best strategy
for fixing the cardinality problem
 Statement with hints:
 Use index
 Don’t do full scan

select /*+ INDEX(B LARGEINDEX)


NO_INDEX_FFS(B LARGEINDEX) */
B.NUM
from SMALL A,LARGE B
where
a.NUM=B.NUM and
A.NAME='FEW';
Step 3: Choose the best strategy
for fixing the cardinality problem

 Time without a hint (1 second):

Elapsed: 00:00:01.03

 Time with a hint (1/100th second):

Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem

 Changing the application


 Change your tables so that the optimizer gets your
SQL’s cardinality right
 Requires more work designing tables, but keeps
productivity benefits of SQL
Step 3: Choose the best strategy
for fixing the cardinality problem
 Example 4 – moved NAME column to LARGE table and
split table in two
 One million (‘MANY’,1) rows in LARGEA
 One (‘FEW’,2) row in LARGEB
 Query:
select NUM
from (select * from largea
union
select * from largeb)
where
NAME='FEW';
Step 3: Choose the best strategy
for fixing the cardinality problem
 Plan is just as efficient as with hint:
 Number of rows = 2 (reality is 1)
 Range Scan
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 |
| 1 | VIEW | | 2 |
| 2 | SORT UNIQUE | | 2 |
| 3 | UNION-ALL | | |
| 4 | TABLE ACCESS BY INDEX ROWID| LARGEA | 1 |
|* 5 | INDEX RANGE SCAN | LARGEAINDEX | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID| LARGEB | 1 |
|* 7 | INDEX RANGE SCAN | LARGEBINDEX | 1 |
--------------------------------------------------------------
Step 3: Choose the best strategy
for fixing the cardinality problem

 Time without table change (1 second):

Elapsed: 00:00:01.03

 Time with table change (1/100th second):

Elapsed: 00:00:00.01
Conclusion

 SQL improves productivity, optimizer has limits


 Identify cases where cardinality is wrong
 Understand why the database got it wrong
 One column
 Multiple columns
 Choose best strategy to fix
 Give optimizer more info
 Override optimizer’s choices
 Redesign tables
References
 Cost Based Optimizer Fundamentals, Jonathan Lewis

 Metalink Note:212809.1, Limitations of the Oracle Cost Based


Optimizer

 Metalink Note:68992.1, Predicate Selectivity

 Histograms – Myths and Facts, Wolfgang Breitling


Select Journal, Volume 13, Number 3

You might also like