0% found this document useful (0 votes)
17 views6 pages

SQL Tuning Guide - pt6

The document discusses managing column group statistics in Oracle Database. It describes how to automatically and manually create and gather statistics on column groups using various DBMS_STATS procedures and functions. This allows the optimizer to better estimate cardinalities when queries reference multiple columns.

Uploaded by

Miriam Almeida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

SQL Tuning Guide - pt6

The document discusses managing column group statistics in Oracle Database. It describes how to automatically and manually create and gather statistics on column groups using various DBMS_STATS procedures and functions. This allows the optimizer to better estimate cardinalities when queries reference multiple columns.

Uploaded by

Miriam Almeida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 14

Managing Column Group Statistics

The following query of the DBA_TAB_COL_STATISTICS table shows information about statistics
that have been gathered on the columns cust_state_province and country_id from the
sh.customers table:

COL COLUMN_NAME FORMAT a20


COL NDV FORMAT 999

SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM


FROM DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SH'
AND TABLE_NAME = 'CUSTOMERS'
AND COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID');

Sample output is as follows:

COLUMN_NAME NDV HISTOGRAM


-------------------- ---------- ---------------
CUST_STATE_PROVINCE 145 FREQUENCY
COUNTRY_ID 19 FREQUENCY

As shown in the following query, 3341 customers reside in California:

SELECT COUNT(*)
FROM sh.customers
WHERE cust_state_province = 'CA';

COUNT(*)
----------
3341

Consider an explain plan for a query of customers in the state CA and in the country with ID
52790 (USA):

EXPLAIN PLAN FOR


SELECT *
FROM sh.customers
WHERE cust_state_province = 'CA'
AND country_id=52790;

Explained.

sys@PROD> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 1683234692

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 128 | 24192 | 442 (7)| 00:00:06 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 128 | 24192 | 442 (7)| 00:00:06 |
--------------------------------------------------------------------------

14-3
Chapter 14
Managing Column Group Statistics

Predicate Information (identified by operation id):


---------------------------------------------------

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
---

1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)

13 rows selected.

Based on the single-column statistics for the country_id and cust_state_province


columns, the optimizer estimates that the query of California customers in the USA will
return 128 rows. In fact, 3341 customers reside in California, but the optimizer does
not know that the state of California is in the country of the USA, and so greatly
underestimates cardinality by assuming that both predicates reduce the number of
returned rows.
You can make the optimizer aware of the real-world relationship between values in
country_id and cust_state_province by gathering column group statistics. These
statistics enable the optimizer to give a more accurate cardinality estimate.

See Also:

• "Detecting Useful Column Groups for a Specific Workload"


• "Creating Column Groups Detected During Workload Monitoring"
• "Creating and Gathering Statistics on Column Groups Manually"

14.1.1.2 Automatic and Manual Column Group Statistics


Oracle Database can create column group statistics either automatically or manually.
The optimizer can use SQL plan directives to generate a more optimal plan. If the
DBMS_STATS preference AUTO_STAT_EXTENSIONS is set to ON (by default it is OFF), then a
SQL plan directive can automatically trigger the creation of column group statistics
based on usage of predicates in the workload. You can set AUTO_STAT_EXTENSIONS
with the SET_TABLE_PREFS, SET_GLOBAL_PREFS, or SET_SCHEMA_PREFS procedures.

When you want to manage column group statistics manually, then use DBMS_STATS as
follows:
• Detect column groups
• Create previously detected column groups
• Create column groups manually and gather column group statistics

14-4
Chapter 14
Managing Column Group Statistics

See Also:

• "Detecting Useful Column Groups for a Specific Workload"


• "Creating Column Groups Detected During Workload Monitoring"
• "Creating and Gathering Statistics on Column Groups Manually"
• Oracle Database PL/SQL Packages and Types Reference to learn about the
DBMS_STATS procedures for setting optimizer statistics

14.1.1.3 User Interface for Column Group Statistics


Several DBMS_STATS program units have preferences that are relevant for column groups.

Table 14-1 DBMS_STATS APIs Relevant for Column Groups

Program Unit or Preference Description


SEED_COL_USAGE Procedure Iterates over the SQL statements in the specified workload,
compiles them, and then seeds column usage information for the
columns that appear in these statements.
To determine the appropriate column groups, the database must
observe a representative workload. You do not need to run the
queries themselves during the monitoring period. Instead, you can
run EXPLAIN PLAN for some longer-running queries in your
workload to ensure that the database is recording column group
information for these queries.
REPORT_COL_USAGE Function Generates a report that lists the columns that were seen in filter
predicates, join predicates, and GROUP BY clauses in the
workload.
You can use this function to review column usage information
recorded for a specific table.
CREATE_EXTENDED_STATS Creates extensions, which are either column groups or
Function expressions. The database gathers statistics for the extension
when either a user-generated or automatic statistics gathering job
gathers statistics for the table.
AUTO_STAT_EXTENSIONS Controls the automatic creation of extensions, including column
Preference groups, when optimizer statistics are gathered. Set this preference
using SET_TABLE_PREFS, SET_SCHEMA_PREFS, or
SET_GLOBAL_PREFS.
When AUTO_STAT_EXTENSIONS is set to OFF (default), the
database does not create column group statistics automatically. To
create extensions, you must execute the
CREATE_EXTENDED_STATS function or specify extended statistics
explicitly in the METHOD_OPT parameter in the DBMS_STATS API.
When set to ON, a SQL plan directive can trigger the creation of
column group statistics automatically based on usage of columns
in the predicates in the workload.

14-5
Chapter 14
Managing Column Group Statistics

See Also:

• "Setting Artificial Optimizer Statistics for a Table"


• Oracle Database PL/SQL Packages and Types Reference to learn about
the DBMS_STATS package

14.1.2 Detecting Useful Column Groups for a Specific Workload


You can use DBMS_STATS.SEED_COL_USAGE and REPORT_COL_USAGE to determine which
column groups are required for a table based on a specified workload.
This technique is useful when you do not know which extended statistics to create.
This technique does not work for expression statistics.

Assumptions
This tutorial assumes the following:
• Cardinality estimates have been incorrect for queries of the sh.customers_test
table (created from the customers table) that use predicates referencing the
columns country_id and cust_state_province.
• You want the database to monitor your workload for 5 minutes (300 seconds).
• You want the database to determine which column groups are needed
automatically.

To detect column groups:


1. Start SQL*Plus or SQL Developer, and log in to the database as user sh.
2. Create the customers_test table and gather statistics for it:

DROP TABLE customers_test;


CREATE TABLE customers_test AS SELECT * FROM customer;
EXEC DBMS_STATS.GATHER_TABLE_STATS(user, 'customers_test');

3. Enable workload monitoring.


In a different SQL*Plus session, connect as SYS and run the following PL/SQL
program to enable monitoring for 300 seconds:

BEGIN
DBMS_STATS.SEED_COL_USAGE(null,null,300);
END;
/

4. As user sh, run explain plans for two queries in the workload.
The following examples show the explain plans for two queries on the
customers_test table:

EXPLAIN PLAN FOR


SELECT *

14-6
Chapter 14
Managing Column Group Statistics

FROM customers_test
WHERE cust_city = 'Los Angeles'
AND cust_state_province = 'CA'
AND country_id = 52790;

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

EXPLAIN PLAN FOR


SELECT country_id, cust_state_province, count(cust_city)
FROM customers_test
GROUP BY country_id, cust_state_province;

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

Sample output appears below:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
Plan hash value: 4115398853

----------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | TABLE ACCESS FULL| CUSTOMERS_TEST | 1 |
----------------------------------------------------

8 rows selected.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
Plan hash value: 3050654408

-----------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 1949 |
| 1 | HASH GROUP BY | | 1949 |
| 2 | TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------

9 rows selected.

The first plan shows a cardinality of 1 row for a query that returns 932 rows. The second
plan shows a cardinality of 1949 rows for a query that returns 145 rows.
5. Optionally, review the column usage information recorded for the table.
Call the DBMS_STATS.REPORT_COL_USAGE function to generate a report:

SET LONG 100000


SET LINES 120
SET PAGES 0

14-7
Chapter 14
Managing Column Group Statistics

SELECT DBMS_STATS.REPORT_COL_USAGE(user, 'customers_test')


FROM DUAL;

The report appears below:

LEGEND:
.......

EQ : Used in single table EQuality predicate


RANGE : Used in single table RANGE predicate
LIKE : Used in single table LIKE predicate
NULL : Used in single table is (not) NULL predicate
EQ_JOIN : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER : Used in single table FILTER predicate
JOIN : Used in JOIN predicate
GROUP_BY : Used in GROUP BY expression
....................................................................
....

####################################################################
####

COLUMN USAGE REPORT FOR SH.CUSTOMERS_TEST


.........................................

1. COUNTRY_ID : EQ
2. CUST_CITY : EQ
3. CUST_STATE_PROVINCE : EQ
4. (CUST_CITY, CUST_STATE_PROVINCE,
COUNTRY_ID) : FILTER
5. (CUST_STATE_PROVINCE, COUNTRY_ID) : GROUP_BY
####################################################################
####

In the preceding report, the first three columns were used in equality predicates in
the first monitored query:

...
WHERE cust_city = 'Los Angeles'
AND cust_state_province = 'CA'
AND country_id = 52790;

All three columns appeared in the same WHERE clause, so the report shows them
as a group filter. In the second query, two columns appeared in the GROUP BY
clause, so the report labels them as GROUP_BY. The sets of columns in the FILTER
and GROUP_BY report are candidates for column groups.

14-8

You might also like