0% found this document useful (0 votes)
815 views40 pages

Oracle Ask Tom

This document provides a summary of the dbms_flashback package in Oracle for recovering deleted or modified data. It discusses how to enable and disable flashback mode using SCN numbers or timestamps. Examples are provided on inserting data and tracking SCNs, using flashback queries to view past data, and techniques like saving SCNs in a function to enable precise flashback queries. Limitations discussed include flashback being session-specific and requiring automatic undo management to be configured. The document aims to provide a concise manual on using flashback capabilities in Oracle.

Uploaded by

Akesh Kakarla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
815 views40 pages

Oracle Ask Tom

This document provides a summary of the dbms_flashback package in Oracle for recovering deleted or modified data. It discusses how to enable and disable flashback mode using SCN numbers or timestamps. Examples are provided on inserting data and tracking SCNs, using flashback queries to view past data, and techniques like saving SCNs in a function to enable precise flashback queries. Limitations discussed include flashback being session-specific and requiring automatic undo management to be configured. The document aims to provide a concise manual on using flashback capabilities in Oracle.

Uploaded by

Akesh Kakarla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

1

Free Ebook

7/25/2003 1:16:11 PM

In few Words
from
Oracle Documentation
http://tahiti.oracle.com/
&
Tom Kyte advices
http://asktom.oracle.com/




Wrote , Copied and Ordered by Juan Carlos Reyes
[email protected]
http://www.geocities.com/juancarlosreyesp/index.html

DazaSoftware S. A.









All the examples are executed in Oracle 9.2

2
1 Table of contents
1 Table of contents ............................................................................................... 2
2 Introduction....................................................................................................... 2
3 RECOVERY SECTION................................................................................... 3
4 Using dbms_Flashback package .............................................................. 4
5 TUNNING SECTION.................................................................................. 6
6 Soft Parses & session_cached_cursor parameter ............................... 7
7 open_cursors parameter ........................................................................... 9
8 cursor_space_for_time parameter ......................................................12
9 pre_page_sga parameter .......................................................................12
10 binding, cursor_sharing parameter ......................................................13
11 Getting statistics...........................................................................................15
12 Tunning ...........................................................................................................16
13 PL/SQL SECTION..........................................................................................18
14 Table Functions: A function that works as a table......................................19
15 PL/SQL tricks..................................................................................................25
16 Working with NULL values .......................................................................26
17 Getting more from a Query: Analytic functions .......................................29

2 Introduction
Hi, maybe you are asking why I took the time to do this.
The reason are simple:
I am going to do anyway, because I always extract the more important from
documentation to work with a easy manual.
If Im alive, and still have a job, is thanks to Tom Kytes advices.
I think the core of internet is the ability to give and receive free advice. I say
today for me, tomorrow for you.
Test and demonstrate a new format for knowledge communication, and if it
really works, try to convince Oracle that he wrote this instead me : ).
Oracle Documentation is really good, the point is the need of a new
documentation, where you can find all in one place, brief, plus some real
experience.
If you like it, and see this is necessary, try sending an email to
[email protected], asking a new kind of documentation, like this.

This symbols means that the article or point was concluded by me
Means its revised and accepted by Tom Kyte

Remember this articles had been developed principally for Oracle 9.2 database, so
if you have a previous version, some examples could not work.

Thanks to Tom Kytes patience.
Thanks to Jaime Daza for all the support I received.

Questions about this paper go to a http://asktom.oracle.com
If you verify Real bugs in this paper, please email [email protected]

I hope you enjoy it


3
3 RECOVERY SECTION
4
4 Using dbms_Flashback package
1.1 What its for?
1.2 Syntax
1.3 Database Configuration
1.4 Performance
1.4.1 Most performant with queries that normally do a small amount of
logical IO
1.4.2 Keep statistics current on tables involved in flashback queries
1.5 Privileges
1.6 Examples and techniques
1.6.1 Inserting Data
1.6.2 Using SELECT AS OF
1.6.3 Using DBMS_FLASBACK package
1.6.4 Use SCN to be more precise
1.6.5 Function to save SCN data
1.6.6 Create a view or a function to get old data
1.6.7 Exporting using flashback option
1.6.8 How to get the date time for a specific SCN or vice versa
1.7 Restrictions and Errors
1.7.1 Flashback is session-specific
1.7.2 Automatic Undo Management
1.7.3 Time based flashback is granular within +- 5 mn
1.7.4 Use SCN for precision when gathering data
1.7.5 Undo Data Invalidation
1.7.6 Materialized Views
1.7.7 SYS User
1.7.8 Database links
1.7.9 Cannot nest flashback calls
1.8 Features by Release
1.8.1 9.0.1
1.8.2 9.2.0
1.9 Bugs by Release
1.10 Bibliography
4.1 What its for?
Recovering deleted or incorrectly modified data.
Compare Current Data with Old Data.
See historic data.
Avoid the use of temporal data.
4.2 Syntax
DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER RETURN NUMBER;
Get the SCN number
EXEC DBMS_FLASHBACK.ENABLE_AT_TIME( TIMESTAMP '2002-01-01 00:00:00');
Enable the session in flashback mode at approximately that time
EXEC DBMS_FLASHBACK.ENABLE_AT_SYSTEM_CHANGE_NUMBER(3254658);
Enable the session in flashback mode exactly at that SCN
EXEC DBMS_FLASHBACK.DISABLE;
Disable the flashback mode

SELECT FROM AS OF TIMESTAMP '2002-01-01 00:00:00'
Select the table in flashback mode at approximately that time
SELECT FROM A AS OF SCN 3254658;
Select the table in flashback mode exactly at that SCN
4.3 Database Configuration
Is advisable to set automatic undo management
For more information about setting automatic undo management, see Oracle
documentation.
Set the retention you need
UNDO_RETENTION=In seconds the time of undo information retained.
Using rollback segments, the rollback segments need to large enough to contain the
undo required to reconstruct the view of the object at the time you establish for the
flashback query.
If you get an
ORA-01555 snapshot too old: rollback segment number string with name "string"
too small
Means that you must increase the setting of UNDO_RETENTION or otherwise use
larger rollback segments.
4.4 Performance
4.4.1 Most performant with queries that normally do a small amount of
logical IO
Select small sets of data using indexes, rather than queries that require full table
scans. If you must do a full table scan, consider adding a parallel hint to the query.
4.4.2 Keep statistics current on tables involved in flashback queries
Because it uses the cost-based optimizer.
4.5 Privileges
FLASHBACK ANY TABLE to issue a flashback on any table, view or materialized
view in any schema. Or FLASHBACK object privilege to specific objects. ( Not
needed for DBMS_FLASBACK package).
Privilege to execute DBMS_FLASBACK package, if you use that package.

4.6 Examples and techniques
4.6.1 Inserting Data

We start creating the table, inserting the values and getting the proper SCN, after
every operation.
14:34:07 SQL> CREATE TABLE A ( B VARCHAR2(1))
14:34:23 2 ;
Table created.

14:34:25 SQL> SELECT DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER FROM
DUAL;
GET_SYSTEM_CHANGE_NUMBER
------------------------
3254181

14:57:58 SQL> INSERT INTO A VALUES( '1');
GET_SYSTEM_CHANGE_NUMBER
------------------------
3254652

14:58:22 SQL> INSERT INTO A VALUES( '2');
GET_SYSTEM_CHANGE_NUMBER
------------------------
3254653

14:58:26 SQL> INSERT INTO A VALUES( '3');
GET_SYSTEM_CHANGE_NUMBER
------------------------
3254654

14:58:30 SQL> commit;
Commit complete.

15:08:32 SQL> DELETE FROM A;
3 rows deleted.

15:08:36 SQL> COMMIT;
Commit complete.
4.6.2 Using SELECT AS OF
After we delete the data, we still can see the data
If you would like to use the time stamp you could use

SELECT * FROM A AS OF TIMESTAMP TO_TIMESTAMP('2003-01-01 01:00','YYYY-
MM-DD HH:MI')

15:08:39 SQL> SELECT * FROM A AS OF SCN 3254658;
B
-
1
2
3

15:08:41 SQL> SELECT * FROM A;
no rows selected
4.6.3 Using DBMS_FLASBACK package

If you would like to use the time stamp you could use
EXEC DBMS_FLASHBACK.ENABLE_AT_TIME(TO_TIMESTAMP('2003-01-01
01:00','YYYY-MM-DD HH:MI'));

But we use the SCN

15:14:27 SQL> EXEC
DBMS_FLASHBACK.ENABLE_AT_SYSTEM_CHANGE_NUMBER(3254658);
PL/SQL procedure successfully completed.

15:15:46 SQL> SELECT * FROM A;
B
-
5
1
2
3

15:15:52 SQL> EXEC DBMS_FLASHBACK.DISABLE;
PL/SQL procedure successfully completed.

15:15:55 SQL> SELECT * FROM A;
no rows selected

4.6.4 Use SCN to be more precise
You dont need to wait 5 minutes.

scott@ORA920> column scn new_val scn
scott@ORA920> select dbms_flashback.get_system_change_number scn from dual;

SCN
----------
25585211

scott@ORA920>
scott@ORA920> update emp set ename = lower(ename);
14 rows updated.

scott@ORA920>
scott@ORA920> select a.ename, b.ename
2 from emp a, emp as of scn &scn b
3 where a.empno = b.empno
4 /
old 2: from emp a, emp as of scn &scn b
new 2: from emp a, emp as of scn 25585211 b

ENAME ENAME
---------- ----------
smith SMITH
allen ALLEN
ward WARD
jones JONES
martin MARTIN
blake BLAKE
clark CLARK
scott SCOTT
king KING
turner TURNER
adams ADAMS
james JAMES
ford FORD
miller MILLER
14 rows selected.
4.6.5 Function to save SCN data
To get precision in your flashback queries is better to save the SCN number
periodically.
The system change number (SCN), a version number for the database that is
incremented on every commit.
You can use the function In Oracle 9i
DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER
In 8i you can use
USERENV('COMMITSCN')

create or replace procedure save_scn
as
pragma autonomous_transaction/* commit independently*/;
l_scn number;
begin
insert into scn
values (
SYSDATE,
DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER
);
commit;
end;
/
4.6.6 Create a view or a function to get old data
CREATE VIEW vew_example AS
SELECT * FROM example AS OF
TIMESTAMP (SYSTIMESTAMP - INTERVAL '120' MINUTE);
When using this technique, remember that daylight savings time and leap years can
cause anomalous results. For example, SYSDATE - 1might refer to 23 or 25 hours
ago, shortly after a change in daylight savings time.
4.6.7 Exporting using flashback option
You can export using flashback query:
Using the parameter flashback_scn, flashback_time.
For example
exp system/manager full=y file=flashbacktest.dmp flashback_scn=10002
4.6.8 How to get the date time for a specific SCN or vice versa
It only tracks the last five days and only in five minute increments.

You can query the table SMON_SCN_TIME
select * from SYS.SMON_SCN_TIME
1 row(s) selected
THREAD TIME_MP TIME_DP SCN_WRP SCN_BAS
---------- ---------- -------------------- ---------- ----------
1 1041971155 7-Ene-2003 16:25:57 0 3385087
1 1041971462 7-Ene-2003 16:31:04 0 3385188
1 1041971770 7-Ene-2003 16:36:12 0 3385289
4.7 Restrictions and Errors
4.7.1 Flashback is session-specific
You enable it only for your session.
4.7.2 Automatic Undo Management
The flashback query mechanism is most effective when you use automatic undo
management.
4.7.3 Time based flashback is granular within +- 5 mn
When you use time based flashback you get the data in a period between +- 5 mn.
DBMS_FLASHBACK.ENABLE_AT_TIME and AS OF TIMESTAMP maps to an
SCN value.
As the SCN-time is recorded every 5 minutes. The time you specify is rounded down
by up to 5 minutes from database startup.
This situation could create the ORA-01466 unable to read data - table definition has
changed.
There is only track of times up to a maximum of 5 days (Database up time).
4.7.4 Use SCN for precision when gathering data
Use SCN instead of time to be more precise.
4.7.5 Undo Data Invalidation
DDL statements like drop, modify, move, truncate invalidates undo data.
Not including storage attributes of a table as INITTRANS, MAXTRANS.
This situation creates the ORA-01466 unable to read data - table definition has
changed
Ontly the data if affected by a flashback query. Other information is got from data
dictionary, including current character set.
4.7.6 Materialized Views
Flashback queries against materialized views do not take advantage of query rewrite.
4.7.7 SYS User
Cannot make calls to the DBMS_FLASHBACK package.
But can use the AS OF clause and perform flashback queries, this not includes
V$VIEWS data
4.7.8 Database links
Cannot perform flashback queries on remote tables through database links
4.7.9 Cannot nest flashback calls
You must disable it before enabling it to a different time.
4.8 Features by Release
4.8.1 9.0.1
Release when this feature was introduced.
4.8.2 9.2.0
You can flashback without using the DBMS_FLASHBACK package.
SELECT AS OF SCN, .
CREATE TABLE AS SELECT AS OF SCN.
SELECT AS OF TIMESTAMP..
CREATE TABLE AS SELECT AS OF TIMESTAMP..
4.9 Bugs by Release
4.9.1 When you want to flashback again, after you returned from a
previous flashback you get an error
It seems to be a bug, that I had twice, I didnt apply all the patches.
Once, using AS OF and other with the package, I went to a point in the past, after I
returned to the current date, I couldnt go back again.
Thats why I suggest if you are in an emergency, get and save all the data you need
before disabling the flashback( you need to use the package)
And if is possible, do a full backup before, if its possible and if its really important.
6
5 TUNNING SECTION
5.1 Parameters
There is no a TUNED_DATABASE=TRUE, but the true is that under specific
circumstances, there are a specific parameter that can be as
TUNED_DATABASE=TRUE.
And in the same way are other parameters that can be as
TUNED_DATABASE=FALSE.
I know how hard is to find advise about some parameters, that is why I included in
this sections some parameters, at least me, one is always thinking if this could help to
improve performance.
7
6 Soft Parses & session_cached_cursor parameter
4.1 What its for?
4.1.1 SESSION_CACHED_CURSOR parameter
4.1.2 Important
4.2 Syntax
4.3 Evaluating the accuracy of the value
4.3.1 Stat: session cursor cache count
4.3.1.1 Query to evaluate
4.3.2 Parse vs. Execute in statistics
4.3.3 Note.-
4.4 Examples and techniques
4.4.1 Demonstrating the effect of changes in the parameter
session_cached_cursors
6.1 What its for?
Two kinds of parse calls exist, hard and soft.
"hard parse" occurs when the SQL or PL/SQL statement is not found in the shared
SQL area (shared pool), so a complete parsing is required (data dictionary object
descriptions user's privileges, generate the execution plan, etc). The most expensive
kind of parsing, and should be minimized for repeated execution.
The "soft parse", is performed when the statement is already in the shared pool (user
must be authenticated again, all name translations must be done once more, syntaxis
and security chekings), but the session lost the "link" to the shared portion, because
the cursor was closed, so that the private portion must be rebuilt and linked to its
shared portion again.
To eliminate soft parsing in COBOL, C, or other 3GL applications, the
precompiler option HOLD_CURSOR=YES should be used. Other options, such as
RELEASE_CURSOR and MAXOPENCURSORS, can be used in conjunction with
this to achieve optimal results.
For non-3GL programs (when you do not have the same degree of control over
cursors) such as Oracle Forms and other third-party tools, the cursors will
automatically be closed when a new form is called and switching from one form to
another closes all session cursors associated with the first form.
So if you subsequently return to the caller, at least a soft parse will be performed for
each cursor. In this case, you should enable this parameter that will keep a copy of
the user's cursors even though they are closed.

6.1.1 SESSION_CACHED_CURSOR parameter
Lets you specify the number of session cursors to cache.
After the first soft parse, subsequent soft parse calls will find the cursor in the
cache and do not need to reopen the cursor. To get placed in the session cache the
same statement has to be parsed 3 times within the same cursor. Oracle uses a least
recently used algorithm to remove entries in the session cursor cache to make room
for new entries when needed.
Session cached cursors is a great help in reducing latching that takes place due
to excessive soft parsing (where a program parses, executes, closes a statement
over and over)
Steven Adams says,
http://www.ixora.com.au/scripts/library.htm
The session cursor cache is an important facility for reducing load on the library
cache. In our opinion, the session_cached_cursors parameter should always be set to
at least 2. However, a larger value is normally beneficial.
Tom comment, (if steve adams said it, it is more then likely "true".
as they said -- a larger is normally beneficial. I am partial (opinion, no true science
here) to 100.)
6.1.2 Important
Be aware that this is done at the expense of increased memory allocation for
every session in the this will increase UGA memory which is in the PGA in
dedicated server mode and in the SGA in shared server mode.
An application to run optimally, is necessary to analyze how parsing works.
6.2 Syntax
You can set this parameter with
ALTER SESSION SET SESSION_CACHED_CURSOR = value
ALTER SYSTEM SET SESSION_CACHED_CURSOR = value [DEFERRED]
In parameter file
set SESSION_CACHED_CURSOR = (number), default value 0
6.3 Evaluating the accuracy of the value
Set to 50 the parameter SESSION_CACHED_CURSOR and evaluate if this is
enough.
6.3.1 Stat: session cursor cache count
Total number of cursors cached. This statistic is incremented only if
SESSION_CACHED_CURSORS > 0. This statistic is the most useful in
V$SESSTAT. If the value for this statistic in V$SESSTAT is close to the setting of
the SESSION_CACHED_CURSORS parameter, the value of the parameter should
be increased.
6.3.1.1 Query to evaluate
To evaluate this parameter you can save the information of every user every time he
logs off in a table, after that you can analyze it in different ways, here is one
example:
CREATE TABLE Stat_Session_Historic
(
UUSER VARCHAR2(100),
DDATE DATE,
SstatisticName VARCHAR2 (200),
VVALUE NUMBER (6)
)
/
CREATE OR REPLACE TRIGGER TGR_LOGOFF_STATS
BEFORE
LOGOFF
ON DATABASE

INSERT INTO Stat_Session_Historic
SELECT USER, SYSDATE, 'session cursor cache count', VALUE
FROM V$SESSTAT C
WHERE C.statistic# = (select STATISTIC# from v$statname where name = 'session
cursor cache count')
AND C.SID = (SELECT SID FROM V$SESSION WHERE USER#= UID )
/
-- And the select will be the first 10 of an
average of their statics in a period of time for user
SELECT UUSER, AVGVAL
FROM
( SELECT UUSER, AVG( VVALUE ) AVGVAL
FROM Stat_Session_Historic
where TRUNC(DDATE) = TRUNC(SYSDATE) only for today
GROUP BY UUSER
ORDER BY 2 DESC
)
WHERE ROWNUM < 10 -- First 10 cases

UUSER AVGVAL
--------------
ADM 5
SAF 1
Then you compare this value with the value you had set to the parameter, and then
decide to increase or decrease the parameter.
6.3.2 Parse vs. Execute in statistics
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.80 1.72 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 3 0.02 0.05 2 665 0 14
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 9 0.82 1.77 2 665 0 14
Misses in library cache during parse: 0
In this statistics we can see the is too soft parsing, parse = execute.
In a well tuned application
parse = 1, execute = some number greater than or equal to 1
The best way to speed something up is to NOT do it. Hence, don't parse 3 times,
just parse 1 time.
if you find yourself parsing the same statement more then 3 times and you really
cannot fix the code, session cached cursors can be of some assistance. if you
do not, it will not help nor hurt.
6.3.3 Note.-
The V$SESSION_CURSOR_CACHE view is not a measure of the
effectiveness of the SESSION_CACHED_CURSORS initialization parameter.
Soft parse % ratio, tells you if you have too many , hard parses, but they cannot
be fixed with session cached cursors.
6.4 Examples and techniques
6.4.1 Demonstrating the effect of changes in the parameter
session_cached_cursors
[email protected]> alter session set session_cached_cursors =0;
Session altered.
no cached cursors......
[email protected]> select a.name, b.value
2 from v$statname a, v$mystat b
3 where a.statistic# = b.statistic#
4 and lower(a.name) like '%cursor ca%'
5 /
8
NAME VALUE
------------------------------ ----------
session cursor cache hits 5 thats from logging in
session cursor cache count 0
[email protected]> declare
2 l_cnt number;
3 begin
4 for i in 1 .. 100
5 loop
6 execute immediate 'select count(*) from dual d1' into l_cnt;
7 end loop;
8 end;
9 /
PL/SQL procedure successfully completed.
[email protected]> select a.name, b.value
2 from v$statname a, v$mystat b
3 where a.statistic# = b.statistic#
4 and lower(a.name) like '%cursor ca%'
5 /
NAME VALUE
------------------------------ ----------
session cursor cache hits 5 no change
session cursor cache count 0
now, lets cache upto 10 cursors
[email protected]> alter session set session_cached_cursors=10;
Session altered.
[email protected]> declare
2 l_cnt number;
3 begin
4 for i in 1 .. 100
5 loop
6 execute immediate 'select count(*) from dual d2' into l_cnt;
7 end loop;
8 end;
9 /
PL/SQL procedure successfully completed.
[email protected]> select a.name, b.value
2 from v$statname a, v$mystat b
3 where a.statistic# = b.statistic#
4 and lower(a.name) like '%cursor ca%'
5 /
NAME VALUE
------------------------------ ----------
session cursor cache hits 104 99 more hits!
session cursor cache count 4
[email protected]>

Our first query in that loop didn't get a hit (we hadn't cached it yet), the
subsequent 99 did. It has to go through the mechanics of a pretending to do a
softparse (making sure things haven't been invalidated and such) but the code path is much
smaller.
6.4.2 Another script to evaluate this parameter
From Steven Adams
http://www.ixora.com.au/scripts/sql/session_cursor_cache.sql
Tom comment: the script looked reasonable to me.

9
7 open_cursors parameter
6.1 What its for?
6.1.1 Precompilers Programs
6.1.2 Heterogeneous Services
6.1.3 Relation with session_cached_cursors
6.2 Syntax
6.3 Evaluating the accuracy of the value
6.3.1 V$OPEN_CURSOR
6.3.2 Stat: opened cursors current
6.4 Examples
6.4.1 Closing Cursors
7.1 What its for?
Specifies the maximum number of open cursors (handles to private SQL areas) each
session can have at once. You can use this parameter to prevent a session from
opening an excessive number of cursors. This parameter also constrains the size of
the PL/SQL cursor cache which PL/SQL uses to avoid having to reparse as
statements are reexecuted by a user.
If the limit is exceeded an ORA-01000 error is fired, and you should have to
increase this parameters value.
This parameter can too be used in trigger cascading, when a statement in a trigger
body causes another trigger to be fired, the triggers are said to be cascading. Oracle
allows up to 32 triggers to cascade at any one time. However, you can effectively
limit the number of trigger cascades using the initialization parameter
OPEN_CURSORS, because a cursor must be opened for every execution of a
trigger.
If your program exceeds the limit imposed by OPEN_CURSORS, Oracle gives you
an error.
Assuming that a session does not open the number of cursors specified by
OPEN_CURSORS, there is no added overhead to setting this value higher than
actually needed.
Cursors are allocated 64 at a time up to OPEN_CURSORS so having it set high is
OK. The recommended value is between 0 and 10,000 open cursor will allocate an
array in the session space (smallish). 200 would be fine for most. Reports, Forms, etc
they all use a large number of cached cursors. 500-1000 (recommend 1000).
It should be noted that OPEN_CURSORS simply allocates a fixed number of slots
but does not allocate memory for these slots for a client (eg: it sets an array up to
have 1,000 cursors for example but does not allocate 1,000 cursors).
The management of private SQL areas is the responsibility of the user process. The
allocation and deallocation of private SQL areas depends largely on which
application tool you are using, although the number of private SQL areas that a user
process can allocate is always limited by the initialization parameter It is important
to set the value of OPEN_CURSORS high enough to prevent your application from
running out of open cursors. The number will vary from one application to another.
Applications should close unneeded cursors to conserve system memory. If a cursor
cannot be opened due to a limit on the number of cursors

To take advantage of the additional memory available for shared SQL areas, you
may also need to increase the number of cursors permitted per session. You can
increase this limit by increasing the value of the initialization parameter
OPEN_CURSORS.
Be careful where you place a recursive call. If you place it inside a cursor FOR loop
or between OPEN and CLOSE statements, another cursor is opened at each call. As
a result, your program might exceed the limit set by the Oracle initialization
parameter OPEN_CURSORS.
7.1.1 Precompilers Programs
When writing precompiler programs, increasing the number of cursors using
MAX_OPEN_CURSORS can often reduce the frequency of parsing and improve
performance.
Oracle allocates an additional cache entry if it cannot find one to reuse. For example,
if MAXOPENCURSORS=8 and all eight entries are active, a ninth is created. If
necessary, Oracle keeps allocating additional cache entries until it runs out of
memory or reaches the limit set by OPEN_CURSORS. This dynamic allocation adds
to processing overhead.
MAXOPENCURSORS specifies the initial size of the cursor cache. If a new cursor
is needed and there are no free cache entries, the server tries to reuse an entry. Its
success depends on the values of HOLD_CURSOR and RELEASE_CURSOR and,
for explicit cursors, on the status of the cursor itself.
If the value of MAXOPENCURSORS is less than the number of cache entries
actually needed, the server uses the first cache entry marked as reusable. For
example, suppose an INSERT statement's cache entry E(1) is marked as reusable,
and the number of cache entries already equals MAXOPENCURSORS. If the
program executes a new statement, cache entry E(1) and its private SQL area might
be reassigned to the new statement. To reexecute the INSERT statement, the server
would have to reparse it and reassign another cache entry.
Thus, specifying a low value for MAXOPENCURSORS saves memory but causes
potentially expensive dynamic allocations and deallocations of new cache entries.
Specifying a high value for MAXOPENCURSORS assures speedy execution but
uses more memory.
A system-wide limit of cursors for each session is set by the initialization parameter
named OPEN_CURSORS found in the parameter file (such as INIT.ORA).
7.1.2 Heterogeneous Services
HS_OPEN_CURSORS FOR Heterogeneous Services, defines the maximum number
of cursors that can be open on one connection to a non-Oracle system instance.
7.1.3 Relation with session_cached_cursors
None relation.
session_cached_cursors -- how many cached CLOSED cursors you can have.
open_cursor -- how many concurrently opened cursors you can have.
ops$tkyte@ORA920> show parameter _cursors
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
open_cursors integer 50
session_cached_cursors integer 100
that means, "you cannot have more then 50 open at the same time - but we
might cache 100 of them for you off to the side..."
ops$tkyte@ORA920> @mystat cursor
ops$tkyte@ORA920> select a.name, b.value
2 from v$statname a, v$mystat b
3 where a.statistic# = b.statistic#
4 and lower(a.name) like '%' || lower('&1')||'%'
5 /
old 4: and lower(a.name) like '%' || lower('&1')||'%'
new 4: and lower(a.name) like '%' || lower('cursor')||'%'
NAME VALUE
------------------------------ ----------
opened cursors cumulative 26
opened cursors current 9
session cursor cache hits 0
session cursor cache count 13
cursor authentications 1

ops$tkyte@ORA920> declare
2 type rc is ref cursor;
3
4 l_cursor rc;
5 begin
6 for i in 1 .. 100
7 loop
8 for j in 1 .. 5
9 loop
10 open l_cursor for 'select * from dual xx' || i;
11 close l_cursor;
12 end loop;
13 end loop;
14 end;
15 /
PL/SQL procedure successfully completed.

ops$tkyte@ORA920>
ops$tkyte@ORA920> @mystat cursor
ops$tkyte@ORA920> select a.name, b.value
2 from v$statname a, v$mystat b
3 where a.statistic# = b.statistic#
4 and lower(a.name) like '%' || lower('&1')||'%'
5 /
old 4: and lower(a.name) like '%' || lower('&1')||'%'
new 4: and lower(a.name) like '%' || lower('cursor')||'%'
NAME VALUE
------------------------------ ----------
opened cursors cumulative 529
opened cursors current 9
session cursor cache hits 400
session cursor cache count 100
cursor authentications 1
that shows I've 100 cursors in my "cache" ready to be opened faster then
normal -- but I never exceeded my 50 open cursors at a time threshold.
10
7.2 Syntax
You can set this parameter with
ALTER SESSION SET OPEN_CURSORS = value
ALTER SYSTEM SET OPEN_CURSORS = value [DEFERRED]
In parameter file
set OPEN_CURSORS = (number), default value 50
7.3 Evaluating the accuracy of the value
While executing an embedded PL/SQL block, one cursor. the parent cursor, is
associated with the entire block and one cursor, the child cursor, is associated with
each SQL statement in the embedded PL/SQL block. Both parent and child cursors
count toward the OPEN_CURSORS limit.
The following calculation shows how to determine the maximum number of cursors
used. The sum of the cursors used must not exceed OPEN_CURSORS.
SQL statement cursors
PL/SQL parent cursors
PL/SQL child cursors
+ 6 cursors for overhead
--------------------------
Sum of cursors in use
The Oracle9i default of 50 or so is too small to accommodate Oracle Internet
Directory server cursor cache. Note that this value is not dependent on other Oracle
Internet Directory server parameters, such as # SERVERS and # WORKERS. The
value of 200 is sufficient for any size DIT.
7.3.1 V$OPEN_CURSOR
V$OPEN_CURSOR represents a set of cached cursors the server has for you.
7.3.2 Stat: opened cursors current
Is Total number of current open cursors
This statistics gives you the actual number of truely open cursors
--For current session
select a.value, b.name
from v$mystat a, v$statname b
where a.statistic# = b.statistic#
and b.name = 'opened cursors current', for current session
--For all sessions
select a.sid, a.value, b.name
from v$sesstat a, v$statname b
where a.statistic# = b.statistic#
and b.name = 'opened cursors current'
order by value desc
7.4 Examples
7.4.1 Closing Cursors
7.4.1.1 Closing ref cursor explicitly
How, would I close a ref cursor, after I fetch from it.
It depends on the language.
Pro*c: EXEC SQL CLOSE :ref_cursor_variable;
SQLPlus: mplicit
PLSQL: Close ref_cursor_variable;
Java: rset.close();
and so on.
7.4.1.2 Closing cursor in PLSQL
[email protected]> create or replace package types
2 as
3 type rc is ref cursor;
4 end;
5 /
Package created.

[email protected]>
[email protected]> create or replace function foo return
types.rc
2 as
3 l_cursor types.rc;
4 begin
5 open l_cursor for select * from dual;
6 return l_cursor;
7 end;
8 /
Function created.

[email protected]>
[email protected]> create or replace procedure bar
2 as
3 l_cursor types.rc;
4 l_rec dual%rowtype;
5 begin
6 l_cursor := foo;
7 loop
8 fetch l_cursor into l_rec;
9 exit when l_cursor%notfound;
10 dbms_output.put_line( l_rec.dummy );
11 end loop;
12 close l_cursor;
13 end;
14 /
Procedure created.

[email protected]>
[email protected]> exec bar
X

PL/SQL procedure successfully completed.
7.4.1.3 Closing cursor in Java
Cursors will remain there until you run out of slots in your OPEN CURSOR array --
at which point they are flushed if not currently being used (plsql lets them "go
away" if and when the server needs that slot)
They do not count against you, they are there for performance. It is an
EXCELLENT reason why most java programs entire suite of SQL should consist of
nothing more then begin .... end; -- never any actual DML of its own. More
manageable, more flexible.
You can test this out yourself by using this:
create or replace package demo_pkg
as
type refcur is ref cursor;

procedure get_cur( x in out refcur );
end;
/
create or replace package body demo_pkg
as
g_first_time boolean default true;
procedure get_cur( x in out refcur )
is
l_user varchar2(1000);
begin
open x for select USER from dual THIS_IS_A_JAVA_CURSOR;
if ( g_first_time )
then
select user
into l_user
from dual THIS_IS_PLSQL where rownum = 1;
select user
into l_user
from dual THIS_TOO_IS_PLSQL where rownum = 1;
g_first_time := false;
end if;
end;
end;
/

that plsql only needs the cursors for a bit -- we don't need them everytime...

Now I modified the java to be:

public static void main (String args [])
throws SQLException, ClassNotFoundException
{
String query =
"begin demo_pkg.get_cur( :1 ); end;";

DriverManager.registerDriver
(new oracle.jdbc.driver.OracleDriver());

Connection conn=
DriverManager.getConnection
("jdbc:oracle:oci8:@ora817dev",
"scott", "tiger");

showOpenCnt( conn, "Before Anything" );
11

CallableStatement cstmt = conn.prepareCall(query);
cstmt.registerOutParameter(1,OracleTypes.CURSOR);

for( int j = 0; j < 100; j++ )
{
cstmt.execute();
showOpenCnt( conn, j + ") After prepare and execute" );

ResultSet rset = (ResultSet)cstmt.getObject(1);

for(int i = 0; rset.next(); i++ );
}

cstmt.close();
showOpenCnt( conn, "After CallableStatement closes" );
}

I don't close the result sets - we just let them leak all over the place. I
have open_cursors set to 50 and run:

> !java
java curvar
====================================
Before Anything
====================================
1 opened cursors current
-----------------------
Open Cursors Currently
SID***8 SELECT VALUE FROM NLS_INSTANCE_PARAMETERS WHERE
PARAMETER ='
SID***8 select a.value, b.name from v$mystat a, v$statname b where a
SID***8 ALTER SESSION SET NLS_TERRITORY = 'AMERICA'
SID***8 ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
SID***8 select sid, sql_text from v$open_cursor where sid = (select
SID***8 ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN'
NLS_TERRITORY= 'A
-----------------------
====================================
0) After prepare and execute
====================================
5 opened cursors current
-----------------------
Open Cursors Currently
SID***8 SELECT VALUE FROM NLS_INSTANCE_PARAMETERS WHERE
PARAMETER ='
SID***8 select a.value, b.name from v$mystat a, v$statname b where a
SID***8 SELECT USER FROM DUAL THIS_IS_PLSQL WHERE ROWNUM
= 1
SID***8 begin demo_pkg.get_cur( :1 ); end;
SID***8 ALTER SESSION SET NLS_TERRITORY = 'AMERICA'
SID***8 ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
SID***8 select sid, sql_text from v$open_cursor where sid = (select
SID***8 ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN'
NLS_TERRITORY= 'A
SID***8 SELECT USER FROM DUAL THIS_TOO_IS_PLSQL WHERE
ROWNUM = 1
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
-----------------------
====================================
1) After prepare and execute
====================================

6 opened cursors current
-----------------------
Open Cursors Currently
SID***8 SELECT VALUE FROM NLS_INSTANCE_PARAMETERS WHERE
PARAMETER ='
SID***8 select a.value, b.name from v$mystat a, v$statname b where a
SID***8 SELECT USER FROM DUAL THIS_IS_PLSQL WHERE ROWNUM
= 1
SID***8 begin demo_pkg.get_cur( :1 ); end;
SID***8 ALTER SESSION SET NLS_TERRITORY = 'AMERICA'
SID***8 ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
SID***8 select sid, sql_text from v$open_cursor where sid = (select
SID***8 ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN'
NLS_TERRITORY= 'A
SID***8 SELECT USER FROM DUAL THIS_TOO_IS_PLSQL WHERE
ROWNUM = 1
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
-----------------------

note that after each iteration I got more and more "this is a java cursor".
The plsql guys stayed in there.... UNTIL:

====================================
45) After prepare and execute
====================================
50 opened cursors current
-----------------------
Open Cursors Currently
SID***8 select a.value, b.name from v$mystat a, v$statname b where a
SID***8 SELECT USER FROM DUAL THIS_IS_PLSQL WHERE
ROWNUM = 1
SID***8 begin demo_pkg.get_cur( :1 ); end;
SID***8 select sid, sql_text from v$open_cursor where sid = (select
SID***8 ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN'
NLS_TERRITORY= 'A
SID***8 SELECT USER FROM DUAL THIS_TOO_IS_PLSQL WHERE
ROWNUM = 1
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
<lots of those chopped out>
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
-----------------------
====================================
46) After prepare and execute
====================================
49 opened cursors current
-----------------------
Open Cursors Currently
SID***8 select a.value, b.name from v$mystat a, v$statname b where a
SID***8 begin demo_pkg.get_cur( :1 ); end;
SID***8 select sid, sql_text from v$open_cursor where sid = (select
SID***8 ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN'
NLS_TERRITORY= 'A
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
<lots chopped NOTE: PLSQL cursors *gone*>
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
-----------------------
====================================
47) After prepare and execute
====================================
50 opened cursors current
-----------------------
Open Cursors Currently
SID***8 select a.value, b.name from v$mystat a, v$statname b where a
SID***8 begin demo_pkg.get_cur( :1 ); end;
SID***8 select sid, sql_text from v$open_cursor where sid = (select
SID***8 ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN'
NLS_TERRITORY= 'A
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
...
SID***8 SELECT USER FROM DUAL THIS_IS_A_JAVA_CURSOR
-----------------------

java.sql.SQLException: ORA-01000: maximum open cursors exceeded

at java.lang.Throwable.<init>(Compiled Code)
at java.lang.Exception.<init>(Compiled Code)
at java.sql.SQLException.<init>(Compiled Code)
at oracle.jdbc.dbaccess.DBError.throwSqlException(Compiled Code)
at oracle.jdbc.oci8.OCIDBAccess.check_error(Compiled Code)
at oracle.jdbc.oci8.OCIDBAccess.parseExecuteDescribe(Compiled Code)
at oracle.jdbc.driver.OracleStatement.doExecuteQuery(Compiled Code)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(Compiled
Code)
at oracle.jdbc.driver.OracleStatement.executeQuery(Compiled Code)
at curvar.showOpenCnt(Compiled Code)
at curvar.main(Compiled Code)

12
8 cursor_space_for_time parameter
7.1 What its for?
7.2 Syntax
8.1 What its for?
If you have no library cache misses, then you might be able to accelerate execution
calls by setting the value of the initialization parameter

TRUE. This parameter specifies whether a cursor can be deallocated from the library
cache to make room for a new SQL statement.
Lets you use more space for cursors in order to save time. It affects both the shared
SQL area and the client's private SQL area.
Then a cursor can be deallocated only when all application cursors associated with
its statement are closed. In this case, Oracle need not verify that a cursor is in the
cache, because it cannot be deallocated while an application cursor associated with it
is open.
Do not set to true if you have found library cache misses on execution calls. Such
library cache misses indicate that the shared pool is not large enough to hold the
shared SQL areas of all concurrently open cursors. If the value is true, and if the
shared pool has no space for a new SQL statement, then the statement cannot be
parsed, and Oracle returns an error saying that there is no more shared memory.
Do not set to true if the amount of memory available to each user for private SQL
areas is scarce. This value also prevents the deallocation of private SQL areas
associated with open cursors. If the private SQL areas for all concurrently open
cursors fills your available memory so that there is no space for a new SQL
statement, then the statement cannot be parsed. Oracle returns an error indicating that
there is not enough memory.
Setting the value of the parameter to true saves Oracle a small amount of time and
can slightly improve the performance of execution calls. This value also prevents the
deallocation of cursors until associated application cursors are closed.
If you do set this parameter to TRUE be aware that:
If the SHARED_POOL is too small for the workload then an ORA-4031 is much
more likely to be signalled.
If your application has any cursor leak then the leaked cursors can waste large
amounts of memory having an adverse effect on performance after a period of
operation.
Some DBAs, like Nathan Hughes in Special Edition Using Oracle8
Says:
Do not change CURSOR_SPACE_FOR_TIME from its default value of FALSE if
any of the following apply to your situation:
l RELOADS in V$LIBRARY_CACHE always shows a 0 value.
l You are using Oracle or SQL*Forms.
l You use any dynamic SQL. , but they dont explain why.

FALSE (the default), then a cursor can be deallocated from the library cache
regardless of whether application cursors associated with its SQL statement are open.
In this case, Oracle must verify that the cursor containing the SQL statement is in the
library cache.
If there is no space for a new statement, then Oracle deallocates an existing cursor.
Although deallocating a cursor could result in a library cache miss later (only if the
cursor is reexecuted), it is preferable to an error halting your application because a
SQL statement cannot be parsed.

This parameter should not be set to true, unless there is an application for it (e.g.
a bar-code scanner application that continually scans and runs the same query
over and over again)
Gaja Krishna Vaidyanatha
http://www.quest.com/whitepapers/orcl_db_mgmt.pdf

Tom Kyte suggestion, I would leave it at its default (i do leave it at its default!), use
session_cached_cursors instead.
8.2 Syntax
CURSOR_SPACE_FOR_TIME true | false, default false
9 pre_page_sga parameter
8.1 What its for?
8.2 Syntax
9.1 What its for?
Determines whether Oracle reads the entire SGA into memory at instance startup.
Operating system page table entries are then prebuilt for each page of the SGA.
It is likely to decrease the amount of time necessary for Oracle to reach its full
performance capacity after startup.
This setting can increase the amount of time necessary for instance startup, because
every process that starts must access every page in the SGA, overhead can be
significant if your system frequently creates and destroys processes by, for example,
continually logging on and logging off.
You could try to use this parameter if you have enough memory and verify there is
no an important overhead every time you login to the database.

Note:
This setting does not prevent your operating system from paging or swapping the
SGA after it is initially read into memory.

Tom Kyte comment: i've never actually used it myself.
9.2 Syntax
PRE_PAGE_SGA true | false, default false

13
10 binding, cursor_sharing parameter
10.1 What its for?
10.1.1 Shared Cursors
One of the first stages of parsing is to compare the text of the statement with existing
statements in the shared pool to see if the statement can be shared. If the statement
differs textually in any way, then Oracle does not share the statement, and too parse
again the statement, this uses several resources, and is negative to performance.
Reuse of shared SQL for multiple users running the same application, avoids hard
parsing. Soft parses provide a significant reduction in the use of resources such as the
shared pool and library cache latches. To share cursors, do the following:
Use bind variables rather than literals in SQL statements whenever possible.
For example, the following two statements cannot use the same shared area
because they do not match character for character:
SELECT employee_id FROM employees WHERE department_id = 10;
SELECT employee_id FROM employees WHERE department_id = 20;
By replacing the literals with a bind variable, only one SQL statement would
result, which could be executed twice:
SELECT employee_id FROM employees WHERE department_id = :dept_id;
Avoid application designs that result in large numbers of users issuing
dynamic, unshared SQL statements. Typically, the majority of data required by
most users can be satisfied using preset queries. Use dynamic SQL where such
functionality is required.
Be sure that users of the application do not change the optimization approach
and goal for their individual sessions.
Establish the following policies for application developers:
Standardize naming conventions for bind variables and spacing conventions for
SQL statements and PL/SQL blocks. Consider using stored procedures
whenever possible. Multiple users issuing the same stored procedure use the
same shared PL/SQL area automatically. Because stored procedures are stored
in a parsed form, their use reduces runtime parsing.
10.1.2 SQL Sharing Criteria
Oracle automatically determines whether a SQL statement or PL/SQL block being
issued is identical to another statement currently in the shared pool.
Oracle performs the following steps for the comparison:
1. The text of the statement issued is compared to existing statements in the
shared pool.
SELECT * FROM employees WHERE id = cVariableID;
uses the same statement in the shared pool, for different values in the variable
cVariableID, this statement is parsed only one time, 1 hard parse.
But if instead of using a variable you use a value
SELECT * FROM employees WHERE id = 1
SELECT * FROM employees WHERE id = 2
SELECT * FROM employees WHERE id = 3
Every one uses a distinct statement in the shared pool, parsing every one of
them, 3 hard parses.
2. The text of the statement is hashed. If there is no matching hash value, then the
SQL statement does not currently exist in the shared pool, and a hard parse is
performed.
If there is a matching hash value for an existing SQL statement in the shared
pool, then Oracle compares the text of the matched statement to the text of the
statement hashed to see if they are identical. The text of the SQL statements or
PL/SQL blocks must be identical, character for character, including spaces,
case, and comments.

For example, the following statements cannot use the same shared SQL area:
SELECT * FROM employees; SELECT * FROM Employees;
SELECT * FROM employees;
Usually, SQL statements that differ only in literals cannot use the same shared
SQL area. For example, the following SQL statements do not resolve to the
same SQL area:
SELECT count(1) FROM employees WHERE manager_id = 121;
SELECT count(1) FROM employees WHERE manager_id = 247;
The only exception to this rule is when the parameter CURSOR_SHARING
has been set to SIMILAR or FORCE. Similar statements can share SQL areas
when the CURSOR_SHARING parameter is set to SIMILAR or FORCE. The
costs and benefits involved in using CURSOR_SHARING are explained later
in this section.
4. The objects referenced in the issued statement are compared to the referenced
objects of all existing statements in the shared pool to ensure that they are
identical.
References to schema objects in the SQL statements or PL/SQL blocks must
resolve to the same object in the same schema.
For example, if two users each issue the following SQL statement:
SELECT * FROM employees;
and they each have their own employees table, then this statement is not
considered identical, because the statement references different tables for each
user.
5. Bind variables in the SQL statements must match in name, datatype, and
length.
For example, the following statements cannot use the same shared SQL area,
because the bind variable names differ:
SELECT * FROM employees WHERE department_id = :department_id;
SELECT * FROM employees WHERE department_id = :dept_id;
Many Oracle products (such as Oracle Forms and the precompilers) convert the
SQL before passing statements to the database. Characters are uniformly
changed to uppercase, white space is compressed, and bind variables are
renamed so that a consistent set of SQL statements is produced.
6. The session's environment must be identical. Items compared include the
following:
Optimization approach and goal. SQL statements must be optimized
using the same optimization approach and, in the case of the cost-based
approach, the same optimization goal.
Session-configurable parameters such as SORT_AREA_SIZE.
10.1.3 When to set CURSOR_SHARING to SIMILAR or FORCE
The optimal solution is to write sharable SQL
(CURSOR_SHARING=EXACT), rather than rely on the CURSOR_SHARING
parameter. This is because although CURSOR_SHARING does significantly reduce
the amount of resources used by eliminating hard parses, it requires some extra work
as a part of the soft parse to find a similar statement in the shared pool.
Consider setting CURSOR_SHARING to SIMILAR or FORCE if you can answer
'yes' to both of the following questions:
1. Are there statements in the shared pool that differ only in the values of
literals?
2. Is the response time low due to a very high number of library cache
misses?
Binding not is always the best, to use histograms you need to use literal parameters,
for example to get the best execution plan in a table that sex field 10%man (index
access) and 90% women(full table scan access)
10.1.3.1 EXACT
Setting CURSOR_SHARING to EXACT allows SQL statements to share the SQL
area only when their texts match exactly. This is the default behavior. Using this
setting, similar statements cannot shared; only textually exact statements can be
shared.
10.1.3.2 SIMILAR and FORCE
When CURSOR_SHARING is used set to SIMILAR or FORCE , Oracle first checks
the shared pool to see if there is an identical statement in the shared pool. If an
identical statement is not found, then Oracle searches for a similar statement in the
shared pool. If the similar statement is there, then the parse checks continue to verify
the executable form of the cursor can be used. If the statement is not there, then a
hard parse is necessary to generate the executable form of the statement.
Using CURSOR_SHARING = SIMILAR (or FORCE) can significantly improve
cursor sharing on some applications that have many similar statements, resulting in
reduced memory usage, faster parses, and reduced latch contention.
Statements that are identical, except for the values of some literals, are called similar
statements. Similar statements pass the textual check in the parse phase when the
CURSOR_SHARING parameter is set to SIMILAR or FORCE. Textual similarity
does not guarantee sharing. The new form of the SQL statement still needs to go
through the remaining steps of the parse phase to ensure that the execution plan of
the preexisting statement is equally applicable to the new statement.
Setting CURSOR_SHARING to either SIMILAR or FORCE allows similar
statements to share SQL.
SIMILAR , Causes statements that may differ in some literals, but are otherwise
identical, to share a cursor, unless the literals affect either the meaning of the
statement or the degree to which the plan is optimized.
FORCE, Forces statements that may differ in some literals, but are otherwise
identical, to share a cursor, unless the literals affect the meaning of the statement.
The difference between SIMILAR and FORCE is that SIMILAR forces similar
statements to share the SQL area without deteriorating execution plans. Setting
CURSOR_SHARING to FORCE forces similar statements to share the executable
SQL area, potentially deteriorating execution plans. Hence, FORCE should be used
as a last resort, when the risk of suboptimal plans is outweighed by the
improvements in cursor sharing.
10.2 Syntax
CURSOR_SHARING SIMILAR | EXACT | FORCE}, default EXACT
10.3 Evaluating this parameter
We stated that the correct solution is to fix the code and use EXACT, but you can
test this value and see the effect in performance

14
In Try
OLTP FORCE
Mixed workload SIMILAR
(where you need a different plan for some of the queries)
DSS/DW EXACT
10.4 Hints
10.4.1 /*+ CURSOR_SHARING_EXACT */
Oracle can replace literals in SQL statements with bind variables, if it is safe to do
so. This is controlled with the CURSOR_SHARING startup parameter. The
CURSOR_SHARING_EXACT hint causes this behavior to be switched off. In other
words, Oracle executes the SQL statement without any attempt to replace literals by
bind variables.
10.5 Notes
Forcing cursor sharing among similar (but not identical) statements can have
unexpected results in some DSS applications, or applications that use stored outlines.
Oracle can force similar statements to share SQL by replacing literals with system-
generated bind variables. This works with plan stability if the outline was generated
using the CREATE_STORED_OUTLINES parameter, not the CREATE OUTLINE
statement. Also, the outline must have been created with the CURSOR_SHARING
parameter set to SIMILAR or FORCE, and the parameter must also set to SIMILAR
or FORCE when attempting to use the outline.
Orace does not recommend setting CURSOR_SHARING to FORCE in a DSS
environment or if you are using complex queries. Also, star transformation is not
supported with CURSOR_SHARING set to either SIMILAR or FORCE.
Setting CURSOR_SHARING to SIMILAR or FORCE causes an increase in the
maximum lengths (as returned by DESCRIBE) of any selected expressions that
contain literals (in a SELECT statement). However, the actual length of the data
returned does not change.
Setting CURSOR_SHARING to FORCE or SIMILAR prevents any outlines
generated with literals from being used if they were generated with
CURSOR_SHARING set to EXACT.
To use stored outlines with CURSOR_SHARING=FORCE or SIMILAR, the
outlines must be generated with CURSOR_SHARING set to FORCE or SIMILAR
and with the CREATE_STORED_OUTLINES parameter.
Shared SQL may be less appropriate for data warehousing applications. Also, setting
CURSOR_SHARING to FORCE or SIMILAR may affect the execution plans of the
statements.
10.6 Examples and techniques
10.6.1 Similar
The best bet -- have the application use bind variables WHERE appropriate and
constants where NOT appropriate.
Nothing else will come even marginally close!!!
In the above example, the optimizer, under similar, should detect that binding
"deptno = constant" would be bad due to the skewed data and would leave the
statement "as is"
For example:
[email protected]> /*
DOC>
DOC>drop table emp;
DOC>create table emp as select * from scott.emp;
DOC>exec gen_data( 'EMP', 50000 )
DOC>update emp set deptno = 99;
DOC>update emp set deptno = 1 where rownum = 1;
DOC>create index dept_idx on emp(deptno);
DOC>*/
[email protected]>
[email protected]> analyze table emp compute statistics
2 for table
3 for all indexes
4 for all indexed columns
5 /
Table analyzed.

[email protected]>
[email protected]> alter system flush shared_pool;
System altered.

[email protected]> set autotrace traceonly explain
[email protected]> select * from emp where deptno = 99;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=49999 Bytes=2749945)
1 0 TABLE ACCESS (FULL) OF 'EMP' (Cost=41 Card=49999 Bytes=2749945)

[email protected]> select * from emp where deptno = 1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=55)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (Cost=2 Card=1 Bytes=55)
2 1 INDEX (RANGE SCAN) OF 'DEPT_IDX' (NON-UNIQUE) (Cost=1 Card=1)
that shows, given constants -- the queries would do different things... So, we
test:

[email protected]> set autotrace off
[email protected]>
[email protected]> alter session set cursor_sharing=similar;
Session altered.

[email protected]> select * from emp where deptno = 1;
EMPNO ENAME JOB MGR HIREDATE SAL COMM
DEPTNO
---------- ---------- --------- ---------- --------- ---------- ----------
143 wFZVhMqDxB QjPZKgTZw 3520 15-SEP-03 17353.19 80462.15
1
[email protected]> alter session set cursor_sharing=exact;

Session altered.
[email protected]>
[email protected]> select sql_text from v$sql
2 where sql_text like 'select * from emp where deptno =%';
SQL_TEXT
------------------------------------------------------------
select * from emp where deptno = 1
select * from emp where deptno = 99

and we can see that it left it be. No binds. Now, we try again but take away
some information:
[email protected]> alter system flush shared_pool;

[email protected]> analyze table emp delete statistics;
Table analyzed.
[email protected]> analyze table emp compute statistics
2 for table
3 for all indexes
4 /

[email protected]> alter session set cursor_sharing=similar;

[email protected]> select * from emp where deptno = 1;
EMPNO ENAME JOB MGR HIREDATE SAL COMM
DEPTNO
---------- ---------- --------- ---------- --------- ---------- ----------
143 wFZVhMqDxB QjPZKgTZw 3520 15-SEP-03 17353.19 80462.15
1

[email protected]> alter session set cursor_sharing=exact;

[email protected]> select sql_text from v$sql
2 where sql_text like 'select * from emp where deptno =%';

SQL_TEXT
------------------------------------------------------------
select * from emp where deptno = :"SYS_B_0"
[email protected]>

and here we did bind -- because the plans would NOT change in this case. The
optimizer didn't have enough data to tell if the plans would change
But remember -- it is all just software. The right answer: you use binds when
you want to, you don't use them when you don't want to.

10.7 Views
10.7.1 GV$SQL_BIND_DATA and V$SQL_BIND_DATA
For each distinct bind variable in each cursor owned by the session querying this
view, this view describes:
Actual bind data, if the bind variable is user defined
The underlying literal, if the CURSOR_SHARING parameter is set to
FORCE and the bind variable is system generated. (System-generated
binds have a value of 256 in the SHARED_FLAG2 column.) .
To see only the rows corresponding to internal binds, you can issue:
SELECT * FROM V$SQL_BIND_DATA
WHERE BITAND(SHARED_FLAG2,256) = 256
10.7.2 V$SQL
V$SQL lists statistics on shared SQL area without the GROUP BY clause and
contains one row for each child of the original SQL text entered.
10.7.3 GV$SQL_BIND_METADATA and V$SQL_BIND_METADATA
For each distinct bind variable in each cursor owned by the session querying this
view, this view describes:
Bind metadata provided by the client, if the bind variable is user defined
Metadata based on the underlying literal, if the CURSOR_SHARING
parameter is set to FORCE and the bind variable is system-generated.

15
11 Getting statistics
4.1 What its for?
4.2 Syntax
4.3 Evaluating the accuracy of the value
4.3.1 Stat: session cursor cache count
4.3.2 Stat: session cursor cache hits
4.3.3 Note.-
4.4 Examples and techniques
11.1 What its for?
11.2 Syntax
11.3 Guidelines
11.3.1 Dont use statistic#, because it can change from release to release
1* select * from v$statname where statistic# = 207
ops$tkyte@ORA920> /

STATISTIC# NAME CLASS
---------- ------------------------------ ----------
207 cursor authentications 128
ops$tkyte@ORA817DEV> select * from v$statname where statistic# = 207
2 /
STATISTIC# NAME CLASS
---------- ------------------------------ ----------
207 PX remote messages recv'd 32
ops$tkyte@ORA9I> select * from v$statname where statistic# = 207
2 /
STATISTIC# NAME CLASS
---------- ------------------------------ ----------
207 PX remote messages sent 32
ops$tkyte@ORA815> select * from v$statname where statistic# = 207
2 /
STATISTIC# NAME CLASS
---------- ------------------------------ ----------
207 OS Swaps 16
11.4 Evaluating the accuracy of the value
11.5 Examples and techniques
.
16
12 Tunning
12.1 What its for?
Tunning means
Faster
Less resources ( CUP, disk reads, hardware, memory )
More concurrent users at the same time
Happy customers
* Before reading this article, is assumed you had read at least once Concepts and
Database Performance Tuning Guide and Reference.
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/toc.htm,
12.2 Introduction
Tuning is related to network, hardware, database software, client software DBAs,
network administrators and user, and all the bugs and human error that all of them
has.
Once a problem appear is not always easy to detect, and the more suspicious is
responsible to find it, or at least demonstrate he is not the responsible. That is not
always easy, because some problems appears when you work with oracle, and other
application like Microsoft word, excel, etc. Dont suffer as consequence of this
problems. And the solution are required for yesterday, always tuning is required as
the last resource.
You not only need to know about Oracle, you need to know about the operating
system (NT, Unix, etc. ) for the database and for the client. You need to know how
clients uses the software, how backups are done, if there is another software installed
in the server where Oracle is installed, etc.
Every serious tuning problem seems to be different from the previous, there is not a
magical solution for this.
I think always there is something more to tune, so your principal objective is that you
reach the performance you need and avoid the possibility that critical process loose
performance suddenly.
You are not going to learn to tune here, you need to read more than once the
performance tuning Oracle documentation, and buy some book ( Tom Kytes book is
a good one, even when his goal is not to be a book for tuning, and there are several
like this ).
Following is the description of all the tricks I had got and used from.
12.3 Test database or at least make a Backup
As ever if is possible you should made changes in a test database.
If not you must be sure you have a backup before tuning because several things can
go wrong.
And warn about the risks there are always when working directly in the production
database and that is possible you should have to recover the database, try to do one
after they do their own backup, because is possible that they couldnt recover the
database from his backup.
12.4 Priority
The order of priority in which you can increase performance are based on my
experience and the experience from others:
Application design, moving a procedure to the database, insert directly
from a select instead using loops I got a process that took 3 hours to take
1 minute.
Improving queries, some times you can got an improvement from 0:30 to
1s.
Hardware, there is a big difference between using Intel Pentium IV and
Intel Pentium II for several process.
Finally other kind of tunings that cant be underestimated ( general oracle
configuration, server configuration, hardware configuration, network
configuration, etc.), for example increase block caching, even when using
page memory, can temporally solve serious performance problems.
12.5 Database Configuration
12.5.1 Set this parameters that are not set by default

This parameter must be set.
The objective of this parameters is to help to the CBO, when to use an index and
when not.
OPTIMIZER_INDEX_COST_ADJ, The default value is 100, which means that
INDEX access is 100% as an FULL TABLE SCAN access.
OPTIMIZER_INDEX_CACHING, A value of
100 infers that 100% of the index blocks are likely to be found in the buffer cache.
Try for OLTP
OPTIMIZER_INDEX_COST_ADJ = 10
OPTIMIZER_INDEX_CACHING = 90
For DSS
OPTIMIZER_INDEX_COST_ADJ = 50
OPTIMIZER_INDEX_CACHING = 90
More information read this document.
http://www.evdbt.com/SearchIntelligenceCBO.doc
12.6 Statistics
COMO SE QUE NO ESTA UTILIZANDO ESTADISTICAS ALMACENADAS O
OUTLINES ALMACENADOS?
First update the statistics.
To update old statistics execute

For all statistics connect as SYS and execute
begin
DBMS_STATS.GATHER_DATABASE_STATS();
end;
For specific statistics execute
BEGIN
**REViSAR
dbms_stats.GATHER_TABLE_STATS('MCDONAC','table','partition');
END;
You need to have updated statistics
After adding distinct values to a column that only has a value, for
example YEAR.
After adding an important amount of data
After creating and modifying indexes.
Periodically or when the performance is decreased.
12.6.1 Dont Use Analyze
Oracle Corporation strongly recommends that you use the
DBMS_STATS package rather than ANALYZE to collect optimizer
statistics. That package lets you collect statistics in parallel, collect
global statistics for partitioned objects, and fine tune your statistics
collection in other ways. Further, the cost-based optimizer will
eventually use only statistics that have been collected by DBMS_
STATS.
That dbms_stats gathers lots more stats then the analyze command does!
The analyze doesn't work 100% on partitions, for example -- analyze that table
and look at the LAST_ANALYZED in the USER_TAB_COLUMNS table -- it'll not
have changed (nor will the values there). Then, do it with dbms_stats and you'll
find that it does. See the e xample.

analyze table p1 partition (x1) compute statistics
*******************************************************
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.05 0 0 0 0
Execute 3 14.89 44.28 11089 1818 300 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 14.92 44.33 11089 1818 300 0
BEGIN dbms_stats.GATHER_TABLE_STATS('MCDONAC','P1','X1'); END;
*******************************************************
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 11 0.04 0.08 0 0 0 0
Execute 18 0.01 0.03 2 23 13 9
Fetch 15 50.47 198.18 48473 7511 1723 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 44 50.52 198.29 48475 7534 1736 19
so you do need to be a little wary before converting all of your analyze scripts
to use dbms_stats.
12.6.2 If you have a database that is OLTP in the day and DSS in the
night
Gather statistics during the day. Gathering ends after 720 minutes and is stored in
the mystats table:
BEGIN
DBMS_STATS.GATHER_SYSTEM_STATS(
gathering_mode => 'interval',
interval => 720,
stattab => 'mystats',
statid => 'OLTP');
END;
/
Generating Statistics
Gather statistics during the night. Gathering ends after 720 minutes and is stored in
the mystats table:
BEGIN
DBMS_STATS.GATHER_SYSTEM_STATS(
gathering_mode => 'interval',
interval => 720,
stattab => 'mystats',
statid => 'OLAP');
END;
/
If appropriate, you can switch between the statistics gathered. It is possible to
automate this process by submitting a job to update the dictionary with appropriate
statistics.
During the day, the following jobs import the OLTP statistics for the daytime run:
17
VARIABLE jobno number;
BEGIN
DBMS_JOB.SUBMIT(:jobno,
'DBMS_STATS.IMPORT_SYSTEM_STATS(''mystats'',''OLTP'');'
SYSDATE, 'SYSDATE + 1');
COMMIT;
END;
/
During the night, the following jobs import the OLAP statistics for the nighttime
run:
BEGIN
DBMS_JOB.SUBMIT(:jobno,
'DBMS_STATS.IMPORT_SYSTEM_STATS(''mystats'',''OLAP'');'
SYSDATE + 0.5, 'SYSDATE + 1');
COMMIT;
END;
/
12.6.3 Copy statistics from the production to development database
To see the performance problems you can have in production.
12.7 Execution Plan
12.7.1 First, verify that if you are using a index when you need it (few data
from the table) and a full scan (reading almost all data from the
table ) when you has to use
The concept is to verify that you queries are not doing a full scan when they need to
use and index, and they dont use an index when is better to do an index scan.
12.8 Indexes
12.8.1 See if could be a good idea to add column to indexes
To avoid the additional step of searching the data in the table, after oracle finds the
records in the index, you could analyze the possibility to add all the fields you use in
the index.
12.9 Querys
12.9.1 Include only the fields you need in the query
Dont do a SELECT * FROM when you are going to need only a few fields.
There is an advantage, if the fields you need are in the index, then you will save an
access to the table.
*********EXAMPLE EXAMPLE EXAMPLE
12.9.2 Verify it uses the best indexes
There are situations that using the index(a,b) is better than index(b,a), the point is not
only verify is uses the index, else the best index It can cause great improvements.
12.9.3 Views include where when is union of tables
If you dont have the partitioning feature, Oracle Standard Editoin, and uses a view
to join two tables.
Example text hicartera
12.9.4 Bind variables
To bind variables instead of execute( select * from a where field1=2 ), that creates
an execution plan for every value you query for field1.
You assign a variable ( select * from a where field1=BB ), then Oracle analyze the
select for this query an create an execution plan where field1 has any value, and
reuses this execution plan for every query same to this, except for the value that BBB
has.
You can use the CURSOR_SHARING = SIMILAR OR FORCE meanwhile you fix
the code, because while they solve some problems they can create another.
12.9.5 Conditions that requires full scans, and avoid the use of index
If you use the following conditions, even if you specify an index, the index will be
not used.
!=, you should change to a > or an IN condition
IS NULL, nulls are not used to be stored in indexes
ABS(X), when there is not an index based on AB(X)
You should create a function index

12.9.6 Few and not uniform valuesin columns
If you have for example a column with country, and there are several differences
between them, for example from US you have 1,000,000 and from Australia you has
100. You need for US a full scan and for Australia an index scan.
As in normal index search Oracle suppose all are uniformly distributed you will have
to do two things.
1. Obtain histogram statistics.
EXECUTE DBMS_STATS.GATHER_TABLE_STATS
(scott,emp, METHOD_OPT => FOR COLUMNS SIZE 10 sal);
2. Not use binding, for this column, you should specify the value.
Instead of select * from table where field = cVariable;
select * from table where field = 123;
or ***??? HINT for avoid use of binding
12.10 Locks
12.10.1 Don use pessimistic lock
Oracle has an intelligent and fast locking mechanism, usually you dont need to lock
a record or table to guarantee consistency of data.
12.10.2 Lock a record instead of a table
If you want to lock a table to avoid the same process be run twice at the same time,
you can lock a record in a locks table. That will be released at commid
12.11 DB programming
12.11.1 Create database packages, funcion and producedures if gather
data from the database in the server
Try always to avoid to create local packages, functions or procedures; when you
really dont need they to be local.
12.12 Test, because there are exceptions
No matter how reasonable is a solution, always there are exceptions, always test your
solutions, to see if you get what you want.
18
13 PL/SQL SECTION
19
14 Table Functions: A function that works as a table
14.1 What its for?
14.2 Syntax
14.2.1 PIPELINED Clause
14.2.2 PIPE ROW
14.2.3 Fetching from the Results of Table Functions
14.2.4 Passing Data with Cursor Variables
14.2.5 Performing DML Operations Inside Table Functions
14.2.6 Handling Exceptions in Table Functions
14.2.7 How Table Functions Stream their Input Data
14.2.8 Generic datatypes
14.3 Performance
14.3.1 Returning Large Amounts of Data from a Function
14.3.2 Optimizing Multiple Calls to Table Functions
14.3.3 Parallelizing Table Functions
14.4 Examples and techniques
14.4.1 The easiest example is this
14.4.2 Using a Pipelined Table Function as an Aggregate Function
14.4.3 Using CLOB as input
14.4.4 Assign the result of a table function to a PL/SQL collection
variable
14.4.5 To "pipe" an entire collection
14.5 Restrictions and Errors
14.6 Features by Release
14.6.1 9.0.1

14.1 What its for?
Table functions return a collection type instance representing rows in a table. They
can be queried like a table by calling the function in the FROM clause of a query,
enclosed by the TABLE keyword. They can be assigned to a PL/SQL collection
variable by calling the function in the SELECT list of a query.

A table function is defined as a function that can produce a set of rows as output.
Multiple rows to be returned from a function
Results of SQL subqueries (that select multiple rows) to be passed directly to
functions
Functions take cursors as input
Functions can be parallelized
Returning result sets incrementally for further processing as soon as they are
created. This is called incremental pipelining

Table functions can be defined in PL/SQL using a native PL/SQL interface, or in
Java or C using the Oracle Data Cartridge Interface (ODCI).

14.2 Syntax

CREATE FUNCTION f(p ref cursor type) RETURN rec_tab_type PIPELINED
PARALLEL_ENABLE(PARTITION p BY [{HASH | RANGE} (column list) |
ANY ]) IS
BEGIN ... END;

14.2.1 PIPELINED Clause
Data is said to be pipelined if it is consumed by a consumer (transformation) as soon
as the producer (transformation) produces it, without being staged in tables or a
cache before being input to the next transformation.

Pipelining enables a table function to return rows faster and can reduce the memory
required to cache a table function's results.

A pipelined table function can return the table function's result collection in subsets.
The returned collection behaves like a stream that can be fetched from on demand.
This makes it possible to use a table function like a virtual table.

Pipelined table functions can be implemented in two ways:
Native PL/SQL approach: The consumer and producers can run on separate
execution threads (either in the same or different process context) and
communicate through a pipe or queuing mechanism. This approach is similar to
co-routine execution.
Interface approach: The consumer and producers run on the same execution
thread. Producer explicitly returns the control back to the consumer after
producing a set of results. In addition, the producer caches the current state so
that it can resume where it left off when the consumer invokes it again. The
interface approach requires you to implement a set of well-defined interfaces in
a procedural language. For details on the interface approach, see the Data
Cartridges User's Guide.

Use PIPELINED to instruct Oracle to return the results of a table function iteratively.
A table function returns a collection type (a nested table or varray). You query table
functions by using the TABLE keyword before the function name in the FROM
clause of the query. For example:

SELECT * FROM TABLE(function_name(...))

Oracle then returns rows as they are produced by the function.
If you specify the keyword PIPELINED alone (PIPELINED IS ...), the PL/SQL
function body should use the PIPE keyword. This keyword instructs Oracle to return
single elements of the collection out of the function, instead of returning the whole
collection as a single value.

You can specify PIPELINED USING implementation_type clause if you want to
predefine an interface containing the start, fetch, and close operations. The
implementation type must implement the ODCITable interface, and must exist at the
time the table function is created. This clause is useful for table functions that will be
implemented in external languages such as C++ and Java.

If the return type of the function is SYS.AnyDataSet, then you must also define a
describe method (ODCITableDescribe) as part of the implementation type of the
function.

User-written table functions can appear in the statement's FROM list. These
functions act like source tables in that they output rows. Table functions are
initialized once during the statement at the start of each parallel execution process.
All variables are entirely private to the parallel execution process.
14.2.2 PIPE ROW
In PL/SQL, the PIPE ROW statement causes a table function to pipe a row and
continue processing. The statement enables a PL/SQL table function to return rows
as soon as they are produced. (For performance, the PL/SQL runtime system
provides the rows to the consumer in batches.) For example:

CREATE FUNCTION StockPivot(p refcur_pkg.refcur_t) RETURN TickerTypeSet
PIPELINED IS
out_rec TickerType := TickerType(NULL,NULL,NULL);
in_rec p%ROWTYPE;
BEGIN
LOOP
FETCH p INTO in_rec;
EXIT WHEN p%NOTFOUND;
-- first row
out_rec.ticker := in_rec.Ticker;
out_rec.PriceType := 'O';
out_rec.price := in_rec.OpenPrice;
PIPE ROW(out_rec);
-- second row
out_rec.PriceType := 'C';
out_rec.Price := in_rec.ClosePrice;
PIPE ROW(out_rec);
END LOOP;
CLOSE p;
RETURN;
END;
/

In the example, the PIPE ROW(out_rec) statement pipelines data out of the PL/SQL
table function. out_rec is a record, and its type matches the type of an element of the
output collection.

The PIPE ROW statement may be used only in the body of pipelined table functions;
an error is raised if it is used anywhere else. The PIPE ROW statement can be
omitted for a pipelined table function that returns no rows.

A pipelined table function must have a RETURN statement that does not return a
value. The RETURN statement transfers the control back to the consumer and
ensures that the next fetch gets a NO_DATA_FOUND exception.
14.2.3 Fetching from the Results of Table Functions

PL/SQL cursors and ref cursors can be defined for queries over table functions. For
example:

OPEN c FOR SELECT * FROM TABLE(f(...));
20

Cursors over table functions have the same fetch semantics as ordinary cursors. REF
CURSOR assignments based on table functions do not have any special semantics.

However, the SQL optimizer will not optimize across PL/SQL statements. For
example:

BEGIN
OPEN r FOR SELECT * FROM TABLE(f(CURSOR(SELECT * FROM tab)));
SELECT * BULK COLLECT INTO rec_tab FROM TABLE(g(r));
END;

does not execute as well as:

SELECT * FROM TABLE(g(CURSOR(SELECT * FROM
TABLE(f(CURSOR(SELECT * FROM tab))))));

This is so even ignoring the overhead associated with executing two SQL statements
and assuming that the results can be pipelined between the two statements.
14.2.4 Passing Data with Cursor Variables

You can pass a set of rows to a PL/SQL function in a REF CURSOR parameter. For
example, this function is declared to accept an argument of the predefined weakly
typed REF CURSOR type SYS_REFCURSOR:

FUNCTION f(p1 IN SYS_REFCURSOR) RETURN ... ;

Results of a subquery can be passed to a function directly:

SELECT * FROM TABLE(f(CURSOR(SELECT empno FROM tab)));

In the example above, the CURSOR keyword is required to indicate that the results
of a subquery should be passed as a REF CURSOR parameter.

A predefined weak REF CURSOR type SYS_REFCURSOR is also supported. With
SYS_REFCURSOR, you do not need to first create a REF CURSOR type in a
package before you can use it.

To use a strong REF CURSOR type, you still must create a PL/SQL package and
declare a strong REF CURSOR type in it. Also, if you are using a strong REF
CURSOR type as an argument to a table function, then the actual type of the REF
CURSOR argument must match the column type, or an error is generated. Weak
REF CURSOR arguments to table functions can only be partitioned using the
PARTITION BY ANY clause. You cannot use range or hash partitioning for weak
REF CURSOR arguments.
Example: Using Multiple REF CURSOR Input Variables

PL/SQL functions can accept multiple REF CURSOR input variables:

CREATE FUNCTION g(p1 pkg.refcur_t1, p2 pkg.refcur_t2) RETURN...
PIPELINED ... ;

Function g can be invoked as follows:

SELECT * FROM TABLE(g(CURSOR(SELECT empno FROM tab),
CURSOR(SELECT * FROM emp));

You can pass table function return values to other table functions by creating a REF
CURSOR that iterates over the returned data:

SELECT * FROM TABLE(f(CURSOR(SELECT * FROM TABLE(g(...)))));
Example: Explicitly Opening a REF CURSOR for a Query

You can explicitly open a REF CURSOR for a query and pass it as a parameter to a
table function:

BEGIN
OPEN r FOR SELECT * FROM TABLE(f(...));
-- Must return a single row result set.
SELECT * INTO rec FROM TABLE(g(r));
END;

In this case, the table function closes the cursor when it completes, so your program
should not explicitly try to close the cursor.
14.2.5 Performing DML Operations Inside Table Functions

To execute DML statements, a table function must be declared with the
autonomous transaction pragma . This pragma causes the function to execute in an
autonomous transaction not shared by other processes.

Use the following syntax to declare a table function with the autonomous transaction
pragma:

CREATE FUNCTION f(p SYS_REFCURSOR) return CollType PIPELINED IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN ... END;

During parallel execution, each instance of the table function creates an independent
transaction.
Performing DML Operations on Table Functions

Table functions cannot be the target table in UPDATE, INSERT, or DELETE
statements. For example, the following statements will raise an error:

UPDATE F(CURSOR(SELECT * FROM tab)) SET col = value;
INSERT INTO f(...) VALUES ('any', 'thing');

However, you can create a view over a table function and use INSTEAD OF triggers
to update it. For example:

CREATE VIEW BookTable AS
SELECT x.Name, x.Author
FROM TABLE(GetBooks('data.txt')) x;

The following INSTEAD OF trigger is fired when the user inserts a row into the
BookTable view:

CREATE TRIGGER BookTable_insert
INSTEAD OF INSERT ON BookTable
REFERENCING NEW AS n
FOR EACH ROW
BEGIN
...
END;
INSERT INTO BookTable VALUES (...);

INSTEAD OF triggers can be defined for all DML operations on a view built on a
table function.
14.2.6 Handling Exceptions in Table Functions

Exception handling in table functions works just as it does with ordinary user-
defined functions.

Some languages, such as C and Java, provide a mechanism for user-supplied
exception handling. If an exception raised within a table function is handled, the
table function executes the exception handler and continues processing. Exiting the
exception handler takes control to the enclosing scope. If the exception is cleared,
execution proceeds normally.

An unhandled exception in a table function causes the parent transaction to roll back.
14.2.7 How Table Functions Stream their Input Data

The way in which a table function orders or clusters rows that it fetches from cursor
arguments is called data streaming. A function can stream its input data in any of the
following ways:
Place no restriction on the ordering of the incoming rows
Order them on a particular key column or columns
Cluster them on a particular key

Clustering causes rows that have the same key values to appear together but does not
otherwise do any ordering of rows.

You control the behavior of the input stream using the ORDER BY or CLUSTER
BY clauses when defining the function.

Input streaming can be specified for either sequential or parallel execution of a
function.

If an ORDER BY or CLUSTER BY clause is not specified, rows are input in a
(random) order.
Note:

21
The semantics of ORDER BY are different for parallel execution from the semantics
of the ORDER BY clause in a SQL statement. In a SQL statement, the ORDER BY
clause globally orders the entire data set. In a table function, the ORDER BY clause
orders the respective rows local to each instance of the table function running on a
slave.

The following example illustrates the syntax for ordering the input stream. In the
example, function f takes in rows of the kind (Region, Sales) and returns rows of the
form (Region, AvgSales), showing average sales for each region.

CREATE FUNCTION f(p ref_cursor_type) RETURN tab_rec_type PIPELINED
CLUSTER p BY Region
PARALLEL_ENABLE(PARTITION p BY Region) IS
ret_rec rec_type;
cnt number;
sum number;
BEGIN
FOR rec IN p LOOP
IF (first rec in the group) THEN
cnt := 1;
sum := rec.Sales;
ELSIF (last rec in the group) THEN
IF (cnt <> 0) THEN
ret_rec.Region := rec.Region;
ret_rec.AvgSales := sum/cnt;
PIPE ROW(ret_rec);
END IF;
ELSE
cnt := cnt + 1;
sum := sum + rec.Sales;
END IF;
END LOOP;
RETURN;
END;
14.2.7.1 OLAP
SQL applications can use the database table functions to access and manipulate data
directly in the multidimensional OLAP data cache. Alternatively, relational views
can be created for multidimensional data, which provides access to standard SQL.

The OLAP_TABLE function extracts data from the LOBs in which workspace data
has been stored and presents the result set in the format of a relational table.
OLAP_TABLE is an implementation of the PL/SQL table functions.

The OLAP_TABLE function can be used in a SQL SELECT statement instead of, or
in addition to, the names of relational tables and views. It presents fully solved data
that is either stored or calculated in an analytic workspace. OLAP_TABLE accepts
parameters that are passed to the OLAP engine, which selects, manipulates, and
returns the data. The WHERE clause of a SELECT statement that includes a call to
OLAP_TABLE only needs to identify the result set; it does not need to perform any
calculations. If it does include calculations, they will be performed by the SQL
engine, not the OLAP engine.

SELECT statements that use OLAP_TABLE can be used during database
maintenance to create relational views, and they can be used interactively to fetch
data directly into an application.

Oracle OLAP runs within the Oracle database kernel. An Oracle OLAP session is
always connected to the database. You do not open a connection with the database as
a separate or optional step.

You can move data between an analytic workspace objects (such as variables and
dimensions) and relational tables in the following ways:
The OLAP DML's SQL command fetches data into dimensions and variables for
further manipulation. A new SQL IMPORT command facilitates bulk data transfer
from relational tables into the analytic workspace, and a new SQL INSERT DIRECT
command facilitates data transfer from the analytic workspace into relational tables.
A PL/SQL package, CWM2_OLAP_AW_CREATE, provides procedures for
creating an analytic workspace from relational tables and OLAP Catalog metadata,
and for generating views of the workspace.
Using SQL table functions, it is now possible for a SQL-based application to
manipulate and extract data from an analytic workspace. Express Server did not
permit a data transfer to be initiated externally.

ODBC is not available, and thus access to third-party databases is not available
directly from Oracle OLAP.

Oracle Express Relational Access Administrator and Oracle Express Relational
Access Manager are not available.
14.2.8 Generic datatypes
Oracle has three special SQL datatypes that enable you to dynamically encapsulate
and access type descriptions, data instances, and sets of data instances of any other
SQL type, including object and collection types. You can also use these three special
types to create anonymous (that is, unnamed) types, including anonymous collection
types. The types are SYS.ANYTYPE, SYS.ANYDATA, and SYS.ANYDATASET.
The SYS.ANYDATA type can be useful in some situations as a return value from
table functions.
See Also:
Oracle9i Supplied PL/SQL Packages and Types Reference for information about the
interfaces to the ANYTYPE, ANYDATA, and ANYDATASET types and about the
DBMS_TYPES package for use with these types.
14.3 Performance
14.3.1 Returning Large Amounts of Data from a Function

In a data warehousing environment, you might use a PL/SQL function to transform
large amounts of data. Perhaps the data is passed through a series of transformations,
each performed by a different function. In the past, such transformations required
either significant memory overhead, or storing the data in tables between each stage
of the transformation.

A low-overhead way to perform such transformations is to use PL/SQL table
functions. These functions can accept and return multiple rows, can return rows as
they are ready rather than all at once, and can be parallelized.

In this technique:
The producer function uses the PIPELINED keyword in its declaration.
The producer function uses an OUT parameter that is a record, corresponding to a
row in the result set.
As each output record is completed, it is sent to the consumer function using the
PIPE ROW keyword. As explain previously in PIPE ROW topic.
14.3.2 Optimizing Multiple Calls to Table Functions

Multiple invocations of a table function, either within the same query or in separate
queries result in multiple executions of the underlying implementation. By default,
there is no buffering or reuse of rows.

For example,

SELECT * FROM TABLE(f(...)) t1, TABLE(f(...)) t2
WHERE t1.id = t2.id;

SELECT * FROM TABLE(f());
SELECT * FROM TABLE(f());

However, if the output of a table function is determined solely by the values passed
into it as arguments, such that the function always produces exactly the same result
value for each respective combination of values passed in, you can declare the
function DETERMINISTIC, and Oracle will automatically buffer rows for it. Note,
though, that the database has no way of knowing whether a function marked
DETERMINISTIC really is DETERMINISTIC, and if one is not, results will be
unpredictable.
14.3.3 Parallelizing Table Functions

With parallel execution of a function that appears in the SELECT list, execution of
the function is pushed down to and conducted by multiple slave scan processes.
These each execute the function on a segment of the function's input data.

For a table function to be executed in parallel, it must have a partitioned input
parameter. Parallelism is turned on for a table function if, and only if, both the
following conditions are met:
The function has a PARALLEL_ENABLE clause in its declaration
Exactly one REF CURSOR argument is specified with a PARTITION BY
clause
If the PARTITION BY clause is not specified for any input REF CURSOR as part of
the PARALLEL_ENABLE clause, the SQL compiler cannot determine how to
partition the data correctly. Note that only strongly typed REF CURSOR arguments
can be specified in the PARTITION BY clause, unless you use PARTITION BY
ANY.

For example:

22
SELECT f(col1) FROM tab;

is parallelized if f is a pure function. The SQL executed by a slave scan process is
similar to:

SELECT f(col1) FROM tab WHERE ROWID BETWEEN :b1 AND :b2;

Each slave scan operates on a range of rowids and applies function f to each
contained row. Function f is then executed by the scan processes; it does not run
independently of them.

Unlike a function that appears in the SELECT list, a table function is called in the
FROM clause and returns a collection. This affects the way that table function input
data is partitioned among slave scans because the partitioning approach must be
appropriate for the operation that the table function performs. (For example, an
ORDER BY operation requires input to be range-partitioned, whereas a GROUP BY
operation requires input to be hash partitioned.)

A table function itself specifies in its declaration the partitioning approach that is
appropriate for it. (See "Input Data Partitioning".) The function is then executed in a
two-stage operation. First, one set of slave processes partitions the data as directed in
the function's declaration; then a second set of slave processes executes the table
function in parallel on the partitioned data.

For example, the table function in the following query has a REF CURSOR
parameter:

SELECT * FROM TABLE(f(CURSOR(SELECT * FROM tab)));

The scan is performed by one set of slave processes, which redistributes the rows
(based on the partitioning method specified in the function declaration) to a second
set of slave processes that actually executes function f in parallel.

The table function declaration can specify data partitioning for exactly one REF
CURSOR parameter. The syntax to do this is as follows:

The PARTITION...BY phrase in the PARALLEL_ENABLE clause specifies which
one of the input cursors to partition and what columns to use for partitioning.

When explicit column names are specified in the column list, the partitioning method
can be RANGE or HASH. The input rows will be hash- or range-partitioned on the
columns specified.

The ANY keyword indicates that the function behavior is independent of the
partitioning of the input data. When this keyword is used, the runtime system
randomly partitions the data among the slaves. This keyword is appropriate for use
with functions that take in one row, manipulate its columns, and generate output
row(s) based on the columns of this row only.

For example, the pivot-like function StockPivot shown below takes as input a row of
the type:

(Ticker varchar(4), OpenPrice number, ClosePrice number)

and generates rows of the type:

(Ticker varchar(4), PriceType varchar(1), Price number).

So the row ("ORCL", 41, 42) generates two rows ("ORCL", "O", 41) and ("ORCL",
"C", 42).

CREATE FUNCTION StockPivot(p refcur_pkg.refcur_t) RETURN rec_tab_type
PIPELINED
PARALLEL_ENABLE(PARTITION p BY ANY) IS
ret_rec rec_type;
BEGIN
FOR rec IN p LOOP
ret_rec.Ticker := rec.Ticker;
ret_rec.PriceType := "O";
ret_rec.Price := rec.OpenPrice;
PIPE ROW(ret_rec);
ret_rec.Ticker := rec.Ticker; -- Redundant; not required
ret_rec.PriceType := "C";
ret_rec.Price := rec.ClosePrice;
push ret_rec;
END LOOP;
RETURN;
END;

The function f can be used to generate another table from Stocks table in the
following manner:

INSERT INTO AlternateStockTable
SELECT * FROM
TABLE(StockPivot(CURSOR(SELECT * FROM StockTable)));

If the StockTable is scanned in parallel and partitioned on OpenPrice, then the
function StockPivot is combined with the data-flow operator doing the scan of
StockTable and thus sees the same partitioning.

If, on the other hand, the StockTable is not partitioned, and the scan on it does not
execute in parallel, the insert into AlternateStockTable also runs sequentially. Here is
a slightly more complex example:

INSERT INTO AlternateStockTable
SELECT *
FROM TABLE(f(CURSOR(SELECT * FROM Stocks))),
TABLE(g(CURSOR( ... )))
WHERE join_condition;

where g is defined to be:

CREATE FUNCTION g(p refcur_pkg.refcur_t) RETURN ... PIPELINED
PARALLEL_ENABLE (PARTITION p BY ANY)
BEGIN ... END;

If function g runs in parallel and is partitioned by ANY, then the parallel insert can
belong in the same data-flow operator as g.

Whenever the ANY keyword is specified, the data is partitioned randomly among the
slaves. This effectively means that the function is executed in the same slave set
which does the scan associated with the input parameter.

No redistribution or repartitioning of the data is required here. If the cursor p itself is
not parallelized, the incoming data is randomly partitioned on the columns in the
column list. The round-robin table queue is used for this partitioning.
Parallel Execution of Leaf-level Table Functions

To use parallel execution with a function that produces multiple rows, but does not
need to accept multiple rows as input and so does not require a REF CURSOR,
arrange things so as to create a need for a REF CURSOR. That way, the function will
have some way to partition the work.

For example, suppose that you want a function to read a set of external files in
parallel and return the records they contain. To provide work for a REF CURSOR,
you might first create a table and populate it with the filenames. A REF CURSOR
over this table can then be passed as a parameter to the table function (readfiles). The
following code shows how this might be done:

CREATE TABLE filetab(filename VARCHAR(20));
INSERT INTO filetab VALUES('file0');
INSERT INTO filetab VALUES('file1');
...
INSERT INTO filetab VALUES('fileN');
SELECT * FROM
TABLE(readfiles(CURSOR(SELECT filename FROM filetab)));
CREATE FUNCTION readfiles(p pkg.rc_t) RETURN coll_type
PARALLEL_ENABLE(PARTITION p BY ANY) IS
ret_rec rec_type;
BEGIN
FOR rec IN p LOOP
done := FALSE;
WHILE (done = FALSE) LOOP
done := readfilerecord(rec.filename, ret_rec);
PIPE ROW(ret_rec);
END LOOP;
END LOOP;
RETURN;
END;
14.3.3.1 Choosing Between Partitioning and Clustering for Parallel
Execution

Partitioning and clustering are easily confused, but they do different things. For
example, sometimes partitioning can be sufficient without clustering in parallel
execution.
23

Consider a function SmallAggr that performs in-memory aggregation of salary for
each department_id, where department_id can be either 1, 2, or 3. The input rows to
the function can be partitioned by HASH on department_id such that all rows with
department_id equal to 1 go to one slave, all rows with department_id equal to 2 go
to another slave, and so on.

The input rows do not need to be clustered on department_id to perform the
aggregation in the function. Each slave could have a 1x3 array SmallSum[1..3] in
which the aggregate sum for each department_id is added in memory into
SmallSum[department_id]. On the other hand, if the number of unique values of
department_id were very large, you would want to use clustering to compute
department aggregates and write them to disk one department_id at a time.
14.4 Examples and techniques
14.4.1 The easiest example is this:

CREATE OR REPLACE TYPE TFunctionTable AS TABLE OF INTEGER;
/
CREATE OR REPLACE FUNCTION FunctionTable( nCount NUMBER)
RETURN TFunctionTable PIPELINED
IS
BEGIN
FOR nI IN 1.. nCount LOOP
PIPE ROW(nI);
END LOOP;
RETURN;
END;
/
SELECT * FROM TABLE(FunctionTable(3));
COLUMN_VALUE
--------------
1
2
3
14.4.2 Using a Pipelined Table Function as an Aggregate Function

A table function can compute aggregate results using the input ref cursor. The
following example computes a weighted average by iterating over a set of input
rows.

DROP TABLE gradereport;
CREATE TABLE gradereport (student VARCHAR2(30), subject
VARCHAR2(30), weight NUMBER, grade NUMBER);

INSERT INTO gradereport VALUES('Mark', 'Physics', 4, 4);
INSERT INTO gradereport VALUES('Mark','Chemistry', 4,3);
INSERT INTO gradereport VALUES('Mark','Maths', 3,3);
INSERT INTO gradereport VALUES('Mark','Economics', 3,4);
CREATE OR replace TYPE gpa AS TABLE OF NUMBER;
/
CREATE OR replace FUNCTION weighted_average(input_values
sys_refcursor)
RETURN gpa PIPELINED IS
grade NUMBER;
total NUMBER := 0;
total_weight NUMBER := 0;
weight NUMBER := 0;
BEGIN
-- The function accepts a ref cursor and loops through all the input
rows.
LOOP
FETCH input_values INTO weight, grade;
EXIT WHEN input_values%NOTFOUND;
-- Accumulate the weighted average.
total_weight := total_weight + weight;
total := total + grade*weight;
END LOOP;
PIPE ROW (total / total_weight);
-- The function returns a single result.
RETURN;
END;
/
show errors;
-- The result comes back as a nested table with a single row.
-- COLUMN_VALUE is a keyword that returns the contents of a nested
table.
select weighted_result.column_value from
table(weighted_average(cursor(select weight,grade from
gradereport))) weighted_result;
COLUMN_VALUE
------------
3.5
14.4.3 Using CLOB as input
The following example shows a table function GetBooks that takes a CLOB as input
and returns an instance of the collection type BookSet_t. The CLOB column stores a
catalog listing of books in some format (either proprietary or following a standard
such as XML). The table function returns all the catalogs and their corresponding
book listings.

The collection type BookSet_t is defined as:
CREATE TYPE Book_t AS OBJECT
( name VARCHAR2(100),
author VARCHAR2(30),
abstract VARCHAR2(1000));
CREATE TYPE BookSet_t AS TABLE OF Book_t;

The CLOBs are stored in a table Catalogs:

CREATE TABLE Catalogs
( name VARCHAR2(30),
cat CLOB);

Function GetBooks is defined as follows:

CREATE FUNCTION GetBooks(a CLOB) RETURN BookSet_t;

The query below returns all the catalogs and their corresponding book listings.

SELECT c.name, Book.name, Book.author, Book.abstract
FROM Catalogs c, TABLE(GetBooks(c.cat)) Book;
Example: Assigning the Result of a Table Function

14.4.4 Assign the result of a table function to a PL/SQL collection
variable.
The table function is called from the SELECT list of the query, you do not need the
TABLE keyword.

create type numset_t as table of number;
/
create function f1(x number) return numset_t pipelined is
begin
for i in 1..x loop
pipe row(i);
end loop;
return;
end;
/
-- pipelined function in from clause
select * from table(f1(3));
COLUMN_VALUE
------------
1
2
3
3 rows selected.
-- pipelined function in select list
select f1(3) from dual;
F1(3)
---------------------------------
NUMSET_T(1, 2, 3)
-- Since the function returns a collection, we can assign
-- the result to a PL/SQL variable.
declare
func_result numset_t;
begin
select f1(3) into func_result from dual;
end;
/

14.4.5 To "pipe" an entire collection
To "pipe" an entire collection, you do not need to use a pipelined function at
all. Just select * from function_returning_a_collection.
create type myRecordType as object
( seq int,
24
a int,
b varchar2(10),
c date
)
/
create or replace type myTableType
as table of myRecordType
/
create or replace function my_func( p_inputs in number default 5 ) return
myTableType
is
l_data myTableType;
begin
l_data := myTableType();
for i in 1..p_inputs
loop
l_data.extend;
l_data(i) := myRecordType( i, mod(i,5), 'row ' || i, sysdate+i );
end loop;
return l_data;
end;
/
select *
from TABLE( cast( my_func as myTableType ) )
/
variable x number
exec :x := 5
select *
from TABLE( cast( my_func(:x) as myTableType ) )
/
14.4.6 Pipelined functions in a package
So, if it is your desire to have the types created inside the package -- that
never has been and most likely never will be "supported". We need to use SQL
types.

But -- just creating a pipelined in a package is easy:
[email protected]> create or replace type myScalarType as object
2 ( a int,
3 b date,
4 c varchar2(25)
5 )
6 /
Type created.
[email protected]>
[email protected]> create or replace type myTableType as table of
myScalarType
2 /
Type created.
[email protected]>
[email protected]>
[email protected]> create or replace package my_pkg
2 as
3 function f return myTableType PIPELINED;
4 end;
5 /
Package created.
[email protected]> create or replace package body my_pkg
2 as
3 function f return myTableType
4 PIPELINED
5 is
6 begin
7 for i in 1 .. 5
8 loop
9 pipe row ( myScalarType( i, sysdate+i, 'row ' || i
) );
10 end loop;
11 return;
12 end;
13 end;
14 /
Package body created.
[email protected]>
[email protected]> select * from table( my_pkg.f() );
A B C
---------- --------- -------------------------
1 29-JUN-02 row 1
2 30-JUN-02 row 2
3 01-JUL-02 row 3
4 02-JUL-02 row 4
5 03-JUL-02 row 5
[email protected]>
14.5 Restrictions and Errors
14.6 Features by Release
14.6.1 9.0.1
Prior to Oracle9i, PL/SQL functions:
Could not take cursors as input
Could not be parallelized or pipelined
Starting with Oracle9i, functions are not limited in these ways. Table functions
extend database functionality by allowing:
Multiple rows to be returned from a function
Results of SQL subqueries (that select multiple rows) to be passed directly to
functions
Functions take cursors as input
Functions can be parallelized
Returning result sets incrementally for further processing as soon as they are created.
This is called incremental pipelining


25
15 PL/SQL tricks
Hi there, here are only the code, there are not too much explanation, because the goal
here is only to show an example.
15.1 How to get a table from a file
Ejemplo que habilita alert.log como tabla
15.2 Dynamic SQL
15.3 How to get an IF .. THEN IN A SELECT,UPDATE, INSERT (DML)
STATEMENT
15.3.1 DECODE
15.3.2 CASE
Oracle now supports simple and searched CASE statements. CASE statements are
similar in purpose to the Oracle DECODE statement, but they offer more flexibility
and logical power. They are also easier to read than traditional DECODE statements,
and offer better performance as well. They are commonly used when breaking
categories into buckets like age (for example, 20-29, 30-39, and so on). The syntax
for simple statements is:
expr WHEN comparison_expr THEN return_expr
[, WHEN comparison_expr THEN return_expr]...
The syntax for searched statements is:
WHEN condition THEN return_expr [, WHEN condition THEN
return_expr]...
You can specify only 255 arguments and each WHEN ... THEN pair counts as two
Arguments
Using a CASE expression lets you avoid developing custom functions and can also
perform faster. The query using functions has performance implications because it
needs to invoke a function for each row. Writing custom functions can also add to
the development load.
CASE Example
Suppose you wanted to find the average salary of all employees in the company. If
an employee's salary is less than $2000, you want the query to use $2000 instead.
With a CASE statement, you would have to write this query as follows,
SELECT AVG(foo(e.sal)) FROM emps e;
In this, foo is a function that returns its input if the input is greater than 2000, and
returns 2000 otherwise.
Using CASE expressions in the database without PL/SQL, this query can be
rewritten as:
SELECT AVG(CASE when e.sal > 2000 THEN e.sal ELSE 2000 end) FROM
emps e;

perform faster.
Creating Histograms With User-Defined Buckets
You can use the CASE statement when you want to obtain histograms with
user-defined buckets (both in number of buckets and width of each bucket). The
following are two examples of histograms created with CASE statements. In the first
example, the histogram totals are shown in multiple columns and a single row is
returned. In the second example, the histogram is shown with a label column and a
single column for totals, and multiple rows are returned.
SELECT
SUM(CASE WHEN cust_credit_limit BETWEEN 0 AND 3999 THEN 1 ELSE 0
END)
AS "0-3999",
SUM(CASE WHEN cust_credit_limit BETWEEN 4000 AND 7999 THEN 1 ELSE
0 END)
AS "4000-7999",
SUM(CASE WHEN cust_credit_limit BETWEEN 8000 AND 11999 THEN 1
ELSE 0 END)
AS "8000-11999",
SUM(CASE WHEN cust_credit_limit BETWEEN 12000 AND 16000 THEN 1
ELSE 0 END)
AS "12000-16000"
FROM customers WHERE cust_city='Marshal';
0-3999 4000-7999 8000-11999 12000-16000
--------- --------- ---------- -----------
6 6 4 1
SELECT
(CASE WHEN cust_credit_limit BETWEEN 0 AND 3999
THEN ' 0 - 3999'
WHEN cust_credit_limit BETWEEN 4000 AND 7999 THEN ' 4000 - 7999'
WHEN cust_credit_limit BETWEEN 8000 AND 11999 THEN ' 8000 -
11999'
WHEN cust_credit_limit BETWEEN 12000 AND 16000 THEN '12000 -
16000' END)
AS BUCKET,
COUNT(*) AS Count_in_Group
FROM customers.WHERE cust_city = 'Marshal'
GROUP BY
(CASE WHEN cust_credit_limit BETWEEN 0 AND 3999
THEN ' 0 - 3999'
WHEN cust_credit_limit BETWEEN 4000 AND 7999 THEN ' 4000 - 7999'
WHEN cust_credit_limit BETWEEN 8000 AND 11999 THEN ' 8000 -
11999'
WHEN cust_credit_limit BETWEEN 12000 AND 16000 THEN '12000 -
16000'
END);
CASE Expressions
19-46 Oracle9 i Data Warehousing Guide
BUCKET COUNT_IN_GROUP
------------- --------------
0 - 3999 6
4000 - 7999 6
8000 - 11999 4
12000 - 16000 1


26
16 Working with NULL values
16.1 What is it for
Null is other kind of value
If your salary is 1000, you say 1000
If you salary is 0, you say 0
If you dont entered the salary, it is NULL (unknown)
If you ask salary < 500, it means it will ignore the null values.
If you do a select count(salary) from table, it will not include the records with null
values in the column salary.
Null have several features you must know
16.2 Nulls Indicate Absence of Value
A null is the absence of a value in a column of a row. Nulls indicate missing,
unknown, or inapplicable data. A null should not be used to imply any other value,
such as zero. A column allows nulls unless a NOT NULL or PRIMARY KEY
integrity constraint has been defined for the column, in which case no row can be
inserted without a value for that column.
Nulls are stored in the database if they fall between columns with data values. In
these cases they require 1 byte to store the length of the column (zero).
Trailing nulls in a row require no storage because a new row header signals that the
remaining columns in the previous row are null. For example, if the last three
columns of a table are null, no information is stored for those columns. In tables with
many columns, the columns more likely to contain nulls should be defined last to
conserve disk space.
Most comparisons between nulls and other values are by definition neither true nor
false, but unknown. To identify nulls in SQL, use the IS NULL predicate. Use the
SQL function NVL to convert nulls to non-null values.
All aggregate functions except COUNT(*) and GROUPING ignore null values.
16.3 Indexes and Nulls
Oracle does not index table rows in which all key columns are NULL, except in the
case of bitmap indexes or when the cluster key column value is NULL.
To make compressed bitmaps as small as possible, declare NOT NULL
constraints on all columns that cannot contain null values.
With normal indexes, the query must be guaranteed not to need any NULL values
from the indexed expression, because NULL values are not stored in indexes.
NULL values in indexes are considered to be distinct except when all the non-NULL
values in two or more rows of an index are identical, in which case the rows are
considered to be identical. Therefore, UNIQUE indexes prevent rows containing
NULL values from being treated as identical. This does not apply if there are no non-
NULL values--in other words, if the rows are entirely NULL.
16.4 Indexing null values
Because NULL values are not indexed and
If you do a search it will do a FULLSCAN, and will not use the index.
WHERE FIELD IS NULL, you get a read to the whole table
But you can create a function index
CREATE INDEX idxnull ON table ( NVL(FIELD,'~') ASC )
WHERE NVL(FIELD,~) IS NULL, you get a read through an index
16.5 Functions to work with null values
16.5.1 COALESCE(EXPR1,EXPR2,)
COALESCE returns the first non-null expr in the expression list. At least one expr
must not be the literal NULL. If all occurrences of expr evaluate to null, then the
function returns null.
The following example uses the sample oe.product_information table to
organize a "clearance sale" of products. It gives a 10% discount to all products with
a list price. If there is no list price, then the sale price is the minimum price. If there
is no minimum price, then the sale price is "5":
SELECT product_id, list_price, min_price,
COALESCE(0.9*list_price, min_price, 5) "Sale"
FROM product_information
WHERE supplier_id = 102050;
PID PRICE MIN_PRICE Sale
---------- ---------- ---------- ----------
2382 850 731 765
3355 5
1770 73 73
2378 305 247 274.5
1769 48 43.2
16.5.2 NULLIF( EXPR1, EXPR2)
NULLIF compares expr1 and expr2. If they are equal, then the function returns
null. If they are not equal, then the function returns expr1. You cannot specify the
literal NULL for expr1.
SELECT NULLIF( 1,1 ) -> null
SELECT NULLIF( 3,1 ) -> 3
16.5.3 NVL(EXPR1,EXPR2)
NVL lets you replace a null (blank) with a string in the results of a query. If expr1 is
null, then NVL returns expr2. If expr1 is not null, then NVL returns expr1. The
arguments expr1 and expr2 can have any datatype. If their datatypes are
different, then Oracle converts expr2 to the datatype of expr1 before comparing
them.
The datatype of the return value is always the same as the datatype of expr1,
unless expr1 is character data, in which case the return values datatype is
VARCHAR2 and is in the character set of expr1.
SELECT NVL( null,1 ) -> 1
SELECT NVL( 3,1 ) -> 3
16.5.4 NLV2 (EXPR1,EXPR2,EXPR3)
NVL2 lets you determine the value returned by a query based on whether a
specified expression is null or not null. If expr1 is not null, then NVL2 returns
expr2. If expr1 is null, then NVL2 returns expr3. The argument expr1 can have
any datatype. The arguments expr2 and expr3 can have any datatypes except
LONG.
If the datatypes of expr2 and expr3 are different, then Oracle converts expr3 to
the datatype of expr2 before comparing them unless expr3 is a null constant. In
that case, a datatype conversion is not necessary.
The datatype of the return value is always the same as the datatype of expr2,
unless expr2 is character data, in which case the return values datatype is
VARCHAR2.
SELECT NVL2( null,1,2 ) -> 2
SELECT NVL2( 3,1,2 ) -> 1
16.5.5 IS NULL & IS NOT NULL and other operators
IS NULL and IS NOT NULL are operators to identify null and non null values.
If A is: Condition Evaluates to:
10 a IS NULL FALSE
10 a IS NOT NULL TRUE
NULL a IS NULL TRUE
NULL a IS NOT NULL FALSE
10 a = NULL UNKNOWN
10 a != NULL UNKNOWN
NULL a = NULL UNKNOWN
NULL a != NULL UNKNOWN
NULL a = 10 UNKNOWN
NULL a != 10 UNKNOWN

x Y x AND y x OR y NOT x
TRUE TRUE TRUE TRUE FALSE
TRUE FALSE FALSE TRUE FALSE
TRUE NULL NULL TRUE FALSE
FALSE TRUE FALSE TRUE TRUE
FALSE FALSE FALSE FALSE TRUE
FALSE NULL FALSE NULL TRUE
NULL TRUE NULL TRUE NULL
NULL FALSE FALSE NULL NULL
NULL NULL NULL NULL NULL

16.6 Handling Null Values in Comparisons and Conditional
Statements
When working with nulls, you can avoid some common mistakes by keeping in mind
the following rules:
Comparisons involving nulls always yield NULL
Applying the logical operator NOT to a null yields NULL
In conditional control statements, if the condition yields NULL, its
associated sequence of statements is not executed
If the expression in a simple CASE statement or CASE expression yields
NULL, it cannot be matched by using WHEN NULL. In this case, you
would need to use the searched case syntax and test WHEN expression IS
NULL.
27
In the example below, you might expect the sequence of statements to
execute because x and y seem unequal. But, nulls are indeterminate.
Whether or not x is equal to y is unknown. Therefore, the IF condition
yields NULL and the sequence of statements is bypassed.
x := 5;
y := NULL;
IF x != y THEN -- yields NULL, not TRUE
sequence_of_statements; -- not executed
END IF;

In the next example, you might expect the sequence of statements to execute because
a and b seem equal. But, again, that is unknown, so the IF condition yields NULL
and the sequence of statements is bypassed.
a := NULL;
b := NULL;
...
IF a = b THEN -- yields NULL, not TRUE
sequence_of_statements; -- not executed
END IF;
16.7 Table Constraints
16.7.1 Primary Key constraints,
by definition, a primary key constraint is an unique identifier for the row, so it dont
accept nulls. You cant identify a record, with a I dont know value in none of the
fields that composes the primary key constraint.
16.7.2 Unique Constraints,
allow unlimited number of null values in all the fields it is composed. If one of the
fields that composes the unique constraint is null and the other no, then you can
repeat that value.
16.8 LOBs
If a LOB column is NULL, no data blocks are used to store the information. The
NULL value is stored in the row just like any other NULL value. This is true even
when you specify DISABLE STORAGE IN ROW for the LOB.
If a LOB column is initialized with EMPTY_CLOB() or EMPTY_BLOB(), instead
of NULL, a LOB locator is stored in the row. No additional storage is used.
16.9 Examples
16.9.1 Including null values in a query or condition
If you want for example query the records with salary < 100 and null then you must
add a condition
SELECT * FROM TABLE WHERE SALARY <100 OR SALARY IS NULL
16.9.2 Update Statements and null values
When updating a column with an update-statement, the value of some records
(records that don't need to be updated), are changed into the value NULL. I use
the next statement:
update table name B
set columnname =
( select value
from lookup O
where B.keyname = O.keyname
and O.Othercolumn = Other_value);

As a result all the necessary changes are made, but also the records that don't
need to be updated: they get the Null-value. Is there a way of avoiding this,
because we do need to update the records frequently, but not all records at the
same time.
Is there a kind of workaround we can use for updating the records that need to
be updated without changing the other records too with a Null value?

and we said...
There are at least 2 ways to perform this sort of co-related update correctly.
I'll show my preferred method (update a join) and then another method that'll
work if you cannot put a unique constraint on LOOKUP(keyname) (which is needed
for the join update).

Here are the test tables:
[email protected]> create table name
2 ( keyname int,
3 columnName varchar2(25)
4 )
5 /
Table created.
[email protected]> create table lookup
2 ( keyname int PRIMARY KEY,
3 value varchar2(25),
4 otherColumn int
5 )
6 /
Table created.

[email protected]> insert into name values ( 100, 'Original Data' );
1 row created.

[email protected]> insert into name values ( 200, 'Original Data' );
1 row created.

[email protected]> insert into lookup values ( 100, 'New Data', 1 );
1 row created.

[email protected]> commit;
Commit complete.


here is the "other_value" parameter you are using in the above update you
attempted...

[email protected]> variable other_value number
[email protected]> exec :other_value := 1
PL/SQL procedure successfully completed.

[email protected]> select * from name;

KEYNAME COLUMNNAME
---------- -------------------------
100 Original Data
200 Original Data


Here we update a join. We can only modify the columns in one of the tables
and the other tables we are *NOT* modifying must be "key preserved" -- that is,
we must be able to verify that at most one record will be returned when we join
NAME to this other table. In order to do that, keyname in LOOKUP must either be
a primary key or have a unique constraint applied to it...

[email protected]> update
2 ( select columnName, value
3 from name, lookup
4 where name.keyname = lookup.keyname
5 and lookup.otherColumn = :other_value )
6 set columnName = value
7 /

1 row updated.

[email protected]> select * from name;

KEYNAME COLUMNNAME
---------- -------------------------
100 New Data
200 Original Data

See, the other data is untouched and only the rows we wanted are updated..

[email protected]> rollback;
Rollback complete.

[email protected]> select * from name;

KEYNAME COLUMNNAME
---------- -------------------------
100 Original Data
200 Original Data

Now, this way will work with no constraints on anything -- you do not need the
primary key/unique constraint on lookup (but you better be sure the subquery
returns 0 or 1 records!).

It is very much like your update, just has a where clause so that only rows that
we find matches for are actually updated...

[email protected]> update name
2 set columnName = ( select value
3 from lookup
4 where lookup.keyname = name.keyname
5 and otherColumn = :other_value )
6 where exists ( select value
7 from lookup
8 where lookup.keyname = name.keyname
9 and otherColumn = :other_value )
10 /

1 row updated.

[email protected]> select * from name;

28
KEYNAME COLUMNNAME
---------- -------------------------
100 New Data
200 Original Data
29
17 Getting more from a Query: Analytic functions

1 What its for?
2 Processing Order for analytic functions
3 Syntax
3.1 Analytic functions
4 Query partition clause
5 Order by clause
6 Windowing clause
7 More Examples
8 Restrictions and Errors
9 Features by Release
10 Bugs by Release
11 Bibliography
17.1 What its for?
Analytic functions computes an aggregate value based on a (group of rows) (Not for
each group as in aggregate functions).
You can use it to:
Relate the current row with other rows within the same select without a
join clause.
In the same select clause, analyze the data with different ordering and
grouping.
They enable you to calculate:
Rankings and percentiles
Moving window calculations
Lag/lead analysis
First/last analysis
Linear regression statistics
17.2 Processing Order for analytic functions
1) All joins, WHERE, GROUP BY and HAVING
2) Analytic functions
3)ORDER BY
Note. Analytic functions can be parallelized.
17.3 Syntax

The analytic_clause is composed of
=
You can execute several analytic functions, which runs over a specific group of records that has
an specific order.

Note: Examples are un over the table test
FA FB
---------- ----------
A 1
A 3
A 2
B 4
B 5
B (NULL)
17.3.1 Analytic functions
You have the following functions
AVG * average value
CORR * coefficient of correlation of a set of number pairs.
COVAR_POP * population covariance of a set of number pairs.
COVAR_SAMP * sample covariance of a set of number pairs.

COUNT * number of rows

CUME_DIST cumulative distribution of a value in a group of
values.

DENSE_RANK rank of a row in an ordered group of rows.
FIRST first Row
FIRST_VALUE * first value
LAG provides access to previous rows from the current
row without a self join.
LAST Last Row
LAST_VALUE * Last value
LEAD provides access to following rows from the current
row without a self join.
MAX * maximum
MIN * minimum
NTILE divides an ordered dataset into a number of
buckets indicated
PERCENT_RANK is similar to the CUME_DIST
PERCENTILE_CONT an inverse distribution function that assumes a
continuous
distribution model.
PERCENTILE_DISC inverse distribution function that assumes a
discrete
distribution model.
RANK rank of a value in a group of values.
RATIO_TO_REPORT computes the ratio of a value in relation to a set of
values.
REGR_ linear regression functions
ROW_NUMBER assigns a unique number to each row to which it is
applied
STDDEV * sample standard deviation of expr, a set of
numbers.
STDDEV_POP * population standard deviation
STDDEV_SAMP * cumulative sample standard deviation
SUM * sum
VAR_POP * population variance
VAR_SAMP * sample variance
VARIANCE * returns variance
(*) allow the full syntax, including the windowing_clause.

You can specify OVER analytic_clause with user-defined
analytic functions as well as built-in analytic functions.
17.4 Query partition clause
The query partition clause allow establish different partition that will be analyzed
independently.
For example to get the sum over the partition identified by column FA
select fa,fb, sum(fb) over( partition by fa ) sum
from test;
FA FB SUM
--------------------------------
A 1 6 = 1+3+2
A 3 6
A 2 6
B 4 9 = 4+5+NULL
B 5 9
B 9
NOTE.- Until you dont specify an order you dont see the sum accumulative.
17.5 Order by clause

Order by specifies the order in which the data will be analyzed, as you can see once
you include the order clause the data is analyzed per row (order by clause must be
specified), and the order changes completely the analyzed data you can get.
Within each function, you can specify multiple ordering expressions. Doing so is
especially useful when using functions that rank values.

select fa,fb, sum(fb) over( partition by fa order by fb desc ) sum
from test;
FA FB SUM
-------------------------------
A 3 3
A 2 5
A 1 6
B
B 5 5
B 4 9
select fa,fb, sum(fb) over( partition by fa order by fb ASC nulls first ) sum
from test;
FA FB SUM
-----------------------------
A 1 1
A 2 3
A 3 6
B
B 4 4
B 5 9

Now using multiple ordering clause
30
select fa,fb,
sum(fb) over( partition by fa order by fb desc NULLS LAST ) sumA,
sum(fb) over( partition by fa order by fb ASC nulls first ) sumB
from test
ORDER BY FA,FB;

FA FB SUMA SUMB
A 1 6 1
A 2 5 3
A 3 3 6
B 4 9 4
B 5 5 9
B 9
SIBLINGS,
If you specify a hierarchical ( CONNECT BY ) query and also specify the ORDER
BY clause, then the ORDER BY clause takes precedence over any ordering specified
by the hierarchical query, unless you specify the SIBLINGS keyword in the ORDER
BY clause.
NULLS FIRST|LAST
you can choose to specify the position of the nulls.
17.6 Windowing clause
You can define a sliding window of data. This window determines the range of rows
used to perform the calculations for the current row. Window sizes can be based on
either a physical number of rows or a logical interval such as time. The window has a
starting row and an ending row. Depending on its definition, the window may move
at one or both ends.



ROWS. You specify the number of physical rows
RANGE. You specify a range that must be number or date, because it substracts
(preceding) or adds (following) that number to fields specified.

BETWEEN. Allow specify a range
UNBOUNDED PRECEDING specify the first record
CURRENT ROW specify the current record
PRECEDING N specify the number of (records/value) preceding
FOLLOWING specify the number of (records/value) following
UNBOUNDED FOLLOWING specify the last record

We filter the query to show only impair values, to see the difference between rows
and range
In this example you can compare the distinct results you can get.
select fa,fb,
sum(fb) over( partition by fa order by fb desc ROWS BETWEEN CURRENT ROW
AND CURRENT ROW) sumROWC,
sum(fb) over( partition by fa order by fb desc RANGE BETWEEN CURRENT
ROW AND CURRENT ROW) sumRANC,

sum(fb) over( partition by fa order by fb desc ROWS BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING) sumROW,
sum(fb) over( partition by fa order by fb desc RANGE BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING) sumRAN,

sum(fb) over( partition by fa order by fb desc ROWS BETWEEN 1 PRECEDING
AND CURRENT ROW ) sum1ROWP,
sum(fb) over( partition by fa order by fb desc RANGE BETWEEN 1 PRECEDING
AND CURRENT ROW ) sum1RANP,

sum(fb) over( partition by fa order by fb desc ROWS BETWEEN CURRENT ROW
AND 1 FOLLOWING ) sum1ROWF,
sum(fb) over( partition by fa order by fb desc RANGE BETWEEN CURRENT
ROW AND 1 FOLLOWING ) sum1RANF

from test
where mod(fb,2) = 1
order by fa,fb;

FA FB ROWC RANC SUROW SURAN SU1ROP SU1RAP SU1ROF SURAF

17.7 Examples
17.7.1 Why to use analytic functions
Analytic functions make possible what was either

o hard to code (relational theory is awesome but sometimes people have jobs to
do, reports to crank out)
o impratical to perform for performance reasons

In "theory" many things are "bad", in practice -- they are necessary. (i'm not
a big fan of "theory", I'm a big fan of getting things done, done well, done
fast)
Try to compute a 5 day moving average sometime using relational theory. Doable
but not practical.

You yourself violate relational theory above, you are using our psuedo column
rownum which assigns order to rows in a tuple. Thats a big bad no-no. You are
relying on the ordering of rows in a set -- another Oracle non-pure extension
(order by in a subquery -- thats a huge no-no).... You've broken many rules.
Ok, after looking at your query, I deduce that it simply shows the sum of the
salary of

a) the row "in front of" or "preceding" the current row
b) the current row
c) the row "after" or "following" the current row

So, I loaded up 1,000 employees (i tried with 30,000 but I got really bored
waiting for the answer...) and ran the queries:

[email protected]> create table emp ( empno int NOT NULL, sal
int NOT NULL );
Table created.

[email protected]>
[email protected]> insert /*+ append */
2 into emp
3 select rownum, rownum from all_objects
4 where rownum <= 1000;
1000 rows created.

[email protected]> commit;
Commit complete.

[email protected]> select count(*) from emp;

COUNT(*)
----------
1000

[email protected]>
[email protected]> alter table emp add constraint emp_pk primary
key(empno);
Table altered.

[email protected]> create index emp_idx_one on emp(empno,sal);
Index created.

[email protected]> create index emp_idx_two on emp(sal,empno);
Index created.

[email protected]>
[email protected]> analyze table emp compute statistics
2 for table
3 for all indexes
4 for all indexed columns;
Table analyzed.

[email protected]>
[email protected]> alter session set sql_trace=true;


I overindexed just to give every possible index to your query it might want.
Upon running TKPROF to review the result of your query vs the SAME query using
analytic functions we find:

select empno,
(select sum(sal) from
(select rownum rn, sal from
(select sal from emp order by -sal)
)
where center-1<=rn and rn<=center+1 ) x
from (select rownum center, empno, sal from
(select empno, sal from emp order by -sal)
)

31
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 68 14.22 14.19 0 4004 4004 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 70 14.23 14.20 0 4004 4004 1000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 63

Rows Row Source Operation
------- ---------------------------------------------------
1000 VIEW
1000 COUNT
1000 VIEW
1000 SORT ORDER BY
1000 INDEX FAST FULL SCAN (object id 23779)

select empno,
sum(sal) over (order by sal desc
rows between 1 preceding and 1 following ) x
from emp

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 68 0.03 0.03 0 4 0 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 70 0.03 0.03 0 4 0 1000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 63

Rows Row Source Operation
------- ---------------------------------------------------
1000 WINDOW SORT
1000 INDEX FULL SCAN (object id 23780)


So, which query would YOU want to use in the real world??

I find the second query using analytic functions not only faster but more
intuitive. No fancy "order and assign a row number, find the row numbers around
my rownumber and add them up". Just a simple "sum the salary of the preceding
and following row" period.

Much much faster, easier to read -- easier to comprehend and code. Thats why I
think analytic functions RULE.

Larry Ellison is doing this "Americas Cup" boat thing. When he bought the boat
it ran on "SQLServer" (the boat is fully instrumented to capture hundreds of
thousands of pieces of data from windspeed and direction to water related
things, the fullness of the sails, everything). The app they had was so complex
and used so many products there was one guy in New Zealand that knew how to set
it up and run it.

Our group moved it into Oracle, we use nothing but SQL and a simple HTML
interface and they get every bit of functionality they had plus tons more. No
need for a complex, external OLAP server -- good old SQL with analytic functions
does it all (and more). I myself find less moving parts = greater chance of
success. The more functionality we have in the database, the better off we are.

As for:
"Don't you think that adding new features into the language distract users from
being able to leverage existing ones?"

I'd say -- NO WAY (very loudly). Look at languages like Java with J2EE and J2SE
and the base java language and jdk 1.1, 1.2, 1.3, etc etc etc. Should that
language "stand still" or should it keep moving (i say move). Same with the
database - its when you stand still that you become obsolete. Analytic
functions allow us to do things that were not feasible before, required dumping
the data OUT into some other "server", learning some other environment,
programming some other environment.

Here at least all the developers have to learn is the database. They don't have
to learn the database, its apis, some OLAP tool, its apis and nuances, some
OTHER tool that does stuff differently and so on.

Just my 2cents worth. I appreciate the functionality the analytic functions
give us (so much so I did an entire chapter on them in my book, I felt they were
that important -- more benchmarks in there comparing performance)...
17.7.2
Well, with 816 and up we can with the analytic functions, for example:
[email protected]> select x,
2 first_value(x) over (order by x ) first,
3 first_value(x) over (order by x desc NULLS last ) last
4 from t
5 /
X FIRST LAST
---------- ---------- ----------
1000 1 1000
1000 1 1000
1000 1 1000
1000 1 1000
50 1 1000
1 1 1000
1 1 1000
1 1 1000
1 1 1000
9 rows selected.

[email protected]> select avg(x)
2 from ( select x,
3 first_value(x) over (order by x ) first,
4 first_value(x) over (order by x desc NULLS last ) last
5 from t
6 )
7 where x <> first
8 and x <> last
9 /
AVG(X)
----------
50
After throwing out 1 and 1000 we are left with an average of 50...

In 8.0 and before, we could:
[email protected]> select avg(x)
2 from t
3 where x <> ( select max(x) from t )
4 and x <> ( select min(x) from t )
5 /
AVG(X)
----------
50

as well (this will work in 816 and up as well).

Which one performs better will be a matter of the question being asked. In some
cases, the one with the subquery will be better. In order to demonstrate this I
set up a small test:

create table t
as
select all_objects.*, object_id x from all_objects;

alter table t add constraint t_pk primary key(x);

analyze table t compute statistics
for table
for all indexes
for all indexed columns
/

And then ran two sets of equivalent queries against it:

select avg(x)
from t
where x <> ( select min(x) from t )
and x <> ( select max(x) from t )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.09 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.07 0.06 0 39 4 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.08 0.15 0 39 4 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 35

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
16841 FILTER
16844 INDEX FAST FULL SCAN (object id 18322)
2 SORT AGGREGATE
1 INDEX FULL SCAN (MIN/MAX) (object id 18322)
2 SORT AGGREGATE
1 INDEX FULL SCAN (MIN/MAX) (object id 18322)
***************************************************************

32
select avg(x)
from ( select x,
first_value(x) over (order by x ) first,
first_value(x) over (order by x desc NULLS last ) last
from t
)
where x <> first and x <> last
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.97 0.97 0 35 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.97 1.00 0 35 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 35

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
16841 VIEW
16843 WINDOW SORT
16843 WINDOW BUFFER
16843 INDEX FULL SCAN (object id 18322)

Here the subquery was much more efficient then the analytic function. This is
due to the fact that the index could be used to very quickly answer the
subqueries and then the index could be used efficiently to answer the entire
query. With the analytic function -- it was not as effective. Changing the
question just a little -- to find the average of x with some other where clause
involved:

select avg(x)
from t
where x <> ( select min(x) from t where object_name like 'A%' )
and x <> ( select max(x) from t where object_name like 'A%' )
and object_name like 'A%'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.11 0.10 0 738 45 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.11 0.11 0 738 45 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 35

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
275 FILTER
278 TABLE ACCESS FULL T
2 SORT AGGREGATE
277 TABLE ACCESS FULL T
2 SORT AGGREGATE
277 TABLE ACCESS FULL T
*******************************************************************

select avg(x)
from ( select x,
first_value(x) over (order by x ) first,
first_value(x) over (order by x desc NULLS last ) last
from t
where object_name like 'A%'
)
where x <> first
and x <> last

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.05 0.05 0 246 15 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.05 0.05 0 246 15 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 35

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
275 VIEW
277 WINDOW SORT
277 WINDOW SORT
277 TABLE ACCESS FULL T

Now the analytic function starts to become more effective -- since the index
was not nearly as useful this time.
17.7.3 Updates
You wouldn't use analytic functions for that -- no. You cannot update an
"ordered" result set and analytic functions involve ordering.

[email protected]> update ( select ename,
2 first_value(sal) over ( partition by deptno order by sal ) new_sal,
3 sal
4 from emp )
5 set sal = new_sal
6 /
update ( select ename,
*
ERROR at line 1:
ORA-01732: data manipulation operation not legal on this view

for this tho, i would use a correlated subquery:

update t
set trans_date = ( select max(trans_date)
from t t2
where flag = 1
and t2.trans_date < t.trans_date )
where flag is null
/
17.7.4
I find setting up simple examples and testing to see how they work to be useful
- maybe that would work for you? Eg: I did little tests like this:

[email protected]> select ename,
2 sal,
3 first_value(ename) over ( order by sal
4 range between 300 preceding and 300 following ) eprec,
5 first_value(sal) over ( order by sal
6 range between 300 preceding and 300 following ) sprec,
7 last_value(ename) over ( order by sal
8 range between 300 preceding and 300 following ) efoll,
9 last_value(sal) over ( order by sal
10 range between 300 preceding and 300 following ) sfoll
11 from emp
12 /

ENAME SAL EPREC SPREC EFOLL SFOLL
---------- ---------- ---------- ---------- ---------- ----------
SMITH 800 SMITH 800 ADAMS 1100

that shows when you take SMITH as the current row and add + or - 300 to their
salary (range = 500 .. 1100), the "lowest" salary (sprec) is 800 in that window
and the highest (sfoll) is 1100


JAMES 950 SMITH 800 WARD 1250

when you do the same to james, the window is 650..1250. In that range, the
lowest sal is SMITH's at 800 and the highest is WARD's at 1250

ADAMS 1100 SMITH 800 MILLER 1300

now adams with 1100, the range is 800..1400 -- do the same.... and so on

MARTIN 1250 JAMES 950 TURNER 1500
WARD 1250 JAMES 950 TURNER 1500
MILLER 1300 ADAMS 1100 ALLEN 1600
TURNER 1500 MARTIN 1250 ALLEN 1600
ALLEN 1600 MILLER 1300 ALLEN 1600
CLARK 2450 CLARK 2450 CLARK 2450
BLAKE 2850 BLAKE 2850 SCOTT 3000
JONES 2975 BLAKE 2850 SCOTT 3000
FORD 3000 BLAKE 2850 SCOTT 3000
SCOTT 3000 BLAKE 2850 SCOTT 3000
KING 5000 KING 5000 KING 5000
14 rows selected.
17.7.5
I want to show the top 3 sal and sum the rest, something like
select ename, sal
from (select ename, sal
from emp
order by sal desc)
where rownum <= 3
union
select 'REST', sum(sal)
33
from (select ename, sal, rownum numrow
from emp
order by sal desc)
where numrow > 3
order by 2
Followup:
[email protected]> select decode( rn, 1, ename, 2, ename, 3, ename,
'REST' ), sum(sal)
2 from ( select row_number() over ( order by sal desc ) rn, ename, sal
3 from emp
4 )
5 group by decode( rn, 1, ename, 2, ename, 3, ename, 'REST' )
6 order by 2
7 /
DECODE(RN, SUM(SAL)
---------- ----------
ford 3000
scott 3000
king 5000
REST 18025
17.7.6
Lets say you are working with the EMP table and want to get the DEPTNO and
MAX(sal) by deptno and would like to get the ENAME and HIREDATE that goes with
that....
FIRST -- we have to assume that CUST_CODE,INV_AMT is unique or we will get >1
record back for a given cust_code (I mean, if cust_code = 100 has two entries
with inv_amt = 1000000 -- then there are 2 invoice_numbers and 2 invoice_dates
with the MAX!! so be prepared for that!!)
Ok, we can:

[email protected]> select deptno, sal, ename, hiredate
2 from emp
3 where sal = ( select max(sal) from emp e2 where e2.deptno = emp.deptno )
4 order by deptno
5 /

DEPTNO SAL ENAME HIREDATE
---------- ---------- ---------- ---------
10 5000 KING 17-NOV-81
20 3000 SCOTT 09-DEC-82
20 3000 FORD 03-DEC-81
30 2850 BLAKE 01-MAY-81

[email protected]>
[email protected]> select deptno,
2 to_number(substr( max_sal, 1, 10 )) sal,
3 substr( max_sal, 20 ) ename,
4 to_date( substr( max_sal, 11, 9 ) ) hiredate
5 from (
6 select deptno, max( to_char( sal,'fm0000000009') || hiredate || ename )
max_sal
7 from emp
8 group by deptno
9 )
10 /

DEPTNO SAL ENAME HIREDATE
---------- ---------- ----------- ---------
10 5000 KING 17-NOV-81
20 3000 SCOTT 09-DEC-82
30 2850 BLAKE 01-MAY-81

[email protected]>
[email protected]> select distinct deptno,
2 first_value(sal) over ( partition by deptno order by SAL desc
nulls last ) sal,
3 first_value(ename) over ( partition by deptno order by SAL desc
nulls last ) ename,
4 first_value(hiredate) over ( partition by deptno order by SAL
desc nulls last )
5 hiredate
6 from emp
7 order by deptno
8 /
DEPTNO SAL ENAME HIREDATE
---------- ---------- ---------- ---------
10 5000 KING 17-NOV-81
20 3000 FORD 03-DEC-81
30 2850 BLAKE 01-MAY-81
[email protected]>

the totally interesting thing here is that all three queries are correct
(given your problem statement) but they all return different answers given the
same data!!!

(this is one of my favorite problems in SQL, dealing with top-n's, max's and
such -- I use questions like this in interviews all of the time to see if people
can catch this subtle problem, most do not...)

So, which on is "best", thats in the eye of the beholder. I would probably use
the analytic functions (first_value) as they are by far the cleanest approach
and offer immense functionality once you learn them.

the one that uses max( to_char(sal,'fm.......) is sort of a trick. You encode
the value you want to max in a fixed length string, glue the rest of the data
onto the end of it and then substr it back out. Very effective, can be very
performant (but the analytic functions are typically faster still).

The last one is the traditional set based, sql approach to the problem. Boring
but tried and true. It works, can perform well in the right circumstances
(proper indexes, right amount of data and such)....

There are other approaches as well -- this just gives you a flavor of what you
can do.

17.7.7 The following example shows how the [ASC | DESC] option
changes the ranking
order.
SELECT channel_desc,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,
RANK() OVER (ORDER BY SUM(amount_sold) ) AS default_rank,
RANK() OVER (ORDER BY SUM(amount_sold) DESC NULLS LAST) AS
custom_rank
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_month_desc IN ('2000-09', '2000-10')
AND country_id='US'
GROUP BY channel_desc;
CHANNEL_DESC SALES$ DEFAULT_RANK CUSTOM_RANK
-------------------- -------------- ------------ -----------
Direct Sales 5,744,263 5 1
Internet 3,625,993 4 2
Catalog 1,858,386 3 3
Partners 1,500,213 2 4
Tele Sales 604,656 1 5
While the data in this result is ordered on the measure SALES$, in general, it is not
guaranteed by the RANK function that the data will be sorted on the measures. If
you want the data to be sorted on SALES$ in your result, you must specify it
explicitly with an ORDER BY clause, at the end of the SELECT statement.
Ranking functions need to resolve ties between values in the set. If the first
expression cannot resolve ties, the second expression is used to resolve ties and so
on. For example, here is a query ranking four of the sales channels over two months
based on their dollar sales, breaking ties with the unit sales. (Note that the TRUNC
function is used here only to create tie values for this query.)
SELECT channel_desc, calendar_month_desc,
TO_CHAR(TRUNC(SUM(amount_sold),-6), '9,999,999,999') SALES$,
TO_CHAR(SUM(quantity_sold), '9,999,999,999') SALES_Count,
RANK() OVER (ORDER BY trunc(SUM(amount_sold), -6) DESC,
SUM(quantity_sold)
DESC) AS col_rank
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_month_desc IN ('2000-09', '2000-10') AND
channels.channel_desc<>'Tele Sales'
GROUP BY channel_desc, calendar_month_desc;
CHANNEL_DESC CALENDAR SALES$ SALES_COUNT COL_RANK
-------------------- -------- -------------- -------------- -----
Direct Sales 2000-10 10,000,000 192,551 1
Direct Sales 2000-09 9,000,000 176,950 2
Internet 2000-10 6,000,000 123,153 3
Internet 2000-09 6,000,000 113,006 4
Catalog 2000-10 3,000,000 59,782 5
Catalog 2000-09 3,000,000 54,857 6
Partners 2000-10 2,000,000 50,773 7
Partners 2000-09 2,000,000 46,220 8
The sales_count column breaks the ties for three pairs of values.
17.7.8 RANK and DENSE_RANK Difference
The difference between RANK and DENSE_RANK functions is illustrated as follows:
SELECT channel_desc, calendar_month_desc,
TO_CHAR(TRUNC(SUM(amount_sold),-6), '9,999,999,999') SALES$,
RANK() OVER (ORDER BY trunc(SUM(amount_sold),-6) DESC)
AS RANK,
DENSE_RANK() OVER (ORDER BY TRUNC(SUM(amount_sold),-6) DESC)
AS DENSE_RANK
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_month_desc IN ('2000-09', '2000-10') AND
channels.channel_desc<>'Tele Sales'
GROUP BY channel_desc, calendar_month_desc;
CHANNEL_DESC CALENDAR SALES$ RANK DENSE_RANK
-------------------- -------- -------------- ---------
Direct Sales 2000-10 10,000,000 1 1
Direct Sales 2000-09 9,000,000 2 2
Internet 2000-09 6,000,000 3 3
34
Internet 2000-10 6,000,000 3 3
Catalog 2000-09 3,000,000 5 4
Catalog 2000-10 3,000,000 5 4
Partners 2000-09 2,000,000 7 5
Partners 2000-10 2,000,000 7 5
Note that, in the case of DENSE_RANK, the largest rank value gives the number of
distinct values in the dataset.
17.7.9 Per Group Ranking
The RANK function can be made to operate within groups, that is, the rank gets reset
whenever the group changes. This is accomplished with the PARTITION BY clause.
The group expressions in the PARTITION BY subclause divide the dataset into
groups within which RANK operates. For example, to rank products within each
channel by their dollar sales, you say:
SELECT channel_desc, calendar_month_desc,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,
RANK() OVER (PARTITION BY channel_desc
ORDER BY SUM(amount_sold) DESC) AS RANK_BY_CHANNEL
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_month_desc IN ('2000-08', '2000-09', '2000-10',
'2000-11') AND
channels.channel_desc IN ('Direct Sales', 'Internet')
GROUP BY channel_desc, calendar_month_desc;
A single query block can contain more than one ranking function, each partitioning
the data into different groups (that is, reset on different boundaries). The groups can
be mutually exclusive. The following query ranks products based on their dollar
sales within each month (rank_of_product_per_region) and within each
channel (rank_of_product_total).
SELECT channel_desc, calendar_month_desc,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,
RANK() OVER (PARTITION BY calendar_month_desc
ORDER BY SUM(amount_sold) DESC) AS RANK_WITHIN_MONTH,
RANK() OVER (PARTITION BY channel_desc
ORDER BY SUM(amount_sold) DESC) AS RANK_WITHIN_CHANNEL
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_month_desc IN ('2000-08', '2000-09', '2000-10',
'2000-11')
AND
channels.channel_desc IN ('Direct Sales', 'Internet')
GROUP BY channel_desc, calendar_month_desc;
CHANNEL_DESC CALENDAR SALES$ RANK_WITHIN_MONTH
RANK_WITHIN_CHANNEL
-------------------- -------- -------------- ----------------- --
Direct Sales 2000-08 9,588,122 1 4
Internet 2000-08 6,084,390 2 4
Direct Sales 2000-09 9,652,037 1 3
Internet 2000-09 6,147,023 2 3
Direct Sales 2000-10 10,035,478 1 2
Internet 2000-10 6,417,697 2 2
Direct Sales 2000-11 12,217,068 1 1
Internet 2000-11 7,821,208 2 1
17.7.10 Per Cube and Rollup Group Ranking
Analytic functions, RANK for example, can be reset based on the groupings provided
by a CUBE, ROLLUP, or GROUPING SETS operator. It is useful to assign ranks to the
groups created by CUBE, ROLLUP, and GROUPING SETS queries.
A sample CUBE and ROLLUP query is the following:
SELECT channel_desc, country_id,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,
RANK() OVER (PARTITION BY GROUPING_ID(channel_desc, country_id)
ORDER BY SUM(amount_sold) DESC) AS RANK_PER_GROUP
FROM sales, customers, times, channels
WHERE sales.time_id=times.time_id AND
sales.cust_id=customers.cust_id AND
sales.channel_id= channels.channel_id AND
channels.channel_desc IN ('Direct Sales', 'Internet') AND
times.calendar_month_desc='2000-09'
AND country_id IN ('UK', 'US', 'JP')
GROUP BY CUBE( channel_desc, country_id);
CHANNEL_DESC CO SALES$ RANK_PER_GROUP
-------------------- -- -------------- --------------
Direct Sales US 2,835,557 1
Internet US 1,732,240 2
Direct Sales UK 1,378,126 3
Internet UK 911,739 4
Direct Sales JP 91,124 5
Internet JP 57,232 6
Direct Sales 4,304,807 1
Internet 2,701,211 2
US 4,567,797 1
UK 2,289,865 2
JP 148,355 3
7,006,017 1
Treatment of NULLs
NULLs are treated like normal values. Also, for rank computation, a NULL value is
assumed to be equal to another NULL value. Depending on the ASC | DESC options
provided for measures and the NULLS FIRST | NULLS LAST clause, NULLs will
either sort low or high and hence, are given ranks appropriately. The following
example shows how NULLs are ranked in different cases:
SELECT calendar_year AS YEAR, calendar_quarter_number AS QTR,
calendar_month_number AS MO, SUM(amount_sold),
RANK() OVER (ORDER BY SUM(amount_sold) ASC NULLS FIRST) AS
NFIRST,
RANK() OVER (ORDER BY SUM(amount_sold) ASC NULLS LAST) AS NLASST,
RANK() OVER (ORDER BY SUM(amount_sold) DESC NULLS FIRST) AS
NFIRST_DESC,
RANK() OVER (ORDER BY SUM(amount_sold) DESC NULLS LAST) AS
NLAST_DESC
FROM (
SELECT sales.time_id, sales.amount_sold, products.*, customers.*
FROM sales, products, customers
WHERE
sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
prod_name IN ('Ruckpart Eclipse', 'Ukko Plain Gortex Boot')
AND country_id ='UK') v, times
WHERE v.time_id (+) =times.time_id AND
calendar_year=1999
GROUP BY calendar_year, calendar_quarter_number,
calendar_month_number;
YEAR QTR MO SUM(AMOUNT_SOLD) NFIRST NLASST NFIRST_DESC NLAST_DESC
------------- --------- --------- ---------------- --------- ----
----- ----------- ----------
1999 1 3 51820 12 8 5 1
1999 2 6 45360 11 7 6 2
1999 3 9 43950 10 6 7 3
1999 3 8 41180 8 4 9 5
Ranking Functions
19-12 Oracle9 i Data Warehousing Guide
1999 2 5 27431 7 3 10 6
1999 2 4 20602 6 2 11 7
1999 3 7 15296 5 1 12 8
1999 1 1 1 9 1 9
1999 4 10 1 9 1 9
1999 4 11 1 9 1 9
1999 4 12 1 9 1 9
If the value for two rows is NULL, the next group expression is used to resolve the
tie. If they cannot be resolved even then, the next expression is used and so on till
the tie is resolved or else the two rows are given the same rank. For example:
Top N Ranking
You can easily obtain top N ranks by enclosing the RANK function in a subquery and
then applying a filter condition outside the subquery. For example, to obtain the top
five countries in sales for a specific month, you can issue the following statement:
SELECT * FROM
(SELECT country_id,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,
RANK() OVER (ORDER BY SUM(amount_sold) DESC ) AS COUNTRY_RANK
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_month_desc='2000-09'
GROUP BY country_id)
WHERE COUNTRY_RANK <= 5;
CO SALES$ COUNTRY_RANK
-- -------------- ------------
US 6,517,786 1
NL 3,447,121 2
UK 3,207,243 3
DE 3,194,765 4
FR 2,125,572 5
Bottom N Ranking
Bottom N is similar to top N except for the ordering sequence within the rank
expression. Using the previous example, you can order SUM(s_amount) ascending
instead of descending.
17.7.11 CUME_DIST
The CUME_DIST function (defined as the inverse of percentile in some statistical
books) computes the position of a specified value relative to a set of values. The
order can be ascending or descending. Ascending is the default. The range of values
for CUME_DIST is from greater than 0 to 1. To compute the CUME_DIST of a value x
in a set S of size N, you use the formula:
CUME_DIST(x) = number of values in S coming before and including
x
in the specified order/ N
Its syntax is:
CUME_DIST ( ) OVER ( [query_partition_clause] order_by_clause )
The semantics of various options in the CUME_DIST function are similar to those in
the RANK function. The default order is ascending, implying that the lowest value
gets the lowest CUME_DIST (as all other values come later than this value in the
order). NULLs are treated the same as they are in the RANK function. They are
counted toward both the numerator and the denominator as they are treated like
non-NULL values. The following example finds cumulative distribution of sales by
channel within each month:
SELECT calendar_month_desc AS MONTH, channel_desc,
TO_CHAR(SUM(amount_sold) , '9,999,999,999') SALES$ ,
CUME_DIST() OVER ( PARTITION BY calendar_month_desc ORDER BY
SUM(amount_sold) ) AS
CUME_DIST_BY_CHANNEL
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_month_desc IN ('2000-09', '2000-07','2000-08')
GROUP BY calendar_month_desc, channel_desc;
MONTH CHANNEL_DESC SALES$ CUME_DIST_BY_CHANNEL
-------- -------------------- -------------- --------------------
35
2000-07 Tele Sales 1,012,954 .2
2000-07 Partners 2,495,662 .4
2000-07 Catalog 2,946,709 .6
2000-07 Internet 6,045,609 .8
2000-07 Direct Sales 9,563,664 1
2000-08 Tele Sales 1,008,703 .2
2000-08 Partners 2,552,945 .4
Ranking Functions
19-14 Oracle9 i Data Warehousing Guide
2000-08 Catalog 3,061,381 .6
2000-08 Internet 6,084,390 .8
2000-08 Direct Sales 9,588,122 1
2000-09 Tele Sales 1,017,149 .2
2000-09 Partners 2,570,666 .4
2000-09 Catalog 3,025,309 .6
2000-09 Internet 6,147,023 .8
2000-09 Direct Sales 9,652,037 1
PERCENT_RANK
PERCENT_RANK is similar to CUME_DIST, but it uses rank values rather than row
counts in its numerator. Therefore, it returns the percent rank of a value relative to a
group of values. The function is available in many popular spreadsheets. PERCENT_
RANK of a row is calculated as:
(rank of row in its partition - 1) / (number of rows in the
partition - 1)
PERCENT_RANK returns values in the range zero to one. The row(s) with a rank of 1
will have a PERCENT_RANK of zero.
Its syntax is:
PERCENT_RANK ( ) OVER ( [query_partition_clause] order_by_clause
)
NTILE
NTILE allows easy calculation of tertiles, quartiles, deciles and other common
summary statistics. This function divides an ordered partition into a specified
number of groups called buckets and assigns a bucket number to each row in the
partition. NTILE is a very useful calculation because it lets users divide a data set
into fourths, thirds, and other groupings.
The buckets are calculated so that each bucket has exactly the same number of rows
assigned to it or at most 1 row more than the others. For instance, if you have 100
rows in a partition and ask for an NTILE function with four buckets, 25 rows will be
assigned a value of 1, 25 rows will have value 2, and so on. These buckets are
referred to as equiheight buckets.
If the number of rows in the partition does not divide evenly (without a remainder)
into the number of buckets, then the number of rows assigned for each bucket will
differ by one at most. The extra rows will be distributed one for each bucket starting
from the lowest bucket number. For instance, if there are 103 rows in a partition
which has an NTILE(5) function, the first 21 rows will be in the first bucket, the
Ranking Functions
SQL for Analysis in Data Warehouses 19-15
next 21 in the second bucket, the next 21 in the third bucket, the next 20 in the
fourth bucket and the final 20 in the fifth bucket.
The NTILE function has the following syntax:
NTILE ( expr ) OVER ( [query_partition_clause] order_by_clause )
In this, the N in NTILE(N) can be a constant (for example, 5) or an expression.
This function, like RANK and CUME_DIST, has a PARTITION BY clause for per group
computation, an ORDER BY clause for specifying the measures and their sort order,
and NULLS FIRST | NULLS LAST clause for the specific treatment of NULLs. For
example,
NTILE Example
The following is an example assigning each month's sales total into one of 4
buckets:
SELECT calendar_month_desc AS MONTH ,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,
NTILE(4) OVER (ORDER BY SUM(amount_sold)) AS TILE4
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_year=1999 AND
prod_category= 'Men'
GROUP BY calendar_month_desc;
MONTH SALES$ TILE4
-------- -------------- ---------
1999-10 4,373,102 1
1999-01 4,754,622 1
1999-11 5,367,943 1
1999-12 6,082,226 2
1999-07 6,161,638 2
1999-02 6,518,877 2
1999-06 6,634,401 3
1999-04 6,772,673 3
1999-08 6,954,221 3
1999-03 6,968,928 4
1999-09 7,030,524 4
1999-05 8,018,174 4
NTILE ORDER BY statements must be fully specified to yield reproducible results.
Equal values can get distributed across adjacent buckets (75 is assigned to buckets 2
and 3 in the previous example) and buckets 1, 2, and 3 in the example have 3
elements - one more than the size of bucket 4. In this example, JEANS could as well
be assigned to bucket 2 (instead of 3) and SWEATERS to bucket 3 (instead of 2),
because there is no ordering on the p_product_key column. To ensure
deterministic results, you must order on a unique key.
17.7.12 ROW_NUMBER
The ROW_NUMBER function assigns a unique number (sequentially, starting from 1,
as defined by ORDER BY) to each row within the partition. It has the following
syntax:
ROW_NUMBER ( ) OVER ( [query_partition_clause] order_by_clause )
ROW_NUMBER Example
SELECT channel_desc, calendar_month_desc,
TO_CHAR(TRUNC(SUM(amount_sold), -6), '9,999,999,999') SALES$,
ROW_NUMBER() OVER (ORDER BY TRUNC(SUM(amount_sold), -6) DESC)
AS ROW_NUMBER
FROM sales, products, customers, times, channels
WHERE sales.prod_id=products.prod_id AND
sales.cust_id=customers.cust_id AND
sales.time_id=times.time_id AND
sales.channel_id=channels.channel_id AND
times.calendar_month_desc IN ('2000-09', '2000-10')
GROUP BY channel_desc, calendar_month_desc;
CHANNEL_DESC CALENDAR SALES$ ROW_NUMBER
-------------------- -------- -------------- ----------
Direct Sales 2000-10 10,000,000 1
Direct Sales 2000-09 9,000,000 2
Internet 2000-09 6,000,000 3
Internet 2000-10 6,000,000 4
Catalog 2000-09 3,000,000 5
Catalog 2000-10 3,000,000 6
Partners 2000-09 2,000,000 7
Partners 2000-10 2,000,000 8
Tele Sales 2000-09 1,000,000 9
Tele Sales 2000-10 1,000,000 10
Note that there are three pairs of tie values in these results. Like NTILE, ROW_
NUMBER is a non-deterministic function, so each tied value could have its row
number switched. To ensure deterministic results, you must order on a unique key.
Inmost cases, that will require adding a new tie breaker column to the query and
using it in the ORDER BY specification.
17.7.13 Windowing Aggregate Functions
Windowing functions can be used to compute cumulative, moving, and centered
aggregates. They return a value for each row in the table, which depends on other
rows in the corresponding window. These functions include moving sum, moving
average, moving min/max, cumulative sum, as well as statistical functions. They
can be used only in the SELECT and ORDER BY clauses of the query. Two other
functions are available: FIRST_VALUE, which returns the first value in the window;
and LAST_VALUE, which returns the last value in the window. These functions
provide access to more than one row of a table without a self-join. The syntax of the
windowing functions is:
{SUM|AVG|MAX|MIN|COUNT|STDDEV|VARIANCE|FIRST_VALUE|LAST_VALUE}
({value expression1 | *}) OVER
([PARTITION BY value expression2[,...])
ORDER BY value expression3 [collate clause>]
[ASC| DESC] [NULLS FIRST | NULLS LAST] [,...]
{ ROWS | RANGE }
{ BETWEEN
{ UNBOUNDED PRECEDING
| CURRENT ROW
| value_expr { PRECEDING | FOLLOWING }
}
AND
{ UNBOUNDED FOLLOWING
| CURRENT ROW
| value_expr { PRECEDING | FOLLOWING }
}
| { UNBOUNDED PRECEDING
| CURRENT ROW
| value_expr PRECEDING
}
}
17.7.14 Treatment of NULLs as Input to Window Functions
Window functions' NULL semantics match the NULL semantics for SQL aggregate
functions. Other semantics can be obtained by user-defined functions, or by using
the DECODE or a CASE expression within the window function.
17.7.15 Windowing Functions with Logical Offset
A logical offset can be specified with constants such as RANGE 10 PRECEDING, or
an expression that evaluates to a constant, or by an interval specification like RANGE
INTERVAL N DAY/MONTH/YEAR PRECEDING or an expression that evaluates to an
interval. With logical offset, there can only be one expression in the ORDER BY
expression list in the function, with type compatible to NUMERIC if offset is numeric,
or DATE if an interval is specified.
The following is an example of cumulative amount_sold by customer ID by
quarter in 1999:
SELECT c.cust_id, t.calendar_quarter_desc,
TO_CHAR (SUM(amount_sold), '9,999,999,999') AS Q_SALES,
TO_CHAR(SUM(SUM(amount_sold)) OVER (PARTITION BY
c.cust_id ORDER BY c.cust_id, t.calendar_quarter_desc ROWS
UNBOUNDED
PRECEDING), '9,999,999,999') AS CUM_SALES
FROM sales s, times t, customers c
WHERE
s.time_id=t.time_id AND
s.cust_id=c.cust_id AND
t.calendar_year=1999 AND
c.cust_id IN (6380, 6510)
GROUP BY c.cust_id, t.calendar_quarter_desc
ORDER BY c.cust_id, t.calendar_quarter_desc;
CUST_ID CALENDA Q_SALES CUM_SALES
--------- ------- -------------- --------------
6380 1999-Q1 60,621 60,621
6380 1999-Q2 68,213 128,834
6380 1999-Q3 75,238 204,072
6380 1999-Q4 57,412 261,484
36
6510 1999-Q1 63,030 63,030
6510 1999-Q2 74,622 137,652
6510 1999-Q3 69,966 207,617
6510 1999-Q4 63,366 270,983
In this example, the analytic function SUM defines, for each row, a window that
starts at the beginning of the partition (UNBOUNDED PRECEDING) and ends, by
default, at the current row.
Nested SUMs are needed in this example since we are performing a SUM over a value
that is itself a SUM. Nested aggregations are used very often in analytic aggregate
functions.
17.7.16 Moving Aggregate Function Example
This example of a time-based window shows, for one customer, the moving average
of sales for the current month and preceding two months:
SELECT c.cust_id, t.calendar_month_desc,
TO_CHAR (SUM(amount_sold), '9,999,999,999') AS SALES ,
TO_CHAR(AVG(SUM(amount_sold))
OVER (ORDER BY c.cust_id, t.calendar_month_desc ROWS 2
PRECEDING),
'9,999,999,999') AS MOVING_3_MONTH_AVG
FROM sales s, times t, customers c
WHERE
s.time_id=t.time_id AND
s.cust_id=c.cust_id AND
t.calendar_year=1999 AND
c.cust_id IN (6380)
GROUP BY c.cust_id, t.calendar_month_desc
ORDER BY c.cust_id, t.calendar_month_desc;
CUST_ID CALENDAR SALES MOVING_3_MONTH
--------- -------- -------------- --------------
6380 1999-01 19,642 19,642
6380 1999-02 19,324 19,483
6380 1999-03 21,655 20,207
6380 1999-04 27,091 22,690
6380 1999-05 16,367 21,704
6380 1999-06 24,755 22,738
6380 1999-07 31,332 24,152
6380 1999-08 22,835 26,307
6380 1999-09 21,071 25,079
6380 1999-10 19,279 21,062
6380 1999-11 18,206 19,519
6380 1999-12 19,927 19,137
Note that the first two rows for the three month moving average calculation in the
output data are based on a smaller interval size than specified because the window
calculation cannot reach past the data retrieved by the query. You need to consider
the different window sizes found at the borders of result sets. In other words, you
may need to modify the query to include exactly what you want.
17.7.17 Centered Aggregate Function
Calculating windowing aggregate functions centered around the current row is
straightforward. This example computes for a customer a centered moving average
of the sales total for the one day preceding the current row and one day following
the current row including the current row as well.
SELECT cust_id, t.time_id,
TO_CHAR (SUM(amount_sold), '9,999,999,999') AS SALES,
TO_CHAR(AVG(SUM(amount_sold)) OVER
(PARTITION BY s.cust_id ORDER BY t.time_id
RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY
FOLLOWING),
'9,999,999,999') AS CENTERED_3_DAY_AVG
FROM sales s, times t
WHERE
s.time_id=t.time_id AND
t.calendar_week_number IN (51) AND
calendar_year=1999 AND cust_id IN (6380, 6510)
GROUP BY cust_id, t.time_id
ORDER BY cust_id, t.time_id;
CUST_ID TIME_ID SALES CENTERED_3_DAY
--------- --------- -------------- --------------
6380 20-DEC-99 2,240 1,136
6380 21-DEC-99 32 873
6380 22-DEC-99 348 148
6380 23-DEC-99 64 302
6380 24-DEC-99 493 212
6380 25-DEC-99 80 423
6380 26-DEC-99 696 388
6510 20-DEC-99 196 106
6510 21-DEC-99 16 155
6510 22-DEC-99 252 143
6510 23-DEC-99 160 305
6510 24-DEC-99 504 240
6510 25-DEC-99 56 415
6510 26-DEC-99 684 370
The starting and ending rows for each product's centered moving average
calculation in the output data are based on just two days, since the window
calculation cannot reach past the data retrieved by the query. Users need to consider
the different window sizes found at the borders of result sets: the query may need
to be adjusted.
17.7.18 Windowing Aggregate Functions in the Presence of Duplicates
The following example illustrates how window aggregate functions compute values
when there are duplicates, that is, when multiple rows are returned for a single
ordering value. The query retrieves the quantity sold in the US for two products
during a specified time range. The query defines a moving window that runs from
the date of the current row to 10 days earlier.
Note that the RANGE keyword is used to define the windowing clause of this
example. This means that the window can potentially hold many rows for each
value in the range. In this case, there are three rows with the duplicate ordering
value of '04-NOV-98'.
SELECT time_id, s.quantity_sold,
SUM(s.quantity_sold) OVER (ORDER BY time_id
RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW)
AS current_group_sum
FROM customers c, products p, sales s
WHERE p.prod_id=s.prod_id AND c.cust_id=s.cust_id
AND c.country_id='US' AND p.prod_id IN (250, 500)
AND s.time_id BETWEEN '24-OCT-98' AND '14-NOV-98'
ORDER BY TIME_ID;
TIME_ID QUANTITY_SOLD CURRENT_GROUP_SUM /* Source #s for row */
--------- ------------- -----------------
24-OCT-98 19 19 /* 19 */
27-OCT-98 17 36 /* 19+17 */
04-NOV-98 2 24 /* 17+(2+3+2) */
04-NOV-98 3 24 /* 17+(2+3+2) */
04-NOV-98 2 24 /* 17+(2+3+2) */
14-NOV-98 12 19 /* (2+3+2)+12 */
6 rows selected.
In the output, values within parentheses are from the rows with the tied ordering
key value, 04-NOV-98.
Consider the row with the output of "04-NOV-98, 3, 24". In this case, all the
other rows with TIME_ID of 04-NOV-98 (ties) are considered to belong to one
group. Therefore, the CURRENT_GROUP_SUM should include this row (that is, 3) and
its ties (that is, 2 and 2) in the window. It also includes any rows with dates up to 10
days earlier. In this data, that includes the row with date 27-OCT-98. Hence the
result is 17+(2+3+2) = 24. The calculation of CURRENT_GROUP_SUM is identical for
each of the tied rows, so the output shows three rows with the value 24.
Note that this example applies only when you use the RANGE keyword rather than
the ROWS keyword. It is also important to remember that with RANGE, you can only
use 1 ORDER BY expression in the analytic functions ORDER BY clause. With the
ROWS keyword, you can use multiple order by expressions in the analytic functions
order by clause.
17.7.19 Varying Window Size for Each Row
There are situations where it is useful to vary the size of a window for each row,
based on a specified condition. For instance, you may want to make the window
larger for certain dates and smaller for others. Assume that you want to calculate
the moving average of stock price over three working days. If you have an equal
number of rows for each day for all working days and no non-working days are
stored, then you can use a physical window function. However, if the conditions
noted are not met, you can still calculate a moving average by using an expression
in the window size parameters.
Expressions in a window size specification can be made in several different sources.
the expression could be a reference to a column in a table, such as a time table. It
could also be a function that returns the appropriate boundary for the window
based on values in the current row. The following statement for a hypothetical stock
price database uses a user-defined function in its RANGE clause to set window size:
SELECT t_timekey,
AVG(stock_price)
OVER (ORDER BY t_timekey RANGE fn(t_timekey) PRECEDING) av_price
FROM stock, time
WHERE st_timekey = t_timekey
ORDER BY t_timekey;
In this statement, t_timekey is a date field. Here, fn could be a PL/SQL function
with the following specification:
fn(t_timekey) returns
4 if t_timekey is Monday, Tuesday
2 otherwise
If any of the previous days are holidays, it adjusts the count appropriately.
Note that, when window is specified using a number in a window function with
ORDER BY on a date column, then it is converted to mean the number of days. You
could have also used the interval literal conversion function, as
NUMTODSINTERVAL(fn(t_timekey), 'DAY') instead of just fn(t_timekey)
to mean the same thing. You can also write a PL/SQL function that returns an
INTERVAL datatype value.
17.7.20 Windowing Aggregate Functions with Physical Offsets
For windows expressed in rows, the ordering expressions should be unique to
produce deterministic results. For example, the following query is not deterministic
because time_id is not unique in this result set.
Example 198 Windowing Aggregate Functions With Physical Offsets
SELECT t.time_id,
TO_CHAR(amount_sold, '9,999,999,999') AS INDIV_SALE,
TO_CHAR(SUM(amount_sold) OVER
(PARTITION BY t.time_id ORDER BY t.time_id
ROWS UNBOUNDED PRECEDING), '9,999,999,999') AS CUM_SALES
FROM sales s, times t, customers c
WHERE
s.time_id=t.time_id AND
s.cust_id=c.cust_id AND
t.time_id IN (TO_DATE('11-DEC-1999'), TO_DATE('12-DEC-1999') )
AND
c.cust_id BETWEEN 6500 AND 6600
ORDER BY t.time_id;
TIME_ID INDIV_SALE CUM_SALES
--------- -------------- --------------
11-DEC-99 1,036 1,036
11-DEC-99 1,932 2,968
11-DEC-99 588 3,556
12-DEC-99 504 504
12-DEC-99 429 933
12-DEC-99 1,160 2,093
The statement could also yield the following:
37
Reporting Aggregate Functions
19-24 Oracle9 i Data Warehousing Guide
TIME_ID INDIV_SALE CUM_SALES
--------- -------------- --------------
11-DEC-99 1,932 2,968
11-DEC-99 588 3,556
11-DEC-99 1,036 1,036
12-DEC-99 504 504
12-DEC-99 1,160 2,093
12-DEC-99 429 933
One way to handle this problem would be to add the prod_id column to the result
set and order on both time_id and prod_id.
17.7.21 FIRST_VALUE and LAST_VALUE
The FIRST_VALUE and LAST_VALUE functions allow you to select the first and last
rows from a window. These rows are especially valuable because they are often
used as the baselines in calculations. For instance, with a partition holding sales
data ordered by day, you might ask "How much was each day's sales compared to
the first sales day (FIRST_VALUE) of the period?" Or you might wish to know, for a
set of rows in increasing sales order, "What was the percentage size of each sale in
the region compared to the largest sale (LAST_VALUE) in the region?"
Reporting Aggregate Functions
After a query has been processed, aggregate values like the number of resulting
rows or an average value in a column can be easily computed within a partition and
made available to other reporting functions. Reporting aggregate functions return
the same aggregate value for every row in a partition. Their behavior with respect
to NULLs is the same as the SQL aggregate functions. The syntax is:
{SUM | AVG | MAX | MIN | COUNT | STDDEV | VARIANCE}
([ALL | DISTINCT] {value expression1 | *})
OVER ([PARTITION BY value expression2[,...]])
In addition, the following conditions apply:
An asterisk (*) is only allowed in COUNT(*)
DISTINCT is supported only if corresponding aggregate functions allow it
value expression1 and value expression2 can be any valid expression
involving column references or aggregates.
The PARTITION BY clause defines the groups on which the windowing
functions would be computed. If the PARTITION BY clause is absent, then the
function is computed over the whole query result set.
Reporting functions can appear only in the SELECT clause or the ORDER BY clause.
The major benefit of reporting functions is their ability to do multiple passes of data
in a single query block and speed up query performance. Queries such as "Count
the number of salesmen with sales more than 10% of city sales" do not require joins
between separate query blocks.
For example, consider the question "For each product category, find the region in
which it had maximum sales". The equivalent SQL query using the MAX reporting
aggregate function would be:
SELECT prod_category, country_region, sales FROM
(SELECT substr(p.prod_category,1,8), co.country_region,
SUM(amount_sold)
AS sales,
MAX(SUM(amount_sold)) OVER (partition BY prod_category) AS
MAX_REG_SALES
FROM sales s, customers c, countries co, products p
WHERE s.cust_id=c.cust_id AND
c.country_id=co.country_id AND
s.prod_id=p.prod_id AND
s.time_id=to_DATE('11-OCT-2000')
GROUP BY prod_category, country_region)
WHERE sales=MAX_REG_SALES;
The inner query with the reporting aggregate function MAX(SUM(amount_sold))
returns:
SUBSTR(P COUNTRY_REGION SALES MAX_REG_SALES
-------- -------------------- --------- -------------
Boys Africa 594 41974
Boys Americas 20353 41974
Boys Asia 2258 41974
Boys Europe 41974 41974
Boys Oceania 1402 41974
Girls Americas 13869 52963
Girls Asia 1657 52963
Girls Europe 52963 52963
Girls Middle East 303 52963
Girls Oceania 380 52963
Men Africa 1705 123253
Men Americas 69304 123253
Men Asia 6153 123253
Men Europe 123253 123253
Reporting Aggregate Functions
19-26 Oracle9 i Data Warehousing Guide
Men Oceania 2646 123253
Women Africa 4037 255109
Women Americas 145501 255109
Women Asia 20394 255109
Women Europe 255109 255109
Women Middle East 350 255109
Women Oceania 17408 255109
The full query results are:
PROD_CATEGORY COUNTRY_REGION SALES
------------- -------------- ------
Boys Europe 41974
Girls Europe 52963
Men Europe 123253
Women Europe 255109
17.7.22 Reporting Aggregate Example
Reporting aggregates combined with nested queries enable you to answer complex
queries efficiently. For instance, what if we want to know the best selling products
in our most significant product subcategories? We have 4 product categories which
contain a total of 37 product subcategories, and there are 10,000 unique products.
Here is a query which finds the 5 top-selling products for each product subcategory
that contributes more than 20% of the sales within its product category.
SELECT SUBSTR(prod_category,1,8) AS CATEG, prod_subcategory,
prod_id, SALES FROM
(SELECT p.prod_category, p.prod_subcategory, p.prod_id,
SUM(amount_sold) as SALES,
SUM(SUM(amount_sold)) OVER (PARTITION BY p.prod_category) AS
CAT_SALES,
AUM(SUM(amount_sold)) OVER
(PARTITION BY p.prod_subcategory) AS SUBCAT_SALES,
RANK() OVER (PARTITION BY p.prod_subcategory
ORDER BY SUM(amount_sold) ) AS RANK_IN_LINE
FROM sales s, customers c, countries co, products p
WHERE s.cust_id=c.cust_id AND
c.country_id=co.country_id AND s.prod_id=p.prod_id AND
s.time_id=to_DATE('11-OCT-2000')
GROUP BY p.prod_category, p.prod_subcategory, p.prod_id
ORDER BY prod_category, prod_subcategory)
WHERE SUBCAT_SALES>0.2*CAT_SALES AND RANK_IN_LINE<=5;
17.7.23 RATIO_TO_REPORT
The RATIO_TO_REPORT function computes the ratio of a value to the sum of a set
of values. If the expression value expression evaluates to NULL, RATIO_TO_
REPORT also evaluates to NULL, but it is treated as zero for computing the sum of
values for the denominator. Its syntax is:
RATIO_TO_REPORT ( expr ) OVER ( [query_partition_clause] )
In this, the following applies:
expr can be any valid expression involving column references or aggregates.
The PARTITION BY clause defines the groups on which the RATIO_TO_
REPORT function is to be computed. If the PARTITION BY clause is absent, then
the function is computed over the whole query result set.
Example 199 RATIO_TO_REPORT
To calculate RATIO_TO_REPORT of sales per channel, you might use the following
syntax:
SELECT ch.channel_desc,
TO_CHAR(SUM(amount_sold),'9,999,999') as SALES,
TO_CHAR(SUM(SUM(amount_sold)) OVER (), '9,999,999')
AS TOTAL_SALES,
TO_CHAR(RATIO_TO_REPORT(SUM(amount_sold)) OVER (), '9.999')
AS RATIO_TO_REPORT
FROM sales s, channels ch
WHERE s.channel_id=ch.channel_id AND
s.time_id=to_DATE('11-OCT-2000')
GROUP BY ch.channel_desc;
CHANNEL_DESC SALES TOTAL_SALE RATIO_
-------------------- ---------- ---------- ------
Catalog 111,103 781,613 .142
Direct Sales 335,409 781,613 .429
Internet 212,314 781,613 .272
Partners 91,352 781,613 .117
Tele Sales 31,435 781,613 .040
17.7.24 LAG/LEAD Functions
The LAG and LEAD functions are useful for comparing values when the relative
positions of rows can be known reliably. They work by specifying the count of rows
which separate the target row from the current row. Since the functions provide
access to more than one row of a table at the same time without a self-join, they can
enhance processing speed. The LAG function provides access to a row at a given
offset prior to the current position, and the LEAD function provides access to a row
at a given offset after the current position.
LAG/LEAD Syntax
These functions have the following syntax:
{LAG | LEAD} ( value_expr [, offset] [, default] )
OVER ( [query_partition_clause] order_by_clause )
offset is an optional parameter and defaults to 1. default is an optional
parameter and is the value returned if offset falls outside the bounds of the table
or partition.
Example 1910 LAG/LEAD
SELECT time_id, TO_CHAR(SUM(amount_sold),'9,999,999') AS SALES,
TO_CHAR(LAG(SUM(amount_sold),1) OVER (ORDER BY
time_id),'9,999,999') AS LAG1,
TO_CHAR(LEAD(SUM(amount_sold),1) OVER (ORDER BY
time_id),'9,999,999') AS LEAD1
FROM sales
WHERE
time_id>=TO_DATE('10-OCT-2000') AND
time_id<=TO_DATE('14-OCT-2000')
GROUP BY time_id;
TIME_ID SALES LAG1 LEAD1
--------- ---------- ---------- ----------
10-OCT-00 773,921 781,613
11-OCT-00 781,613 773,921 744,351
12-OCT-00 744,351 781,613 757,356
13-OCT-00 757,356 744,351 791,960
14-OCT-00 791,960 757,356
17.7.25 FIRST/LAST Functions
The FIRST/LAST aggregate functions allow you to return the result of an aggregate
applied over a set of rows that rank as the first or last with respect to a given order
specification. FIRST/LAST lets you order on column A but return an result of an
aggregate applied on column B. This is valuable because it avoids the need for a
self-join or subquery, thus improving performance. These functions begin with a
tiebreaker function, which is a regular aggregate function (MIN, MAX, SUM, AVG,
38
COUNT, VARIANCE, STDDEV) that produces the return value. The tiebreaker function
is performed on the set rows (1 or more rows) that rank as first or last respect to the
order specification to return a single value.
FIRST/LAST As Regular Aggregates
You can use the FIRST/LAST family of aggregates as regular aggregate functions.
Example 1911 FIRST/LAST Example 1
The following query lets us compare minimum price and list price of our products.
For each product subcategory within the Mens clothing category, it returns the
following:
List price of the product with the lowest minimum price
Lowest minimum price
List price of the product with the highest minimum price
Highest minimum price
SELECT prod_subcategory, MIN(prod_list_price)
KEEP (DENSE_RANK FIRST ORDER BY (prod_min_price))
AS LP_OF_LO_MINP,
MIN(prod_min_price) AS LO_MINP,
MAX(prod_list_price) KEEP (DENSE_RANK LAST ORDER BY
(prod_min_price))
AS LP_OF_HI_MINP,
MAX(prod_min_price) AS HI_MINP
FROM products
WHERE prod_category='Men'
GROUP BY prod_subcategory;
PROD_SUBCATEGORY LP_OF_LO_MINP LO_MINP LP_OF_HI_MINP HI_MINP
---------------- ------------- ------- ------------- -------
Casual Shirts - Men 39.9 16.92 88 59.4
Dress Shirts - Men 42.5 17.34 59.9 41.51
Jeans - Men 38 17.33 69.9 62.28
Outerwear - Men 44.9 19.76 495 334.12
Shorts - Men 34.9 15.36 195 103.54
Sportcoats - Men 195 96.53 595 390.92
Sweaters - Men 29.9 14.59 140 97.02
Trousers - Men 38 15.5 135 120.29
Underwear And Socks - Men 10.9 4.45 39.5 27.02
A query like this can be useful for understanding the sales patterns of your different
channels. For instance, the result set here highlights that Telesales sell relatively
small volumes.
17.7.26 FIRST/LAST As Reporting Aggregates
You can also use the FIRST/LAST family of aggregates as reporting aggregate
functions. An example is calculating which months had the greatest and least
increase in head count throughout the year. The syntax for these functions is similar
to the syntax for any other reporting aggregate.
Consider the example in Example 1911 for FIRST/LAST. What if we wanted to
find the list prices of individual products and compare them to the list prices of the
products in their subcategory that had the highest and lowest minimum prices?
The following query lets us find that information for the Sportcoats - Men
subcategory by using FIRST/LAST as reporting aggregates. Because there are over
100 products in this subcategory, we show only the first few rows of results.
Example 1912 FIRST/LAST Example 2
SELECT prod_id, prod_list_price,
MIN(prod_list_price) KEEP (DENSE_RANK FIRST ORDER BY
(prod_min_price))
OVER(PARTITION BY (prod_subcategory)) AS LP_OF_LO_MINP,
MAX(prod_list_price) KEEP (DENSE_RANK LAST ORDER BY
(prod_min_price))
OVER(PARTITION BY (prod_subcategory)) AS LP_OF_HI_MINP
FROM products
WHERE prod_subcategory='Sportcoats - Men';
PROD_ID PROD_LIST_PRICE LP_OF_LO_MINP LP_OF_HI_MINP
------- --------------- ------------- -------------
730 365 195 595
1165 365 195 595
1560 595 195 595
2655 195 195 595
2660 195 195 595
3840 275 195 595
3865 275 195 595
4035 319.9 195 595
4075 395 195 595
4245 195 195 595
4790 365 195 595
4800 365 195 595
5560 425 195 595
5575 425 195 595
5625 595 195 595
7915 275 195 595
....
Using the FIRST and LAST functions as reporting aggregates makes it easy to
include the results in calculations such "Salary as a percent of the highest salary."
Linear Regression Functions
The regression functions support the fitting of an ordinary-least-squares regression
line to a set of number pairs. You can use them as both aggregate functions or
windowing or reporting functions.
The functions are:
REGR_COUNT
REGR_AVGX
REGR_AVGY
REGR_SLOPE
REGR_INTERCEPT
REGR_R2
REGR_SXX
REGR_SYY
Linear Regression Functions
19-32 Oracle9 i Data Warehousing Guide
REGR_SXY
Oracle applies the function to the set of (e1, e2) pairs after eliminating all pairs for
which either of e1 or e2 is null. e1 is interpreted as a value of the dependent
variable (a "y value"), and e2 is interpreted as a value of the independent variable
(an "x value"). Both expressions must be numbers.
The regression functions are all computed simultaneously during a single pass
through the data. They are frequently combined with the COVAR_POP, COVAR_
SAMP, and CORR functions.
17.7.27 REGR_COUNT
REGR_COUNT returns the number of non-null number pairs used to fit the
regression line. If applied to an empty set (or if there are no (e1, e2) pairs where
neither of e1 or e2 is null), the function returns 0.
REGR_AVGY and REGR_AVGX
REGR_AVGY and REGR_AVGX compute the averages of the dependent variable and
the independent variable of the regression line, respectively. REGR_AVGY computes
the average of its first argument (e1) after eliminating (e1, e2) pairs where either of
e1 or e2 is null. Similarly, REGR_AVGX computes the average of its second
argument (e2) after null elimination. Both functions return NULL if applied to an
empty set.
17.7.28 REGR_SLOPE and REGR_INTERCEPT
The REGR_SLOPE function computes the slope of the regression line fitted to
non-null (e1, e2) pairs.
The REGR_INTERCEPT function computes the y-intercept of the regression line.
REGR_INTERCEPT returns NULL whenever slope or the regression averages are
NULL.
REGR_R2
The REGR_R2 function computes the coefficient of determination (usually called
"R-squared" or "goodness of fit") for the regression line.
See Also: Oracle9i SQL Reference for further information regarding
syntax and semantics
REGR_R2 returns values between 0 and 1 when the regression line is defined (slope
of the line is not null), and it returns NULL otherwise. The closer the value is to 1,
the better the regression line fits the data.
REGR_SXX, REGR_SYY, and REGR_SXY
REGR_SXX, REGR_SYY and REGR_SXY functions are used in computing various
diagnostic statistics for regression analysis. After eliminating (e1, e2) pairs where
either of e1 or e2 is null, these functions make the following computations:
REGR_SXX: REGR_COUNT(e1,e2) * VAR_POP(e2)
REGR_SYY: REGR_COUNT(e1,e2) * VAR_POP(e1)
REGR_SXY: REGR_COUNT(e1,e2) * COVAR_POP(e1, e2)
Linear Regression Statistics Examples
Some common diagnostic statistics that accompany linear regression analysis are
given in Table 192, "Common Diagnostic Statistics and Their Expressions". Note
that Oracle's new functions allow you to calculate all of these.
Table 192 Common Diagnostic Statistics and Their Expressions
Type of Statistic Expression
Adjusted R2 1-((1 - REGR_R2)*((REGR_COUNT-1)/(REGR_
COUNT-2)))
Standard error SQRT((REGR_SYY-(POWER(REGR_SXY,2)/REGR_
SXX))/(REGR_COUNT-2))
Total sum of squares REGR_SYY
Regression sum of squares POWER(REGR_SXY,2) / REGR_SXX
Residual sum of squares REGR_SYY - (POWER(REGR_SXY,2)/REGR_SXX)
t statistic for slope REGR_SLOPE * SQRT(REGR_SXX) / (Standard error)
t statistic for y-intercept REGR_INTERCEPT / ((Standard error)
*
SQRT((1/REGR_COUNT)+(POWER(REGR_
AVGX,2)/REGR_SXX))
17.7.29 Sample Linear Regression Calculation
In this example, we compute an ordinary-least-squares regression line that
expresses the quantity sold of a product as a linear function of the product's list
price. The calculations are grouped by sales channel. The values SLOPE, INTCPT,
RSQR are slope, intercept, and coefficient of determination of the regression line,
respectively. The (integer) value COUNT is the number of products in each channel
for whom both quantity sold and list price data are available.
SELECT s.channel_id,
REGR_SLOPE(s.quantity_sold, p.prod_list_price) SLOPE,
REGR_INTERCEPT(s.quantity_sold, p.prod_list_price) INTCPT,
REGR_R2(s.quantity_sold, p.prod_list_price) RSQR,
REGR_COUNT(s.quantity_sold, p.prod_list_price) COUNT,
REGR_AVGX(s.quantity_sold, p.prod_list_price) AVGLISTP,
REGR_AVGY(s.quantity_sold, p.prod_list_price) AVGQSOLD
FROM sales s, products p
WHERE s.prod_id=p.prod_id
AND p.prod_category='Men' AND s.time_id=to_DATE('10-OCT-2000')
GROUP BY s.channel_id;
C SLOPE INTCPT RSQR COUNT AVGLISTP AVGQSOLD
- --------- --------- --------- --------- --------- ---------
C -.0683687 16.627808 .05134258 20 65.495 12.15
I .0197103 14.811392 .00163149 46 51.480435 15.826087
P -.0124736 12.854546 .01703979 30 81.87 11.833333
S .00615589 13.991924 .00089844 83 69.813253 14.421687
T -.0041131 5.2271721 .00813224 27 82.244444 4.8888889
Inverse Percentile Functions
Using the CUME_DIST function, you can find the cumulative distribution
(percentile) of a set of values. However, the inverse operation (finding what value
computes to a certain percentile) is neither easy to do nor efficiently computed. To
overcome this difficulty, Oracle introduced the PERCENTILE_CONT and
PERCENTILE_DISC functions. These can be used both as window reporting
functions as well as normal aggregate functions.
39
These functions need a sort specification and a parameter that takes a percentile
value between 0 and 1. The sort specification is handled by using an ORDER BY
clause with one expression. When used as a normal aggregate function, it returns a
single value for each ordered set.
PERCENTILE_CONT, which is a continuous function computed by interpolation,
and PERCENTILE_DISC, which is a step function that assumes discrete values. Like
other aggregates, PERCENTILE_CONT and PERCENTILE_DISC operate on a group
of rows in a grouped query, but with the following differences:
They require a parameter between 0 and 1 (inclusive). A parameter specified
out of this range will result in error. This parameter should be specified as an
expression that evaluates to a constant.
They require a sort specification. This sort specification is an ORDER BY clause
with a single expression. Multiple expressions are not allowed.
Normal Aggregate Syntax
[PERCENTILE_CONT | PERCENTILE_DISC]( constant expression )
WITHIN GROUP ( ORDER BY single order by expression
[ASC|DESC] [NULLS FIRST| NULLS LAST])
Inverse Percentile Example Basis
We use the following query to return the 17 rows of data used in the examples of
this section:
SELECT cust_id, cust_credit_limit, CUME_DIST()
OVER (ORDER BY cust_credit_limit) AS CUME_DIST
FROM customers
WHERE cust_city='Marshal';
CUST_ID CUST_CREDIT_LIMIT CUME_DIST
--------- ----------------- ---------
171630 1500 .23529412
346070 1500 .23529412
420830 1500 .23529412
383450 1500 .23529412
165400 3000 .35294118
227700 3000 .35294118
28340 5000 .52941176
215240 5000 .52941176
364760 5000 .52941176
184090 7000 .70588235
370990 7000 .70588235
408370 7000 .70588235
121790 9000 .76470588
22110 11000 .94117647
246390 11000 .94117647
40800 11000 .94117647
464440 15000 1
PERCENTILE_DISC(x) is computed by scanning up the CUME_DIST values in each
group till you find the first one greater than or equal to x, where x is the specified
percentile value. For the example query where PERCENTILE_DISC(0.5), the result
is 5,000, as the following illustrates:
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP
(ORDER BY cust_credit_limit) AS perc_disc,
PERCENTILE_CONT(0.5) WITHIN GROUP
(ORDER BY cust_credit_limit) AS perc_cont
FROM customers WHERE cust_city='Marshal';
PERC_DISC PERC_CONT
--------- ---------
5000 5000
The result of PERCENTILE_CONT is computed by linear interpolation between rows
after ordering them. To compute PERCENTILE_CONT(x), we first compute the row
number = RN= (1+x*(n-1)), where n is the number of rows in the group and x is the
specified percentile value. The final result of the aggregate function is computed by
linear interpolation between the values from rows at row numbers CRN =
CEIL(RN) and FRN = FLOOR(RN).
The final result will be: PERCENTILE_CONT(X) = if (CRN = FRN = RN), then
(value of expression from row at RN) else (CRN - RN) * (value of expression for row
at FRN) + (RN -FRN) * (value of expression for row at CRN).
Consider the previous example query, where we compute PERCENTILE_
CONT(0.5). Here n is 17. The row number RN = (1 + 0.5*(n-1))= 9 for both groups.
Putting this into the formula, (FRN=CRN=9), we return the value from row 9 as the
result.
Another example is, if you want to compute PERCENTILE_CONT(0.66). The
computed row number RN=(1 + 0.66*(n-1))= (1 + 0.66*16)= 11.67. PERCENTILE_
CONT(0.66) = (12-11.67)*(value of row 11)+(11.67-11)*(value of row 12). These results
are:
SELECT PERCENTILE_DISC(0.66) WITHIN GROUP
(ORDER BY cust_credit_limit) AS perc_disc,
PERCENTILE_CONT(0.66) WITHIN GROUP
(ORDER BY cust_credit_limit) AS perc_cont
FROM customers WHERE cust_city='Marshal';
Inverse Percentile Functions
SQL for Analysis in Data Warehouses 19-37
PERC_DISC PERC_CONT
--------- ---------
7000 7000
Inverse percentile aggregate functions can appear in the HAVING clause of a query
like other existing aggregate functions.
As Reporting Aggregates
You can also use the aggregate functions PERCENTILE_CONT, PERCENTILE_DISC
as reporting aggregate functions. When used as reporting aggregate functions, the
syntax is similar to those of other reporting aggregates.
[PERCENTILE_CONT | PERCENTILE_DISC](constant expression)
WITHIN GROUP ( ORDER BY single order by expression
[ASC|DESC] [NULLS FIRST| NULLS LAST])
OVER ( [PARTITION BY value expression [,...]] )
This query computes the same thing (median credit limit for customers in this result
set, but reports the result for every row in the result set, as shown in the following
output:
SELECT cust_id, cust_credit_limit,
PERCENTILE_DISC(0.5) WITHIN GROUP
(ORDER BY cust_credit_limit) OVER () AS perc_disc,
PERCENTILE_CONT(0.5) WITHIN GROUP
(ORDER BY cust_credit_limit) OVER () AS perc_cont
FROM customers WHERE cust_city='Marshal';
CUST_ID CUST_CREDIT_LIMIT PERC_DISC PERC_CONT
--------- ----------------- --------- ---------
171630 1500 5000 5000
346070 1500 5000 5000
420830 1500 5000 5000
383450 1500 5000 5000
165400 3000 5000 5000
227700 3000 5000 5000
28340 5000 5000 5000
215240 5000 5000 5000
364760 5000 5000 5000
184090 7000 5000 5000
370990 7000 5000 5000
408370 7000 5000 5000
121790 9000 5000 5000
22110 11000 5000 5000
246390 11000 5000 5000
40800 11000 5000 5000
464440 15000 5000 5000
Inverse Percentile Restrictions
For PERCENTILE_DISC, the expression in the ORDER BY clause can be of any data
type that you can sort (numeric, string, date, and so on). However, the expression in
the ORDER BY clause must be a numeric or datetime type (including intervals)
because linear interpolation is used to evaluate PERCENTILE_CONT. If the
expression is of type DATE, the interpolated result is rounded to the smallest unit
for the type. For a DATE type, the interpolated value will be rounded to the nearest
second, for interval types to the nearest second (INTERVAL DAY TO SECOND) or to
the month(INTERVAL YEAR TO MONTH).
Like other aggregates, the inverse percentile functions ignore NULLs in evaluating
the result. For example, when you want to find the median value in a set, Oracle
ignores the NULLs and finds the median among the non-null values. You can use
the NULLS FIRST/NULLS LAST option in the ORDER BY clause, but they will be
ignored as NULLs are ignored.
Hypothetical Rank and Distribution Functions
These functions provide functionality useful for what-if analysis. As an example,
what would be the rank of a row, if the row was hypothetically inserted into a set of
other rows?
This family of aggregates takes one or more arguments of a hypothetical row and an
ordered group of rows, returning the RANK, DENSE_RANK, PERCENT_RANK or
CUME_DIST of the row as if it was hypothetically inserted into the group.
Hypothetical Rank and Distribution Syntax
[RANK | DENSE_RANK | PERCENT_RANK | CUME_DIST]( constant
expression [, ...] )
WITHIN GROUP ( ORDER BY order by expression [ASC|DESC] [NULLS
FIRST|NULLS
LAST][, ...] )
Here, constant expression refers to an expression that evaluates to a constant,
and there may be more than one such expressions that are passed as arguments to
the function. The ORDER BY clause can contain one or more expressions that define
the sorting order on which the ranking will be based. ASC, DESC, NULLS FIRST,
NULLS LAST options will be available for each expression in the ORDER BY.
Hypothetical Rank and Distribution Functions
SQL for Analysis in Data Warehouses 19-39
Example 1913 Hypothetical Rank and Distribution Example 1
Using the list price data from the products table used throughout this section, you
can calculate the RANK, PERCENT_RANK and CUME_DIST for a hypothetical sweater
with a price of $50 for how it fits within each of the sweater subcategories. The
query and results are:
SELECT prod_subcategory,
RANK(50) WITHIN GROUP (ORDER BY prod_list_price DESC) AS HRANK,
TO_CHAR(PERCENT_RANK(50) WITHIN GROUP
(ORDER BY prod_list_price),'9.999') AS HPERC_RANK,
TO_CHAR(CUME_DIST (50) WITHIN GROUP
(ORDER BY prod_list_price),'9.999') AS HCUME_DIST
FROM products
WHERE prod_subcategory LIKE 'Sweater%'
GROUP BY prod_subcategory;
PROD_SUBCATEGORY HRANK HPERC_RANK HCUME_DIST
---------------- ----- ---------- ----------
Sweaters - Boys 16 .911 .912
Sweaters - Girls 1 1.000 1.000
Sweaters - Men 240 .351 .352
Sweaters - Women 21 .783 .785
Unlike the inverse percentile aggregates, the ORDER BY clause in the sort
specification for hypothetical rank and distribution functions may take multiple
expressions. The number of arguments and the expressions in the ORDER BY clause
should be the same and the arguments must be constant expressions of the same or
compatible type to the corresponding ORDER BY expression. The following is an
example using two arguments in several hypothetical ranking functions.
Example 1914 Hypothetical Rank and Distribution Example 2
SELECT prod_subcategory,
RANK(45,30) WITHIN GROUP (ORDER BY prod_list_price
DESC,prod_min_price) AS
HRANK,
TO_CHAR(PERCENT_RANK(45,30) WITHIN GROUP
(ORDER BY prod_list_price, prod_min_price),'9.999') AS
HPERC_RANK,
TO_CHAR(CUME_DIST (45,30) WITHIN GROUP
(ORDER BY prod_list_price, prod_min_price),'9.999') AS HCUME_DIST
FROM products
40
WHERE prod_subcategory
LIKE 'Sweater%'
GROUP BY prod_subcategory;
WIDTH_BUCKET Function
19-40 Oracle9 i Data Warehousing Guide
PROD_SUBCATEGORY HRANK HPERC_RANK HCUME_DIST
---------------- ----- ---------- ----------
Sweaters - Boys 21 .858 .859
Sweaters - Girls 1 1.000 1.000
Sweaters - Men 340 .079 .081
Sweaters - Women 72 .228 .237
These functions can appear in the HAVING clause of a query just like other
aggregate functions. They cannot be used as either reporting aggregate functions or
windowing aggregate functions.
17.7.30 WIDTH_BUCKET Function
For a given expression, the WIDTH_BUCKET function returns the bucket number
that the result of this expression will be assigned after it is evaluated. You can
generate equiwidth histograms with this function. Equiwidth histograms divide
data sets into buckets whose interval size (highest value to lowest value) is equal.
The number of rows held by each bucket will vary. A related function, NTILE,
creates equiheight buckets.
Equiwidth histograms can be generated only for numeric, date or datetime types.
So the first three parameters should be all numeric expressions or all date
expressions. Other types of expressions are not allowed. If the first parameter is
NULL, the result is NULL. If the second or the third parameter is NULL, an error
message is returned, as a NULL value cannot denote any end point (or any point) for
a range in a date or numeric value dimension. The last parameter (number of
buckets) should be a numeric expression that evaluates to a positive integer value;
0, NULL, or a negative value will result in an error.
Buckets are numbered from 0 to (n+1). Bucket 0 holds the count of values less than
the minimum. Bucket(n+1) holds the count of values greater than or equal to the
maximum specified value.
WIDTH_BUCKET Syntax
The WIDTH_BUCKET takes four expressions as parameters. The first parameter is the
expression that the equiwidth histogram is for. The second and third parameters are
expressions that denote the end points of the acceptable range for the first
parameter. The fourth parameter denotes the number of buckets.
WIDTH_BUCKET(expression, minval expression, maxval expression,
num buckets)
WIDTH_BUCKET Function
SQL for Analysis in Data Warehouses 19-41
Consider the following data from table customers, that shows the credit limits of
17 customers. This data is gathered in the query shown in Example 1915 on
page 19-42.
CUST_ID CUST_CREDIT_LIMIT
-------- -----------------
22110 11000
28340 5000
40800 11000
121790 9000
165400 3000
171630 1500
184090 7000
215240 5000
227700 3000
246390 11000
346070 1500
364760 5000
370990 7000
383450 1500
408370 7000
420830 1500
464440 15000
In the table customers, the column cust_credit_limit contains values between
1500 and 15000, and we can assign the values to four equiwidth buckets, numbered
from 1 to 4, by using WIDTH_BUCKET (cust_credit_limit, 0, 20000, 4).
Ideally each bucket is a closed-open interval of the real number line, for example,
bucket number 2 is assigned to scores between 5000.0000 and 9999.9999...,
sometimes denoted [5000, 10000) to indicate that 5,000 is included in the interval
and 10,000 is excluded. To accommodate values outside the range [0, 20,000), values
less than 0 are assigned to a designated underflow bucket which is numbered 0, and
values greater than or equal to 20,000 are assigned to a designated overflow bucket
which is numbered 5 (num buckets + 1 in general). See Figure 193 for a graphical
illustration of how the buckets are assigned.
You can specify the bounds in the reverse order, for example, WIDTH_BUCKET
(cust_credit_limit, 20000, 0, 4). When the bounds are reversed, the buckets
will be open-closed intervals. In this example, bucket number 1 is (15000,20000],
bucket number 2 is (10000,15000], and bucket number 4, is (0,5000]. The
overflow bucket will be numbered 0 (20000, +infinity), and the underflow
bucket will be numbered 5 (-infinity, 0].
It is an error if the bucket count parameter is 0 or negative.
Example 1915 WIDTH_BUCKET
The following query shows the bucket numbers for the credit limits in the
customers table for both cases where the boundaries are specified in regular or
reverse order. We use a range of 0 to 20,000.
SELECT cust_id, cust_credit_limit,
WIDTH_BUCKET(cust_credit_limit,0,20000,4) AS WIDTH_BUCKET_UP,
WIDTH_BUCKET(cust_credit_limit,20000, 0, 4) AS WIDTH_BUCKET_DOWN
FROM customers WHERE cust_city = 'Marshal';
CUST_ID CUST_CREDIT_LIMIT WIDTH_BUCKET_UP WIDTH_BUCKET_DOWN
------- ----------------- --------------- -----------------
22110 11000 3 2
28340 5000 2 4
40800 11000 3 2
121790 9000 2 3
165400 3000 1 4
171630 1500 1 4
184090 7000 2 3
215240 5000 2 4
227700 3000 1 4
246390 11000 3 2
346070 1500 1 4
364760 5000 2 4
370990 7000 2 3
0 5000 10000 15000 20000
0 1 2 3 4 5
Bucket #
Credit Limits
User-Defined Aggregate Functions
SQL for Analysis in Data Warehouses 19-43
383450 1500 1 4
408370 7000 2 3
420830 1500 1 4
464440 15000 4 2
User-Defined Aggregate Functions
Oracle offers a facility for creating your own functions, called user-defined
aggregate functions. These functions are written in programming languages such as
PL/SQL, Java, and C, and can be used as analytic functions or aggregates in
materialized views.
The advantages of these functions are:
Highly complex functions can be programmed using a fully procedural
language.
Higher scalability than other techniques when user-defined functions are
programmed for parallel processing.
Object datatypes can be processed.
As a simple example of a use40-defined aggregate function, consider the skew
statistic. This calculation measures if a data set has a lopsided distribution about its
mean. It will tell you if one tail of the distribution is significantly larger than the
other. If you created a user-defined aggregate called udskew and applied it to the
credit limit data in the prior example, the SQL statement and results might look like
this:
SELECT USERDEF_SKEW(cust_credit_limit)
FROM customers WHERE cust_city='Marshal';
USERDEF_SKEW
============
0.583891
17.8 Restrictions
You cannot specify any analytic function in any part of the analytic_clause.
That is, you cannot nest analytic functions. However, you can specify an analytic
function in a subquery and compute another analytic function over it.
17.9 Features by Release
Introduced in Oracle 8.1.6
17.10 Bugs by Release

You might also like