Wait Events2
Wait Events2
INTRODUCTION
In early 2002 Roger Schrag published a paper entitled, Interpreting Wait Events to Boost
System Performance. That paper began with a definition of wait events and techniques for
collecting wait event information, and then went on to discuss four examples of how wait event
data had been used to solve real-life performance problems. The impetus for writing that paper
had been that, while many resources were available that listed the wait event v$ views, we
were not aware of any publication that used real-life examples to demonstrate how to use the
wait event interface to solve problems or boost system performance.
As we write this paper in early 2003, we still are not aware of any such publications. Now
Roger Schrag has teamed up with fellow Oracle DBA Terry Sutton to document more examples
of how Oracles wait event interface can be used to solve real-world problems. In essence, this
paper is a sequel to last years paper. If you havent read the first one, you may view it at
http://www.dbspecialists.com/presentations.html. You can read the two papers in either order.
In this paper, we will once again define what wait events are and how to collect wait event
information. If you have read the previous paper or are already familiar with the wait event
interface, we still encourage you to read this section as weve added some enhancements and
new details regarding Oracle 9i. In the second part of this paper we'll move on to new
examples of how wait event information paved the way to solving real-world performance
problems. Our feeling is that the more examples of wait event usage you are exposed to, the
better position youll be in to put the wait event interface to work for you when you most need
it.
The wait event interface appeared in the first production release of Oracle 7. For many years it
remained undocumented, and few people outside the kernel group at Oracle Corporation knew
of its existence. By Oracle 7.3 the word was out, and the documentation that comes with Oracle
8.0 and later gives a pretty good explanation of the wait event interface. In Oracle 9i, wait
event tracing has been made much easier with the inclusion of the DBMS_SUPPORT package
in the kernel, and with the new wait events reporting feature of TKPROF.
The basic idea behind the wait event interface hasn't changed since its inception, but Oracle
continues to add new wait events as they add new functionality to the Oracle kernel. Also,
newer enhancements make the wait event interface easier to use.
-- 1 --
In this section we will explain what wait events are and why the data provided by the wait
event interface can be helpful.
At any given moment, every Oracle process is either busy servicing a request or waiting for
something specific to happen. By busy we mean that the process wishes to use the CPU. For
example, a dedicated server process might be searching the buffer cache to see if a certain
data block is in memory. This process would be said to be busy and not waiting. The ARC0
process, meanwhile, might be waiting for LGWR to signal that an online redo log needs
archiving. In this case, the ARC0 process is waiting.
The kernel developers at Oracle have defined a list of every possible event that an Oracle
process could be waiting for. In Oracle 8.0 there were 158 such wait events. In Oracle 9i
Release 2 there are 363. At any moment, every Oracle process that is not busy is waiting for
one of these events to occur. Suppose an application has submitted a COMMIT statement and
the server process is waiting for the LGWR process to signal that the redo log buffer has been
flushed to disk. This wait event is called log file sync. Another dedicated server process
might be waiting for a row-level lock on the INVOICES table to be freed so that a SELECT FOR
UPDATE statement can continue. That wait event is called enqueue.
It is very beneficial that Oracle is diligent about tracking wait event information and making it
available to DBAs. We call this the wait event interface. By querying v$ views, we can see
what events processes are waiting on to an amazing level of detail. For example, we might
learn that a dedicated server process has been waiting 30 milliseconds for the operating
system to read eight blocks from data file 42, starting at block 18042. We can also see
summary information of how much time each Oracle process has spent waiting on each type of
wait event for the duration of the process. In addition, we can direct an Oracle process to write
detailed wait event data to a trace file for later analysis using TKPROF.
Using the wait event interface, you can get insights into where time is being spent. If a report
takes four hours to complete, for example, the wait event interface will tell you how much of
that four hours was spent waiting for disk reads caused by full table scans, disk reads caused
by index lookups, latch contention, and so on.
The wait event interface provides much more information to work with than cache hit ratios do.
The wait event interface gives both breadth and depth in the information it provides. You get
data that can touch upon so many different areas of your database such as disk I/O, latches,
parallel processing, network traffic, checkpoints, and row-level locking. At the same time, you
get incredible detail such as the file number and block number of a block being read from disk,
or the name of a latch being waited on along with the number of retries.
-- 2 --
The wait event interface will not always give you all of the information you need in order to
diagnose and solve a problem, but it will certainly point you in the right direction. You might
think the buffer cache is too small because the cache hit ratio is only 70%, but in fact, the
application's slow response time could be caused by latch contention in the shared pool, a
bottleneck in the log writer, or any of a number of other things.
Although there are many different types of wait events, the majority of them come up very
infrequently or tend not to be significant. In practice, only a few dozen wait events tend to be of
interest to most DBAs. The rest are rather obscure, pertain to Oracle features not in use, or
occur so infrequently that you don't need to worry about them. You'll see different wait events
surfacing in different environments based on which Oracle features have been implemented
and which capabilities of the database are being taxed the most.
For example, the PX wait events won't appear if you aren't using parallel query, and the virtual
circuit status wait event won't appear if you are not using the multi-threaded server
architecture (or shared server architecture as it is more recently called). Along those lines, the
log file sync and enqueue wait events probably wont be prevalent in a primarily read-only
system.
Here are some of the most common wait events and what they mean:
-- 3 --
db file sequential The session has issued an I/O request to read one block from a data file
read into the buffer cache and is waiting for the operation to complete. This
typically happens during an index lookup or a fetch from a table by
ROWID when the required data block is not already in memory.
direct path read, The session has issued asynchronous I/O requests that bypass the
direct path write buffer cache and is waiting for them to complete. These wait events
often involve temporary segments.
enqueue The session is waiting on an enqueue (a lock you can see in v$lock).
This commonly occurs when one user is trying to update a row in a
table that is currently being updated by another user.
free buffer waits The session needs a free buffer so it can bring a data block into the
buffer cache and is waiting for a buffer that is not dirty to become
available. This can occur if DBWR is not writing dirty buffers to disk
fast enough.
latch free The session is waiting for a latch held by another session. (This event
does not apply to processes that are spinning while waiting for a latch;
when a process is spinning, it is not waiting.)
library cache load The session is waiting for the opportunity to load an object or a piece
lock of an object into the library cache. (Only one process can load an
object or a piece of an object at a time.)
library cache pin The session wants to pin an object in memory in the library cache for
examination, ensuring no other processes can update the object at the
same time. This happens when you are compiling or parsing a PL/SQL
object or a view.
log buffer space The session is waiting for space in the log buffer. (Space becomes
available only after LGWR has written the current contents of the log
buffer to disk.) This typically happens when applications generate redo
faster than LGWR can write it to disk.
log file parallel The session is waiting for blocks to be written to all online redo log
write members in one group. LGWR is typically the only process to see this
wait event. It will wait until all blocks have been written to all
members.
log file sequential The session is waiting for blocks to be read from the online redo log
read into memory. This primarily occurs at instance startup and when the
ARCH process archives filled online redo logs.
log file switch The session is waiting for a log file switch to complete, typically so
completion more redo can be generated.
log file sync The session is waiting for LGWR to finish flushing the log buffer to
disk. This occurs when a user commits a transaction. (A transaction is
not considered committed until all of the redo to recover the
transaction has been successfully written to disk.)
undo segment The session is waiting for an undo segment to be extended or shrunk.
extension
write complete The session is waiting for a requested buffer to be written to disk; the
waits buffer cannot be used while it is being written.
-- 4 --
There are several wait events that we call idle events because each of these wait events
typically occurs when the Oracle process has nothing to do and is waiting for somebody to give
it a task. Idle events are usually not very interesting from a tuning standpoint, so we usually
overlook them when evaluating data extracted from the wait event interface. The common idle
events are as follows:
Idle Events
client message PX Deq: Execute Reply
dispatcher timer PX Deq: Execution Msg
gcs for action PX Deq: Signal ACK
gcs remote message PX Deq: Table Q Normal
ges remote message PX Deque wait
i/o slave wait PX Idle Wait
jobq slave wait queue messages
lock manager wait for remote rdbms ipc message
message
null event slave wait
parallel query dequeue smon timer
pipe get SQL*Net message from client
PL/SQL lock timer SQL*Net message to client
pmon timer SQL*Net more data from
client
PX Deq Credit: need buffer virtual circuit status
PX Deq Credit: send blkd wakeup time manager
If an Oracle process has work to do but must wait for something to happen before it can
continue, then the process will be waiting on a non-idle wait event. If a process has nothing to
do, it will be waiting on an idle wait event. So what happens if a process has work to do and is
busy doing it? When a process is busy, there will be no information in the wait event interface
since the process is not waiting.
When we look at the wait event information extracted from an Oracle instance, we see detailed
information about how many times and how much time was spent waiting for specific events to
occur. But we do not see anything about the time periods in which the process requested use of
the CPU. This means that the wait event interface is not able to provide information about the
following:
-- 5 --
Time spent waiting for a CPU to become available
Time spent waiting for requested memory to be swapped back in to physical memory
This is important to keep in mind because there is an easy trap to fall into. You could be
troubleshooting a very slow SELECT statement and learn from the wait event interface that the
process does not have significant wait events when running the query. This could lead you to
think that the statement is optimal, and that it just takes a long time to run. In fact, however,
the query might be performing huge numbers of logical reads and the number of buffer gets
could be reduced by rewriting the query.
When Oracle needs to access a data block and the block is already in the buffer cache, a logical
read occurs with no physical read. The process is able to read the block without the occurrence
of any wait events. Large amounts of CPU time could be consumed on significant numbers of
logical reads, and the wait event interface will have nothing to show for the elapsed time.
Statement parsing and spinning while waiting for a latch to become available are two other
examples of activities not accounted for by the wait event interface. An Oracle process uses
CPU time while parsing a statement or spinning on a busy latch; since no waiting is involved,
the wait event interface does not have anything to report.
Oracle Corporation has done something interesting with Statspack recently. Beginning in
Oracle 9i, the portion of the Statspack report that lists wait event information also lists CPU
usage. This is very helpful information. It allows us to easily compare time spent waiting to
time spent processing, so we know where to focus our tuning efforts. However, it should be
pointed out that the CPU usage information in this section of the Statspack report does not
come from the wait event interface. Instead, Statspack merges data collected from the wait
event interface with CPU usage information collected from the v$sysstat dynamic performance
view.
The Oracle kernel is capable of timing many activities including wait events. In Oracle 8i and
earlier, timed statistics can be tracked to a resolution of one centisecond (0.01 second) but are
disabled by default. Beginning in Oracle 9i Release 2 timed statistics are collected by default,
and some timings are collected in units of microseconds. The timed_statistics instance
parameter is used to enable and disable timed statistics collection. When timed_statistics is set
to FALSE, all times in the wait event interface will appear as zero. You may enable timed
statistics collection at the instance or the session level with the following commands:
ALTERSYSTEMSETtimed_statistics=TRUE;
ALTERSESSIONSETtimed_statistics=TRUE;
You may enable timed statistics at the instance level on the next and all subsequent instance
restarts by adding the following line to the instance parameter file:
-- 6 --
timed_statistics=true
In practice, the overhead of collecting timed statistics is extremely small. In most cases, the
benefit you'll get from having timing information at your disposal will outweigh the
performance overhead. For several years, many DBAs have been running their production
systems with timed statistics enabled at all times. With Oracle 9i Release 2, this is now the
default behavior.
The wait event interface consists of four dynamic performance views (also known as v$
views) and a tracing facility. Any user with the SELECT ANY TABLE system privilege or the
SELECT_CATALOG_ROLE role can query the v$ views. Only users who can connect to the
database as SYSDBA or execute the DBMS_SUPPORT or DBMS_SYSTEM packages can activate
wait event tracing in other database sessions. In a typical environment, DBAs have access to
the wait event interface but general users do not.
The v$system_event and v$session_event views provide cumulative wait event information for
the instance as a whole and for each process, respectively. The v$session_wait view provides
detailed information about the active or most recent wait event for each process. The contents
of the v$event_name view, meanwhile, do not change. This view lists all wait events built into
the Oracle kernel and the parameters for each one.
In addition to the four dynamic performance views, Oracle provides a tracing facility where
extremely detailed wait event information is written to a trace file for later evaluation. In more
recent releases of Oracle, a PL/SQL package is provided for activating the wait event tracing
facility. In older releases of Oracle, the wait event tracing facility is activated by setting debug
event number 10046 (not to be confused with a wait event).
The v$system_event view shows one row for each wait event name, along with the total number
of times a process has waited for this event, the number of timeouts, the total amount of time
waited, and the average wait time. All of these figures are cumulative for all Oracle processes
since the instance started. Wait events that have not occurred at least once since instance
startup do not appear in this view.
-- 7 --
The v$system_event view has the following columns:
NameNull?Type
EVENTVARCHAR2(64)
TOTAL_WAITSNUMBER
TOTAL_TIMEOUTSNUMBER
TIME_WAITEDNUMBER
AVERAGE_WAITNUMBER
TIME_WAITED_MICRONUMBER
EVENT is the name of the wait event. You can see a list of all wait event names known to your
Oracle kernel with the following query:
SELECTnameFROMv$event_name;
TOTAL_WAITS is the total number of times a process has waited for this event since instance
startup. This includes daemon processes like PMON and SMON, in addition to dedicated server
and shared server processes. It also includes processes from database sessions that have
subsequently ended.
TOTAL_TIMEOUTS is the total number of times a process encountered a timeout while waiting
for an event. When a process begins to wait for an event, it specifies a timeout period after
which the operating system should wake it up if the event has not yet transpired. For example,
when an Oracle process issues an I/O request to read a block from a data file (the db file
sequential read wait event), the process sets a timeout of one second. Usually the I/O request
will complete in less than one second and no timeout will occur. But if the read should take
longer than one second for whatever reason, a timeout will occur and the process will wake up.
The process might do some minor housekeeping, but it will likely just begin another timeout
period of one second and continue waiting for the same event.
TIME_WAITED and AVERAGE_WAIT show the cumulative and average time spent by processes
waiting for this event, in centiseconds. Divide these figures by 100 in order to get the wait time
in seconds. These two columns will show as zero if timed statistics are not enabled.
SQL>SELECTevent,time_waited
2FROMv$system_event
3WHEREeventIN('SQL*Netmessagefromclient','smontimer',
4'dbfilesequentialread','logfileparallelwrite');
EVENTTIME_WAITED
-- 8 --
logfileparallelwrite159692
dbfilesequentialread28657
smontimer130673837
SQL*Netmessagefromclient16528989
SQL>
Since instance startup, processes on this system have waited a total of 286.57 seconds while
reading single data file blocks from disk, and over 1,596 seconds (26 minutes) while writing
redo to the online redo logs. A huge amount of time has been spent waiting on the smon timer
and SQL*Net message from client events, but these are both idle wait events so we don't worry
about them. (The SMON process spends a lot of time sleeping between consecutive checks of
the system, and many dedicated server processes spend a lot of their time waiting for the
application to submit a SQL statement for processing.)
The v$session_event view is a lot like the v$system_event view, except that it shows separate
rows of information for each Oracle process. As with v$system_event, event names do not
appear in this view if the process has not waited for them at least once. Also, when an Oracle
process terminates (as in the case of when a user logs off the database) all of the rows in
v$session_event for that process permanently disappear.
NameNull?Type
SIDNUMBER
EVENTVARCHAR2(64)
TOTAL_WAITSNUMBER
TOTAL_TIMEOUTSNUMBER
TIME_WAITEDNUMBER
AVERAGE_WAITNUMBER
MAX_WAITNUMBER
TIME_WAITED_MICRONUMBER
SID indicates the session ID of the process waiting for the event. You can query v$session in
order to determine the SID of the session whose wait events you want to investigate. The next
five columns in the v$session_event view are the same as in the v$system_event view, except
that now they pertain to the one specific process instead of all processes.
MAX_WAIT indicates the maximum amount of time the process had to wait for the event. Like
TIME_WAITED and AVERAGE_WAIT, the unit of measure is centiseconds and will display as
zero if timed statistics are not enabled.
-- 9 --
Consider the following query which displays all wait event information for the current
SQL*Plus session:
SQL>SELECTevent,total_waits,time_waited_micro,max_wait
2FROMv$session_event
3WHERESID=
4(SELECTsidFROMv$session
5WHEREaudsid=USERENV('sessionid'));
EVENTTOTAL_WAITSTIME_WAITED_MICROMAX_WAIT
logfilesync1196292
dbfilesequentialread3261281
dbfilescatteredread67811546328
SQL*Netmessagetoclient763520
SQL*Netmoredatatoclient1630
SQL*Netmessagefromclient75111865627926930
SQL*Netbreak/resettoclient4222812
SQL>
You can see that the Oracle process serving this session has spent 1.180760 seconds waiting
for disk I/O. Although there have been 76 instances where the process waited for the
networking software to allow it to send a message to the client (the SQL*Plus program), each
of these waits was shorter than the 0.01 second resolution of the MAX_WAIT column in
v$session_event. (All 76 waits combined added up to only 0.000352 seconds.) Far and away, the
Oracle process has spent the vast majority of its time (over 18 minutes) waiting for a SQL
statement to be entered at the SQL*Plus prompt.
If you run the query against v$session_event shown above and get back no rows, it is possible
that you are encountering Oracle bug number 2429929. In some releases of Oracle 9i,
including 9.2.0.1, the SID column in v$session_event actually shows the SID minus one instead
of the SID.
NameNull?Type
EVENT#NUMBER
NAMEVARCHAR2(64)
PARAMETER1VARCHAR2(64)
PARAMETER2VARCHAR2(64)
PARAMETER3VARCHAR2(64)
-- 10 --
EVENT# indicates a unique number assigned to each wait event name. These numbers
frequently change from one release of Oracle to the next, so it is better to reference wait
events by their names and not their numbers. Wait event names appear to stay consistent
across Oracle releases.
The v$system_event and v$session_event views show cumulative information about past waits
in summary form, leaving out parameter information from each individual wait. As we will see
in the next sections, wait event parameters come into play in the v$session_wait view and wait
event trace files.
The v$session_wait view shows one row for each Oracle process. The row indicates the name of
the wait event and additional parameters that provide further information about exactly what
the process is waiting for (or information about the most recent wait event if the process is not
currently waiting). While the v$system_event and v$session_event views show cumulative wait
event information, the v$session_wait view shows information as of the present moment only.
NameNull?Type
SIDNUMBER
SEQ#NUMBER
EVENTVARCHAR2(64)
P1TEXTVARCHAR2(64)
P1NUMBER
P1RAWRAW(8)
P2TEXTVARCHAR2(64)
P2NUMBER
P2RAWRAW(8)
P3TEXTVARCHAR2(64)
P3NUMBER
P3RAWRAW(8)
WAIT_TIMENUMBER
SECONDS_IN_WAITNUMBER
STATEVARCHAR2(19)
SID indicates the process. SEQ# is a sequentially increasing number that starts at one for each
process and increments each time the process begins a new wait.
The STATE column indicates how we should interpret the data in this row of the view. If the
value in the STATE column is WAITING, then the process is currently waiting for an event. In
this case, we can see information about the event and how long the process has been waiting
so far. Otherwise, the process is currently not waiting, but we can see information about the
last event that the process waited for.
-- 11 --
EVENT is the name of a wait event. P1TEXT is the name of a parameter for the wait event, P1
is the value of the parameter, and P1RAW is the value in binary form. The P2 and P3 columns
provide additional parameter information.
When the value in the STATE column is WAITING, the value in the WAIT_TIME column will be
zero and SECONDS_IN_WAIT will show the number of seconds the process has been waiting
for the event thus far. Note that SECONDS_IN_WAIT shows the time in seconds, not
centiseconds or microseconds.
When the value in the STATE column is WAITED KNOWN TIME, WAIT_TIME will show the
length of the last wait (in centiseconds) and SECONDS_IN_WAIT will not be relevant. (It
appears to be the number of seconds since the last wait began, but this is not clear.) The STATE
could also be WAITED UNKNOWN TIME or WAITED SHORT TIME, the latter indicating that
the last wait was less than one centisecond in duration.
The following query shows the parameters associated with the db file scattered read wait
event:
SQL>SELECT*
2FROMv$event_name
3WHEREname='dbfilescatteredread';
EVENT#NAME
PARAMETER1PARAMETER2PARAMETER3
188dbfilescatteredread
file#block#blocks
SQL>
This tells us that when a process is waiting for a multi-block read from disk to complete (as in
the case of a full table scan where the data blocks were not already in the buffer cache), we can
see the file from which the blocks are being read as well as the starting block and number of
blocks.
The following query was run while a session was performing a full table scan:
SQL>SELECT*FROMv$session_waitWHEREsid=16;
SIDSEQ#EVENT
P1TEXTP1P1RAW
P2TEXTP2P2RAW
P3TEXTP3P3RAWWAIT_TIMESECONDS_IN_WAIT
STATE
-- 12 --
16303dbfilescatteredread
file#1700000011
block#272100000AA1
blocks80000000810
WAITEDSHORTTIME
SQL>
You can see that the process was not waiting at the moment this query was run, but its last
wait had been for an I/O request to read eight blocks from file 17 starting at block 2721. That
I/O request had completed in less than 0.01 second. (Note the -1 in the WAIT_TIME column
when the STATE is WAITED SHORT TIME.) Why did Oracle choose to read eight blocks at a
time? Because the db_file_multiblock_read_count instance parameter was set to eight.
DBAs should be familiar with the SQL trace facility built into Oracle. By using the commands
below, you can enable SQL trace for your session or another session:
ALTERSESSIONSETsql_trace=TRUE;
EXECUTESYS.dbms_system.set_sql_trace_in_session(sid,serial#,TRUE);
When SQL trace is enabled for a session, the Oracle process writes detailed trace information
(including timing data if timed statistics are enabled) to a trace file in a directory specified by
the user_dump_dest instance parameter. These trace files are plain text files and human
readable, but rather tedious and repetitive. You can optionally run trace files through a
processor such as TKPROF instead of looking at them directly.
Oracle has the ability to direct a process to include additional information in the trace file,
including wait event information. In earlier releases of Oracle, we activated the wait event
tracing facility by setting debug event 10046. Debug event 10046 still works in Oracle 9i, but
recent releases of Oracle include a PL/SQL package built-in that gives a more friendly means
for activating wait event tracing.
Setting debug events allows a DBA to instruct an Oracle instance to take on a special, atypical
behavior. Debug events can be used, for example, to cause Oracle to write a system level dump
file whenever an ORA-0600 error occurs or skip over corrupted blocks in a table when
performing a full table scan. Most debug events should never be set unless you are directed to
do so by Oracle Support. Some debug events can put your database at risk.
Debug event 10046 affects the amount of information written to trace files. It is a very safe
debug event, and one of the few that you are allowed to set without special permission from
Oracle Support. Debug event 10046 can be set to the following values:
-- 13 --
10046 trace name context forever, Enables ordinary SQL trace
level 1
10046 trace name context forever, Enables SQL trace with bind variable values
level 4 included in trace file
10046 trace name context forever, Enables SQL trace with wait event information
level 8 included in trace file
10046 trace name context forever, Equivalent of level 4 and level 8 together
level 12
10046 trace name context off Turns off tracing
You can set the 10046 debug event to trace your session and collect wait event information in
the trace file with either of the following commands:
EXECUTESYS.DBMS_SUPPORT.START_TRACE
ALTERSESSIONSETevents'10046tracenamecontextforever,level8';
You can set the 10046 debug event to trace another session on the database with any of the
following commands:
EXECUTESYS.DBMS_SUPPORT.START_TRACE_IN_SESSION(sid,serial#)
oradebugsetorapid[OraclePIDfromv$process]
oradebugsession_event10046tracenamecontextforever,level8
EXECUTESYS.DBMS_SYSTEM.SET_EV(sid,serial#,10046,8,'')
Calling DBMS_SUPPORT as shown here is equivalent to activating debug event 10046 at level
8. You may include optional additional parameters in the procedure call to activate level 4 or
12. The DBMS_SUPPORT package is not installed in the database by default. You need to run
the dbmssupp.sql script found in the rdbms/admin directory as SYS before you can call
DBMS_SUPPORT. This script was left out of many releases of Oracle 8i by mistake, but it is
available separately as a patch.
It should also be pointed out that the SET_EV procedure in the DBMS_SYSTEM package is not
officially supported by Oracle Corporationthey prefer that you use DBMS_SUPPORT instead.
The reason is that DBMS_SYSTEM.SET_EV allows you to set any debug event in any session.
As mentioned above some debug events can potentially be dangerous, and
DBMS_SYSTEM.SET_EV lets you set such events in any session.
Unlike timed statistics, tracing consumes a significant amount of system resources. Therefore
it is important to use the tracing facility sparingly. Trace only the sessions you need to trace,
and only for as long as you need. Turn off tracing as soon as it is no longer needed with any of
the following commands:
-- 14 --
EXECUTESYS.DBMS_SUPPORT.STOP_TRACE
ALTERSESSIONSETEVENTS'10046TRACENAMECONTEXTOFF';
ALTERSESSIONSETsql_trace=FALSE;
EXECUTESYS.DBMS_SUPPORT.STOP_TRACE_IN_SESSION(sid,serial#)
oradebugsetorapid[OraclePIDfromv$process]
oradebugsession_event10046tracenamecontextforever,level0
EXECUTESYS.DBMS_SYSTEM.SET_EV(sid,serial#,10046,0,'')
When you set debug event 10046 to a level of 8 or 12 (or call START_TRACE or
START_TRACE_IN_SESSION in DBMS_SUPPORT), the Oracle process will write a line into the
trace file every time it finishes waiting for an event. The line in the trace file will contain almost
the same information that would have appeared in the v$session_wait view, but perhaps in a
slightly less friendly format. You can also see in the trace file which cursor (and therefore
which SQL statement) the wait event was associated with.
Here is an excerpt from a trace file generated by setting debug event 10046 to level 12 on an
Oracle 8i database:
=====================
PARSINGINCURSOR#1len=80dep=0uid=502oct=3lid=502
tim=2293771931hv=2293373707ad='511dca20'
SELECT/*+FULL*/SUM(LENGTH(notes))
FROMcustomer_calls
WHEREstatus=:x
ENDOFSTMT
PARSE#1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=0,tim=2293771931
BINDS#1:
bind0:dty=2mxl=22(22)mal=00scl=00pre=00oacflg=03oacfl2=0
size=24offset=0
bfp=09717724bln=22avl=02flg=05
value=43
EXEC#1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=2293771931
WAIT#1:nam='SQL*Netmessagetoclient'ela=0p1=675562835p2=1p3=0
WAIT#1:nam='dbfilescatteredread'ela=3p1=17p2=923p3=8
WAIT#1:nam='dbfilescatteredread'ela=1p1=17p2=931p3=8
WAIT#1:nam='dbfilescatteredread'ela=2p1=17p2=939p3=8
WAIT#1:nam='dbfilesequentialread'ela=0p1=17p2=947p3=1
WAIT#1:nam='dbfilescatteredread'ela=3p1=17p2=1657p3=8
WAIT#1:nam='dbfilescatteredread'ela=1p1=17p2=1665p3=8
WAIT#1:nam='dbfilescatteredread'ela=2p1=17p2=1673p3=8
WAIT#1:nam='dbfilescatteredread'ela=0p1=17p2=1681p3=8
WAIT#1:nam='dbfilescatteredread'ela=3p1=17p2=1761p3=8
WAIT#1:nam='dbfilescatteredread'ela=1p1=17p2=1769p3=8
WAIT#1:nam='dbfilescatteredread'ela=1p1=17p2=1777p3=8
WAIT#1:nam='dbfilescatteredread'ela=0p1=17p2=1785p3=8
WAIT#1:nam='dbfilescatteredread'ela=2p1=17p2=1841p3=8
WAIT#1:nam='dbfilescatteredread'ela=1p1=17p2=1849p3=8
-- 15 --
The ela= figures indicate the duration of the wait. In trace files generated by Oracle 8i and
earlier, the elapsed time is shown in centiseconds. Beginning in Oracle 9i, the elapsed time is
shown in microseconds.
You can see that when the session executed the query, there was a wait shorter than one
centisecond for a message to be sent to the client, followed by a bunch of waits for I/O requests
against file 17. Most of the I/O requests were multi-block reads, reading eight blocks at a time
while performing a full table scan of the CUSTOMER_CALLS table.
You can probably imagine how large and overwhelming trace files can get when anything but a
trivial application is traced. This is why Oracle provides the TKPROF utility. TKPROF takes a
trace file as input and generates a nicely formatted, easy to read report. Unfortunately, the
TKPROF that comes with Oracle 8i and earlier releases ignores wait event information written
in the trace file. To analyze the wait event information collected by a trace in an Oracle 8i or
earlier environment, you will need to either pore through the trace file manually or write your
own trace file parser and formatter. (Weve heard of people writing perl scripts to do this, but
we have never come across them.)
The TKPROF utility that comes with Oracle 9i is capable of reporting wait event information
summarized by distinct statement. By default Oracle 9i TKPROF ignores wait event information
in the trace file as earlier releases did. However, if the waits=yes command line argument is
provided, a separate table of wait event statistics will appear in the report for each distinct
statement. Here is a sample excerpt from a TKPROF report showing wait event information:
********************************************************************************
SELECTA.customer_id,A.customer_name,COUNT(*)purchases,
MAX(B.transaction_date)last_purchase_date
FROMcustomersA,purchase_historyB
WHEREB.customer_id=A.customer_id
GROUPBYA.customer_id,A.customer_name
ORDERBYA.customer_id
callcountcpuelapseddiskquerycurrentrows
Parse10.060.1202700
Execute10.000.000000
Fetch146121.4125.8078015905521893
total146321.4725.9378015932521893
Missesinlibrarycacheduringparse:1
Optimizergoal:CHOOSE
Parsinguserid:17
RowsRowSourceOperation
21893SORTGROUPBY
1525494HASHJOIN
31212TABLEACCESSFULLCUSTOMERS
1525494INDEXFASTFULLSCANPURCHASE_HISTORY_PK(objectid7824)
Elapsedtimesincludewaitingonfollowingevents:
EventwaitedonTimesMax.WaitTotalWaited
-- 16 --
Waited
SQL*Netmessagetoclient14610.000.00
dbfilesequentialread40.000.01
dbfilescatteredread7500.072.25
directpathwrite2710.832.79
directpathread2540.000.01
SQL*Netmessagefromclient14611.2822.40
********************************************************************************
You can see that the query was parsed and executed once, and there were 1461 fetches to read
21893 rows. The Oracle process used 21.47 CPU seconds and 25.93 seconds of elapsed time to
parse, execute, and fetch the query. The elapsed time was greater than the CPU time because
the Oracle process had to wait on several wait events. The listing at the bottom of the report
shows the process waited on network roundtrips to the client and reads and writes to disk.
All of these waits seem to make sense: There was one network roundtrip to the client for each
fetch call, and it took SQL*Plus a long time (over 22 seconds) to display the 21,893 rows of
data on my display. The db file sequential reads and db file scattered reads resulted from the
table access and index fast full scan. The direct path writes and reads resulted from the hash
join and sort operations. If you add the CPU time of 21.47 seconds and the wait times (5.06
seconds not including the SQL*Net message from client waits) you come up with an elapsed
time of 26.53 seconds. However, the report shows the total elapsed time as 25.93 seconds. This
is a good example of the round-off error that is not unusual when thousands of very short time
intervals are added together.
In some situations you may not be able to identify exactly which session to trace. This happens
frequently when your database is supporting a web application that has frequent short
connections, or an application server tier that maintains a pool of database connections all
logged on as the same Oracle user. You may have a specific query you want to collect wait
event information for, but you may not know which session will be executing the query. One
way to deal with this situation is to trace all of the sessions being started by your application
for a brief period. This can be done easily with the following SQL*Plus script:
SPOOLtraceall.sql
SETHEADINGOFFFEEDBACKOFF
SELECT'EXECUTESYS.dbms_system.set_ev('||TO_CHAR(sid)||
','||TO_CHAR(serial#)||',10046,8,'''')'
FROMv$session
WHEREusername='WEB_USER';
SPOOLOFF
SETFEEDBACKON
@traceall.sql
This script will start wait event tracing on all sessions connected to the database as the
WEB_USER user. To stop tracing simply change the 8 after the 10046 to a 0 and run the script
-- 17 --
again. You should think about how many sessions this script will trace and how that will affect
server load before running on a production system.
After youve traced a number of sessions, you can scan the trace files for occurrences of a
specific query with a command like the following:
grepn1234567890*.trc
Replace 1234567890 with the hash value or address of the statement you are interested in.
(You can get these values by querying v$sql.) The output of the above grep command will tell
you which trace files the query appears in and on which lines. You can then go to that point in
the trace files and get detailed wait event information.
Trace files are easiest to work with when you use a dedicated server connection to the
database. If you connect to the database using Oracles shared server architecture then
different SQL statements may be executed by different server processes. Each server process
writes to its own trace file. Thus the trace information for the session can be spread over
several trace files.
In the examples below, we will apply the techniques described above for collecting wait event
information, and we will evaluate the data in order to better understand how the system is
functioning and determine what we can do to boost system performance. Hopefully these
examples will give you insights into researching performance and other types of problems on
your systems.
A magazine publisher had developed a web site that offered web access to many of the articles
in the magazine. At times, the website would get completely bogged down and response time
would become very poor. Operating system statistics on the database server indicated that
during busy times, server idle time would be near zero. We used Statspack to gather database
statistics on an hourly basis. Excerpts from one Statspack report are as follows:
SnapIdSnapTimeSessions
BeginSnap:6111Dec0213:00:52145
EndSnap:7111Dec0214:00:26145
Elapsed:59.57(mins)
CacheSizes
~~~~~~~~~~~
db_block_buffers:38400log_buffer:1048576
db_block_size:8192shared_pool_size:80000000
-- 18 --
LoadProfile
~~~~~~~~~~~~PerSecondPerTransaction
Redosize:6,390.415,731.32
Logicalreads:33,962.4030,459.63
Blockchanges:44.7240.11
Physicalreads:536.77481.41
Physicalwrites:163.02146.21
Usercalls:397.75356.73
Parses:237.64213.13
Hardparses:23.8421.38
Sorts:248.38222.77
Logons:0.410.36
Executes:213.52191.50
Transactions:1.11
%BlockschangedperRead:0.13RecursiveCall%:86.07
Rollbackpertransaction%:0.23RowsperSort:30.73
InstanceEfficiencyPercentages(Target100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BufferNowait%:98.39RedoNoWait%:99.97
BufferHit%:98.42InmemorySort%:100.00
Top5WaitEvents
~~~~~~~~~~~~~~~~~Wait%Total
EventWaitsTime(cs)WtTime
bufferbusywaits1,962,3721,278,64950.03
dbfilesequentialread1,336,8701,050,87841.12
dbfilescatteredread47,71749,3261.93
directpathwrite8,07040,5741.59
latchfree38,22031,0121.21
BufferPoolStatisticsforDB:
>PoolsD:defaultpool,K:keeppool,R:recyclepool
FreeWriteBuffer
BufferConsistentPhysicalPhysicalBufferCompleteBusy
PGetsGetsReadsWritesWaitsWaitsWaits
D12,29249,863,6964703,5300032
K1,509,38520,691,9721,509,36611001,962,117
R51300000
We noticed that the two dominant wait events were buffer busy waits and db file sequential
read. These two wait events accounted for over 23,000 seconds of wait time during the one
hour sample period. We decided to drill down to learn more about these wait events. Looking at
v$event_name, we see that the parameters for the buffer busy waits event are as follows:
SQL>SELECT*FROMv$event_nameWHEREname='bufferbusywaits';
EVENT#NAMEPARAMETER1PARAMETER2PARAMETER3
75bufferbusywaitsfile#block#id
SQL>
-- 19 --
For this wait event, the parameter information from v$event_name points you in the right
direction but does not necessarily give you the full story. The first and second parameters tell
you the file and block number of the data block in the database that the session was trying to
access when the wait event occurred. The third parameter is a numeric reason code indicating
why the buffer in the buffer cache was busy and the process had to wait. Metalink bulletin
34405.1 explains what the different reason codes mean.
The following query will list all sessions waiting for a given wait event at the split second the
query is begun:
SELECTsid,event,state,seconds_in_wait,wait_time,p1,p2,p3
FROMv$session_wait
WHEREevent='bufferbusywaits'
ORDERBYsid;
Because only a couple (if any) sessions are likely to be encountering this wait at any one
moment in time, we usually need to run this query several times in quick succession. One of
the times, we got the following output:
SQL>/
SIDEVENTSTATESECONDS_IN_WAITWAIT_TIMEP1P2P3
12bufferbusywaitsWAITE103062157130
31bufferbusywaitsWAITE103023558130
SQL>
We performed the following query for each session wait found, substituting P1 for the file
number and P2 for the block number:
SELECTowner,segment_name,segment_type
FROMdba_extents
WHEREfile_id=&absolute_file_number
AND&block_numberBETWEENblock_idANDblock_id+blocks1;
Using this query we found that session 12 above was waiting on a busy buffer in the SAMPLES
table in the PRODMGR schema:
SQL>/
Entervalueforabsolute_file_number:30
Entervalueforblock_number:62157
OWNERSEGMENT_NAMESEGMENT_TYPE
PRODMGRSAMPLESTABLE
SQL>
-- 20 --
In order to figure out what reason code 130 means, we look at a reference table listed in
Metalink bulletin 34405.1. For convenience this table appears below. We can see that the buffer
is busy because the block is being read by another session.
Reason Code
(Id) P3
Reason
>=8.1.
<=8.0.6
6
0 0 A block is being read
We want to NEW the block but the block is currently being read by another session
1003 100
(most likely for undo).
We want to NEW the block but someone else has is using the current copy so we
1007 200
have to wait for them to finish.
Trying to get a buffer in CR/CRX mode , but a modification has started on the buffer
1010 230
that has not yet been completed.
1012 - A modification is happening on a SCUR or XCUR buffer, but has not yet completed
1012
CR/CRX scan found the CURRENT block, but a modification has started on the
(duplicat 231
buffer that has not yet been completed.
e)
Block is being read by another session and no other suitable block image was found,
so we wait until the read is completed. This may also occur after a buffer cache
1013 130
assumed deadlock. The kernel can't get a buffer in a certain amount of time and
assumes a deadlock. Therefor it will read the CR version of the block.
We want the CURRENT block either shared or exclusive but the Block is being read
1014 110
into cache by another session, so we have to wait until their read() is completed.
1014
We want to get the block in current mode but someone else is currently reading it
(duplicat 120
into the cache. Wait for them to complete the read. This occurs during buffer lookup.
e)
The session wants the block in SCUR or XCUR mode. If this is a buffer exchange or
the session is in discrete TX mode, the session waits for the first time and the second
1016 210 time escalates the block as a deadlock and so does not show up as waiting very long.
In this case the statistic: exchange deadlocks is incremented and we yield the CPU
for the buffer deadlock wait event.
1016
During buffer lookup for a CURRENT copy of a buffer we have found the buffer but
(duplicat 220
someone holds it in an incompatible mode so we have to wait.
e)
Next we looked for sessions waiting on the db file sequential read eventsessions waiting on
index lookups or table accesses by ROWID:
SELECTsid,event,state,seconds_in_wait,wait_time,p1,p2,p3
FROMv$session_wait
WHEREevent='dbfilesequentialread'
ORDERBYsid;
-- 21 --
Following the same approach as used with the buffer busy waits, we then drilled down to find
which segments in the database sessions were accessing with the single block I/Os. In this
case it was again the SAMPLES table.
From this we concluded that the buffer busy waits appeared related to the single block disk
readsbuffer block reads were waiting for other buffer block reads to complete, and those in
turn were waiting for sequential reads from disk to complete. Put another way, session A was
waiting on a single block disk I/O to bring a block into the buffer cache. Meanwhile, session B
wanted to read the same block. Since the block was not yet in the buffer cache and another
session was already reading it from disk, session B waited on a buffer busy waits event while
session A waited on the db file sequential read event.
When we looked at the SAMPLES table further, we found that it contained a LONG column and
there was a fair amount of row chainingsome rows were larger than the database block size.
Many queries against multiple tables were doing nested loops joins with the SAMPLES table.
This was causing lots of single block reads from the SAMPLES table.
The values stored in the LONG column of the SAMPLES table were quite big. One of the many
weaknesses of the LONG data type is that data is stored inline in the table segment regardless
of size. This led to a very large SAMPLES table segment even though the table did not have a
large number of rows. While most queries against the SAMPLES table were not fetching the
LONG column, Oracle still had to bring the LONG data into the buffer cache because it was
stored in the same blocks as the other columns. This led to a lot more physical I/O and churn in
the buffer cache than should have been necessary.
One way to reduce buffer cache churn and physical I/O would have been to replace the LONG
column with a CLOB column. CLOB data can be stored in a separate segment from the rest of
the table data. Thus the SAMPLES table segment would be a lot smaller and more likely to fit
entirely in the buffer cache. This would reduce the activity rate in the buffer cache and the
amount of physical I/O.
Unfortunately, changing the database schema and the application code to replace the LONG
data type with CLOB was not possible in a quick enough timeframe to address the immediate
performance problem. So we had to develop a quick fix that could be deployed temporarily
until the data type could be changed. First we enlarged the KEEP buffer pool, reducing
physical reads as all of the SAMPLES table could stay in the buffer cache. Next, we created an
index on all of the columns in the SAMPLES table except the LONG column. This allowed
Oracle to satisfy most queries against the SAMPLES table without having to read the
excessively large table segment, which greatly reduced the amount of data accessed during
each query.
-- 22 --
These two easy and quick changes virtually eliminated the buffer busy waits and reduced db
file sequential read wait events as total physical reads dropped 90%. Response time for the
most common queries involving the SAMPLES table dramatically improved, and overall web
site response times returned to an acceptable level. Excerpts from a Statspack report after the
two changes were made are as follows:
SnapIdSnapTimeSessions
BeginSnap:119220Dec0213:00:49102
EndSnap:120220Dec0214:00:18102
Elapsed:59.48(mins)
CacheSizes
~~~~~~~~~~~
db_block_buffers:38400log_buffer:1048576
db_block_size:8192shared_pool_size:80000000
LoadProfile
~~~~~~~~~~~~PerSecondPerTransaction
Redosize:2,496.243,453.13
Logicalreads:30,270.8641,874.69
Blockchanges:17.0823.62
Physicalreads:36.9051.04
Physicalwrites:79.62110.14
Usercalls:313.37433.50
Parses:173.91240.57
Hardparses:17.0423.57
Sorts:169.33234.25
Logons:0.360.50
Executes:149.11206.27
Transactions:0.72
%BlockschangedperRead:0.06RecursiveCall%:84.55
Rollbackpertransaction%:0.31RowsperSort:62.86
InstanceEfficiencyPercentages(Target100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BufferNowait%:100.00RedoNoWait%:99.98
BufferHit%:99.88InmemorySort%:99.89
Top5WaitEvents
~~~~~~~~~~~~~~~~~Wait%Total
EventWaitsTime(cs)WtTime
directpathwrite6,46713,54530.61
logfilesync4,9147,49316.93
librarycachepin1,1756,09013.76
directpathread5,4883,4287.75
latchfree14,5282,9316.62
-- 23 --
BufferPoolStatisticsforDB
>PoolsD:defaultpool,K:keeppool,R:recyclepool
FreeWriteBuffer
BufferConsistentPhysicalPhysicalBufferCompleteBusy
PGetsGetsReadsWritesWaitsWaitsWaits
D10,17839,561,0723301,1220010
K113,075,94414000
R0000000
As you can see, with a similar amount of querying activity (roughly 300-400 user calls and
30,000-33,000 logical reads per second) we have reduced total wait event time to about 442
seconds during the one hour sample period. (Direct path writes totaled 13545 centiseconds and
represented 30.61% of total waits; 135.45/0.3061 = 442 seconds of total wait time.) This is
quite a difference from the 23,000 seconds of wait time during the sample period we were
seeing before the KEEP pool was enlarged and the extra index was added to the SAMPLES
table.
Over time, application changes were made so that even fewer queries against the SAMPLES
table required retrieving the LONG column. Eventually, the LONG column was replaced with a
CLOB.
-- 24 --
EXAMPLE #2: MORE BUFFER BUSY WAITS, PLUS LATCH CONTENTION FOR GOOD
MEASURE
The magazine publishers application demonstrated buffer busy waits where multiple Oracle
processes want to read the same data block from disk at the same time. The first Oracle
process would issue the I/O request and wait on a db file sequential read event, while
subsequent Oracle processes would wait on a buffer busy waits event until the data block had
made its way into the buffer cache. Now we will look at an example where multiple Oracle
processes try to write to different copies of the same data block in the buffer cache at the same
time. Since Oracle has row-level locking, multiple users can update different rows in the same
table at the same time without locking each other out. Multiple users can even update different
rows in the same data block concurrently. But, as we will see in this example, contention and
buffer busy waits can sometimes occur.
A genetic research company used an Oracle database to store research data. Periodically, they
would load large volumes of raw data into the database. Then they would run a processing
application on multiple workstations. Each workstation would fetch raw data from the
database, process it, and put the processed data back in the database. The company had
planned to add more workstations to speed up the processing, but found that adding more
workstations actually seemed to reduce the throughput. The Oracle database was being used
merely as a repository, we were told. All Oracle had to do was fetch the raw data and then store
the processed data. The actual processing was being performed on the workstations outside of
the realm of Oracle. So it would seem that the Oracle database should not be the bottleneck.
We found about 100 active database sessions, each initiated by a separate workstation. We
modified the application code running on workstation 30 to activate debug event 10046 for wait
event tracing. We did this by inserting the following statement into the code just after
connecting to the database:
ALTERSESSIONSETevents'10046tracenamecontextforever,level8';
We also modified the application to exit after iterating through its main loop 500 times. We ran
the modified application on the workstation and retrieved the trace file that Oracle had created
in the directory specified by the user_dump_dest parameter. The application ran for less than
one minute before exiting. Since this was an Oracle 9i database, we were able to run TKPROF
to generate a report including wait event statistics with the following command:
tkprofprodgen_ora_16466.trcreport_16466.prfwaits=yes
-- 25 --
UPDATEprocessing_stations
SETstatus='ACTIVE',
status_date=SYSDATE,
data_set_id_being_processed=:b1
WHEREstation_id=30
callcountcpuelapseddiskquerycurrentrows
Parse10.010.000000
Execute5000.2310.14036161010500
Fetch00.000.000000
total5010.2410.14036161010500
Optimizergoal:CHOOSE
Parsinguserid:40
RowsRowSourceOperation
500UPDATE
500INDEXUNIQUESCANPROCESSING_STATIONS_PK(objectid9372)
Elapsedtimesincludewaitingonfollowingevents:
EventwaitedonTimesMax.WaitTotalWaited
Waited
bufferbusywaits260.717.87
latchfree170.572.08
logfileswitchcompletion30.090.20
The UPDATE statement here seemed quite trivialchanging three column values on one row,
given a primary key value. Oracle only needed to do about nine logical reads per execution, and
no physical reads were required at all. Yet it still took Oracle over 10 seconds to execute the
statement 500 times. If we had not specified the waits=yes TKPROF option, we might
conclude that the database server was CPU-starved. But a quick look at the wait event portion
of the TKPROF report shows that most of the 10 seconds of elapsed execution time was spent
waiting on busy buffers and latches.
Some of the buffer busy waits listed in the trace file were as follows:
WAIT#2:nam='bufferbusywaits'ela=527727p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=498765p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=137611p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=124165p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=5237p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=264050p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=270177p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=330912p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=156317p1=18p2=10p3=220
WAIT#2:nam='bufferbusywaits'ela=710696p1=18p2=10p3=220
-- 26 --
Since this trace file was generated by an Oracle 9i database, the elapsed times are shown in
units of microseconds. The wait shown at the bottom of the list above had an elapsed time of
710,696 microseconds, or approximately 0.71 seconds. This was the longest buffer busy wait
for this statement, as indicated by the Max. Wait figure in the TKPROF report.
From the previous example, recall that the parameters for the buffer busy waits event indicate
file number, block number, and reason code. All of the buffer busy waits listed above involve
block 10 in data file 18, with a reason code of 220. By querying the data dictionary, we found
that block 10 in data file 18 was allocated to the PROCESSING_STATIONS table, the table
being updated by the statement that encountered the buffer busy waits. According to Metalink
bulletin 34405.1, reason code 220 indicates that the buffer is busy because another session is
using a copy of the buffer in an incompatible mode.
From the TKPROF report we can see that the application updated the PROCESSING_STATIONS
table 500 times during the period traced. Each update affected one row. While performing the
updates, there were 26 times that the process had to wait on a buffer busy waits event. From
looking at the trace file we know that all of the buffer busy waits occurred because data block
10 in data file 18 was already in the buffer cache, but in an incompatible mode. These 26 waits
totaled 7.87 seconds.
A query against the data dictionary revealed the following statistics for the
PROCESSING_STATIONS table:
SQL>SELECTSYSDATElast_analyzed,blocks,avg_row_len,avg_space,
2chain_cnt,num_rows
3FROMuser_tables
4WHEREtable_name='PROCESSING_STATIONS';
SYSDATELAST_ANALYZEDBLOCKSAVG_ROW_LENAVG_SPACECHAIN_CNTNUM_ROWS
2.13211805616216860100
SQL>
-- 27 --
We can see that optimizer statistics were gathered recently and the table contains 100 very
short rows, all stored in one data block. If the application session that we traced updated the
PROCESSING_STATIONS table 500 times in less than a minute, we might suspect that the
database sessions from the other 99 workstations are doing roughly the same thing. This adds
up to over 50,000 updates to the same data block per minute. Since each session is updating a
different row in the table, there is no waiting on row-level locks (which would show up in the
wait event interface as enqueue waits). However, only one process at a time may write to a
buffer in the buffer cache. Therefore, a lot of time is being wasted contending for this one very
hot buffer.
All of this begs the question, Why is the application updating a row in the
PROCESSING_STATIONS table 500 times in less than one minute? It turns out that the
application uses this table to coordinate efforts between the workstations. There is one row in
the table for each workstation. The application running on each workstation updates its row in
PROCESSING_STATIONS to indicate which data set it is processing so that other workstations
will not attempt to process the same data set. Also, each workstation periodically updates the
status date so that if a workstation crashes, it will be easy to tell when it crashed and what data
it was processing during the crash.
Processing of one data set could take anywhere from a fraction of a second to twenty minutes.
The application code was written to not only update the PROCESSING_STATIONS table when a
new data set was processed, but also to frequently update the timestamp during the processing
of one data set. All stakeholders agreed that 500 updates per minute might be excessive. If the
application code could be modified to update the table only once per second, then writes to the
table would reduce by roughly 90% and contention for the hot buffer in the buffer cache would
likely go away.
We rebuilt the PROCESSING_STATIONS table with the PCTFREE storage parameter set to 99
and PCTUSED set to 1. This caused Oracle to leave 99% of the available space in each data
block empty for future row expansion. This change caused the 100 rows in the table to be
spread over 100 data blocks instead of all rows being neatly packed into one data block. A
query against the data dictionary confirmed this:
-- 28 --
SQL>SELECTSYSDATElast_analyzed,blocks,avg_row_len,avg_space,
2chain_cnt,num_rows
3FROMuser_tables
4WHEREtable_name='PROCESSING_STATIONS';
SYSDATELAST_ANALYZEDBLOCKSAVG_ROW_LENAVG_SPACECHAIN_CNTNUM_ROWS
.1308680561006280140100
SQL>
Tracing another 500 iterations of the main loop in the application on workstation 30 after
rebuilding the PROCESSING_STATIONS table yielded the following TKPROF output:
UPDATEprocessing_stations
SETstatus='ACTIVE',
status_date=SYSDATE,
data_set_id_being_processed=:b1
WHEREstation_id=30
callcountcpuelapseddiskquerycurrentrows
Parse10.000.000000
Execute5000.202.2205001009500
Fetch00.000.000000
total5010.202.2205001009500
Optimizergoal:CHOOSE
Parsinguserid:40
RowsRowSourceOperation
500UPDATE
500INDEXUNIQUESCANPROCESSING_STATIONS_PK(objectid9379)
Elapsedtimesincludewaitingonfollowingevents:
EventwaitedonTimesMax.WaitTotalWaited
Waited
latchfree20.350.61
The application is still updating the PROCESSING_STATIONS table very rapidly. But now each
row in the table is stored in a separate data block, so each workstation is working with its own
hot buffer in the buffer cache. The sessions are no longer competing with each other for the
opportunity to write to the same buffer. Waits on the buffer busy waits event have been
eliminated. As a side effect, logical reads have been reduced to one third of what they were
originally.
The original TKPROF reports also indicated waits on the latch free wait event, indicating latch
contention. A query against v$event_name shows that the parameters to the latch free wait
event are as follows:
-- 29 --
SQL>SELECT*FROMv$event_nameWHEREname='latchfree';
EVENT#NAMEPARAMETER1PARAMETER2PARAMETER3
3latchfreeaddressnumbertries
SQL>
Some of the latch free wait events in the trace file generated before the
PROCESSING_STATIONS table was rebuilt were as follows:
WAIT#2:nam='latchfree'ela=47004p1=15113593728p2=97p3=0
WAIT#2:nam='latchfree'ela=14629p1=15113593728p2=97p3=1
WAIT#2:nam='latchfree'ela=20652p1=15113593728p2=97p3=2
WAIT#2:nam='latchfree'ela=37737p1=15113593728p2=97p3=3
A query against v$latch or v$latchname will give us the name of latch number 97:
SQL>SELECTname
2FROMv$latch
3WHERElatch#=97;
NAME
cachebufferschains
SQL>
So we can see that the four wait events listed from the trace file above amount to a wait of
approximately 0.12 seconds for the cache buffer chains latch. (Remember that in Oracle 9i
trace files, elapsed times are listed in microseconds while in trace files from Oracle 8i and
earlier the elapsed times are in centiseconds.) This is not unexpected, since we had 100
sessions contending for the right to modify one buffer in the buffer cache.
Comparing the TKPROF report before rebuilding the PROCESSING_STATIONS table with after,
we see that rebuilding the table reduced the latch contention but did not eliminate it. It is likely
that the long-term solution of modifying the application to reduce the update frequency will
eliminate or further reduce the latch contention.
An application performed occasional loads into a small data warehouse. Unfortunately, the
loads were taking a much longer time on the production system than they had in a test
environment. To find out what was happening on the production database during a data load,
we ran the following script that displays the wait event activity for the entire instance over a 30
second interval. The list of events in the NOT IN clause simply filters out the idle events,
making the output a little easier to read:
-- 30 --
DROPTABLEprevious_events;
CREATETABLEprevious_events
AS
SELECTSYSDATEtimestamp,v$system_event.*
FROMv$system_event;
EXECUTEdbms_lock.sleep(30);
SELECTA.event,
A.total_waitsNVL(B.total_waits,0)total_waits,
A.time_waitedNVL(B.time_waited,0)time_waited
FROMv$system_eventA,previous_eventsB
WHEREB.event(+)=A.event
ANDA.eventNOTIN
(
'clientmessage','dispatchertimer','gcsforaction',
'gcsremotemessage','gesremotemessage','i/oslavewait',
'jobqslavewait','lockmanagerwaitforremotemessage',
'nullevent',Nullevent,'parallelquerydequeue','pipeget',
'PL/SQLlocktimer','pmontimer','PXDeqCredit:needbuffer',
'PXDeqCredit:sendblkd','PXDeq:ExecuteReply',
'PXDeq:ExecutionMsg','PXDeq:SignalACK',
'PXDeq:TableQNormal','PXDequeWait','PXIdleWait',
'queuemessages','rdbmsipcmessage','slavewait',
'smontimer','SQL*Netmessagetoclient',
'SQL*Netmessagefromclient','SQL*Netmoredatafromclient',
'virtualcircuitstatus','wakeuptimemanager'
)
ORDERBYtime_waited;
Running this script during a data load yielded the following output:
-- 31 --
EVENTTOTAL_WAITSTIME_WAITED
LGWRwaitforredocopy00
SQL*Netbreak/resettoclient20
SQL*Netmoredatatoclient00
bufferbusywaits00
dbfilescatteredread60
directpathread270
directpathwrite340
directpathwrite(lob)00
fileidentify60
fileopen930
librarycachepin00
localwritewait00
processstartup00
rdbmsipcreply00
refreshcontrolfilecommand00
singletaskmessage00
switchlogfilecommand00
controlfilesequentialread611
latchfree21
dbfilesequentialread67
controlfileparallelwrite4131
logfilesinglewrite6164
dbfileparallelwrite13220
enqueue6486
logbufferspace242007
logfilesequentialread302655
logfileswitchcompletion332883
logfileparallelwrite193561
logfilesync11310249
During the 30 second sample interval, Oracle processes spent roughly 215 seconds waiting on
events related to the online redo log. The likely culprit in this situation is some combination of
slow disks and contention on the disks holding the online redo log files. After a little research,
we found that the redo logs had been striped on a RAID array on the faster test system. On the
production system, however, the redo logs were all on a single disk that was shared with
another very active database. When the online redo logs were moved to separate disks, the
data loads ran more rapidly and the wait events pertaining to the redo log decreased
significantly.
A company experienced delays when querying summarized information about their orders.
Querying individual orders was fast, but queries involving totals or rollups seemed to take
unacceptably long. We ran our Database Rx monitoring tool at regular intervals to collect
performance information from the system while the users were performing their typical tasks.
(Database Rx is a proactive monitoring and alert notification tool that we developed for use
with our remote database administration clients. For a free demo, see our website at
dbrx.dbspecialists.com/guest.) A Performance Summary report produced the following output:
-- 32 --
-- 33 --
Looking at the Top 5 Wait Events section, we see that the leading wait events are direct path
write, db file scattered read, and direct path read. These three wait events combined accounted
for roughly 99% of the non-idle wait time on the system during the sample interval. From our
knowledge of the queries in the application, the db file scattered read (i.e., full table scan)
events didnt surprise us. The absence of db file sequential read waits also did not surprise us
as the relevant blocks for index lookups and row fetches by ROWID were likely present in the
buffer cache.
We figured we might be able to reduce the db file scattered read waits with a larger buffer
cache or judicious use of the KEEP pool in the short term and application code review in the
longer term. What seemed amiss, however, were the large numbers of direct path read and
write wait events. Querying v$session_wait while applications were running yielded the
following:
SQL>SELECTsid,event,state,seconds_in_wait,wait_time,p1,p2,p3
2FROMv$session_wait
3WHEREevent='directpathwrite'
4ORDERBYsid;
SIDEVENTSTATESECONDS_IN_WAITWAIT_TIMEP1P2P3
39directpathwriteWAITI00201657
47directpathwriteWAITI0020122487
SQL>
Querying v$event_name shows the parameters for the direct path write event:
SQL>SELECT*FROMv$event_nameWHEREname='directpathwrite';
EVENT#NAMEPARAMETER1PARAMETER2PARAMETER3
115directpathwritefilenumberfirstdbablockcnt
SQL>
We see that the first parameter is supposed to be the file number of the data file containing the
blocks being written, but we couldnt find data file 201 using the following query:
SQL>SELECTtablespace_name,file_id"AFN",relative_fno"RFN"
2FROMdba_data_files
3WHEREfile_id=201;
norowsselected
SQL>
-- 34 --
This is because the database is using temp files in the temporary tablespace, so we used the
following query:
SQL>SELECTtablespace_name,file_id+value"AFN",relative_fno"RFN"
2FROMdba_temp_files,v$parameter
3WHEREname='db_files';
TABLESPACE_NAMEAFNRFN
TEMP2011
SQL>
This indicated that the TEMP tablespace was experiencing direct path write waits, which is not
surprising as direct path reads and writes usually occur in connection with temporary
segments. Direct path I/O operations were responsible for most of the waits on the system and
appeared to be a bottleneck. Upon further examination, we discovered that the sort_area_size
parameter was set to the default value of 65536. Since the database server had a fair amount of
RAM free and there generally werent too many sessions connected simultaneously, we
increased the sort_area_size to 10 Mb, and the direct path waits disappeared as sorts no longer
had to be written to disk.
If increasing the sort_area_size had not been an option, we might have looked at the disk
storage where the TEMP tablespace temp files were located to see if there was I/O contention
with other systems. We might also have looked at application code to see if the amount of
sorting and hashing could be reduced.
This example also demonstrates something that might seem odd at first glance. During the
Database Rx sample period, there were more physical reads than logical reads. How can this
bedon't all physical reads also require logical reads? Well, this is where direct path reads get
their name. As Oracle says in reference note 50415.1 on Metalink, Direct path reads are
generally used by Oracle when reading directly into PGA memory (as opposed to into the buffer
cache). Since direct path reads bypass the buffer cache, they aren't counted as logical reads.
This is the reason that some people question the traditional definition of the buffer cache hit
ratio, because it doesn't take account of direct path reads. We won't get into that debate, but be
aware that direct path reads could distort your cache hit ratio. In this example, the cache hit
ratio would be negative!
Several of the SQL*Net wait events are considered idle events. For example, the SQL*Net
message from client event indicates that the process is waiting for the application to submit a
SQL statement for processing. However, the SQL*Net wait events pertaining to database links
are not idle eventsthey come into play when queries use database links to access remote
databases. The SQL*Net message from dblink event, for example, indicates that the local
process has submitted a request to a remote database and is waiting for a response.
-- 35 --
A company had deployed a data import application for receiving batches of customer
transactions from its partners. There were five Oracle databases, one per geographic region.
Customer transactions were sorted by geographic location, and loaded into the database
serving that region. Due to human error, sometimes the same customer transaction would get
imported into multiple Oracle databases. They had developed a reconciliation report to address
this. The report was simply a query that compared transactions between two databases and
listed records found on both. Unfortunately, the report took over 30 minutes to run and the
runtime was expected to get much worse as the customer base grew.
Looking at the report code, we found the main query, which appeared to be the one using the
lions share of resources. We logged onto SQL*Plus as an application user in one window and
logged onto SQL*Plus again as a DBA in another window. From the DBA session we queried
v$session and found the SID corresponding to our application user SQL*Plus session. Then we
launched the query from the application user session and monitored v$session_event from the
other. Since the wait event interface does not account for CPU usage, we also monitored
v$sesstat.
After the report query completed 31 minutes later, we queried v$session_event one last time
and the results were as follows:
SQL>SELECTevent,total_waits,time_waited,max_wait
2FROMv$session_event
3WHEREsid=47
4ORDERBYevent;
EVENTTOTAL_WAITSTIME_WAITEDMAX_WAIT
SQL*Netmessagefromclient3244352432
SQL*Netmessagefromdblink152551610491931
SQL*Netmessagetoclient3300
SQL*Netmessagetodblink15255164669
dbfilesequentialread27199802528
latchfree4054
logfilesync122
7rowsselected.
SQL>SELECTA.name,B.value
2FROMv$statnameA,v$sesstatB
3WHEREA.statistic#=12
4ANDB.statistic#=A.statistic#
5ANDB.sid=47;
NAMEVALUE
CPUusedbythissession67937
SQL>
We can ignore the SQL*Net message from client wait event because this just indicates time
Oracle spent waiting for us to submit the query in SQL*Plus. However, there were over 1.5
million network roundtrips to a database link totaling over 1053 seconds of wait time. (The
1053 seconds includes both network latency and waiting on the remote database server to
-- 36 --
respond to requests.) There were over 27,000 waits on single block disk I/Os totaling about 80
seconds of wait. There were also latch free and log file sync wait events, but the wait times
were insignificant. The query from v$sesstat shows that about 679 seconds of CPU time were
used.
Knowing nothing about the report query or database size, the number of network roundtrips
seemed very high and the number of single block disk I/Os seemed like it might be high. Before
we could go any further, we really needed to look at the query and the tables involved. The
report query was as follows:
SELECTcustomer_id,batch_serial_number,batch_date,load_date,
batch_comment,control_total,control_amount
FROMcustomer_transfer_batchesA
WHEREexists
(
SELECT1
FROMcustomer_transfer_batches@prdwestB
WHEREB.customer_id=A.customer_id
ANDB.batch_serial_number=A.batch_serial_number
)
ORDERBYcustomer_id,batch_serial_number;
ExecutionPlan
0SELECTSTATEMENT
10FILTER
21TABLEACCESS(BYINDEXROWID)OF'CUSTOMER_TRANSFER_BATCHES'
32INDEX(FULLSCAN)OF'CUSTOMER_TRANSFER_BATCHES_PK'(UNIQUE)
41REMOTE*PRDWEST
4SERIAL_FROM_REMOTESELECT"CUSTOMER_ID","BATCH_SERIAL_NUMBER"
FROM"CUSTOMER_TRANSFER_BATCHES""B"
WHERE"BATCH_SERIAL_NUMBER"=:1
AND"CUSTOMER_ID"=:2
Performing 1.5 million network roundtrips and making the remote database execute 1.5 million
queries did not seem to be the most efficient way to go. It would probably be more efficient for
the local database to pull all of the required data from the remote database in one shot and
then join it to the local table. We probably could have added an optimizer hint to the query to
get Oracle to rewrite the subquery as a join, but instead we chose to rewrite the query
ourselves in a way that would be easier for Oracles query optimizer to do the right thing. The
rewritten query was as follows:
SELECTcustomer_id,batch_serial_number,batch_date,load_date,
batch_comment,control_total,control_amount
-- 37 --
FROMcustomer_transfer_batches
WHERE(customer_id,batch_serial_number)IN
(
SELECTcustomer_id,batch_serial_number
FROMcustomer_transfer_batches
INTERSECT
SELECTcustomer_id,batch_serial_number
FROMcustomer_transfer_batches@prdwest
)
ORDERBYcustomer_id,batch_serial_number;
We ran the revised query and monitored v$session_event and v$sesstat as before, with the
following results:
SQL>SELECTevent,total_waits,time_waited,max_wait
2FROMv$session_event
3WHEREsid=49
4ORDERBYevent;
EVENTTOTAL_WAITSTIME_WAITEDMAX_WAIT
SQL*Netmessagefromclient4636802481
SQL*Netmessagefromdblink243118
SQL*Netmessagetoclient4700
SQL*Netmessagetodblink2400
SQL*Netmoredatafromdblink5978133713
dbfilescatteredread34306758
dbfilesequentialread182602
directpathread14823311
directpathwrite920357233
logfilesync122
10rowsselected.
SQL>SELECTA.name,B.value
2FROMv$statnameA,v$sesstatB
3WHEREA.statistic#=12
4ANDB.statistic#=A.statistic#
5ANDB.sid=49;
NAMEVALUE
CPUusedbythissession3227
SQL>
This time the query completed in 92 seconds. Both elapsed time and CPU usage were reduced
by about 95%. The waits on the database link were reduced from about 1053 seconds to under
14 seconds, indicating less network overhead and a lot less demand being placed on the remote
database server. Waits on db file sequential read and db file scattered read combined were
significantly shorter than the waits on db file sequential read in the original query because
large numbers of single block I/Os were replaced with fewer multi-block I/Os.
However, there were about 38 seconds of direct path I/O waits. As weve seen, direct path I/O
usually occurs in connection with temporary segments. Oracle had to do quite a bit of sorting
-- 38 --
to implement the INTERSECT operation added to the rewritten query, so the direct path I/O
waits were not unexpected.
The performance gains achieved by the query rewrite were significant, and we were tempted to
end the tuning exercise right here. However, the company indicated that this report was critical
to their operation and they wanted to make it run even faster if possible. So, our next step was
to speed up the sort operation and thereby reduce the direct path I/O waits.
Looking at v$parameter, we saw that the sort_area_size was set to 1 Mb. A larger sort area
would allow Oracle to perform more of the sort in memory and less on disk, thus reducing the
direct path I/O. To see the effect of a larger sort area, we repeated the timing exercise, first
setting sort_area_size to 100 Mb dynamically in the SQL*Plus session where the report query
was run. The query completed in 55 seconds this time, and statistics collected from
v$session_event and v$sesstat were as follows:
SQL>SELECTevent,total_waits,time_waited,max_wait
2FROMv$session_event
3WHEREsid=46
4ORDERBYevent;
EVENTTOTAL_WAITSTIME_WAITEDMAX_WAIT
SQL*Netmessagefromclient47442287
SQL*Netmessagefromdblink252514
SQL*Netmessagetoclient4800
SQL*Netmessagetodblink2500
SQL*Netmoredatafromdblink6050137826
dbfilescatteredread34309458
dbfilesequentialread191591
logfilesync133
8rowsselected.
SQL>SELECTA.name,B.value
2FROMv$statnameA,v$sesstatB
3WHEREA.statistic#=12
4ANDB.statistic#=A.statistic#
5ANDB.sid=46;
NAMEVALUE
CPUusedbythissession3296
SQL>
You can see that increasing the sort area size did not reduce the CPU time, database link wait
time, or db file scattered/sequential read times. However, the direct path I/O waits disappeared
completely. This indicates that all sorting was performed in memory. Of course, it may not be
feasible to keep the sort entirely in memory as the CUSTOMER_TRANSFER_BATCHES table
grows. However, a larger sort area will still speed up the sort and overall query speed even if
the entire sort cannot be performed in memory.
-- 39 --
This example demonstrated the practice of iterative tuning: Curing one bottleneck often
creates another or reveals another performance issue that was masked by the larger issue.
Boosting performance of complex systems is usually an iterative process.
CONCLUSION
Because wait events touch all areas of Oracle server operation, the wait event interface can
provide insights into just about any kind of performance problem you might experience. Other
metrics, such as cache hit ratios, can give you valuable information, but it is the wait event
interface that ties everything together. In the examples discussed here, wait event data directed
us to a swift diagnosis of each problem.
The wait event interface, with its four dynamic performance views and tracing facility, enables
you to see how an Oracle process is spending its time when it is not using the CPU. While wait
event information may not give you everything you might need to cure every performance
problem, it does give you access to a wealth of information that can guide you on the path to
better system performance.
The basic concepts behind the wait event interface have not changed since Oracle 7. But each
release of Oracle brings a few more enhancements that make the wait event interface easier to
use or more informative. In Oracle 9i, TKPROF has been enhanced to parse wait event
information from trace files and format it in a useful report. Also, some timing information is
tracked in microseconds for more resolution, and the DBMS_SUPPORT package has been
included for easier trace activation. However, these are just bells and whistles. The wait event
interface has been a solid tool for Oracle performance management since Oracle 7.
Roger Schrag, OCP, has been an Oracle DBA and application architect for over twelve years,
starting out at Oracle Corporation on the Oracle Financials development team and moving on
to production DBA and database architect positions at various companies in the San Francisco
Bay Area. In 1995, Roger founded Database Specialists, Inc., a consulting firm specializing in
business solutions based on Oracle technology, including on-site and remote database
administration services. Roger is a frequent speaker at OracleWorld and the IOUG Live
conferences. He is also President of the Northern California Oracle Users Group. You may
contact Roger by email at [email protected].
Terry Sutton, OCP, has been an Oracle DBA for nine years, and has worked in the information
technology area for 16 years. Since 2000, Terry has been a Senior Staff Consultant at Database
Specialists, performing assignments ranging from production database administration to
emergency troubleshooting with a particular focus on Oracle database performance tuning. You
may contact Terry by email at [email protected].
-- 40 --
Note from the authors: We are always looking for interesting wait event situations to learn
from. One can never be exposed to too many examples or techniques. If you are trying to tackle
a problem using the wait event interface, feel free to email us the specifics of your situation.
Well do our best to look over what you send us and give you our thoughts if you wish. But
please understand there arent enough hours in the day for any one person (or two of us,
actually) to solve the worlds Oracle performance problems. Well try to help you if we can.
-- 41 --