0% found this document useful (0 votes)
91 views

Identifying Performance Issues Beyond Oracle Wait

Identifying Performance Issues Beyond Oracle Wait

Uploaded by

fqchina
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Identifying Performance Issues Beyond Oracle Wait

Identifying Performance Issues Beyond Oracle Wait

Uploaded by

fqchina
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Identifying performance issues beyond the

Oracle wait interface








Stefan Koehler!

11.11.15! Page 1!
About me!
Stefan Koehler!
•  Independent Oracle performance consultant and researcher!
•  12+ years using Oracle RDBMS!
•  Oracle performance and internals geek!
•  Main interests: Cost based optimizer and Oracle RDBMS internals!
!

Focus & Services: “It is all about performance” !


•  Oracle performance tuning (e.g. Application, CBO, Database, Design, SQL)!
•  Oracle core internals researching (e.g. DTrace, GDB, Perf, etc.)!
•  Troubleshooting nontrivial Oracle RDBMS issues (e.g. Heap dumps, System State
dumps, etc.)!
•  Services are mainly based on short-term contracting ! !!
! !!
! !www.soocs.de ! [email protected] ! !@OracleSK !
11.11.15! Page 2!
Agenda!
•  Systematic troubleshooting - What are we talking about?!
•  “System call trace” vs. “Stack trace”!
•  Capturing and interpreting “Stack traces” with focus on Linux!
•  Safety warning - Are “Stack traces” safe to use in production?!
•  Combine Oracle wait interface and “Stack traces”!
•  Real life root cause identified + fixed with help of “Stack traces”!

11.11.15! Page 3!
Systematic troubleshooting - What are 

we talking about? (1)!

11.11.15! Page 4!
Systematic troubleshooting - What are 

we talking about? (2)!
1.  Identify performance bottleneck based on response time
Method R by Cary Millsap!
Business process is affected by
single SQL running on CPU only

2.  Interpret execution plan with help of additional SQL execution


statistics (or Real-Time SQL Monitoring) and wait interface!
•  PL/SQL package DBMS_XPLAN or DBMS_SQLTUNE!
No execution plan issue found
!

3.  Capture and interpret session statistics and performance


counters!
•  Tools like Snapper by Tanel Poder!
Still no obvious root cause
! for the high CPU load

4.  Capture and interpret system call or stack traces!


•  This is what this session is about. Disassembling Oracle code.!
11.11.15! Page 5!
“System call trace” vs. “Stack trace”!
•  System call trace!
•  A system call is the fundamental interface between an application and the (Linux)
kernel and is generally not invoked directly, but rather via wrapper functions in
glibc (or some other library). For example: truncate() à truncate() or truncate64()!
•  Example of Oracle using system calls: gettimeofday(), pread(), etc.!
•  Tools: Strace (Linux), Truss (AIX / Solaris), Tusc (HP-UX)!
•  Be aware of vDSO / vSyscall64 feature when tracing system calls on Linux!
!

•  Stack trace / Stack backtrace!


•  A call stack is the list of names of methods called at run time from the beginning of
a program until the execution of the current statement!
•  Tools: Oradebug (Oracle), GDB + wrappers or Perf or SystemTap (Linux),
! ! DTrace (Solaris), Procstack (AIX)!
! The stack trace includes the called methods / functions of an Oracle process
and the system call trace includes only the (function) requests to the OS kernel
11.11.15! Page 6!
Capturing “Stack traces” with focus on 

Linux (1)!
•  Tool “Oradebug” (Oracle tool and platform independent)!
!SQL>oradebug SETMYPID / SETOSPID <PID>
SQL> oradebug SHORT_STACK Code path of oradebug request - SIGUSR2 signal
+<NUM> = Offset in bytes from beginning of symbol
! (function) where child function call happened

11.11.15! Page 7!
Capturing “Stack traces” with focus on 

Linux (2)!
•  Tool “GDB” (GNU debugger) and its wrapper script pstack!
shell> gdb shell> /usr/bin/pstack <PID>
(gdb) attach <PID>
(gdb) backtrace

GDB is based on ptrace() system calls

11.11.15! Page 8!
Capturing “Stack traces” with focus on 

Linux (3)!
•  Performance counters for Linux (Linux kernel-based subsystem)!
•  Framework for collecting and analyzing performance data, e.g. hardware events,
including retired instructions and processor clock cycles and many more!
•  Based on sampling (default avg. 1000 Hz respectively 1000 samples/sec)!
•  Caution in virtualized environments when capturing cpu-cycles events (VMware
KB #2030221)!
•  Tool ”Perf” is based on perf_events interface exported by Linux kernel (>= 2.6.31)!
shell> perf record -e cpu-cycles -o /tmp/perf.out -g -p <PID>
Hardware event (cpu-cycles) = Usage of kernel’s performance registers
! Software event (cpu-clock) = Depends on timer interrupt

•  Poor man’s stack profiling!


•  When no other tool is available and you need a quick insight into sampled stacks!
shell> export LC_ALL=C ; for i in {1..20} ; do pstack <PID>
| ./os_explain -a ; done | sort -r | uniq -c!
Script by Tanel Poder to translate C function names into known functionality
11.11.15! Page 9!
Capturing “Stack traces” with focus on 

Linux (4)!
•  Listing other capturing tools for completeness!
•  OStackProf by Tanel Poder (needs to be run from Windows SQL*Plus client as
based on oradebug short_stack and VBS script for post processing)!

•  DTrace on Solaris (e.g. DTrace toolkit script “hotuser” by Brendan Gregg or


analysis with PID provider)!

•  DTrace on Linux lacks in case of userspace integration / probing!

•  SystemTap (with Linux kernel >= 3.5 for userspace probing) otherwise “utrace
patch” needs to be applied!

!
11.11.15! Page 10!
Interpreting “Stack traces” with focus on 

Linux!
•  Performance counters for Linux (Linux kernel-based subsystem)!
•  Tool “Perf”!
shell> perf report -i /tmp/perf.out -g none -n --stdio
shell> perf report -i /tmp/perf.out -g graph -n --stdio !
!Problem: Depending on the stack trace content there may be too much data to
!interpret in this format. Main question: Where is the bulk of CPU time spent?!
!
•  Tool “Flame Graph” by Brendan Gregg (works with DTrace & SystemTap too)!
!shell> perf script -i /tmp/perf.out | ./stackcollapse-
perf.pl > out.perf.folded
shell>./flamegraph.pl out.perf.folded > perf-out.svg

11.11.15! Page 11!


Safety warning - Are “Stack traces” safe 

to use in production?!
•  If your database is already in such a state …! ! ! ! !
… then don’t worry about the possible ! ! !
consequences and issues by capturing ! ! ! ! !
stack traces!
!
•  Be aware of different behavior by capturing stack traces, if only
some specific business processes are affected !
•  Tool “Oradebug” - “Unsafe” as it alters code path / SIGUSR2 (e.g bug #15677306)!
•  Tool “GDB” (and its wrappers) - “Unsafe” as it suspends the process (ptrace
syscall) with possible impact on communication to kernel or other processes!
•  Tool “Perf” based on Linux performance counters - Safe by design, but fallback to
the other tools is still needed, if the process is not running on CPU and stuck
somewhere else!
•  DTrace (Solaris) - Safe by design!

11.11.15! Page 12!


Combine Oracle wait interface and 

“Stack traces”!
•  Fulltime.sh by Craig Shallahamer and Frits Hoogland!
•  Based on V$SESSION_EVENT and Linux performance counters!
shell> fulltime.sh <PID> <SAMPLE_DURATION> <SAMPLE_COUNT>

•  Oracle 12c enhancement - Diagnostic event “wait_event[]” in


"new" kernel diagnostics & tracing infrastructure!
SQL> oradebug doc event name wait_event
wait_event: event to control wait event post-wakeup actions
SQL> alter session set events 'wait_event["<wait event name>"]
trace("%s\n", shortstack())';
Combine extended SQL trace & event wait_event[]
Function kslwtectx marks end of wait event

11.11.15! Page 13!


Real life root cause identified + fixed 

with help of “Stack traces” (1)!
•  Environment and issue!
•  Large SAP system with Oracle 11.2.0.2 running on AIX 6.1 !
•  Most of the SAP work processes are stuck in a simple INSERT statement and
burning up all CPUs on database server!
•  Index key compression and OLTP compression is enabled!
•  SQL statement:!
SQL> INSERT INTO "BSIS” VALUES(:A0 , ... ,:A81);

•  Applying systematic troubleshooting!


•  Identify performance bottleneck based on response time with Method R!
!Performance bottleneck is clearly caused by the INSERT statement as 100%
!of the end user response time is spent on it and all application processes are
!affected by this !
!No further response time analysis needed here!
11.11.15! ! Page 14!
Real life root cause identified + fixed 

with help of “Stack traces” (2)!
•  Applying systematic troubleshooting!
•  Interpret execution plan with help of additional SQL execution statistics (or Real-
Time SQL Monitoring) and wait interface!
!

11.11.15! Page 15!


Real life root cause identified + fixed 

with help of “Stack traces” (3)!
•  Applying systematic troubleshooting!
•  Capture and interpret session statistics and performance counters!
!

11.11.15! Page 16!


Real life root cause identified + fixed 

with help of “Stack traces” (4)!
•  Applying systematic troubleshooting!
•  Capture and interpret session statistics and performance counters!

11.11.15! Page 17!


Real life root cause identified + fixed 

with help of “Stack traces” (5)!
•  Applying systematic troubleshooting!
•  Capture and interpret system call or stack traces!
!!
!
!
!

•  Process is stuck in main call stack “ktspscan_bmb” + on-top functions. The


high CPU usage (“session logical reads”) is the consequence of it!
•  Table “BSIS” is stored in an ASSM tablespace and the call stack “ktspfsrch <-
ktspscan_bmb” is related to “first level bitmap block search”!
•  MOS search results in bug #13641076 – “HIGH AMOUNT OF BUFFER GETS
FOR INSERT STATEMENT ­ REJECTIONLIST DOES NOT FIRE”!
•  Root cause found and can be fixed by applying corresponding patch!

11.11.15! ! Page 18!


!
!
Questions and answers!
!
!
!
!
!
!
!
!!
Download links and further information to all mentioned tools and procedures
! are in the reference section of the manuscript
! ! !!
! !!
! !www.soocs.de ! [email protected] ! !@OracleSK !
11.11.15! Page 19!

You might also like