Understanding Processor Utilization With Ibm Powervm
Understanding Processor Utilization With Ibm Powervm
Index
1. Introduction ................................................................................................................................. 3
2. Reference Documentation ........................................................................................................... 3
3. Know AIX and Firmware Issues................................................................................................. 4
4. What Information to Collect ....................................................................................................... 4
4.1 Collecting Hardware and LPAR Configuration Information................................................ 4
4.1.1 prtconf command: .......................................................................................................... 4
4.1.2 lparstat commands: ........................................................................................................ 5
4.2 Collecting Performance Information .................................................................................... 7
4.2.1 Sample Processor Utilization Data ................................................................................ 8
4.2.3 Differences in CPU Consumption by Hypervisor at Low and High Utilization .. 11
5. Collecting Detailed Performance Data ..................................................................................... 13
5.1 curt Performance Report ..................................................................................................... 13
5.2 tprof Performance Report ................................................................................................... 14
1. Introduction
A number of customers on Oracle RAC and IBM AIX have opened SRs after seeing a higher
than expected total system utilization running the Oracle 11.2 Clusterware and database stack on
AIX with an otherwise idle system. For example database instance(s) are ONLINE but not being
used (this will be referred to as “idle RAC cluster”). Because of the virtualized environments
frequently being used on modern processors, it is important to understand how to properly assess
the utilization in these environments.
There are probably several factors leading to the number of related SRs being opened:
1. There appears to be higher utilization when running Oracle Clusterware on AIX than
other platforms. Oracle and IBM are working together to better understand and if
possible correct this problem. This work is still in progress.
2. There is a known bug in Oracle 11.2 Clusterware release where a higher CPU
utilization is seen even when the database instances are idle. An Oracle patch
13498267 needs to be applied for this problem.
3. There is a known firmware and AIX problem resulting in addition CPU utilization in
some cases when using shared LPARs compared to dedicated LPARs. When proper
code levels are installed there is only a very slight performance difference between
dedicated and shared when looking at the overhead of an idle RAC cluster.
4. In some case customers may not be correctly assessing the system utilization on their
virtualized systems.
The purpose of this paper is to address the third and fourth items in this list, by documenting the
recommended code levels and explaining how to properly evaluate the utilization on a
virtualized AIX system.
2. Reference Documentation
The following paper provides very detailed information useful in understanding processor
utilization in a virtualized environment.
The following paper provides detailed information related to processor performance monitoring
IBM's Energy Saving features which modify the CPU frequency. It is not clear the extent to
which this feature is being used by customers, but it is important to keep in mind when
evaluating the customers processor utilization, because if it is being used and the correct
performance tools are not used it can skew the processor utilization.
A hypervisor problem was identified and fixed that caused the hypervisor to delay dispatching a
partition even though it was ready to run. This added latency that adversely affected
performance. This problem can effect POWER7 systems running any level of Ax720 firmware
prior to Ax720_101. But it is recommended to update to the latest available firmware.
If required, AIX and Firmware fixes can be obtained from IBM Support Fix Central:
http://www-933.ibm.com/support/fixcentral/main/System+p/AIX
# prtconf|head -30
System Model: IBM,9117-MMB
Network Information
Host Name: rac82
IP Address: 9.47.89.163
Sub Netmask: 255.255.255.0
Gateway: 9.47.89.1
Name Server:
Domain Name:
Of primary interest are the System Model, Processor Type, Processor Clock Speed, and Platform
Firmware level; highlighted in yellow in the sample output.
# lparstat -i
Node Name : rac82
Partition Name : prd1117_vclient2_el9-89-163
Partition Number : 5
Type : Shared-SMT-4
Mode : Capped
Entitled Capacity : 2.00
Partition Group-ID : 32773
Shared Pool ID : 0
Online Virtual CPUs : 2
Maximum Virtual CPUs : 4
Minimum Virtual CPUs : 2
Online Memory : 32768 MB
Maximum Memory : 32768 MB
The key values from this output related to processor utilization are described below. Some
settings of these values can impact the processor utilization. The following section describing
what performance commands to run shows how to access the impact of some of these settings.
Type:
Indicates whether the LPAR is using dedicated or shared CPU resource and if SMT is
turned ON. The Type is displayed in the format [Shared | Dedicated] [ -SMT ] [ -# ]
Shared - Indicates that the LPAR is running in the Shared processor mode. In
shared mode virtual processors from other LPARs may be time-sliced on the same
physical processor.
Dedicated - Indicates that the LPAR is running in the dedicated processor mode.
In dedicated mode the LPAR is not time-sliced with virtual processors from other
LPARs on the physical processor. There is a special optional mode of dedicated,
called donating, in which if the LPAR is idle other virtual processors can
“borrow” the physical processor, but once the LPAR is no longer idle it resumes
on the physical processor without time-slicing.
SMT[-#] - Indicates that the LPAR has SMT mode turned ON and the number of
SMT threads. If only “SMT” is shown the number of threads per processor is 2. If
the number of threads is greater than 2, then the number of threads is also
displayed. Example: SMT or SMT-2 = SMT with 2 threads; SMT-4 = SMT with
4 threads.
Mode:
Indicates whether the LPAR processor capacity is capped or uncapped allowing it to
consume idle cycles from the shared pool. Dedicated LPAR is capped or donating.
Entitled Capacity
The number of processing units this LPAR is entitled (guaranteed) to receive. If the
LPAR mode is capped, then the LPAR can not consume more then the entitled capacity.
If the LPAR is uncapped then if needed it can consume additional resources (up to the
number of virtual processors, ie each virtual processor can consume up to 1 full physical
processor) as long as there are available resources on the system not required to meet the
Entitled Capacity of other LPARs.
Sample output from these recommended commands is listed below, along with explanations of
the output. In some cases multiple commands for displaying similar information are included.
Note:
On AIX, whenever per CPU utilization is shown it is calculated as a percentage of the
physical processor consumption of each individual physical processor. Because of this,
when looking at the per CPU utilization the sum of the utilization percentages (USR,
SYS, WAIT and IDLE) will always be 100%, even if the physical processor utilization
on that processor is much lower than 100%. If this isn’t understood it can be misleading,
and lead to a conclusion that the system utilization is higher then it actually is.
In contrast to the per CPU utilization, the System Level (ALL) CPU utilization
percentages are relative to Entitled Capacity and in Uncapped mode relative to the
Physical Capacity being consumed (Physc) once it is greater then the Entitled Capacity.
This is more intuitive then the way the per CPU utilization is displayed.
For the examples only one sample was collected to keep the output short, normally it would be
desirable to collect multiple samples over a longer time.
# vmstat 10 1
System configuration: lcpu=8 mem=32768MB ent=2.00
the entitled capacity percentage can sometimes exceed 100%. This excess is noticeable
only with small sampling intervals.
# sar -P ALL 10 1
# lparstat -h 10 1
%user %sys %wait %idle physc %entc lbusy vcsw phint %hypv hcalls
----- ----- ------ ------ ----- ----- ------ ----- ----- ------ ------
7.4 6.8 0.0 85.8 0.43 21.7 8.9 2144 2 88.0 14188
Relevant columns:
lbusy
Indicates the percentage of logical processor(s) utilization that occurred while executing
at the user and system level.
vcsw
Indicates the number of virtual context switches that are virtual-processor hardware
preemptions.
phint
Indicates the number of phantom (targeted to another shared partition in this pool)
interruptions received.
%hypv
Indicates the percentage of physical processor consumption spent making hypervisor
calls.
hcalls
Indicates the average number of hypervisor calls that were started.
lparstat –h command sample output from not busy system
# lparstat –E 10 1
--------Actual-------- ------Normalised------
user sys wait idle freq user sys wait idle
---- ---- ---- ---- --------- ---- ---- ---- ----
0.147 0.137 0.000 1.716 3.1GHz[100%] 0.147 0.137 0.000 1.716
With option –E lparstat reports Scaled Processor Utilization Resource Register (SPURR) based
utilization metrics if run on a SPURR-capable processor. This output is useful when the Power
Saving Mode is Enabled.
The Normalized columns indicate what the utilization would be if the machine was running at
the full processor speed, and therefore are a better indicator of the available capacity (since as the
processor becomes busier the frequency will be restored to the full speed).
See the reference CPU frequency monitoring using lparstat for more details.
Lparstat –E command sample output
# lparstat –c 10 1
System is not running in AME memory mode, -c flag is not valid
If AME is not enabled (target memory expansion factor is not displayed, or if “-“ is displayed)
then no stats are returned.
Below is sample output from another system, under load, with AME enabled.
System configuration: type=Shared mode=Capped mmode=Ded-E smt=4 lcpu=80
mem=40960MB tmem=32768MB psize=32 ent=20.00
%user %sys %wait %idle physc %entc lbusy vcsw phint %xcpu xphysc dxm
----- ----- ------ ------ ----- ----- ------ ----- ----- ------ ------ ------
18.3 4.1 4.0 73.6 7.06 35.3 10.3 2654 87 0.8 0.0576 0
21.3 4.0 3.1 71.6 7.79 39.0 11.5 3799 121 0.3 0.0240 0
21.3 4.1 3.2 71.4 8.02 40.1 11.8 8983 112 0.6 0.0502 0
24.8 3.4 2.6 69.2 8.89 44.5 13.4 4433 120 1.4 0.1286 0
17.5 3.4 2.4 76.7 6.57 32.9 11.2 6329 119 0.4 0.0295 0
60.1 5.6 3.4 30.8 16.25 81.2 41.9 14123 614 1.3 0.2085 0
25.7 20.7 8.3 45.3 11.99 60.0 31.4 12320 336 37.0 4.4403 0
13.3 10.6 4.5 71.6 7.44 37.2 15.9 8107 171 29.3 2.1798 0
17.1 3.5 5.7 73.7 6.53 32.7 11.4 4205 118 6.3 0.4125 0
23.9 5.2 6.7 64.2 8.89 44.4 15.3 9940 149 5.4 0.4772 0
25.5 3.1 4.8 66.5 8.91 44.6 13.9 4390 142 1.3 0.1120 0
16.7 3.8 4.1 75.4 6.81 34.1 11.6 2980 87 3.8 0.2603 0
27.9 8.6 3.7 59.8 10.99 54.9 17.1 9501 158 10.1 1.1112 0
18.6 7.4 3.6 70.4 8.27 41.3 13.2 4801 121 12.4 1.0275 0
21.0 4.5 3.5 71.0 8.12 40.6 12.6 3548 111 5.7 0.4610 0
20.2 4.9 2.7 72.3 7.93 39.6 13.5 4683 115 2.3 0.1842 0
26.7 4.2 2.4 66.7 9.37 46.9 15.6 3412 136 0.6 0.0537 0
43.0 16.7 2.7 37.6 16.00 80.0 29.0 9829 293 17.3 2.7726 0
15.2 8.3 5.0 71.5 7.19 36.0 15.7 17264 212 21.5 1.5424 0
23.9 6.2 7.4 62.5 9.29 46.4 16.0 11682 203 12.9 1.1936 0
Relevant columns:
exit%xcpu
Indicates the percentage of utilization (relative to the overall CPU consumption by the
logical partition, in other words relative to the value in physc column ) for the Active
Memory Expansion (AME) activity.
xphysc
Indicates the number of physical processors used for the Active Memory Expansion
activity.
dxm
Indicates the size of the expanded memory deficit for the LPAR in MB.
lparstat –c command sample output
# mpstat –s 10 1
Proc0 Proc4
43.13% 0.27%
cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu7
24.79% 8.65% 4.86% 4.83% 0.06% 0.06% 0.06% 0.10%
Displays simultaneous multithreading threads utilization, this –s flag is available only when
mpstat runs in a simultaneous multithreading enabled partition.
The percentages show the percentage of physical processor consumed. This is comparable to the
sar command physc output only broken out by virtual and logical processor.
mpstat –s command sample output
# lparstat -h 10 1
%user %sys %wait %idle physc %entc lbusy vcsw phint %hypv hcalls
----- ----- ------ ------ ----- ----- ------ ----- ----- ------ ------
7.4 6.8 0.0 85.8 0.43 21.7 8.9 2144 2 88.0 14188
lparstat –h command sample output from not busy system
# lparstat -h 10 1
%user %sys %wait %idle physc %entc lbusy vcsw phint %hypv hcalls
----- ----- ------ ------ ----- ----- ------ ----- ----- ------ ------
96.1 3.9 0.0 0.0 2.00 100.0 100.0 4 62 0.4 24855
lparstat –h command sample output from busy system
The two preceding tables show the `lparstat –h` command on a not busy and busy system. Note
that on the not busy system the percentage of physical processor consumption spent making
hypervisor calls (%hypv) is very high. In the busy case, the percentage of physical processor
consumption spent making hypervisor calls (%hypv) is very small. The important thing to note is
that when the system is not busy (when the wait process is running) it gives control back to the
hypervisor, this results in additional overhead, however as the system becomes busier this
overhead diminishes.
Note: at this time there is a know problem impacting the accuracy of the %hypv being reported.
In some cases the percentage reported is too high. At this time there is not an APAR to resolve
this.
The lparstat command with the –H option show a detailed break down of the %hypv. Note in the
following example the majority of the %hypv time is in the cede hypervisor call. This is the wait
process giving control back to the hypervisor.
--------------------------------------------------------------------------------
Hypervisor Number of %Total Time %Hypervisor Avg Call Max Call
Call Calls Spent Time Spent Time(ns) Time(ns)
Please refer to the AIX documentation for details on the curt command and its options:
http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.cmds%2Fdoc%2Fa
ixcmds1%2Fcurt.htm
:
System Summary
--------------
processing percent percent
total time total time busy time
The curt report includes the First Level Interrupt Handler (FLIH) time. This is noted because it is
not accounted for in the tprof report mentioned below.
The tprof command can run in different modes. In the following example it is run in offline
mode which allows specifying a larger trace buffer and logfile.
:
Process Freq Total Kernel User Shared Other Java
======= ==== ===== ====== ==== ====== ===== ====
wait 4 88.92 88.91 0.01 0.00 0.00 0.00
./ocssd.bin 18 1.76 1.17 0.00 0.58 0.000 0.00
./oraagent.bin 67 1.15 0.56 0.31 0.28 0.00 0.00
./orarootagent.bin 385 1.06 0.66 0.17 0.23 0.00 0.00
./ohasd.bin 20 0.91 0.62 0.01 0.29 0.000 0.00
./gipcd.bin 6 0.88 0.61 0.00 0.27 0.00 0.00
./crsd.bin 23 0.76 0.52 0.00 0.23 0.00 0.00
./evmd.bin 8 0.67 0.47 0.00 0.20 0.00 0.00
./octssd.bin 8 0.65 0.46 0.00 0.19 0.00 0.00
//bin/sh 2253 0.46 0.41 0.02 0.01 0.01 0.00
/ora112/grid/perl/bin/perl 223 0.14 0.07 0.05 0.02 0.01 0.00
ora_lms1_orcl_4 1 0.14 0.11 0.02 0.00 0.00 0.00
ora_lms0_gpdb_4 1 0.14 0.11 0.02 0.00 0.00 0.00
ora_lms1_gpdb_4 1 0.13 0.11 0.02 0.00 0.00 0.00
ora_lms0_orcl_4 1 0.13 0.11 0.02 0.00 0.00 0.00
asm_dia0_+ASM4 1 0.12 0.07 0.04 0.01 0.00 0.00
gil 4 0.12 0.11 0.00 0.00 0.00 0.00
ora_dia0_gpdb_4 1 0.11 0.06 0.04 0.01 0.00 0.00
ora_dia0_orcl_4 1 0.11 0.06 0.04 0.01 0.00 0.00
asm_lms0_+ASM4 1 0.09 0.07 0.02 0.00 0.00 0.00
/usr/sbin/lsattr 502 0.09 0.07 0.01 0.00 0.00 0.00
/usr/sbin/sshd 275 0.08 0.05 0.02 0.01 0.00 0.00
/sbin/acfsutil.bin 115 0.07 0.03 0.00 0.00 0.04 0.00
ora_lmon_gpdb_4 1 0.07 0.02 0.05 0.00 0.00 0.00
ora_lmon_orcl_4 1 0.06 0.02 0.04 0.00 0.00 0.00
/usr/bin/awk 386 0.06 0.05 0.00 0.00 0.00 0.00
/usr/bin/pwd 272 0.05 0.04 0.00 0.00 0.00 0.00
asm_lmon_+ASM4 1 0.05 0.02 0.02 0.00 0.00 0.00
/usr/bin/ps 170 0.03 0.03 0.00 0.00 0.00 0.00
/bin/sh 161 0.03 0.03 0.00 0.00 0.00 0.00
./cssdagent 11 0.03 0.02 0.00 0.01 0.00 0.00
ora_j000_orcl_4 27 0.03 0.01 0.01 0.00 0.01 0.00
./cssdmonitor 9 0.03 0.02 0.00 0.01 0.00 0.00
asm_lmd0_+ASM4 1 0.03 0.02 0.01 0.00 0.00 0.00
/usr/sbin/instfix 5 0.02 0.01 0.01 0.00 0.00 0.00
:
Sample section from tprof report