RHEL Kernel Performance Optimizations and Tuning
RHEL Kernel Performance Optimizations and Tuning
CharacterizationandTuning
LarryWoodman
JohnShakshober
Agenda
Section1Systemoverview
Section2AnalyzingSystemPerformance
Section3TuningRedhatEnterpriseLinux
Section4PerfomanceAnalysisandTuningExamples
References
Section1SystemOverview
Processors
NUMA
MemoryManagement
FileSystem&DiskIO
ProcessorsSupported/Tested
RHEL4Limitations
x8616
x86_648,512(LargeSMP)
ia648,64(SGI)
RHEL5Limitations
x8632
x86_64256
ia641024
Processortypes
UniProcessor
SymmetricMultiProcessor
MultiCore
SymmetricMultiThread
NUMASupport
RHEL3NUMASupport
Basicmultinodesupport
Localmemoryallocation
RHEL4NUMASupport
NUMAawarememoryallocationpolicy
NUMAawarememoryreclamation
Multicoresupport
RHEL5NUMASupport
NUMAawarescheduling
CPUsets
NUMAawareslaballocator
NUMAawarehugepages
AMD64SystemNumaMemoryLayout
S1
C0
C1
Memory
ProcessonS1C0
S2
C0
C1
Memory
S S SS S S S S S S SS
1 2 34 1 2 3 4 1 2 34
Interleaved(NonNUMA)
C0
C1
C0
C1
Memory
Memory
S3
S4
ProcessonS1C0
S1
S2
S3 S4
NonInterleaved(NUMA)
MemoryManagement
PhysicalMemory(RAM)Management
VirtualAddressSpaceMaps
KernelWiredMemory
ReclaimableUserMemory
PageReclaimDynamics
PhysicalMemorySupported/Tested
RHEL3Limitations
x8664GB
x86_6464GB
ia64128GB
RHEL4Limitations
x8664GB
x86_64128GB
ia641TB
RHEL5Limitations
x8664GB
x86_64256GB
ia642TB
PhysicalMemory(RAM)Management
PhysicalMemoryLayout
NUMAversusNonNUMA(UMA)
NUMANodes
Zones
mem_maparray
Pagelists
Freelist
Active
Inactive
MemoryZones
32bit
64bit
Upto64GB(PAE)
EndofRAM
HighmemZone
NormalZone
896MBor3968MB
NormalZone
16MB
DMAZone
0
16MB
DMAZone
0
MemoryZoneUtilization
DMA
24bitI/O
Normal
KernelStatic
KernelDynamic
slabcache
bouncebuffers
driverallocations
UserOverflow
Highmem(x86)
User
Anonymous
Pagecache
Pagetables
PerZoneResources
RAM
mem_map
Pagelists:free,activeandinactive
Pageallocationandreclamation
Pagereclamationwatermarks
mem_map
Kernelmaintainsapagestructforeach4KB(16KBonIA64
and64KBforPPC64/RHEL5)pageofRAM
mem_mapistheglobalarrayofpagestructs
Pagestructsize:
RHEL332bit=60bytes
RHEL364bit=112bytes
RHEL4/RHEL532bit=32bytes
RHEL4/RHEL564bit=56bytes
16GBx86runningRHEL3:~250MBmem_maparray!!!
RHEL4&5mem_mapisonlyabout50%oftheRHEL3
mem_map.
Perzonepagelists
ActiveListmostrecentlyreferenced
Anonymousstack,heap,bss
Pagecachefilesystemdata/metadata
InactiveListleastrecentlyreferenced
Dirtymodified
Laundrywritebackinprogress
Cleanreadytofree
Free
Coalescedbuddyallocator
PerzoneFreelist/buddyallocatorlists
Kernelmaintainsperzonefreelist
Buddyallocatorcoalescesfreepagesintolargerphysicallycontiguouspieces
DMA
1*4kB4*8kB6*16kB4*32kB3*64kB1*128kB1*256kB1*512kB0*1024kB1*2048kB2*4096kB=11588kB)
Normal
217*4kB207*8kB1*16kB1*32kB0*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=3468kB)
HighMem
847*4kB409*8kB17*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=7924kB)
Memoryallocationfailures
Freelistexhaustion.
Freelistfragmentation.
PerNUMANodeResources
Memoryzones(DMA&Normalzones)
CPUs
IO/DMAcapacity
Pagereclamationdaemon(kswapd#)
NUMANodesandZones
64bit
Node1
EndofRAM
NormalZone
NormalZone
Node0
16MB(or4GB)
DMAZone
0
VirtualAddressSpaceMaps
32bit
3G/1Gaddressspace
4G/4Gaddressspace(RHEL3/4)
64bit
X86_64
IA64
Linux32bitAddressSpaces(SMP)
Virtual
3G/1GKernel(SMP)
0GB3GB4GB
RAM
DMANormalHighMem
Linux32bitAddressSpace(Hugemem)
Virtual
4G/4GKernel(Hugemem)
User(s)
Kernel
RAM
0GB3968MB
DMANormal3968MBHighMem
Linux64bitAddressSpace
x86_64
VIRT
Kernel
User
0128TB(2^47)
RAM
IA64
VIRT
0
RAM
MemoryPressure
32bit
DMA
Normal
Highmem
KernelAllocationsUserAllocations
64bit
DMA
Normal
KernelandUserAllocations
KernelMemoryPressure
StaticBoottime(DMAandNormalzones)
Kerneltext,data,BSS
Bootmemallocator,tablesandhashes(mem_map)
Dynamic
Slabcache(Normalzone)
Kerneldatastructs
Inodecache,dentrycacheandbufferheaderdynamics
Pagetables(Highmem/Normalzone)
32bitversus64bit
HughTLBfs(Highmem/Normalzone)
UserMemoryPressure
Anonymous/pagecachesplit
PagecacheAllocationsPageFaults
pagecache
anonymous
PageCache/Anonymousmemorysplit
Pagecachememoryisglobalandgrowswhenfilesystemdataisaccessed
untilmemoryisexhausted.
Pagecacheisfreed:
Underlyingfilesaredeleted.
Unmountofthefilesystem.
Kswapdreclaimspagecachepageswhenmemoryisexhausted.
Anonymousmemoryisprivateandgrowsonuserdemmand
Allocationfollowedbypagefault.
Swapin.
Anonymousmemoryisfreed:
Processunmapsanonymousregionorexits.
Kswapdreclaimsanonymouspages(swapout)whenmemoryis
exhausted
PageCache/Anonymousmemorysplit(Cont)
Balancebetweenpagecacheandanonymousmemory.
Dynamic.
Controlledvia:
/proc/sys/vm/pagecache.
/proc/sys/vm/swappinessonRHEL4/RHEL5.
32bitMemoryReclamation
KernelAllocationsUserAllocations
DMA
Normal
Highmem
KernelReclamationUserReclamation
(kswapd)(kswapd,bdflush/pdflush)
slapcachereaping
pageaging
inodecachepruningpagecacheshrinking
bufferheadfreeing swapping
dentrycachepruning
64bitMemoryReclamation
RAM
KernelandUserAllocations
KernelandUserReclamation
Anonymous/pagecachereclaiming
PagecacheAllocationsPageFaults
pagecache
anonymous
kswapd(bdflush/pdflush,kupdated)kswapd
pagereclaim
deletionofafile
unmountfilesystem
pagereclaim(swapout)
unmap
exit
PerNode/ZonePagingDynamics
UserAllocations
Reactivate
ACTIVE
Pageaging
INACTIVE
FREE
(Dirty>Clean)
swapout
Reclaiming
bdflush(RHEL3)
pdflush(RHEL4/5)
Userdeletions
MemoryreclaimWatermarks
FreeList
AllofRAM
Donothing
PagesHighkswapdsleepsaboveHigh
kswapdreclaimsmemory
PagesLowkswapdwakesupatLow
kswapdreclaimsmemory
PagesMinallmemoryallocatorsreclaimatMin
userprocesses/kswapdreclaimmemory
0
Bufferedfilesystemwrite
pagecache
Memory
copy
buffer
User
100%ofpagecacheRAMdirty
Pagecache
page(dirty)
Kernel
pdflushdandwrite()'ng
processeswritedirtybuffers
40%dirty)processesstart
synchronouswrites
pdflushdwritesdirtybuffersin
background
10%dirtywakeuppdflushd
do_nothing
0%dirty
Bufferedfilesystemread
Memorycopy
Buffer
(dirty)
Pagecache
page
User
Kernel
Section2AnalyzingSystemPerformance
PerformanceMonitoringTools
Whattorunundercertainloads
AnalyzingSystemPerformance
Whattolookfor
PerformanceMonitoringTools
StandardUnixOStools
Monitoringcpu,memory,process,disk
oprofile
KernelTools
/proc,info(cpu,mem,slab),dmesg,AltSysrq
Profilingnmi_watchdog=1,profile=2
Tracing
strace,ltrace
dprobe,kprobe
3rdpartyprofiling/capacitymonitoring
Perfmon,Caliper,vtune
SARcheck,KDE,BEAPatrol,HPOpenview
RedHatTopTools
CPUTools
MemoryTools
ProcessTools
1top
1top
1top
2vmstat
2vmstats
2psopmem
3psaux
3psaur
3gprof
4mpstatPall
4ipcs
4strace,ltrace
5saru
5sarrBW
5sar
6iostat
6free
7oprofile
7oprofile
1iostatx
8gnome
8gnome
2vmstatD
systemmonitor
systemmonitor
3sarDEV#
9KDEmonitor
9KDEmonitor
4nfsstat
10/proc
10/proc
5NEEDMORE!
DiskTools
toppresshhelp,1showcpus,mmemory,tthreads,>
columnsort
top09:01:04up8days,15:22,2users,loadaverage:1.71,0.39,0.12
Tasks:114total,1running,113sleeping,0stopped,0zombie
Cpu0:5.3%us,2.3%sy,0.0%ni,0.0%id,92.0%wa,0.0%hi,0.3%si
Cpu1:0.3%us,0.3%sy,0.0%ni,89.7%id,9.7%wa,0.0%hi,0.0%si
Mem:2053860ktotal,2036840kused,17020kfree,99556kbuffers
Swap:2031608ktotal,160kused,2031448kfree,417720kcached
PIDUSERPRNIVIRTRESSHRS%CPU%MEMTIME+COMMAND
27830oracle1601315m1.2g1.2gD1.360.90:00.09oracle
27802oracle1601315m1.2g1.2gD1.061.00:00.10oracle
27811oracle1601315m1.2g1.2gD1.060.80:00.08oracle
27827oracle1601315m1.2g1.2gD1.061.00:00.11oracle
27805oracle1701315m1.2g1.2gD0.761.00:00.10oracle
27828oracle1502758466484620S0.30.30:00.17tpcc.exe
1root1604744580480S0.00.00:00.50init
2rootRT0000S0.00.00:00.11migration/0
3root3419000S0.00.00:00.00ksoftirqd/0
vmstat(pagingvsswapping)
vmstat10
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
200548352420052423457600546315251303096
020169784020052439314400057850482108539941221463
300784420052457841090059330589463243144307321842
mstat10
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
200548352420052423457600546315251303096
02016623402005242345760057850482108539941221463
3023567873842005242345761875423745193589463243144307321842
VmstatIOzone(8GBfilewith6GBRAM)
#!depletememoryuntilpdflushturnson
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
200448352420052423457600546315251303096
020169784020052429314400057850482108539941221463
3001537884200524384109200193589463243144307321842
02052812020052462281720047888810177133921322246
01046140200524671373600179110719144718251303535
22050972200524670574400232119698131619710253144
....
#!nowtransitionfromwritetoreads
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
14051040200524670554400213351912658390265618
1103506420052467127240040118911136720210354223
01068264234372664702000767445420484032072073
01034468234372667801600773913416202834091872
01047320234372669035600810507717832916072073
10038756234372669834400761364420273705191972
01031472234372670653200767253316012807081973
iostatxofsameIOzoneEXT3filesystem
Iostatmetrics
ratesperfsecsizesandresponsetime
r|wrqm/srequestmerged/saverqszaveragerequestsz
r|wsec/s512bytesectors/savequszaveragequeuesz
r|wKB/sKilobyte/sawaitaveragewaittimems
r|w/soperations/ssvcmaveservicetimems
Linux2.4.2127.0.2.ELsmp(node1)05/09/2005
avgcpu:%user%nice%sys%iowait%idle
0.400.002.630.9196.06
Device:rrqm/swrqm/sr/sw/srsec/swsec/srkB/swkB/savgrqszavgquszawaitsvctm%util
sdi16164.600.00523.400.00133504.000.0066752.000.00255.071.001.911.8898.40
sdi17110.100.00553.900.00141312.000.0070656.000.00255.120.991.801.7898.40
sdi16153.500.00522.500.00133408.000.0066704.000.00255.330.981.881.8697.00
sdi17561.900.00568.100.00145040.000.0072520.000.00255.311.011.781.76100.00
SAR
[root@localhostredhat]#saru33
Linux2.4.2120.EL(localhost.localdomain)05/16/2005
10:32:28PMCPU%user%nice%system%idle
10:32:31PMall0.000.000.00100.00
10:32:34PMall1.330.000.3398.33
10:32:37PMall1.340.000.0098.66
Average:all0.890.000.1199.00
[root]sarnDEV
Linux2.4.2120.EL(localhost.localdomain)03/16/2005
01:10:01PMIFACErxpck/stxpck/srxbyt/stxbyt/srxcmp/stxcmp/
srxmcst/s
01:20:00PMlo3.493.49306.16306.160.00
0.000.00
01:20:00PMeth03.893.532395.34484.700.00
0.000.00
01:20:00PMeth10.000.000.000.000.00
0.000.00
free/numastatmemoryallocation
[root@localhostredhat]#freel
totalusedfreesharedbuffers
cached
Mem:511368342336169032029712
167408
Low:51136834233616903200
0
High:00000
0
/+buffers/cache:145216366152
Swap:104324001043240
numastat(on2cpux86_64basedsystem)
node1node0
numa_hit980333210905630
numa_miss20490181609361
numa_foreign16093612049018
interleave_hit5868954749
local_node977092710880901
other_node20814231634090
ps
[root@localhostroot]#psaux
[root@localhostroot]#psaux|more
USERPID%CPU%MEMVSZRSSTTYSTATSTARTTIMECOMMAND
root10.10.11528516?S23:180:04init
root20.00.000?SW23:180:00[keventd]
root30.00.000?SW23:180:00[kapmd]
root40.00.000?SWN23:180:00[ksoftirqd/0]
root70.00.000?SW23:180:00[bdflush]
root50.00.000?SW23:180:00[kswapd]
root60.00.000?SW23:180:00[kscand]
pstree
init/usr/bin/sealer
acpid
atd
auditdpython
{auditd}
automount6*[{automount}]
avahidaemonavahidaemon
bonoboactivati{bonoboactivati}
btapplet
clockapplet
crond
cupsdcupspolld
3*[dbusdaemon{dbusdaemon}]
2*[dbuslaunch]
dhclient
mpstat
[root@localhostredhat]#mpstat33
Linux2.4.2120.EL(localhost.localdomain)05/16/2005
10:40:34PMCPU%user%nice%system%idleintr/s
10:40:37PMall3.000.000.0097.00193.67
10:40:40PMall1.330.000.0098.67208.00
10:40:43PMall1.670.000.0098.33196.00
Average:all2.000.000.0098.00199.22
The/procfilesystem
/proc
meminfo
slabinfo
cpuinfo
pid<#>/maps
vmstat(RHEL4&RHEL5)
zoneinfo(RHEL5)
sysrqtrigger
/proc/meminfo(rhel3,4,5)
RHEL3>cat/proc/meminfo
MemTotal:509876kB
MemFree:17988kB
MemShared:0kB
Buffers:4728kB
Cached:157444kB
SwapCached:46576kB
Active:222784kB
ActiveAnon:118844kB
ActiveCache:103940kB
Inact_dirty:41088kB
Inact_laundry:7640kB
Inact_clean:6904kB
Inact_target:55680kB
HighTotal:0kB
HighFree:0kB
LowTotal:509876kB
LowFree:17988kB
SwapTotal:1044184kB
SwapFree:945908kB
CommitLimit:1299120kB
Committed_AS:404920kB
HugePages_Total:0
HugePages_Free:0
Hugepagesize:2048kB
RHEL4>cat/proc/meminfo
MemTotal:32749568kB
MemFree:31313344kB
Buffers:29992kB
Cached:1250584kB
SwapCached:0kB
Active:235284kB
Inactive:1124168kB
RHEL5>cat/proc/meminfo
MemTotal:1025220kB
MemFree:11048kB
Buffers:141944kB
Cached:342664kB
SwapCached:4kB
Active:715304kB
Inactive:164780kB
HighTotal:0kB
HighFree:0kB
LowTotal:1025220kB
HighTotal:0kB
LowFree:11048kB
LowTotal:32749568kB
SwapFree:2031472kB
SwapTotal:4095992kB
Writeback:0kB
HighFree:0kB
LowFree:31313344kB
SwapFree:4095992kB
Dirty:0kB
Writeback:0kB
Mapped:1124080kB
Slab:38460kB
CommitLimit:20470776kB
Committed_AS:1158556kB
PageTables:5096kB
VmallocTotal:536870911kB
VmallocUsed:2984kB
VmallocChunk:536867627kB
HugePages_Total:0
HugePages_Free:0
Hugepagesize:2048kB
SwapTotal:2031608kB
Dirty:84kB
AnonPages:395572kB
Mapped:82860kB
Slab:92296kB
PageTables:23884kB
NFS_Unstable:0kB
Bounce:0kB
CommitLimit:2544216kB
Committed_AS:804656kB
VmallocTotal:34359738367kB
VmallocUsed:263472kB
VmallocChunk:34359474711kB
HugePages_Total:0
HugePages_Free:0
HugePages_Rsvd:0
Hugepagesize:2048kB
/proc/slabinfo
slabinfoversion:2.1
#name<active_objs><num_objs><objsize><objperslab><pagesperslab>:tunables<limit>
<batchcount><sharedfactor>:slabdata<active_slabs><num_slabs><sharedavail>
nfsd4_delegations0065661:tunables54278:slabdata000
nfsd4_stateids00128301:tunables120608:slabdata000
nfsd4_files0072531:tunables120608:slabdata000
nfsd4_stateowners0042491:tunables54278:slabdata000
nfs_direct_cache00128301:tunables120608:slabdata000
nfs_write_data363683292:tunables54278:slabdata440
nfs_read_data323576851:tunables54278:slabdata770
nfs_inode_cache13831389104031:tunables24128:slabdata4634630
nfs_page00128301:tunables120608:slabdata000
fscache_cookie_jar35372531:tunables120608:slabdata110
ip_conntrack_expect00136281:tunables120608:slabdata000
ip_conntrack75130304131:tunables54278:slabdata10100
bridge_fdb_cache0064591:tunables120608:slabdata000
rpc_buffers88204821:tunables24128:slabdata440
rpc_tasks3030384101:tunables54278:slabdata330
/proc/cpuinfo
[lwoodman]$cat/proc/cpuinfo
processor:0
vendor_id:GenuineIntel
cpufamily:6
model:15
modelname:Intel(R)Xeon(R)[email protected]
stepping:6
cpuMHz:2394.070
cachesize:4096KB
physicalid:0
siblings:2
coreid:0
cpucores:2
fpu:yes
fpu_exception:yes
cpuidlevel:10
wp:yes
flags:fpuvmedepsetscmsrpaemcecx8apicsepmtrrpgemcacmovpatpse36clflushdts
acpimmxfxsrssesse2sshttmsyscallnxlmconstant_tscpnimonitords_cplvmxesttm2cx16xtpr
lahf_lm
bogomips:4791.41
clflushsize:64
cache_alignment:64
addresssizes:36bitsphysical,48bitsvirtual
powermanagement:
32bit/proc/<pid>/maps
[root@dhcp8336proc]#cat5808/maps
0022e0000023b000rxp0000000003:034137068/lib/tls/libpthread0.60.so
0023b0000023c000rwp0000c00003:034137068/lib/tls/libpthread0.60.so
0023c0000023e000rwp0000000000:000
0037f00000391000rxp0000000003:03523285/lib/libnsl2.3.2.so
0039100000392000rwp0001100003:03523285/lib/libnsl2.3.2.so
0039200000394000rwp0000000000:000
00c4500000c5a000rxp0000000003:03523268/lib/ld2.3.2.so
00c5a00000c5b000rwp0001500003:03523268/lib/ld2.3.2.so
00e5c00000f8e000rxp0000000003:034137064/lib/tls/libc2.3.2.so
00f8e00000f91000rwp0013100003:034137064/lib/tls/libc2.3.2.so
00f9100000f94000rwp0000000000:000
080480000804f000rxp0000000003:031046791/sbin/ypbind
0804f00008050000rwp0000700003:031046791/sbin/ypbind
09794000097b5000rwp0000000000:000
b5fdd000b5fde000p0000000000:000
b5fde000b69de000rwp0000100000:000
b69de000b69df000p0000000000:000
b69df000b73df000rwp0000100000:000
b73df000b75df000rp0000000003:033270410/usr/lib/locale/localearchive
b75df000b75e1000rwp0000000000:000
bfff6000c0000000rwpffff800000:000
64bit/proc/<pid>/maps
#cat/proc/2345/maps
004000000100b000rxp00000000fd:001933328/usr/sybase/ASE12_5/bin/dataserver.esd3
0110b00001433000rwp00c0b000fd:001933328/usr/sybase/ASE12_5/bin/dataserver.esd3
01433000014eb000rwxp0143300000:000
4000000040001000p4000000000:000
4000100040a01000rwxp4000100000:000
2a95f730002a96073000p0012b000fd:00819273/lib64/tls/libc2.3.4.so
2a960730002a96075000rp0012b000fd:00819273/lib64/tls/libc2.3.4.so
2a960750002a96078000rwp0012d000fd:00819273/lib64/tls/libc2.3.4.so
2a960780002a9607e000rwp2a9607800000:000
2a9607e0002a98c3e000rws0000000000:06360450/SYSV0100401e(deleted)
2a98c3e0002a98c47000rwp2a98c3e00000:000
2a98c470002a98c51000rxp00000000fd:00819227/lib64/libnss_files2.3.4.so
2a98c510002a98d51000p0000a000fd:00819227/lib64/libnss_files2.3.4.so
2a98d510002a98d53000rwp0000a000fd:00819227/lib64/libnss_files2.3.4.so
2a98d530002a98d57000rxp00000000fd:00819225/lib64/libnss_dns2.3.4.so
2a98d570002a98e56000p00004000fd:00819225/lib64/libnss_dns2.3.4.so
2a98e560002a98e58000rwp00003000fd:00819225/lib64/libnss_dns2.3.4.so
2a98e580002a98e69000rxp00000000fd:00819237/lib64/libresolv2.3.4.so
2a98e690002a98f69000p00011000fd:00819237/lib64/libresolv2.3.4.so
2a98f690002a98f6b000rwp00011000fd:00819237/lib64/libresolv2.3.4.so
2a98f6b0002a98f6d000rwp2a98f6b00000:000
35c7e0000035c7e08000rxp00000000fd:00819469/lib64/libpam.so.0.77
35c7e0800035c7f08000p00008000fd:00819469/lib64/libpam.so.0.77
35c7f0800035c7f09000rwp00008000fd:00819469/lib64/libpam.so.0.77
35c800000035c8011000rxp00000000fd:00819468/lib64/libaudit.so.0.0.0
35c801100035c8110000p00011000fd:00819468/lib64/libaudit.so.0.0.0
35c811000035c8118000rwp00010000fd:00819468/lib64/libaudit.so.0.0.0
35c900000035c900b000rxp00000000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
35c900b00035c910a000p0000b000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
35c910a00035c910b000rwp0000a000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
7fbfff10007fc0000000rwxp7fbfff100000:000
ffffffffff600000ffffffffffe00000p0000000000:000
/proc/vmstat(RHEL4/RHEL5)
cat/proc/vmstat
nr_anon_pages98893
nr_mapped20715
nr_file_pages120855
nr_slab23060
nr_page_table_pages5971
nr_dirty21
nr_writeback0
nr_unstable0
nr_bounce0
numa_hit996729666
numa_miss0
numa_foreign0
numa_interleave87657
numa_local996729666
numa_other0
pgpgin2577307
pgpgout106131928
pswpin0
pswpout34
pgalloc_dma198908
pgalloc_dma32997707549
pgalloc_normal0
pgalloc_high0
pgfree997909734
pgactivate1313196
pgdeactivate470908
pgfault2971972147
pgmajfault8047.
CONTINUED...
pgrefill_dma18338
pgrefill_dma321353451
pgrefill_normal0
pgrefill_high0
pgsteal_dma0
pgsteal_dma320
pgsteal_normal0
pgsteal_high0
pgscan_kswapd_dma7235
pgscan_kswapd_dma32417984
pgscan_kswapd_normal0
pgscan_kswapd_high0
pgscan_direct_dma12
pgscan_direct_dma321984
pgscan_direct_normal0
pgscan_direct_high0
pginodesteal166
slabs_scanned1072512
kswapd_steal410973
kswapd_inodesteal61305
pageoutrun7752
allocstall29
pgrotated73
AltSysrqMRHEL3
SysRq:ShowMemory
Meminfo:
Zone:DMAfreepages:2929min:0low:0high:0
Zone:Normalfreepages:1941min:510low:2235high:3225
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:4870(0HighMem)
(Active:72404/13523,inactive_laundry:2429,inactive_clean:1730,free:4870)
aa:0ac:0id:0il:0ic:0fr:2929
aa:46140ac:26264id:13523il:2429ic:1730fr:1941
aa:0ac:0id:0il:0ic:0fr:0
1*4kB4*8kB2*16kB2*32kB1*64kB2*128kB2*256kB1*512kB0*1024kB1*2048kB2*4096kB=11716kB)
1255*4kB89*8kB5*16kB1*32kB0*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=7764kB)
Swapcache:add958119,delete918749,find4611302/5276354,race0+1
27234pagesofslabcache
244pagesofkernelstacks
1303lowmempagetables,0highmempagetables
0bouncebufferpages,0areontheemergencylist
Freeswap:598960kB
130933pagesofRAM
0pagesofHIGHMEM
3497reservedpages
34028pagesshared
39370pagesswapcached
AltSysrqMRHEL3/NUMA
SysRq:ShowMemory
Meminfo:
Zone:DMAfreepages:0min:0low:0high:0
Zone:Normalfreepages:369423min:1022low:6909high:9980
Zone:HighMemfreepages:0min:0low:0high:0
Zone:DMAfreepages:2557min:0low:0high:0
Zone:Normalfreepages:494164min:1278low:9149high:13212
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:866144(0HighMem)
(Active:9690/714,inactive_laundry:764,inactive_clean:35,free:866144)
aa:0ac:0id:0il:0ic:0fr:0
aa:746ac:2811id:188il:220ic:0fr:369423
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:2557
aa:1719ac:4414id:526il:544ic:35fr:494164
aa:0ac:0id:0il:0ic:0fr:0
2497*4kB1575*8kB902*16kB515*32kB305*64kB166*128kB96*256kB56*512kB39*1024kB30*2048kB300*4096kB=1477692kB)
Swapcache:add288168,delete285993,find726/2075,race0+0
4059pagesofslabcache
146pagesofkernelstacks
388lowmempagetables,638highmempagetables
Freeswap:1947848kB
917496pagesofRAM
869386freepages
30921reservedpages
21927pagesshared
2175pagesswapcached
Buffermemory:9752kB
Cachememory:34192kB
CLEAN:696buffers,2772kbyte,51used(last=696),0locked,0dirty0delay
DIRTY:4buffers,16kbyte,4used(last=4),0locked,3dirty0delay
AltSysrqMRHEL4&5
SysRq:ShowMemory
Meminfo:
Freepages:20128kB(0kBHighMem)
Active:72109inactive:27657dirty:1writeback:0unstable:0free:5032slab:19306mapped:41755pagetables:945
DMAfree:12640kBmin:20kBlow:40kBhigh:60kBactive:0kBinactive:0kBpresent:16384kBpages_scanned:847
all_unreclaimable?yes
protections[]:000
Normalfree:7488kBmin:688kBlow:1376kBhigh:2064kBactive:288436kBinactive:110628kBpresent:507348kB
pages_scanned:0all_unreclaimable?no
protections[]:000
HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
DMA:4*4kB4*8kB3*16kB4*32kB4*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB2*4096kB=12640kB
0*1024kB0*2048kB0*4096kB=7488kB
Normal:1052*4kB240*8kB39*16kB3*32kB0*64kB1*128kB0*256kB1*512kB
HighMem:empty
Swapcache:add52,delete52,find3/5,race0+0
Freeswap:1044056kB
130933pagesofRAM
0pagesofHIGHMEM
2499reservedpages
71122pagesshared
0pagesswapcached
AltSysrqMRHEL4&5/NUMA
Freepages:16724kB(0kBHighMem)
Active:236461inactive:254776dirty:11writeback:0unstable:0free:4181slab:13679mapped:34073
pagetables:853
Node1DMAfree:0kBmin:0kBlow:0kBhigh:0kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
Node1Normalfree:2784kBmin:1016kBlow:2032kBhigh:3048kBactive:477596kBinactive:508444kB
present:1048548kBpages_scanned:0all_unreclaimable?no
protections[]:000
Node1HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
Node0DMAfree:11956kBmin:12kBlow:24kBhigh:36kBactive:0kBinactive:0kBpresent:16384kB
pages_scanned:1050all_unreclaimable?yes
protections[]:000
Node0Normalfree:1984kBmin:1000kBlow:2000kBhigh:3000kBactive:468248kBinactive:510660kB
present:1032188kBpages_scanned:0all_unreclaimable?no
protections[]:000
Node0HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
Node1DMA:empty
Node1Normal:0*4kB0*8kB30*16kB10*32kB1*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=2784kB
Node1HighMem:empty
Node0DMA:5*4kB4*8kB4*16kB2*32kB2*64kB3*128kB2*256kB1*512kB0*1024kB1*2048kB2*4096kB=11956kB
Node0Normal:0*4kB0*8kB0*16kB0*32kB1*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=1984kB
Node0HighMem:empty
Swapcache:add44,delete44,find0/0,race0+0
Freeswap:2031432kB
524280pagesofRAM
10951reservedpages
363446pagesshared
0pagesswapcached
AltSysrqT
bashRcurrent016091606
(NOTLB)
CallTrace:[<c02a1897>]snprintf[kernel]0x27(0xdb3c5e90)
[<c01294b3>]call_console_drivers[kernel]0x63(0xdb3c5eb4)
[<c01297e3>]printk[kernel]0x153(0xdb3c5eec)
[<c01297e3>]printk[kernel]0x153(0xdb3c5f00)
[<c010c289>]show_trace[kernel]0xd9(0xdb3c5f0c)
[<c010c289>]show_trace[kernel]0xd9(0xdb3c5f14)
[<c0125992>]show_state[kernel]0x62(0xdb3c5f24)
[<c01cfb1a>]__handle_sysrq_nolock[kernel]0x7a(0xdb3c5f38)
[<c01cfa7d>]handle_sysrq[kernel]0x5d(0xdb3c5f58)
[<c0198f43>]write_sysrq_trigger[kernel]0x53(0xdb3c5f7c)
[<c01645b7>]sys_write[kernel]0x97(0xdb3c5f94)
*loggedin/var/log/messages
AltSysrqWandP
SysRq:ShowCPUs
CPU0:
ffffffff8047ef480000000000000000ffffffff80437f10ffffffff8019378b
000000000000000000000000000000000000000000000000ffffffff801937ba
ffffffff8019378bffffffff80022b27ffffffff800551bf0000000000090000
CallTrace:
[<ffffffff80069572>]show_trace+0x34/0x47
[<ffffffff80069675>]_show_stack+0xd9/0xe8
[<ffffffff801937ba>]showacpu+0x2f/0x3b
[<ffffffff80022b27>]smp_call_function_interrupt+0x57/0x75
[<ffffffff8005bf16>]call_function_interrupt+0x66/0x6c
[<ffffffff8002fcc2>]unix_poll+0x0/0x96
[<ffffffff800551f5>]mwait_idle+0x36/0x4a
[<ffffffff80047205>]cpu_idle+0x95/0xb8
[<ffffffff8044181f>]start_kernel+0x225/0x22a
[<ffffffff8044125b>]_sinittext+0x25b/0x262
oprofilebuiltintoRHEL4&5(smp)
opcontrolon/offdata
opreportanalyzeprofile
startstartcollection
rreverseordersort
stopstopcollection
dumpoutputtodisk
t[percentage]theshold
event=:name:count
toview
Example:
#opcontrolstart
#/bin/timetest1&
#sleep60
#opcontrolstop
#opcontroldump
f/path/filename
ddetails
opannotate
s/path/source
a/path/assembly
oprofileopcontrolandopreportcpu_cycles
#CPU:Core2,speed2666.72MHz(estimated)
CountedCPU_CLK_UNHALTEDevents(Clockcycleswhennothalted)withaunitmaskof0x00(Unhaltedcorec
ycles)count100000
CPU_CLK_UNHALT...|
samples|%|
39743597184.6702vmlinux
197030644.1976zeus.web
169143173.6034e1000
122085142.6009ld2.5.so
117117462.4951libc2.5.so
51646641.1003sim.cgi
23334270.4971oprofiled
12951610.2759oprofile
10997310.2343zeus.cgi
9686230.2064ext3
2701630.0576jbd
ProfilingTools:SystemTap
RedHat,Intel,IBM&Hitachicollaboration
LinuxanswertoSolarisDtrace
Dynamicinstrumentation
Tooltotakeadeeplookintoarunningsystem:
Assistsinidentifyingcausesofperformance
problems
Simplifiesbuildinginstrumentation
Currentsnapshotsavailablefrom:
http://sources.redhat.com/systemtap
Sourceforpresentations/papers
Kernelspacetracingtoday,userspacetracing
underdevelopment
Technologypreviewstatusuntil5.1
parse
probescript
elaborate
probesetlibrary
translatetoC,compile*
loadmodule,startprobe
probekernel
object
extractoutput,unload
probeoutput
*SolarisDtraceisinterpretive
ProfilingTools:SystemTap
Technology:Kprobes:
Incurrent2.6kernels
Upstream2.6.12,backportedtoRHEL4kernel
Kernelinstrumentationwithoutrecompile/reboot
Usessoftwareintandtraphandlerforinstrumentation
Debuginformation:
Providesmapbetweenexecutableandsourcecode
GeneratedaspartofRPMbuilds
Availableat:ftp://ftp.redhat.com
Safety:Instrumentationscriptinglanguage:
Nodynamicmemoryallocationorassembly/Ccode
Typesandtypeconversionslimited
Restrictaccessthroughpointers
Scriptcompilerchecks:
InfiniteloopsandrecursionInvalidvariableaccess
Section3:Tuning
HowtotuneLinux
Capacitytuning
Fixproblemsbyaddingresources
PerformanceTuning
Methodology
1)Documentconfig
2)Baselineresults
3)Whileresultsnonoptimal
a)Monitor/Instrumentsystem/workload
b)Applytuning1changeatatime
c)Analyzeresults,exitorloop
4)Documentfinalconfig
Tuninghowtosetkernelparameters
/proc
[root@foobarfs]#cat/proc/sys/kernel/sysrq(see0)
[root@foobarfs]#echo1>/proc/sys/kernel/sysrq
[root@foobarfs]#cat/proc/sys/kernel/sysrq(see1)
Sysctlcommand
[root@foobarfs]#sysctlkernel.sysrq
kernel.sysrq=0
[root@foobarfs]#sysctlwkernel.sysrq=1
kernel.sysrq=1
[root@foobarfs]#sysctlkernel.sysrq
kernel.sysrq=1
Editthe/etc/sysctl.conffile
#KernelsysctlconfigurationfileforRedHatLinux
#ControlstheSystemRequestdebuggingfunctionalityofthekernel
kernel.sysrq=1
CapacityTuning
Memory
/proc/sys/vm/overcommit_memory
/proc/sys/vm/overcommit_ratio
/proc/sys/vm/max_map_count
/proc/sys/vm/nr_hugepages
Kernel
/proc/sys/kernel/msgmax
/proc/sys/kernel/msgmnb
/proc/sys/kernel/msgmni
/proc/sys/kernel/shmall
/proc/sys/kernel/shmmax
/proc/sys/kernel/shmmni
/proc/sys/kernel/threadsmax
Filesystems
/proc/sys/fs/aio_max_nr
/proc/sys/fs/file_max
OOMkills
OOMkillsswapspaceexhaustion(RHEL3)
Meminfo:
Zone:DMAfreepages:975min:1039low:1071high:1103
Zone:Normalfreepages:126min:255low:1950high:2925
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:1101(0HighMem)
(Active:118821/401,inactive_laundry:0,inactive_clean:0,free:1101)
aa:1938ac:18id:44il:0ic:0fr:974
aa:115717ac:1148id:357il:0ic:0fr:126
aa:0ac:0id:0il:0ic:0fr:0
6*4kB0*8kB0*16kB1*32kB0*64kB0*128kB1*256kB1*512kB1*1024kB1*2048kB0*4096kB=3896kB)
0*4kB1*8kB1*16kB1*32kB1*64kB1*128kB1*256kB0*512kB0*1024kB0*2048kB0*4096kB=504kB)
Swapcache:add620870,delete620870,find762437/910181,race0+200
2454pagesofslabcache
484pagesofkernelstacks
2008lowmempagetables,0highmempagetables
Freeswap:0kB
129008pagesofRAM
0pagesofHIGHMEM
3045reservedpages
4009pagesshared
0pagesswapcached
OOMkillslowmemconsumption(RHEL3/x86)
Meminfo:
zone:DMAfreepages:2029min:0low:0high:0
Zone:Normalfreepages:1249min:1279low:4544high:6304
Zone:HighMemfreepages:746min:255low:29184high:43776
Freepages:4024(746HighMem)
(Active:703448/665000,inactive_laundry:99878,inactive_clean:99730,free:4024)
aa:0ac:0id:0il:0ic:0fr:2029
aa:128ac:3346id:113il:240ic:0fr:1249
aa:545577ac:154397id:664813il:99713ic:99730fr:746
1*4kB0*8kB1*16kB1*32kB0*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB1*4096kB=8116kB)
543*4kB35*8kB77*16kB1*32kB0*64kB0*128kB1*256kB0*512kB1*1024kB0*2048kB0*4096kB=4996kB)
490*4kB2*8kB1*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=2984kB)
Swapcache:add4327,delete4173,find190/1057,race0+0
178558pagesofslabcache
1078pagesofkernelstacks
0lowmempagetables,233961highmempagetables
Freeswap:8189016kB
2097152pagesofRAM
1801952pagesofHIGHMEM
103982reservedpages
115582774pagesshared
154pagesswapcached
OutofMemory:Killedprocess27100(oracle).
OOMkillslowmemconsumption(RHEL4&5/x86)
Freepages:9003696kB(8990400kBHighMem)
Active:323264inactive:346882dirty:327575writeback:3686unstable:0free:2250924slab:177094
mapped:15855pagetables:987
DMAfree:12640kBmin:16kBlow:32kBhigh:48kBactive:0kBinactive:0kBpresent:16384kB
pages_scanned:149all_unreclaimable?yes
protections[]:000
Normalfree:656kBmin:928kBlow:1856kBhigh:2784kBactive:6976kBinactive:9976kBpresent:901120kB
pages_scanned:28281all_unreclaimable?yes
protections[]:000
HighMemfree:8990400kBmin:512kBlow:1024kBhigh:1536kBactive:1286080kBinactive:1377552kB
present:12451840kBpages_scanned:0all_unreclaimable?no
protections[]:000
DMA:4*4kB4*8kB3*16kB4*32kB4*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB2*4096kB=12640kB
Normal:0*4kB2*8kB0*16kB0*32kB0*64kB1*128kB0*256kB1*512kB0*1024kB0*2048kB0*4096kB=656kB
HighMem:15994*4kB17663*8kB11584*16kB8561*32kB8193*64kB1543*128kB69*256kB2101*512kB
1328*1024kB765*2048kB875*4096kB=8990400kB
Swapcache:add0,delete0,find0/0,race0+0
Freeswap:8385912kB
3342336pagesofRAM
2916288pagesofHIGHMEM
224303reservedpages
666061pagesshared
0pagesswapcached
OutofMemory:Killedprocess22248(httpd).
oomkiller:gfp_mask=0xd0
OOMkillsIOsystemstall(RHEL4&5/x86)
Freepages:15096kB(1664kBHighMem)Active:34146inactive:1995536dirty:255
writeback:314829unstable:0free:3774slab:39266mapped:31803pagetables:820
DMAfree:12552kBmin:16kBlow:32kBhigh:48kBactive:0kBinactive:0kBpresent:16384kB
pages_scanned:2023all_unreclaimable?yes
protections[]:000
Normalfree:880kBmin:928kBlow:1856kBhigh:2784kBactive:744kBinactive:660296kB
present:901120kBpages_scanned:726099all_unreclaimable?yes
protections[]:000
HighMemfree:1664kBmin:512kBlow:1024kBhigh:1536kBactive:135840kBinactive:7321848kB
present:7995388kBpages_scanned:0all_unreclaimable?no
protections[]:000
DMA:2*4kB4*8kB2*16kB4*32kB3*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB2*4096kB=
12552kB
Normal:0*4kB18*8kB14*16kB0*32kB0*64kB0*128kB0*256kB1*512kB0*1024kB0*2048kB0*4096kB
=880kB
HighMem:6*4kB9*8kB66*16kB0*32kB0*64kB0*128kB0*256kB1*512kB0*1024kB0*2048kB0*4096kB
=1664kB
Swapcache:add856,delete599,find341/403,race0+0
0bouncebufferpages
Freeswap:4193264kB
2228223pagesofRAM
1867481pagesofHIGHMEM
150341reservedpages
343042pagesshared
257pagesswapcached
kernel:OutofMemory:Killedprocess3450(hpsmhd).
EliminatingOOMkills
RHEL3
/proc/sys/vm/oomkillnumberofprocessesthatcanbeinan
OOMkillstateatanyonetime(default1).
RHEL4
/proc/sys/vm/oomkilloomkillenable/disableflag(default1).
RHEL5
/proc/<pid>/oom_adjperprocessOOMadjustment(17to+15)
Setto17todisablethatprocessfrombeingOOMkilled
DecreasetodecreaseOOMkilllikelyhood.
IncreasetoincreaseOOMkilllikelyhood.
/proc/<pid>/oom_scorecurrentOOMkillpriority.
GeneralPerformanceTuningConsiderations
OverCommittingRAM
Swapdevicelocation
Storagedeviceandlimitslimits
Kernelselection
PerformanceTuning(RHEL3)
/proc/sys/vm/bdflush
/proc/sys/vm/pagecache
/proc/sys/vm/numa_memory_allocator
RHEL3/proc/sys/vm/bdflush
intnfract;/*Percentageofbuffercachedirtytoactivatebdflush*/
intndirty;/*Maximumnumberofdirtyblockstowriteoutperwakecycle*/
intdummy2;/*old"nrefill"*/
intdummy3;/*unused*/
intinterval;/*jiffiesdelaybetweenkupdateflushes*/
intage_buffer;/*Timefornormalbuffertoagebeforeweflushit*/
intnfract_sync;/*Percentageofbuffercachedirtytoactivatebdflushsynchronously
intnfract_stop_bdflush;/*Percetangeofbuffercachedirtytostopbdflush*/
intdummy5;/*unused*/
Example:
SettingsforServerwithampleIOconfig(defaultr3gearedforws)
sysctlwvm.bdflush=505000002005000300060200
RHEL3/proc/sys/vm/pagecache
pagecache.minpercent
Lowerlimitforpagecachepagereclaiming.
Kswapdwillstopreclaimingpagecachepagesbelowthispercentof
RAM.
pagecache.borrowpercnet
KswapdattemptstokeepthepagecacheatthispercentorRAM
pagecache.maxpercent
Upperlimitforpagecachepagereclaiming.
RHEL2.1hardlimit,pagecachewillnotgrowabovethispercentofRAM.
RHEL3kswapdonlyreclaimspagecachepagesabovethispercentof
RAM.
Increasingmaxpercentwillincreaseswapping
Example:echo11050>/proc/sys/vm/pagecache
RHEL3/proc/sys/vm/numa_memory_allocator
>numa=on(default)
Zone:Normalfreepages:10539min:1279low:17406high:25597
Zone:Normalfreepages:10178min:1279low:17406high:25597
Zone:Normalfreepages:10445min:1279low:17406high:25597
Zone:Normalfreepages:856165min:1279low:17342high:25501
Swapcache:add2633120,delete2553093,find1375365/1891330,race0+0
>numa=off
Zone:Normalfreepages:861136min:1279low:30950high:63065
Swapcache:add0,delete0find0/0,race0+0
>numa=onand/proc/sys/vm/numa_memory_allocatorsetto1
Zone:Normalfreepages:17406min:1279low:17406high:25597
Zone:Normalfreepages:17406min:1279low:17406high:25597
Zone:Normalfreepages:17406min:1279low:17406high:25597
Zone:Normalfreepages:85739min:1279low:17342high:25501
Swapcache:add0,delete0find0/0,race0+0
PerformanceTuning(RHEL4andRHEL5)
/proc/sys/vm/swappiness
/proc/sys/vm/min_free_kbytes
/proc/sys/vm/dirty_ratio
/proc/sys/vm/dirty_background_ratio
/proc/sys/vm/pagecache
RHEL4/proc/sys/vm/swappiness
Controlshowaggressivelythesystemreclaimsmappedmemory:
Anonymousmemoryswapping
Mappedfilepageswritingifdirtyandfreeing
SystemVsharedmemoryswapping
Decreasing:moreaggressivereclaimingofunmappedpagecachememory
Increasing:moreaggressiveswappingofmappedmemory
Sybaseserverwith/proc/sys/vm/swappinesssetto60(default)
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
51643644267883544323417888801204044749613022084625342516
Sybaseserverwith/proc/sys/vm/swappinesssetto10
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
8302422867243228069600238886377612862002024381326
RHEL4&5/proc/sys/vm/min_free_kbytes
DirectlycontrolsthepagereclaimwatermarksinKB
#echo1024>/proc/sys/vm/min_free_kbytes
Node0DMAfree:4420kBmin:8kBlow:8kBhigh:12kB
Node0DMA32free:14456kBmin:1012kBlow:1264kBhigh:1516kB
echo2048>/proc/sys/vm/min_free_kbytes
Node0DMAfree:4420kBmin:20kBlow:24kBhigh:28kB
Node0DMA32free:14456kBmin:2024kBlow:2528kBhigh:3036kB
MemoryreclaimWatermarksmin_free_kbytes
FreeList
AllofRAM
Donothing
PagesHighkswapdsleepsaboveHigh
kswapdreclaimsmemory
PagesLowkswapdwakesupatLow
kswapdreclaimsmemory
PagesMinallmemoryallocatorsreclaimatMin
userprocesses/kswapdreclaimmemory
0
RHEL4&5/proc/sys/vm/dirty_ratio
Absolutelimittopercentageofdirtypagecachememory
Defaultis40%
LowermeanslessdirtypagecacheandsmallerIOstreams
HighermeansmoredirtypagecacheandlargerIOstreams
RHEL4&5/proc/sys/vm/dirty_background_ratio
Controlswhendirtypagecachememorystartsgettingwritten.
Defaultis10%
Lower
pdflushstartsearlier
lessdirtypagecacheandsmallerIOstreams
Higher
pdflushstartslater
moredirtypagecacheandlargerIOstreams
dirty_ratioanddirty_background_ratio
pagecache
100%ofpagecacheRAMdirty
pdflushdandwrite()'ngprocesseswritedirtybuffers
dirty_ratio(40%ofRAMdirty)processesstartsynchronouswrites
pdflushdwritesdirtybuffersinbackground
dirty_background_ratio(10%ofRAMdirty)wakeuppdflushd
do_nothing
0%ofpagecacheRAMdirty
RHEL4&5/proc/sys/vm/pagecache
Controlswhenpagecachememoryisdeactivated.
Defaultis100%
Lower
Preventsswappingoutanonymousmemory
Higher
Favorspagecachepages
Disabledat100%
PagecacheTuning
Filesystem/pagecacheAllocation
Accessed(pagecacheunderlimit)
ACTIVE
INACTIVE
Aging
(new>old)
Accessed(pagecacheoverlimit)
reclaim
FREE
(Hint)flushingthepagecache
[tmp]#echo1>/proc/sys/vm/drop_caches
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
00224571841078083350196000561136212008317
0022457184107808335019600001039198001000
0022457184107808335019600001021188001000
0022457184107808335019600001035204001000
0022457248107808335019600001008164001000
302242128160176143863600001030197015850
002243610656204344080028361027177032672
0022436106562043440800001026180001000
002243610720212344000080101018300991
(Hint)flushingtheslabcache
[tmp]#echo2>/proc/sys/vm/drop_caches
[tmp]#cat/proc/meminfo
MemTotal:3907444kB
MemFree:3604576kB
tmp]#cat/proc/meminfo
MemTotal:3907444kB
MemFree:3604576kB
Slab:115420kB
Slab:115420kB
Hugepagesize:2048kB
Hugepagesize:2048kB
RHEL3kernelselection
x86
Standardkernel(noPAE,3G/1G)
UPsystemswith<=4GBRAM
PAEcosts~5%inperformance
SMPkernel(PAE,3G/1G)
SMPsystemswith<~12GBRAM
Highmem/Lowmemratio<=10:1
4G/4Gcosts~5%
Hugememkernel(PAE,4G/4G)
SMPsystems>~12GBRAM
X86_64
StandardkernelforUPsystems
SMPkernelforSMPsystems
RHEL4kernelselection
x86
Standardkernel(noPAE,3G/1G)
SMPkernel(PAE,3G/1G)
SMPsystemswith<~16GBRAM
Highmem/Lowmemratio<=16:1
Hugememkernel(PAE,4G/4G)
UPsystemswith<=4GBRAM
SMPsystems>~16GBRAM
X86_64
StandardkernelforUPsystems
SMPkernelforsystemswithupto8CPUs
LargeSMPkernelforsystemsupto512CPUs
RHEL5kernelselection
x86
Standardkernel(noPAE,3G/1G)
PAEkernel(PAE,3G/1G)
UPandSMPsystemswith>4GBRAM
X86_64
UPandSMPsystemswith<=4GBRAM
Standardkernelforallsystems
IA64
Standardkernelforallsystems
Problem16GBx86runningSMPkernel
Zone:DMAfreepages:2207min:0low:0high:0
Zone:Normalfreepages:484min:1279low:4544high:6304
Zone:HighMemfreepages:266min:255low:61952high:92928
Freepages:2957(266HighMem)
(Active:245828/1297300,inactive_laundry:194673,inactive_clean:194668,free:2957)
aa:0ac:0id:0il:0ic:0fr:2207
aa:630ac:1009id:189il:233ic:0fr:484
aa:195237ac:48952id:1297057il:194493ic:194668fr:266
1*4kB1*8kB1*16kB1*32kB1*64kB0*128kB0*256kB1*512kB0*1024kB0*2048kB2*4096kB=8828kB)
48*4kB8*8kB97*16kB4*32kB0*64kB0*128kB0*256kB0*512kB0*1024kB0*2048kB0*4096kB=
1936kB)
12*4kB1*8kB1*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=
1064kB)
Swapcache:add3838024,delete3808901,find107105/1540587,race0+2
138138pagesofslabcache
1100pagesofkernelstacks
0lowmempagetables,37046highmempagetables
Freeswap:3986092kB
4194304pagesofRAM
3833824pagesofHIGHMEM
TuningFileSystemsandDiskIO
KernelOptimizations
CPUSchedulingmultithreaded,multicore
NUMAoptimizedw/NUMActl
KerneldiskI/OI/Oschedulers,DirectI/O,
AsyncI/O
FilesystemsEXT3,NFS,GFS,OCFS
Databasecharactistics
HugePagesHugetlbfs,db'sjavaetc
RHEL5PerformanceFeatures
Linuxat16cpusquadcoreandbeyond
Recognizesdifferencesbetween
logicalandphysicalprocessors
I.E.Multicore,hyperthreaded&
chips/sockets
Optimizesprocessscheduling
totakeadvantageofshared
onchipcache,andNUMAmemorynodes
Implementsmultilevelrunqueues
forsocketsandcores(as
opposedtoonerunqueue
perprocessororpersystem)
StrongCPUaffinityavoids
taskbouncing
RequiressystemBIOStoreportCPU
topologycorrectly
Socket 0
Core 0
Thread 0
Thread 1
Core 1
Thread 0
Socket 1
Thread 1
Thread 0
Thread 1
Socket 2
Process
Process
Process
Process
Process
Process
Process
Process
Process
Process
Process
Process
AsynchronousI/OtoFileSystems
EliminatesSynchronousI/Ostall
Stall for
completion
CriticalforI/Ointensiveserverapplications
App I/O
Request
Device
Driver
I/O Request
Issue
RedHatEnterpriseLinuxsince2002
Synchronous I/O
AllowsapplicationtocontinueprocessingwhileI/
Oisinprogress
I/O
SupportforRAWdevicesonly
Application
WithRedHatEnterpriseLinux4,significant
improvement:
SupportforExt3,NFS,GFSfilesystem
access
SupportsDirectI/O(e.g.Database
applications)
I/O Request
Completion
Asynchronous I/O
No stall for
completion
Makesbenchmarkresultsmoreappropriate
forrealworldcomparisons
App I/O
Request
I/O
I/O
Completion
Application
Device
Driver
I/O Request
Issue
I/O Request
Completion
AsynchronousI/OCharacteristics
R4 U4 FC AIO Read
140
160
120
140
4k
100
8k
80
16k
32k
60
64k
40
120
4k
8k
16k
32k
64k
100
80
60
40
20
0
MB/sec
MB/sec
160
20
aios
16
32
64
aios
16
32
64
PerformanceTuningDISKRHEL3
[root@dhcp8336sysctl]#/sbin/elvtune/dev/hda
/dev/hdaelevatorID0
read_latency:2048
write_latency:8192
max_bomb_segments:6
[root@dhcp8336sysctl]#/sbin/elvtuner1024w2048/
dev/hda
/dev/hdaelevatorID0
read_latency:1024
write_latency:2048
max_bomb_segments:6
DiskIOtuningRHEL4/5
RHEL4/54tunableI/OSchedulers
CFQelevator=cfq.CompletelyFairQueuingdefault,balanced,fairfor
multipleluns,adaptors,smpservers
NOOPelevator=noop.Nooperationinkernel,simple,lowcpu
overhead,leaveopttoramdisk,raidcntrletc.
Deadlineelevator=deadline.Optimizeforruntimelikebehavior,low
latencyperIO,balanceissueswithlargeIOluns/controllers(NOTE:
currentbestforFC5)
Anticipatoryelevator=as.InsertsdelaystohelpstackaggregateIO,
bestonsystemw/limitedphysicalIOSATA
RHEL4Setatboottimeoncommandline
RHEL5Changeonthefly
FileSystems
Separateswapandbusypartitionsetc.
EXT2/EXT3separatetalk
http://www.redhat.com/support/wpapers/redhat/ext3/*.html
Tune2fsormountoptions
data=orderedonlymetadatajournaled
data=journalbothmetadataanddatajournaled
data=writebackusewithcare!
SetupdefaultblocksizeatmkfsbXX
RHEL4/5EXT3improvesperformance
Scalabilityupto5Mfile/system
Sequentialwritebyusingblockreservations
Increasefilesystemupto8TB
GFSglobalfilesystemclusterfilesystem
OptimizingFileSystemPerformance
UseOLTPandDSSworkloads
Resultswithvariousdatabasetuningoptions
RAWvsEXT3/GFS/NFSw/o_direct(iedirectIOiniozone)
ASYNCIOoptions
RHEL3DIO+AIOnotoptimal(pagecachestillactive)
RHEL4
EXT3supportsAIO+DIOoutofthebox
GFSU2fullsupportAIO+DIO/Oraclecert
NFSU3fullsupportofbothDIO+AIO
HUGHMEMkernelsonx86kernels
HugeTLBSuselargerpagesizes(ipcs)
Section4Examples
Generalguidelines
EffectofNUMAandNUMCTL
EffectCPUspeedhowtocontrol
Benchmarking
McCalpinknowmaxmemoryBW
IOzonerunyourown
DatabaseTuning
JVMTuning
McCalpinStreamsCopyBandwidth(1,2,4,8)
16000
25
14000
20
Rate(MB/s)
12000
10000
15
NonNuma
8000
10
6000
4000
5
2000
0
No.ofStreams
Numa
%Difference
RHEL4&5NUMAstatandNUMActl
NUMAstattodisplaysystemNUMAcharacteristicsonanumasystem
[root@perf5~]#numastat
node3node2node1node0
numa_hit7268482215157244325444
numa_miss0000
numa_foreign0000
interleave_hit2668243127632699
local_node6730677456152115324733
other_node537847595129711
NUMActltocontrolprocessandmemory
numactl[interleavenodes][preferrednode][membindnodes]
[cpubindnodes][localalloc]command{arguments...}
TIP
App<memorysingleNUMAzone
Numactlusecpubindcpuswithinsamesocket
App>memoryofasingleNUMAzone
NumactlinterleaveXYandcpubindXY
RHEL4&5NUMAstatandNUMActl
EXAMPLES
numactlinterleave=allbigdatabaseargumentsRunbigdatabasewith
itsmemoryinterleavedonallCPUs.
numactlcpubind=0membind=0,1processRunprocessonnode0with
memoryallocatedonnode0and1.
numactlpreferred=1numactlshowSetpreferrednode1andshowthe
resultingstate.
numactlinterleave=allshmkeyfile/tmp/shmkeyInterleaveallofthe
sysvsharedmemoryregiionspecifiedby/tmp/shmkeyoverallnodes.
numactloffset=1Glength=1Gmembind=1file/dev/shm/Atouch
Bindthesecondgigabyteinthetmpfsfile/dev/shm/Atonode1.
numactllocalalloc/dev/shm/fileResetthepolicyforthesharedmem
oryfilefiletothedefaultlocalallocpolicy.
LinuxNUMAEvolution
RHEL3,4and5LinpackMultistream
AMD64,8cpudualcore(1/2cpusloaded)
3000000
45
40
PerformanceinKflops
2500000
35
2000000
30
25
1500000
20
1000000
15
10
500000
5
0
Limitations:
0
RHEL3U8
RHEL4U5
Numaspilltodifferentnumaboundaries
Processmigrationsnowayback
Lackofpagereplicationtext,readmostly
RHEL5GOLD
DefaultScheduler
TasksetAffinity
ColumnE
RHEL5.2CPUspeedandperformance:
Enabled=governorsettoondemand
Looksatcpuusagetoregulatepower
Within35%ofperformanceforcpuloads
IOloadscankeepcpusteppeddown1530%
SupportedinRHEL5.2virtualization
Toturnoffelsemayleavecpusinreducedstep
Ifitsnotusingperformance,then:
#echoperformance>/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Thenchecktoseeifitstuck:
#cat/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Check/proc/cpuinfotomakesureyourseeingtheexpectedCPUfreq.
Proceedtonormalservicedisable
Servicecpuspeedstop
Chkconfigcpuspeedoff
EffectsofCPUspeedtopeakperformance:
RHEL5.2EffectofCPUspeedonI/Oworkloads
Intel4cpu,16Gbmemory,FCdisk
1.2
(cpuspeeddisabled)
RelativePerformancetoPeak
0.8
50%vsPeak
99%vsPeak
0.6
0.4
0.2
0
IozonePerf
OracleOLTP
IBMDB2
EffectsofCPUspeedwithRHEL5.2Virtualization
OraclerunswithCPUFreqXenkernel
1.2
0.8
80URun1
80URun2
0.6
0.4
0.2
0
RHEL51Dom0CPUfreqon
RHEL51Dom0CPUfreqoff
RHEL51PVCPUfreqon
RHEL51PVCPUfreqoff
UsingIOzonew/o_directmimicdatabase
Problem:
Filesystemsusememoryforfilecache
Databasesusememoryfordatabasecache
Userswantfilesystemformanagementoutsidedatabase
access(copy,backupetc)
YouDON'TwantBOTHtocache.
Solution:
FilesystemsthatsupportDirectIO
Openfileswitho_directoption
DatabaseswhichsupportDirectIO(ORACLE)
NODOUBLECACHING!
EXT3,GFS,NFSIozonew/DirectIO
PerformanceinMB/sec
RHEL5Direct_IOIOzoneEXT3,GFS,NFS
(Geom1M4GB,1k1m)
80
70
60
EXT_DIO
50
GFS1_DIO
40
NFS_DIO
30
20
10
0
ALL
Initial ReWrite
I/O's
Write
Read
Write
Back
RecRe
Stride
ward
Write
Read
Read
TheTranslationLookasideBuffer(TLB)isa
smallCPUcacheofrecentlyusedvirtualto
physicaladdressmappings
TLBmissesareextremelyexpensiveon
today'sveryfast,pipelinedCPUs
Largememoryapplications
canincurhighTLBmissrates
HugeTLBFS
TLB
HugeTLBspermitmemorytobe
managedinverylargesegments
E.G.Itanium:
Standardpage:16KB
Defaulthugepage:256MB
16000:1difference
Filesystemmappinginterface
Idealfordatabases
128data
128instruction
VirtualAddress
Space
E.G.TLBcanfullymapa32GB
OracleSGA
PhysicalMemory
UsingHugeTLBfsw/Databases
RHEL4+5 Effect of HugeTLBfs
Oracle 10G OLTP Performance
Intel 4cpu, 8GB memory, FC San
Transactions/min (k)
60
16.0%
14.0%
50
12.0%
40
10.0%
30
8.0%
6.0%
20
4.0%
10
0
2.0%
RHEL4 U5
RHEL5 GA
0.0%
Base (4k)
HugeTLBfs (2MB)
%Diff
JVMTuning
Eliminateswapping
Promotepagecachereclaiming
Lowerswappinessto10%(or
lowerifnecessary).
Lowerdirty_background_ratioto
10%
Lowerdirty_ratioifnecessary
Promoteinodecachereclaiming
Lowervfs_cache_pressure
TuningNetworkAppsMessages/sec
Disablecpuspeed,selinux,auditd,irqbalance
ManualbindingIRQsw/multiplenics
echovalues>/proc/irq/XXXoruseTUNA
IntelixgbIRQssend/recvtocpusocketw/sharedcache
UseTasksetctostartapplicationson
1cpupersocketgoodforBWintensiveapp
Shieldcpusforcriticalapps
Moveallexistingprocessesoffofthecore(s)tocpu0
Pairsofcpusonthesamesocketshared2ndlevelcache
KeepuserappsoncpusseparatefromNetworkapps
RTTuningNetworkAppsMessages/sec
10 Gbit Nics Stoakley 2.67 to Bensley 3.0 Ghz
Tuning enet gains +25% in Ave Latency,
RT kernel reduced peak latency but smoother how much?
RedHatMRGPerformanceAMQPMess/s
Intel8cpu/16gb,10Gbenet
Messages/sec(32bytesize)
600000
500000
400000
300000
200000
100000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
Samples(MillionMessage/sample)
rhel52_base
rhel52_tuned
rhelrealtime_tune
RTPeformanceofNetworkAppsMessages/sec
RHAMQPLatencyonIntel8cpu/10Gbitenet
RHEL5.2andRHELRT
Milisecond/message
120.00
100.00
80.00
Ave
StdDev
Max
60.00
40.00
Max
20.00
0.00
3
rt
2b
6
rt
4b
1
rt
8b
2
rt
b
56
2
51
rt
1
rt
kb
2
r5
2b
2
r5
4b
2
r5
b
28
2
r5
6
25
2
r5
1
5
2b
2
r5
1k
Ave
b
NumaNetworkAppsMessages/sec
Messages/Sec
3000
50000
2500
40000
2000
30000
1500
20000
1000
10000
500
0
5000
10000
15000
20000
Message Rate
25000
30000
40000
0
50000
Messages/sec Numa On
Messages/sec Numa Off
Average Latency (ms) Numa
On
Average Latency (ms) Numa
Off
GeneralPerformanceTuningGuidelines
Usehugepageswheneverpossible.
Minimizeswapping.
Maximizepagecachereclaiming
Placeswappartition(s)onquite
device(s).
DirectIOifpossible.
BewareofturningNUMAoff.
BenchmarkTuning
UseHugepages.
Dontovercommitmemory
Ifmemorymustbeovercommitted
Eliminateallswapping.
Maximizepagecachereclaiming
Placeswappartition(s)on
separatedevice(s).
UseDirectIO
DontturnNUMAoff.
LinuxPerformanceTuningReferences
Alikins,?SystemTuningInfoforLinuxServers,
http://people.redhat.com/alikins/system_tuning.html
Axboe,J.,?DeadlineIOSchedulerTunables,SuSE,EDFR&D,2003.
Braswell,B,Ciliendo,E,?TuningRedHatEnterpriseLinuxonIBMeServer
xSeriesServers,http://www.ibm.com/redbooks
Corbet,J.,?TheContinuingDevelopmentofIOScheduling?,
http://lwn.net/Articles/21274.
Ezolt,P,OptimizingLinuxPerformance,www.hp.com/hpbooks,Mar2005.
Heger,D,Pratt,S,?WorkloadDependentPerformanceEvaluationoftheLinux
2.6IOSchedulers?,LinuxSymposium,Ottawa,Canada,July2004.
RedHatEnterpriseLinuxPerformanceTuningGuide
http://people.redhat.com/dshaks/rhel3_perf_tuning.pdf
Network,NFSPerformancecoveredinseparatetalks
http://nfs.sourceforge.net/nfshowto/performance.html
Questions?