0% found this document useful (0 votes)
177 views45 pages

Performance Scenario Sudden Slowdown On Rac

This document summarizes the steps taken to diagnose and resolve a sudden slow down issue on a two node Real Application Clusters (RAC) environment. The troubleshooting process involved systematically measuring performance at the operating system, database, and session levels. Key findings included an unbalanced workload on one node, long-running sessions with wait events, and network interconnect issues resolved by replacing a switch. Measuring performance using tools like AWR, ASH, and OS monitoring was essential to pinpointing and addressing the root cause of the slow down impacting business applications.

Uploaded by

behanchod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views45 pages

Performance Scenario Sudden Slowdown On Rac

This document summarizes the steps taken to diagnose and resolve a sudden slow down issue on a two node Real Application Clusters (RAC) environment. The troubleshooting process involved systematically measuring performance at the operating system, database, and session levels. Key findings included an unbalanced workload on one node, long-running sessions with wait events, and network interconnect issues resolved by replacing a switch. Measuring performance using tools like AWR, ASH, and OS monitoring was essential to pinpointing and addressing the root cause of the slow down impacting business applications.

Uploaded by

behanchod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

PerformanceScenario:

Diagnosingandresolvingsudden
slowdownontwonodeRAC
Introduction
KarlArao,OCPDBA,RHCT
SeniorConsultantatSQL*Wizard
RACuserfor3years
1st environmentonVMware
Iheart performance
Dontliketoguesswhentroubleshooting
Scenario
OneThursday
aclientcalled

TherewasaSUDDEN
slowdown
onALL oftheapplications

abigimpacttotheBusiness
Anditsrunningon

RAC RAC
nochangesonthe
RACnodesandontheapplications
Someof10gPerformanceFeatures
OEMPerformancePage
ADDM
SQLTuningadvisor
AWR(DBA_HIST_)
ASH
TimeModel(totaltimeforalldbcalls)
WaitClass(12waitclass)
Metrics(v$performancemetricdeltas)
Services
Setup
ServerandStorage:SunFire X4200(2CPU,
12GBmemory)withLUNs onEMCCX300
OS:RHEL4.3ES
Databaseandclusterware:Oracle10.2.0.3
DatabaseFiles,FlashRecoveryArea,OCR,and
VotingdiskarelocatedonOCFS2filesystems
Application:FormsandReports(6iandalso
lower)
TroubleshootingPrinciple

Systematic/Layeredapproach..
Understand..
ThenFix..

Letsgetiton!
1.MeasuredtheOSstack
Monitoredthefollowing
cpu (vmstat,top,mpstat)
io (iostat)
memory(vmstat,meminfo)
network(netstat)
processinfo(top,ps)
CPUonserver1

CPUonserver2
Datafiles onserver1

Datafiles onserver2
OCR&votingdiskonserver1

OCR&votingdiskonserver2
Archivelogs onserver1

Archivelogs onserver2
FlashRecoveryAreaonserver1

FlashRecoveryAreaonserver2
Memoryonserver1

Memoryonserver2
2.CheckedtheDBenvironment
Comparedmypast&currentRDAofthe
database
Queryonsomev$views..aqueryonv$session
showedthatserver1hasmoreconnections
(89%ofthetotalusers)
2.CheckedtheDBenvironment
Thiscouldbebecauseof:
1) Theclientshavinglowerversions(<Sql*Plus8.1
orOCI8,seeNote97926.1)thatmaynotsupport
TAF(FAILOVER_MODE)andLoadBalancing
(LOAD_BALANCE)
OR
2)TheyareusingTNSentriesexplicitlyconnecting
toserver1
2.CheckedtheDBenvironment
UsersdonthaveFAILOVERcapabilities
2.CheckedtheDBenvironment
Checkedtheapplicationmoduleusageonserver1
2.CheckedtheDBenvironment
HowboutIgraphitinexcel?Willthedatabemore
meaningful?
..YES mostoftheusersusesthexxxlogin.fmx module
3.Checkedinstancewide
DBperformance
GraphedtheASHdata..
..sufferingfromgc cr blocklost andgc cr multiblockrequest from7amto4pm
3.Checkedinstancewide
DBperformance

ResearchedonMetalink forknownissues..
FoundDocID:563566.1gc lostblocks
diagnostics
Wasabletopinpointthepeakperiodfromthe
graph.Then,generatedADDMandAWR
reportonthatpeakperiod..
3.Checkedinstancewide
DBperformance
ADDM

ElapsedTime:60min
DBTime:61.83min
AAS:1.03
MaxCPU:2
3.Checkedinstancewide
DBperformance
ShouldIfollowtheserecommendationsrightaway?
Nope collectmorefacts,numbers,figures
3.Checkedinstancewide
DBperformance
AWR
3.Checkedinstancewide
DBperformance
Dowehaveaworkloaddistributionproblem?
Nope evenwithdistributedusers..
Westillhaveperformanceproblem..
4.Checkedsessionlevel
DBperformance
Thedatabasehastoomanyactivity,wheredo
Istart?Wheretodrilldown?
gv$session_longops &gv$session_wait output
toomanyusers,andrequirerepetitive
monitoring
InthespiritofMethodR
"WORKFIRSTTOREDUCETHEBIGGESTRESPONSETIMECOMPONENTOFA
BUSINESS'MOSTIMPORTANTUSERACTION

WenttotheAccountingDepartment,checked
onthedesktopterminals
4.Checkedsessionlevel
DBperformance
UsersPC1069(withSID601)andPC918(with
SID483)areontotalhang
4.Checkedsessionlevel
DBperformance
Checkedonthe
performance/waitcounters
thecurrentSQLs
4.Checkedsessionlevel
DBperformance
v$session_wait (SID601)
4.Checkedsessionlevel
DBperformance
v$sesstat (SID601)
4.Checkedsessionlevel
DBperformance
v$sql,v$sql_plan,v$sql_plan_statistics (SID601)

Runningfor98minutes
Just12.14secondsonCPU
4.Checkedsessionlevel
DBperformance
v$sesstat (SID483)
4.Checkedsessionlevel
DBperformance
v$sql,v$sql_plan,v$sql_plan_statistics (SID483)

Runningfor3hours
Just2.68secondsonCPU
4.Checkedsessionlevel
DBperformance
AnothergraphofASH
5.Drilleddownonthenetwork
interconnect

Generatedacat&egrep commandtolook
forproblemsintheinterconnectfromtheOS
Watchernetstat output
(fromMetalink DocID:563566.1gc lostblocksdiagnostics)
5.Drilleddownonthenetwork
interconnect
$catserver1_netstat.dat|egrep i"udpInOverflows|packet receive
errors|fragments dropped|reassembles failed|fragments droppedafter
timeout"
34096fragmentsdroppedaftertimeout
306030packetreassemblesfailed
15packetreceiveerrors
34096fragmentsdroppedaftertimeout
306268packetreassemblesfailed
15packetreceiveerrors
34096fragmentsdroppedaftertimeout
306574packetreassemblesfailed
outputsnipped
5.Drilleddownonthenetwork
interconnect
Restartedtheswitch

STILL THEREISAPERFORMANCEPROBLEM
5.Drilleddownonthenetwork
interconnect
Replacedtheswitch

THEYGOTFAST
5.Drilleddownonthenetwork
interconnect
karao@karl:~/Desktop$catkarlarao.dat |egrep i"udpInOverflows|packet receive
errors|fragments dropped|reassembles failed|fragments droppedaftertimeout"
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
0packetreceiveerrors
5.Drilleddownonthenetwork
interconnect
AnothergraphofASH(Stackedgraph)
5.Drilleddownonthenetwork
interconnect
AnothergraphofASH(3dview)
Conclusion

Youdonthavetoguess..

EvenifitsaRACenvironment..

Itjusttakesfacts,numbers,figures
tosolveaperformanceproblem
ReferencesandTools
http://karlarao.wordpress.com
http://blog.tanelpoder.com
http://www.tanelpoder.com/files/TPT_public.zip
http://www.tanelpoder.com/files/PerfSheet.zip
NeilGunther &Tanel Poder MultidimensionalVisualizationofOracle
PerformanceusingBarry007http://arxiv.org/pdf/0809.2532
http://ashmasters.com
http://www.perfvision.com
http://www.methodr.com

Metalink DocID97926.1FailoverIssuesandLimitations[Connecttime
failoverandTAF]
Metalink DocID563566.1gc lostblocksdiagnostics
Metalink DocID301137.1OSWatcherUserGuide
JoinOracleUsers Philippines

Facebook
http://www.facebook.com/home.php#/pages/OracleUsersPhilippines/86773013086?ref=ts

Linkedin
http://www.linkedin.com/groups?home=&gid=2028295&trk=anet_ug_hm
Contactmethrough:

[email protected]
09192673389
8896999

You might also like