VMware - DRS Best Practice

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Performance Study

DRS Performance and Best Practices


VMware Infrastructure 3

VMwareInfrastructure3providesasetofdistributedinfrastructureservicesthatmaketheentireIT
environmentmoreserviceable,available,andefficient.WorkingwithVMwareESX3,VMwareVirtualCenter
2,andVMwareVMotion,VMwareDistributedResourceScheduler(DRS)dynamicallyallocatesresourcesto
enforceresourcemanagementpolicieswhilebalancingresourceusageacrossmultipleESXhosts.
ThispaperisintendedforreaderswhounderstandthearchitectureandbasicconceptsofDRS.Seethepaper
ResourceManagementwithVMwareDRSforthesedetails.(SeeResourcesonpage 19foralink.)This
paperidentifiesvariousscenariosinwhichyoucanbenefitfromDRSandexplainshowtoconfigureyour
environmenttotakebestadvantageofDRS.
Thepapercoversthefollowingtopics:

OverviewofDRSPerformanceonpage 1

ResourceAllocationforVirtualMachinesandResourcePoolsonpage 3

ScalingtheNumberofHostsandVirtualMachinesonpage 9

DRSAggressivenessonpage 13

IntervalBetweenDRSInvocationsonpage 14

CostBenefitAnalysisonpage 15

HeterogeneousHostEnvironmentsonpage 15

ImpactofIdleVirtualMachinesonpage 16

DRSPerformanceBestPracticesonpage 16

MonitoringDRSClusterHealthonpage 17

Conclusionsonpage 19

Resourcesonpage 19

Overview of DRS Performance


VMwareDistributedResourceSchedulerimprovesresourceallocationacrossallhostsandresourcepoolsin
acluster.WhenyouenableaclusterforDRS,VirtualCentercontinuouslymonitorsthedistributionofCPUand
memoryresourceusageforallhostsandvirtualmachinesinthatcluster.DRScomparesthesemetricstoideal
resourceutilizationthatis,thevirtualmachinesentitlements.Theseentitlementsaredeterminedbasedon
theresourcepoliciesoftheresourcepoolsandvirtualmachinesintheclusterandtheircurrentdemands.
VirtualCenterusesthisanalysistoperforminitialplacementofvirtualmachines,virtualmachinemigration
forloadbalancing,enforcementofrulesandpolicies,anddistributedpowermanagement,ifdistributed
powermanagementisenabled.
ThisstudyfocusesonunderstandingtheeffectivenessandscalabilityofDRSalgorithms.

Copyright 2008 VMware, Inc. All rights reserved.

DRS Performance and Best Practices

Effectiveness of DRS Algorithms


OurgoalinthesetestswastoseeifDRSiseffectiveinimprovingperformancewhenthereisresource
contentionandifDRScanallocateresourceswhilebalancingloadinproportiontotheresourcepolicies
allocatedtovirtualmachines.WeevaluatedvariousconfigurationsoftheDRSalgorithmandcomparedhow
effectivetheyareinvarioususecases.
Weexaminedwhethertheshares,reservations,andlimitsofavirtualmachineorresourcepoolareeffectively
enforcedbyDRSandhowthemigrationsrecommendedbyDRSaffecttheperformanceoftheaffectedcluster
asawhole.Weusedvirtualmachinethroughput(orthethroughputofmultiplevirtualmachines)asa
measureofhoweffectiveDRSis,inthepresenceofvariousworkloadswithvariouspolicysettings.
TheloadsshouldachievethroughputthatisapproximatelyproportionaltotheirCPUandmemory
entitlements.TheseresourceentitlementsareameasureofhowmanyCPUcyclesandhowmuchmemorythe
virtualmachineorresourcepoolshouldreceiveandarecalculatedbasedontheshares,reservations,and
limitstheresourcepoliciessetinVirtualCenterandtheestimatedCPUandmemoryresourcesdemanded
bythevirtualmachineorresourcepool.Byplacingormigratingthevirtualmachineseffectivelybasedon
theseentitlements,DRSensuresthatthevirtualmachineorresourcepoolgetswhatitdeserveswithout
violatingtheshares,reservationsandlimitpolicy.

Scalability of DRS Algorithms


DRSsupportsacertainnumberofhostsandvirtualmachinesinaDRScluster.Thegoalinourstudywasto
seehowDRSscaledasweincreasedthenumberofhostsandvirtualmachines.Asthenumberofvirtual
machinesandhostincreases,thepossibilitiesforbalancingwithvirtualmachinemigrationsalsoincrease.On
theotherhand,eachbalancingmigrationdoeshavearesourceoverhead.Thismighteffectivelyimproveor
worsenclusterthroughputaswescaledtheenvironmenttoincludemorehostsandvirtualmachines.
Furthermore,throughputimprovementdependsheavilyonthecharacteristicsoftheworkloadforexample,
placementofthevirtualmachinesandvariabilityofworkloads.Weexploredanumberofdifferentworkloads,
atvariousscales,toseehowDRShandledeachofthem.

Dimensions for DRS Performance Analysis


TobetterunderstandtheperformanceofDRS,wecharacterizedDRSalongseveraldimensions.These
dimensionsinclude:

Resourceallocationforvirtualmachinesandresourcepools

Numberofhostsandvirtualmachines

Degreeofaggressiveness

DRSperiodicfrequency

Heterogeneoushostenvironments

CostbenefitanalysisofVMotion

Impactofidlevirtualmachines

Copyright 2008 VMware, Inc. All rights reserved.

DRS Performance and Best Practices

Resource Allocation for Virtual Machines and Resource Pools


DRSensuresthatresourcepoliciesaresatisfiedandbalancesloadacrossaclusterwhenappropriate.Inorder
toevaluateitsperformanceimpact,weusedthroughputofvariousworkloadsasameansofcomparisonin
variouscaseswitharealisticload.DRSalgorithmsshouldresponddynamicallytotheresourceallocation
changesforvirtualmachinesandresourcepools.Wesetupseveralexperimentstoverifythisbehaviorand
evaluateDRSeffectiveness,seekingtoanswerthefollowingquestions:

CanDRSautomaticallyreducetheloadonanovercommittedhostwhenadditionalresourcesareadded
oravailableintheclusterandconsequentlyalsoimprovethethroughputfortheworkloads?

HowdoesDRSrespondtoahighlyimbalancedcluster?HowdoesDRSaggressivenessaffectthe
placementofvirtualmachines?

CanDRSeffectivelyenforceresourceallocationpoliciesacrosshostsinaDRSclusterbysettingdifferent
sharevaluesforindividualvirtualmachines?

Test Methodology
Thissectiondescribesthetypesofbenchmarkworkloadsused,resourcesdemanded,andmetricsusedtotest
theimpactofresourceallocationonDRS.
Wesetupahomogenousclusterofhosts,eachconfiguredwithtwo2.8GHzIntelXeonCPUswith
hyperthreadingenabled.Eachhosthad4GBofmemory.WeusedadedicatedGigabitEthernetnetworkfor
VMotionandconfiguredDRStorunatthedefaultmigrationthreshold.
Forthisfirstsetoftests,weusedbenchmarkstoreflectrealisticworkloads.Thestableperiodfortheseloads
wasanhourduringwhichwetookmeasurements.Table1liststheseworkloads.
Table 1. Workloads for Resource Allocation Tests
Virtual Machine Workload

Benchmark

Resource Demand

Metric

Databaseserver

Swingbench

CPU:850MHz
Memory:700MB

Transactions/sec

Webserver

SPECweb2005

NetworkI/O:30Mb/s
CPU:1000MHz

Accesses/sec

Fileserver

Dbench

DiskI/O:120Mb/s
CPU:550MHz

MB/sec

Javaserver

SPECjbb2005

Memory:1.7GB
CPU:550MHz

Neworders/sec

Idlevirtualmachine

None

None

None

Test Results
ToverifythatDRScanautomaticallyreducetheloadsonanovercommittedhost,weranthefiveworkloads
showninTable1,eachinitsownvirtualmachine,ononehost.Allvirtualmachineshadequalsharesandno
reservationsorlimits.Figure1showsthiscase.Eachcirclerepresentsonevirtualmachine,withdifferent
lettersrepresentingthedifferentworkloads,andtheboxesrepresentingseparatephysicalhosts.Thesizeof
eachcirclerepresentshowmanysharesaregiventothatvirtualmachine,nottheCPUormemoryresource
demandsofthevirtualmachine.DetailsontheresourcedemandsarelistedinTable1.

Tests for Load Balancing with Increase in Available Resources


WeusedamixofvirtualmachinesthatdemandedmoreCPUresourcesthanwereavailableonthehoston
whichweplacedthem.Wemeasuredthethroughputforeachvirtualmachineasweaddedorpoweredona
newhostinthecluster.

Copyright 2008 VMware, Inc. All rights reserved.

DRS Performance and Best Practices

ThevirtualmachineworkloadsinFigure1andsubsequentdiagramsareidentifiedusingthefollowingkey:

Ddatabaseserver

Ffileserver

Iidle

JJavaapplicationserver

WWebserver

Figure 1. Initial Workload Distribution with DRS Disabled

W
J

Figure 2. Workload Distribution with DRS Enabled

WhenweenabledDRS,itmovedsomeoftheloadontheovercommittedhosttotheidlehostasshownin
Figure2.Usingthedefaultmigrationthreshold(moderate),DRSimprovedoverallsystemthroughput
(geometricmeanoftheindividualthroughputimprovements)by27percent.Theseresultsconfirmthatsimple
balancingprovidedbyDRScanautomaticallybringsignificantimprovementsinthroughputwithouttheneed
toplacevirtualmachinesmanually.
CPUintensiveloadsimprovedsignificantly,becausesystemCPUresourceswereovercommittedbeforewe
enabledDRS.Figure3showsthenormalizedthroughputimprovementforindividualvirtualmachinesused
inthistest.Thedatabaseserverthroughputimprovedby25percentbecausemoreCPUresourceswere
availableafterweenabledDRS.WebserverthroughputimprovedsignificantlyafterweenabledDRS.Because
theWebserverisanetworkintensiveload,itbenefitedfromDRSbecausethenetworkI/Ocausesincreased
CPUload.Ontheotherhand,theJavaapplicationserverthroughputremainedthesame,becausetheJava
serverworkloadismemoryintensive,andthesystemdidnotexperienceanymemorycontentionduringthe
test.Wedidnotuseafileserverworkloadforthistest,becausefileserverdemandsarelativelylowlevelof
CPUresources.Instead,weusedasecondinstanceoftheWebserverworkloadforthistest,whichrequires
moreCPUresourcesthanthefileserverworkloaddoes.

Copyright 2008 VMware, Inc. All rights reserved.

DRS Performance and Best Practices

Figure 3. Throughput Normalized to Baseline Individual Workload Throughput


1.2

1.0

0.8

0.6

0.4

0.2

0.0
DB server

Java server
DRS disabled

Web server 1

Web server 2

DRS enabled

Tests for Robust Response to Highly Imbalanced Cluster


TofurthervalidatetherobustnessoftheDRSalgorithms,wethenran10workloads,twoofeachoftheloads
describedinTable1,ontwoseparateESXhosts.Onceagain,wegaveallvirtualmachinesequalsharesandno
reservationsorlimits.Figure4depictsthescenariosthatweattemptedtovalidateduringthistest.
Figure 4. Distribution of Workloads When DRS Aggressiveness Is Adjusted

Scenario 1 baseline (optimal)

J
I

Scenario 2a moderate DRS (after)

W
J
F

Scenario 2 DRS enabled (before)

D D

Scenario 2b aggressive DRS (after)

J
I

W
J

SI

Scenario1showsthebaselineorbalancedthroughputconfiguration.Wechosethisasthebalanced
arrangementbecauseputtingoneofeachvirtualmachineperhostbalancestheCPU,memory,andI/O
demandsevenlywhileignoringlocalityorcachingeffect.WeusedthisasourbaselineandweexpectedDRS
toachieveathroughputclosetothis.
Scenario2depictstheworstpossibleconfiguration.WestartwiththisconfigurationtoseehowDRSbalances
theloadofthesevirtualmachines.
Weperformedthetestusingdefault(moderate)andaggressivemigrationthresholdsettingstotestwhether
theresourceallocationthatDRSgeneratescanmatchthebaselineconfiguration.Theworstpossible
configurationcouldnotbesustainedononehostwithoutfailuresinthebenchmarkmetricsbecauseof
resourceconstraints.SoweinsertedadelaybetweenthestartofonevirtualmachineandthenexttoallowDRS
tobalanceresourcesasweaddedeachnewloadtothefirsthost.

Copyright 2008 VMware, Inc. All rights reserved.

DRS Performance and Best Practices

OurresultsshowedthatthevirtualmachineplacementaswellassystemthroughputwithDRSwasveryclose
tothatofthebaselineconfiguration.Theoverallsystemthroughputwas95percentofthatofthebaseline
configurationfortheDRSconfigurationwithamoderatethresholdand99percentofthatofthebaseline
configurationwiththeaggressivethresholdsetting.Thisincludestheimpactonthroughputcausedbythe
virtualmachinemigrationsneededtogettothefinalstateofthecluster.Becausetheloadswereconstant,the
finalstatewasachievedquicklyaftertheloadsreachedstablestateandnofurthermigrationsweremade.
Forthetestwiththedefaultthresholdsetting(moderate),DRSperformedfourmigrationsthatis,itmoved
oneWebserver,oneJavaserver,andthetwodatabaseservers.
Thetestusingtheaggressivethresholdyieldedanalmostbaselineconfiguration.Thistestperformedmore
migrationsatotalofeighttogettothisstate.ThenumberofmigrationsreflectsthefactthatDRSload
balancingoccurswhiletheloadisrampingup,andhenceinthemoreaggressivecase,somemigrationswere
revertedafterallloadsreachedsteadystate.Theidlevirtualmachinewasnotmigratedbecausethecostof
migratingthevirtualmachinecouldnotbejustified,giventhattherewasnoload.Becausetheloadsare
relativelystableovertime,weobtainedbetterthroughputwithanaggressivethreshold.However,thedefault
settinginDRSismoderateinordertoaccommodatechangingworkloads.
Figure5comparesthesystemthroughputforindividualworkloadsinthebaselinecase,usingDRSwitha
moderatethreshold,andusingDRSwithanaggressivethreshold.Wenormalizedallthroughputresultstothe
baselinecaseforeasiercomparisonandmeasuredtheresultsoverastableperiodofanhour.
Inthemoderatecase,allindividualthroughputsarewithin80percentofbaselineandoverallwithin95
percent(geometricmeanofalltherelativethroughputs)ofbaseline.ThisistruedespitethefactsthatDRShas
anopportunitytorebalanceonlyonceeveryfiveminutes(default)andthatitisaffectedbyVMotionoverhead.
WhenweusedDRSwithamoderatethreshold,thedatabaseserverswerethemostaffectedbecauseDRS
placedbothdatabasevirtualmachinesonthesamehost.Migratingoneofthedatabaseloadstoadifferenthost
wasnotjustifiedunderthemoderatethresholdbecausetheimbalanceacrosshostswasnotgreatenoughto
justifyit.Someworkloadsachievedbetterthroughputthanthebaselinebecauseadditionalresourceswere
availableonthehostonwhichtheworkloadwasrunning.
Inthemoreaggressivecase,theresultsaredifferent.Becauseeachworkloadwasfairlyconstant,overtime
DRSaggressiverebalancingproducedbetterthroughput.
Iftheloadsvary,DRSwithamoderatethresholdmayachievebetterthroughputinthelongrunbecauseit
wouldrecommendfewermigrationsandhenceincuralowerVMotioncostcomparedtoDRSwithan
aggressivethreshold.
Whenweincreasedthenumberofhosts,asdescribedinScalingtheNumberofHostsandVirtualMachines
onpage 9,DRSwithamoderatethresholdachievedthroughputcomparabletothatofDRSwithanaggressive
thresholdanddidsowithfewermigrations.
Figure 5. Relative Change in Throughput of Workloads
1.2
1.0
0.8
0.6
0.4
0.2
0.0
File server 1 File server 2 DB server 1
Baseline

Copyright 2008 VMware, Inc. All rights reserved.

DB server 2 Java server 1 Java server 2 Web server 1 Web server 2


DRS moderate (default)

DRS aggressive

DRS Performance and Best Practices

Enforcing Resource Allocation Policies


InaVMwareInfrastructure3environment,youcanallocateresourcesusingabsolutememoryandCPU
bounds(reservationandlimit),proportionalvalues(shares),oracombinationofthetwoforthevirtual
machinesandresourcepoolsinthecluster.Areservationenforcesaguaranteedminimumforresourcesand
alimitenforcesahardmaximum.Sharesettingscanbelow(1),normal(2),high(4),orcustom.Youcan
assigndifferentsharevaluesforCPU,memory,orbothtoensureproportionalresourceallocation.ESX
providesresourceswhileensuringtheallocationdoesnotviolatetheassignedreservationsandlimits(inHz
orbytes).
TotesttheeffectivenessofDRSincombinationwithresourceallocationfeatures,weranvarioustypesof
workloadvirtualmachineswithdifferentsharevalues.Figure6showstheninevirtualmachinesrunningon
twohostsbeforeDRSisenabled.Thesizeofeachcircleinthisfigurereflectsthenumberofstaticshares
assignedtotheworkload,nottheactualresourceusageofeachworkload.Hence,workloadswithdifferent
resourceusagecouldhavecirclesofthesamesize.Thesharesareassignedintheratioof1:2:4.Welocatedall
low(1)sharevirtualmachinesonthesamehostatthestartofthetestandlocatedthevirtualmachineswith
normal(2)andhigh(4)sharesonadifferenthost.
Figure 6. Initial Distribution of Virtual Machines with High Resource Contention

W
W

WeassignedthethreeWebservervirtualmachineshigh,normal,andlowshares.WeplacedthetwoWeb
serverswithhighandnormalsharesonthesamehost,hencetheyshareresourcefromonehost.Thevirtual
machinewiththelowsharesettinghasnootherWebservertocontendwith,henceitgetsrelativelymoreCPU
resourcesbecauseitencounterslesscontention.
Afterweenabledit,DRSbalancedtheworkloadstogivethefinalconfigurationasshowninFigure7.DRS
detectedthattheWebserverwithhighshareswasnotreceivingtheresourcesitwasentitledtoreceive.Infact,
asshowninFigure8,itwasreceivingalowerallocationthanthevirtualmachineconfiguredforalowshare
wasreceiving.DRSswappedtheWebservervirtualmachineswithlowandhighshares.ThisallowedtheWeb
serverconfiguredforahighsharetotakeadvantageoftheresourcesavailableontheotherhost.Theswapalso
freedresourcesfortheothervirtualmachineswithhighshares.Thus,DRSbalancedtheWebserverloads
acrossthetwohostsandsatisfiedtheoverallsystemsharesforallninevirtualmachines.
Figure 7. Balanced Distribution of Virtual Machines After DRS Swaps Two Web Servers

W
D

Copyright 2008 VMware, Inc. All rights reserved.

D
W

DRS Performance and Best Practices

Figure 8. Web Server Throughput


1800
1600
1400
1200
1000
800
600
400
200
0
Web server 1 (high)

Web server 2 (normal)


Throughput without DRS

Web server 3 (low)

Throughput with DRS

Figure8showsthechangeinWebservervirtualmachinethroughputbeforeandafterweenabledDRS.Before
weenabledDRS,theWebservervirtualmachinewithlowsharesachievedtwicethethroughputoftheWeb
servervirtualmachinewithnormalshares.WithoutDRS,thescopeofresourcesharingislimitedtoasingle
host.BecausetheWebservervirtualmachineswithhighandnormalsharesranonthesamehost,their
respectivethroughputnumbersreflectedtheirrelativesharesasmanagedbythelocalESXresourcescheduler,
whichcannotscheduleresourcesacrosshosts.However,afterweenabledDRS,thevirtualmachine
throughputnumbersreflectedtherelativevirtualmachinesharesforallvirtualmachines.DRSenforces
resourcesharingacrosshostsbycontrollingthehostlevelresourcesettings.
TheidleandJavaapplicationworkloadsdidnotchangebecausetheydonothaveanysignificantCPUusage
andthereisnomemoryovercommitmentorimbalance.BecauseCPUistheresourcefacingcontentioninthis
scenario,theWebserveranddatabaseworkloadsweregoodcandidatesformigration.DRSmigratedtheWeb
serverwithhighsharestothehostwherevirtualmachineswereentitledtofewerresourcesbecauseDRS
estimatedthatthemigrationwouldbringbetterbalanceatminimalmigrationcost.Later,DRSbalancedCPU
usagebymovingthelowsharesWebservervirtualmachinetothesecondhosttofurtherbalancethe
entitlements.

Resource Allocation Recommendations


ThefollowingrecommendationscanhelpyouimproveresourceallocationinyourDRSclusters.

Sharestakeeffectonlywhenthereiscontentionforresources.Hence,ifCPUresourcesonahostarenot
fullycommitted,eachvirtualmachineorresourcepoolonthathostgetsalltheCPUresourcesitdemands.
Similarly,ifresourcesonaDRSclusterarenotfullycommitted,virtualmachinesorresourcepoolsreceive
theresourcestheydemand,solongasthefollowingconditionsaremet:

TheDRSclusterhasnounappliedmanualrecommendations.

Novirtualmachineorresourcepoollimitsaresettovaluesbelowwhatthevirtualmachineworkload
demands.

Novirtualmachineorresourcepoolreservationsaresetinsuchawaythattheypreventany
balancingmigrationfromanovercommittedhost.

Noaffinityandantiaffinityrulesaresetinsuchawaythattheypreventanybalancingmigrationfrom
anovercommittedhost.

NohostVMotionincompatibilitiessuchasCPUincompatibilities,lackofasharedvirtualmachine
orVMotionnetwork,orlackofasharedvirtualmachinedatastorepreventanybalancingmigration
fromanovercommittedhost.

Copyright 2008 VMware, Inc. All rights reserved.

DRS Performance and Best Practices

Youmightwasteidleresourcesifyouspecifyaresourcelimitforoneormorevirtualmachines.The
systemdoesnotallowvirtualmachinestousemoreresourcesthanthelimit,evenwhenthesystemis
underutilizedandidleresourcesareavailable.Specifyalimitonlyifyouhavespecificreasonsfordoing
so.

Forresourcepools,resourceallocationsaredistributedamongtheirsiblingvirtualmachinesorresource
poolsbasedontheirrelativeshares,reservations,andlimits.

Theexpandablereservationsettingforresourcepoolsallowsthereservationtogoaboveitsinitialvalue
ifthereservationofthechildvirtualmachinesorresourcepoolsincrease.Settingafixedreservation
preventsanyincreaseinthechildvirtualmachineorresourcepoolreservations.

Virtualmachineshavesomeoverheadmemorythatthesystemmustaccountforinadditiontothe
memoryusedbytheguestoperatingsystem.Thismemoryischargedasadditionalreservationontopof
theuserconfiguredvirtualmachinereservation.Hence,whenyousetthereservationsandlimitsfor
resourcepools,youshouldkeepabufferfortheextraoverheadreservationneededbythevirtual
machines.SeetheResourceManagementGuideforestimatedmemoryoverhead.(SeeResourceson
page 19foralink.)Theamountdependsonvirtualmachinememorysize,thenumberofvirtualCPUs,
andwhetherthevirtualmachineisrunninga32bitor64bitguestoperatingsystem.Forinstance,a
virtualmachinewithonevirtualCPUand1024MBofmemoryhasavirtualmachineoverheadof
approximately98MBfora32bitguestoperatingsystemorapproximately118MBfor64bitguest
operatingsystem.Also,theoverheadreservationmaygrowwithtimedependingontheworkload
runninginthevirtualmachine.Thisoverheadgrowthisdesignedtoensureoptimalperformancewithin
theguestoperatingsystem.

Resourcepoolsmakemanagingresourceallocationsmucheasierbygroupingvirtualmachines,resource
pools,orbothwithineachresourcepool.Thisallowsadministratorstoapplysettingsacrossmany
resourceentitieseasilyandtobuildahierarchicalstructurethatmakesiteasytoallocatetheirresources.
Thispaperdoesnotaddresstheuseofresourcepoolhierarchiesbecausetheydonothaveanyimpacton
virtualmachineperformance.

Scaling the Number of Hosts and Virtual Machines


Scalingofthenumberofhostsandvirtualmachinesaffectstheresourcedistributionandloadbalancing
opportunitiesavailabletoDRS.Ourtestingexaminedtheimpactofsizeoftheclusteronthethroughputofthe
system.IncreasingthenumberofhostsinaclustergivesDRSmorelocationswhereitcanplacevirtual
machines,andconsequentlyaffectstheoutputofDRSalgorithms.Furthermore,asystemwithmorehosts
normallyhasmorevirtualmachines,aswell.IncreasingthenumberofvirtualmachinesmeansDRScanmove
moreentitiesaround.Eachofthesevirtualmachinesmightberunningdifferentworkloadsatdifferenttimes.
IncreasingthenumberofhostsandvirtualmachinesdoesincreasethecomplexityofvariousDRS
computations,butatthesametime,theincreaseprovidesopportunityformoreefficientresourcedistribution.
However,astheseopportunitiesincrease,thenumberofmigrationsmayincreaseandasaresultmayincur
additionaloverhead.
Weaccountedforthisoverheadinourexperimentsinordertoevaluatethescaling.Resourcepoolscan
simplifythetaskofallocatingresourcesacrosstheDRScluster,especiallywhendealingwithalargenumber
ofvirtualmachinesandhosts.However,scalingthenumberofresourcepoolsdoesnotaffectperformanceof
thevirtualmachinessolongastheresourcepoliciesareeffectivelyallocatedidentically.Hence,thissection
doesnotconsiderresourcepoolscaling.

Test Methodology
TheobjectiveofthetestwastomeasuretheeffectivenessoftheDRSalgorithmsaswescaledtolarger
environments,whileensuringthattheoverallclusterutilizationremainedconstantthatis,thattheratioof
overallvirtualmachinedemandtototalclusterresourcesremainedconstant.

Copyright 2008 VMware, Inc. All rights reserved.

DRS Performance and Best Practices

Wesetupahomogenousclusterofhostseachconfiguredwithtwo2.8GHzIntelXeonCPUswith
hyperthreadingenabled.Eachhosthad4GBofmemory.WededicatedaGigabitEthernetnetworkfor
VMotionandconfiguredDRStorunatthedefaultmigrationthreshold(moderate).Eachhosthadthree
workloadvirtualmachines,eachwithdifferentworkloads.
Table 2. Test Environment for Scaling Tests
Number of
Hosts

Number of Virtual
Machines

Number of Each Constant


Workload

Number of Each Varying


Workload

12

24

16

48

16

WeconfiguredeachvirtualmachinewithasinglevirtualCPUand1GBofRAMandinstalledRedHat
EnterpriseLinux3astheguestoperatingsystem.Forsimplicity,wegaveeachvirtualmachinethesameshares
andsetnoreservationsorlimits.Weranoneofthefollowingcategoriesofworkloadsinsideeachvirtual
machine:

ConstantworkloadusedafixedamountofCPUresourcesof10percent,50percent,or90percentof
CPU(ofonevirtualCPU).

VaryingworkloadusedavaryingamountofCPUresources,changingwithtime.Foreasyanalysis,the
workloadvariedamongthreesteps,using10percent,50percent,or90percentofCPU,witheachstep
lastingfiveminutes.Weusedallpossiblecombinationsofthethreesteploadtogeneratesixdifferent
workloadtypes,asshowninFigure9.OurresultswiththesevaryingworkloadsdemonstratethatDRS
performseffectivelyevenifloadsconstantlychangeovertimeandnostablestateisreached.Asourtests
showed,thisisoneofthemainadvantagesofusingDRS.

Figure 9. Patterns of Varying Workloads During Tests

4
5

6
Wedistributedvariouscombinationsofthevaryingworkloadvirtualmachinesacrosshoststoachieve
differentpatternsthatreflectthreehostusagescenarios.

Copyright 2008 VMware, Inc. All rights reserved.

10

DRS Performance and Best Practices

Figure 10. Workload on a Host with Optimal Placement


Virtual machine loads

Host load

1
+

Host capacity
Total virtual
machine load

Underutilized

+
5

Intheoptimalplacementscenario,showninfigure10,weplacedcompensatingloadstogetheroneachhost.
Withthisplacement,allpeaksmatchedthetroughsofothervirtualmachinesonthesamehost.Asaresult,
noneofthehostsareovercommittedandtheloadisbalancedacrossallhostsinthecluster.Thethroughput
achievedinthisscenarioisusedasthebaselineforoptimalthroughputintheteststhatfollow.
Figure 11. Workload on a Host with Bad Placement
Virtual machine loads

Host load
Total virtual
machine load
Overcommitted

1
+
1

Host capacity

=
+

Underutilized

1
Inthebadplacementscenario,showninFigure11,weplacedsimilarworkloadstogetheroneachhost
whereverpossible.Asaresult,peaksandvalleysoccurtogetheronahost,resultinginahighlyimbalanced
cluster.Consequently,thehostsareeitherovercommittedorhighlyundercommitted.Thisresultsinnotonly
majorloadimbalanceacrosshostsbutalsopoorperformanceinthevirtualmachinesrunningon
overcommittedhosts.AsshowninTable2,weusedincreasingnumbersofvirtualmachineswithsimilar
workloadsasweincreasedthesizeofthecluster.Asaresult,wesawgreaterimbalanceforabadplacement
astheclusterscaledup.
Figure 12. Workload on a Host with Symmetric Placement
Host load

Virtual machine loads

Overcommitted
Host capacity

+
=

2
+

Underutilized

3
Inthesymmetricplacementscenario,theclusterismoderatelyimbalanced,withamixofhoststhatare
somewhatovercommittedandhoststhataresomewhatundercommitted.Wecallthissymmetricplacement
becauseweusesthesameloaddistributionforeachpairofhostsaswescaledup.Asaresult,theload
imbalanceremainedthesameforallstagesofourtests.
Withthesmallerconfigurations,therearefewerworkloadsofthesametypeandhencetheloaddistribution
cannotbeasflexibleasitiswiththelargersetups.

Copyright 2008 VMware, Inc. All rights reserved.

11

DRS Performance and Best Practices

Test Results
OurresultsshowthatDRSimproveshostloadbalanceacrossacluster.Theloadbalanceimprovement
improvesasthenumberofhostsandvirtualmachinesintheclusterincreases.Furthermore,throughputalso
improvesasthenumberofhostsandvirtualmachinesincreases.DRSperformsslightlybetterwithmorehosts
becausetherearemoreopportunitiesforbalancingloads.Asthesizeoftheclusterincreased,thenumberof
migrationsincreasedproportionally,butDRSensuredthatthemigrationswouldbenefitthroughput,hence
theaveragevirtualmachinethroughputimproved.DRSdidnotaffectthroughputwhenthevirtualmachines
wereplacedintheoptimalconfiguration,becausetheclusterwasalreadybalanced.Thekeyresultspresented
inthissectionhighlightthesefindings.
Figure13showstheresultswhenvirtualmachinesweredistributedinabadplacementthatis,ahighly
imbalancedclusteratthebeginningoftheexperimentwitheachofthevirtualmachineworkloadsvarying
everyfiveminutes.BecauseDRScomputedloadsandoptionsforbalancingtheloadseveryfiveminutes,this
wasaworstcaseworkload.SoonafterDRSfinishedbalancingtheload,thevirtualmachineloadschanged.
AsshowninFigure13,DRSimprovesvirtualmachinethroughputevenwhenitmustbalanceworkloadsthat
varygreatly.Atallenvironmentsizes,theaveragevirtualmachinethroughputstayedwithin96percentofthe
optimalthroughput.DRSachievedtheseresultsdespitethefactthatitwasusingVMotion,withitsattendant
overhead,tobalancetheload.Furthermore,thethroughputimprovementcomparedtotheinitialcondition,
withoutDRS,increasedaswescaledtolargerenvironments.Thisresultmainlyreflectsthefactthataswe
scaledup,thenumberofsimilarworkloadsincreased.Thehighernumberofsimilarworkloads,inturn,
causedgreaterimbalancewithbadplacementinlargerenvironments.
Figure13showstheresultswhenwesetDRStothedefaultaggressiveness(moderate).WhenwesetDRStoa
moreaggressivethreshold,weobservedmanymoremigrations,butthroughputimprovementsweresimilar
tothoseachievedwiththemoderatethreshold.Inaseparatetest,inwhichthevirtualmachineswererunning
constantloads,weobservedevenbetterthroughputimprovements.
Figure 13. Throughput with Varying Load and Bad Placement
1.02

Average virtual machine throughput

1.00
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
2 hosts / 6 VMs

4 hosts / 12 VMs

8 hosts / 24 VMs

No DRS

Optimal placement

DRS moderate

16 hosts / 48 VMs

Figure14showstheresultswhenwedistributedvirtualmachinesinasymmetricplacementthatis,a
somewhatimbalancedclusteratthebeginningofthetestwitheachofthevirtualmachineworkloadsvarying
everyfiveminutes.AsFigure14shows,theinitialcondition,withoutDRS,achievesthesamevirtualmachine
throughputatallenvironmentsizesbecausethevirtualmachinedistributionissymmetric.Ineachsize
environment,whenweenabledDRSatamoderatethreshold,itbroughttheaveragethroughputtowithin98
percentofoptimal.Increasingtheenvironmentsizebroughtminimalimprovementinthroughput,butthe
mainreasonoptimalthroughputwasnotachievedwastheoverheadofVMotionasDRSbalancedtheload.
Overall,DRAachievedsomeperformancegainsbytakingadvantageofsmalleropportunitiesforavoiding
overcommitment.
Copyright 2008 VMware, Inc. All rights reserved.

12

DRS Performance and Best Practices

Figure 14. Throughput with Varying Load and Symmetric Placement


1.02
1.00

Average virtual machine throughput

0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
2 hosts / 6 VMs

4 hosts / 12 VMs
No DRS

DRS moderate

8 hosts / 24 VMs

16 hosts / 48 VMs

Optimal placement

DRS Aggressiveness
DegreeofaggressivenessforDRScontrolshowaggressivelyDRSactstobalanceresourcesacrosshostsina
cluster.Thedegreeofaggressivenessissetfortheentirecluster.Thefiveoptionsrangefromconservative(level
one)tomoderate(levelthree)toaggressive(levelfive)andcontrolthedegreetowhichDRStriestobalance
resourcesacrosshostswithinthecluster.
DRSmakesVMotionrecommendationsbasedonwhatisreferredtoastheirgoodnessvaluethatis,how
muchimprovementinloadbalancingtheactioncanachieve.Thegoodnessvalueistranslatedtoarating
betweenonestarandfivestars,andDRSexecutesmigrationsbasedontheaggressivenessthresholdsetfora
cluster.InVMwareInfrastructure3,amigrationreceivesagoodnessvalueoffivestarsifitisrecommendedto
resolveaffinity,antiaffinity,orreservationrules.Foreachlowerstarratingthebalancingimpactofthe
migrationisless.Soamigrationwitharatingoffourstarswouldimprovetheloadbalancemorethana
migrationwitharatingofthreestars,twostars,oronestar.
Figure15showshowthroughputcomparesforthemoderateandaggressivelevelsforthemixofconstantload
virtualmachinesandbadplacementdescribedinScalingtheNumberofHostsandVirtualMachineson
page 9.Theresultsshowthattheaggressivethresholdachievedslightlybetterthroughputcomparedtothe
moderatesetting.However,aswescaledtolargerconfigurations,thethroughputwasalmostthesameforboth
settings.Whenweusedtheaggressivesetting,wesawmoremigrationsthanwedidwhenweusedthe
moderatesetting.Whenweusedvaryingworkloads,notonlyweretheremoremigrationsintheaggressive
casebutthethroughputdifferencesforthemoderateandaggressivesettingswereevensmaller.

Copyright 2008 VMware, Inc. All rights reserved.

13

DRS Performance and Best Practices

Figure 15. Throughput for Constant Load and Bad Placement


1.02

Average virtual machine throughput

1.00
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
2 hosts / 6 VMs
No DRS

4 hosts / 12 VMs
DRS moderate

8 hosts / 24 VMs

DRS aggressive

16 hosts / 48 VMs

Optimal placement

Interval Between DRS Invocations


VirtualCenteractivatestheDRSalgorithmafterafixedinterval(thedefaultisfiveminutes)soDRScanmake
recommendationsbasedonthepastperformancemetricsofthevirtualmachinesintheDRScluster.This
sectiondiscussestheimpactontheDRSalgorithmofchangingthisinterval.TheDRScalculationsareinvoked
betweentheregularlyscheduledinvocationsiftherearechangesthatrequireDRSdecisionsforexample,if
youchangeresourceallocationsettings.Ourdiscussionignorestheseintermediateinvocationsbecausethey
dependonthefrequencywithwhichyouperformadministrativeoperationsinthecluster.
OurresultsshowthatchoosingtheappropriateDRSperiodicfrequencydependsontheperiodicityofthe
workloadswithinthevirtualmachinesinthecluster.FairlyconstantloadsmaybenefitfromfrequentDRS
invocationssothatimmediateactionistakenwhentheloaddoeschange.However,wedonotrecommendan
intervaloflessthanfiveminutesbecauseofthewaythealgorithmusesstatistics.Weobservednonotable
benefitwhenweincreasedfrequencytomorethanonceeveryfiveminutes,andrunningtheDRSalgorithm
veryoftencausesunnecessaryoverhead.Inaddition,anintervaloflessthanfiveminutesmightbetoo
aggressivegiventhatVMotionmigrationscantakeontheorderoftensofsecondstocomplete,dependingon
theloadrunninginthevirtualmachine.(Theguestoperatingsystemrunsnormallyforthemostofthe
VMotionoperationandgenerallyincursadowntimeoflessthanonesecond.)
Ifyounoticeveryfrequentmigrations,youmightfeeltheneedtoincreasetheinvocationinterval.However,
beforedoingso,youshouldchecktheDRSaggressivenesssetting.Decreasingtheaggressivenessmightbethe
betterchoice.WhenahighnumberofmigrationsisrecommendedinoneDRSinvocation,thisisusuallythe
resultofanaggressivethresholdsetting.
Ifaclusterincludesvirtualmachineswithmemoryintensiveworkloads,aninfrequentinvocationsetting
mightnotallowDRStorespondtomemorypressuresoonenoughandthevirtualmachinemightstart
ballooningorswappingunnecessarily.Thiscansignificantlydegradeperformance.However,becausean
increaseinmemoryuseistypicallyamoregradualprocess,keepingthedefaultoffiveminutesdoesnotaffect
performanceforamajorityofworkloads.DRSalsoprioritizesmemorybalancingmigrationsoverCPU
balancingmigrationswhenmemorycommitmentishighinordertopreventunnecessaryballooningor
swapping.

Copyright 2008 VMware, Inc. All rights reserved.

14

DRS Performance and Best Practices

Althoughwerecommendkeepingthedefaultvalueoffiveminutes,youcanchangethefrequencybyadding
thefollowingoptionsinthevpxd.cfgfile,changing300tothedesirednumberofseconds.
<config>
.
<drm>
<pollPeriodSec>
300 <!--# of seconds desired between 60 and 3600 -- >
</pollPeriodSec>
</drm>

</config>

Cost-Benefit Analysis
Costbenefitcalculationsactasapivotindeterminingwhetherasinglemigrationacrosshostsisbeneficialor
not.Usingthehistoryofvirtualmachineworkloads,DRSdeterminestheexpectedworkloadofthevirtual
machinesoneachhostduringintervalbetweenitscalculationandthenexttimeDRSwillbeinvoked.Before
migratinganyvirtualmachine,itcalculatesthegainachievedbymigratingthevirtualmachineminusthe
expectedmigrationcost(VMotionrequiresresources)andaccountsforthepotentialworstcaseworkloadof
thevirtualmachineafteritmigratestothedestinationhost.Themigrationisallowedonlyifthetotalgainis
positive.
Virtualmachineswithhighlyvaryingworkloadscanbenefitfromusingcostbenefitanalysisbynotmigrating
often,onlywhenDRScalculatesthatthetargetvirtualmachineandotheraffectedvirtualmachineswill
benefitfromthemovebeforeDRSwouldbeinvokedagain.Wetestedseveralscenariosinwhichcostbenefit
analysisiseitherturnedonorturnedoffwhiletheclusterrunsvaryingworkloads.Theresultsdemonstrate
thatwithcostbenefitanalysisenabled,DRSmakesbetterdecisions.Weobservedveryfewunnecessary
migrations(unnecessarymigrationsincludepingpongmigrationsbackandforthbetweentwohostsasload
changes)whencostbenefitanalysiswasenabled.Redundantmigrationsmightoccurinearlyrounds,aswe
observedinourtests,becauseDRSneedstoadapttounderstandthemigrationcosts,whichvaryforparticular
virtualmachinesandnetworkenvironments.Werecommendthatyouleavethecostbenefitalgorithm
enabledthedefaultsetting.

Heterogeneous Host Environments


HeterogeneoushostenvironmentsintroduceadditionalcomplexitiesthatcanaffecttheperformanceofaDRS
cluster.Thissectioncoversseveralissuestoconsider,suchasVMotioncompatibilityandarchitectural
differences.
IfyouareconfiguringaDRSclusterwithheterogeneoushosts,besuretoconsiderthefollowingfactors:

VMotionhardwarecompatibilityVirtualmachinescannotuseVMotiontomigrateacrossdifferentCPU
types(fromInteltoAMDorviceversa).Also,theremaybeCPUincompatibilitiesacrossdifferent
generationsfromthesamemanufacturerthatcanalsopreventVMotionforexample,CPUsthatare
SSEenabledtothosethatarenotSSEenabled.However,inVMwareInfrastructure3beginningwith
version3.5Update2,EnhancedVMotionCompatibilitygivesyoutheabilitytoensureVMotion
compatibilityforthehostsintheclusterbypresentingthesameCPUfeaturesetacrosshosts,evenwhen
theactualCPUsonthehostsdiffer.SeeVMwareVMotionandCPUCompatibilityfordetails.See
Resourcesonpage 19foralink.

HardwaredifferencesacrosscompatiblehostsEvenifhostsareVMotioncompatible,theymaybe
runningatdifferentclockspeeds,usedifferenthyperthreadingsettings,orhavedifferentmemory
subsystemarchitecturesforexample,differentcachesizesorNUMAenabledononeanddisabledon
theother.Asaresult,theeffectiveresourceentitlementsofvirtualmachinesmightbedifferenton
differenthosts,andthesedifferencesmightleadtodifferencesinperformanceacrosshosts.

Copyright 2008 VMware, Inc. All rights reserved.

15

DRS Performance and Best Practices

Highlyskewedhostsizes

Virtualmachinesmightnotfitonallhosts.Forexample,virtualmachineswithfourvirtualCPUs
cannotmigratetohostswithonlytwophysicalCPUs.

DRStendstofavormigratingvirtualmachinestoahostwithmorememoryorCPUresourcesto
balancememoryorCPUloads.Essentially,thealgorithmtriestobalancetherelativeutilizationsfor
eachhost.Soifaclusterhasvirtualmachineswithidenticalloads,DRStypicallyplacesmorevirtual
machinesonahostthathasmoreCPUormemoryresources.Ifthevirtualmachineloadsare
primarilyCPUintensive,thehostmemorysizesdonotreallymatter.Conversely,iftheloadsare
memoryintensive,CPUcapacitiesdonotmatter.

HostconfigurationTheconfigurationofthehostnetworkordatastoremaydisallowmigrationusing
VMotionbecausenotallhostssharethesamevirtualmachineorVMotionnetworkordatastore.

DRSdoesareasonablejobinbalancingloadacrossheterogeneoushostssolongastheclusterismadeupofa
subsetofhoststhatareVMotioncompatible.Thefactorsdiscussedinthissectionmaylimitbenefits.We
recommendahomogenousclusterwhereverpossible.

Impact of Idle Virtual Machines


BasedonthetestsweperformedinourlabswithDRS,inenvironmentsrangingfromsmalltolarge,idle
virtualmachinesweregenerallynotmigratedbecausetheydidnotusesignificantresources.However,inan
imbalancedDRSclusterwithaggressivemode,alargenumberofidlevirtualmachinesperhost,orboth,DRS
wouldmigratetheidlevirtualmachinesbecausetheyhavesomememoryoverhead.Furthermore,some
operatingsystemsuseCPUcyclesforidlingforexample,fordeliveringguesttimerinterruptsandthiscan
alsoaffectDRSmigrationrecommendations.

DRS Performance Best Practices


ClusteringconfigurationscanhavesignificantimpactonDRSperformance.VMwarerecommendsthe
followingDRSconfigurationsandpracticesforoptimalperformance:

WhendecidingwhichhoststogroupintoaDRScluster,trytochoosehoststhatareashomogeneousas
possibleinCPUandmemory.Thisensureshigherperformancepredictabilityandstability.VMotionis
notsupportedacrosshostswithincompatibleCPUs.Hencewithheterogeneoussystemsthathave
incompatibleCPUs,DRSislimitedinthenumberofopportunitiesforimprovingtheloadbalanceacross
thecluster.ToensureCPUcompatibility,systemsshouldbeconfiguredwithCPUsfromthesamevendor,
withsimilarCPUfamilyandSSE3status.However,inVMwareInfrastructure3beginningwithversion
3.5Update2,EnhancedVMotionCompatibilitygivesyoutheabilitytoensureVMotioncompatibilityfor
thehostsintheclusterbypresentingthesameCPUfeaturesetacrosshosts,evenwhentheactualCPUs
onthehostsdiffer.SeeVMwareVMotionandCPUCompatibilityfordetails.SeeResourceson
page 19foralink.
DRSperformsinitialplacementofvirtualmachinesacrossallhostseveniftheyarenotVMotion
compatible.
IfheterogeneoussystemsdohavecompatibleCPUsbuthavedifferentCPUfrequencies,differing
amountsofmemory,orboth,thesystemswithmorememoryandhigherCPUfrequenciesaregenerally
preferred,allotherthingsbeingequal,ashostsforvirtualmachines,becausetheyhavemoreroomto
accommodatepeakload.
Usingmachineswithdifferentcacheormemoryarchitecturesmaycausesomeinconsistencyin
performanceofvirtualmachinesacrossthosehosts.

WhenmoreESXhostsinaDRSclusterareVMotioncompatible,DRShasmorechoicestobetterbalance
workloadsacrossthecluster.Werecommendclustersofupto32hosts.

Copyright 2008 VMware, Inc. All rights reserved.

16

DRS Performance and Best Practices

BesidesCPUincompatibilities,somemisconfigurationscanmaketwoormorehostsincompatible.For
instance,ifthehostsVMotionnetworkadaptersarenotconnectedbyaGigabitEthernetlink,the
VMotionmigrationmightnotoccurbetweenthehosts.YoushouldalsobesuretheVMotiongatewayis
configuredcorrectly,VMotionnetworkadaptersonthesourceanddestinationhostshavecompatible
securitypolicies,andthevirtualmachinenetworkisavailableonthedestinationhost.SeetheResource
ManagementGuideandtheESXServer3ConfigurationGuidefordetails.SeeResourcesonpage 19for
links.

Thedefaultmigrationthreshold(moderate)worksformostconfigurations.Youcansetthemigration
thresholdtomoreaggressivelevelswhenallofthefollowingconditionsaresatisfied:

Thehostsintheclusterarerelativelyhomogeneous.

Thevirtualmachinesresourceutilizationremainsfairlyconstant.

Theclusterhasrelativelyfewconstraintsonwhereavirtualmachinecanbeplaced

Youshouldsetthemigrationthresholdtomoreconservativelevelswhentheconverseistrue.

ThedefaultDRSfrequencyisonceeveryfiveminutes,butyoucansetittoanyperiodbetweenoneand
60minutes.Youshouldavoidchangingthedefaultvalue.Ifyouareconsideringachangeinthesetting,
seeIntervalBetweenDRSInvocationsonpage 14.

Ingeneral,donotspecifyaffinityrulesunlessyouhaveaspecificneedtodoso.Insomecases,however,
specifyingaffinityrulescanimproveperformance.

Keepingvirtualmachinestogethercanimproveperformanceifthevirtualmachinesneedto
communicatewitheachother,becausenetworkcommunicationbetweenvirtualmachinesonthe
samehostenjoyslowerlatencies.

Separatingvirtualmachinesmaintainsmaximalavailabilityofthevirtualmachines.
Forexample,iftwovirtualmachinesarebothWebserverfrontendstothesameapplication,you
wanttomakesurethattheydonotbothgodownatthesametime.
Anotherexampleofvirtualmachinesthatmightneedtobeseparatedisvirtualmachineswith
I/Ointensiveworkloads.Iftheyshareasinglehost,theymightsaturatethehostsI/Ocapacity,
leadingtoperformancedegradation.DRSdoesnotmakevirtualmachineplacementdecisionsbased
ontheirusageofI/Oresources.

Assignresourceallocationstovirtualmachinesandresourcepoolscarefully.Bemindfuloftheimpactof
limits,reservationsandvirtualmachinememoryoverhead.SeeResourceAllocationRecommendations
onpage 8fordetails.

VirtualmachineswithsmallermemorysizesorfewervirtualCPUsprovidemoreopportunitiesforDRS
tomigratetheminordertoimprovebalanceacrossthecluster.Virtualmachineswithlargermemorysize
ormorevirtualCPUsaddmoreconstraintsinmigratingthevirtualmachines.Henceyoushould
configureonlyasmanyvirtualCPUsandasmuchmemoryforavirtualmachineasneeded.

YoucanspecifyDRSmodesofautomatic,manual,orpartiallyautomatedattheclusterlevelaswellasthe
virtualmachinelevel.Werecommendthatyoukeeptheclusterinautomaticmode.Forvirtualmachines
sensitivetoVMotion,youcansetmanualmodeatthevirtualmachinelevel.Thissettingallowsyouto
decideifandwhenthevirtualmachinecanbemigrated.KeepvirtualmachinesinDRSautomaticmode
asmuchaspossible,becausevirtualmachinesinautomaticmodeareconsideredforclusterloadbalance
migrationsacrossESXhostsbeforevirtualmachinesthatcannotbemigratedwithoutuseractions.

Monitoring DRS Cluster Health


InVMwareInfrastructure3,youcanmonitoraDRSclustershealthbytrackingresourceutilizationofthe
cluster,loaddistributionacrosshosts,andwhetherthevirtualmachinesarereceivingtheresourcetowhich
theyareentitled.Thisinformationisdisplayedintwographsthatarepartoftheclustersummarydisplay.

Copyright 2008 VMware, Inc. All rights reserved.

17

DRS Performance and Best Practices

Host Utilization Percentage Graph


Thehostutilizationgraph,thetopgraphintheVMwareDRSResourceDistributionsection,isahistogramthat
showsthenumberofhostsontheYaxisandtheutilizationpercentageontheXaxis.Iftheclusteris
unbalanced,youseemultiplebars,correspondingtodifferentutilizationlevels.ThebluebarsrepresentCPU
usageandtheorangerepresentmemoryusage.Forexample,inthescreencapturebelow,twohostsareat010
CPUutilization,thethirdat8090percentCPUutilization,eachrepresentedbyabluebar.Inthiscluster,DRS
isinmanualmode(moderateaggressiveness)andismakingarecommendation.Aftertherecommendationis
appliedthebluebarsbecomeclosertoeachotherontheXaxis.Theclosertheblueandorangebarsaretoeach
other,themorebalancedtheclusteris.

Percent of Entitled Resources Delivered Graph


Chart(BottomDRSResourceDistributionChart)
Theentitledresourcesdeliveredgraph,thebottomgraphintheVMwareDRSResourceDistributionsection,
isahistogramthatshowsthenumberofhostsontheYaxisandthepercentageofentitledresourcesdelivered
foreachhostontheXaxis.Thetopgraphreportsrawresourceutilizationvalues.Thebottomgraph
incorporatesadditionalinformationaboutresourcesettingsforvirtualmachinesandresourcepools.
DRScomputesaresourceentitlementforeachvirtualmachine,basedonvirtualmachineandresourcepool
configuredshares,reservations,andlimitssettings,aswellasthecurrentdemandsofthevirtualmachinesand
resourcepools.Whatthevirtualmachinedemandsisnotnecessarilywhatitdeservesorisentitledtobecause
ofthevirtualmachineandresourcepoolconfigurations.
Aftercomputingtheentitlements,DRScomputesaresourceentitlementforeachhostbyaddingtheresource
entitlementsforallvirtualmachinesrunningonthathost.Thepercentageofentitledresourcesdeliveredis
equaltothehostscapacitydividedbytheentitlementsofthevirtualmachines.Thismeansthatinaclusterin
whichalltheentitledresourcesaredelivered,thegraphshouldhaveasinglebarforeachresourceinthe
90100percenthistogramrange.ThescreenshotintheprevioussectionshowsthateventhoughtheDRS
clusterwasnotbalanced,eachofthevirtualmachinesgottheresourcestowhichitwasentitled.
However,theremaybecasesinwhichhostsdonotdeliverallentitledresources.Forexample,inthescreen
capturebelow,oneofthehostscoulddeliveronly8090percentofentitledresourcesbecauseCPUresources
wereovercommittedonthehost.
Insummary,ifthebarsinthisgraphappearintherightmostcategory,allvirtualmachinesaregettingthe
resourcestowhichtheyareentitled.

Copyright 2008 VMware, Inc. All rights reserved.

18

DRS Performance and Best Practices

Conclusions
TheeffectivenessandscalabilitytestsweperformedwithDRSdemonstratethatresourcesaredistributed
acrosshostsinaDRSclusteraccordingtotheresourceallocationsandthatbetterthroughputsareachieved
especiallyinthepresenceofvaryingloadsandrandomvirtualmachineplacementonhosts.TheDRS
frameworkalsoprovidestheabilitytoadjusttheaggressivenessoftheDRSalgorithmandintervalbetween
invocationsofthealgorithm.Inaddition,DRSavoidswastefulmigrationswithcostbenefitanalysisand
minimizesmigrationofidlevirtualmachines.
Basedonourexperienceandcustomerexperienceinthefield,weprovidedalistofbestpracticesthatyoucan
followtoavoidpitfalls.
Finally,DRSalsoprovidesafewmechanismsyoucanusetomonitortheDRSclusterperformanceand
potentiallyprovidefeedbackonhowtheresourcesarebeingdeliveredandwhethertheDRSsettingsneedto
beadjusted.

Resources

ESXServer3ConfigurationGuide
http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_3_server_config.pdf

ResourceManagementwithVMwareDRS
http://www.vmware.com/resources/techresources/401

ResourceManagementGuide
http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_resource_mgmt.pdf

VMwareVMotionandCPUCompatibility
http://www.vmware.com/resources/techresources/1022

If you have comments about this documentation, submit your feedback to: [email protected]
VMware, Inc. 3401 Hillview Ave., Palo Alto, CA 94304 www.vmware.com
Copyright 2008 VMware, Inc. All rights reserved. Protected by one or more of U.S. Patent Nos. 6,397,242, 6,496,847, 6,704,925, 6,711,672, 6,725,289, 6,735,601, 6,785,886,
6,789,156, 6,795,966, 6,880,022, 6,944,699, 6,961,806, 6,961,941, 7,069,413, 7,082,598, 7,089,377, 7,111,086, 7,111,145, 7,117,481, 7,149, 843, 7,155,558, 7,222,221, 7,260,815,
7,260,820, 7,269,683, 7,275,136, 7,277,998, 7,277,999, 7,278,030, 7,281,102, 7,290,253, and 7,356,679; patents pending. VMware, the VMware boxes logo and design,
Virtual SMP and VMotion are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned
herein may be trademarks of their respective companies.
Revision 20090109 Item: PS-060-PRD-01-01

19

You might also like