VMware - DRS Best Practice
VMware - DRS Best Practice
VMware - DRS Best Practice
VMwareInfrastructure3providesasetofdistributedinfrastructureservicesthatmaketheentireIT
environmentmoreserviceable,available,andefficient.WorkingwithVMwareESX3,VMwareVirtualCenter
2,andVMwareVMotion,VMwareDistributedResourceScheduler(DRS)dynamicallyallocatesresourcesto
enforceresourcemanagementpolicieswhilebalancingresourceusageacrossmultipleESXhosts.
ThispaperisintendedforreaderswhounderstandthearchitectureandbasicconceptsofDRS.Seethepaper
ResourceManagementwithVMwareDRSforthesedetails.(SeeResourcesonpage 19foralink.)This
paperidentifiesvariousscenariosinwhichyoucanbenefitfromDRSandexplainshowtoconfigureyour
environmenttotakebestadvantageofDRS.
Thepapercoversthefollowingtopics:
OverviewofDRSPerformanceonpage 1
ResourceAllocationforVirtualMachinesandResourcePoolsonpage 3
ScalingtheNumberofHostsandVirtualMachinesonpage 9
DRSAggressivenessonpage 13
IntervalBetweenDRSInvocationsonpage 14
CostBenefitAnalysisonpage 15
HeterogeneousHostEnvironmentsonpage 15
ImpactofIdleVirtualMachinesonpage 16
DRSPerformanceBestPracticesonpage 16
MonitoringDRSClusterHealthonpage 17
Conclusionsonpage 19
Resourcesonpage 19
Resourceallocationforvirtualmachinesandresourcepools
Numberofhostsandvirtualmachines
Degreeofaggressiveness
DRSperiodicfrequency
Heterogeneoushostenvironments
CostbenefitanalysisofVMotion
Impactofidlevirtualmachines
CanDRSautomaticallyreducetheloadonanovercommittedhostwhenadditionalresourcesareadded
oravailableintheclusterandconsequentlyalsoimprovethethroughputfortheworkloads?
HowdoesDRSrespondtoahighlyimbalancedcluster?HowdoesDRSaggressivenessaffectthe
placementofvirtualmachines?
CanDRSeffectivelyenforceresourceallocationpoliciesacrosshostsinaDRSclusterbysettingdifferent
sharevaluesforindividualvirtualmachines?
Test Methodology
Thissectiondescribesthetypesofbenchmarkworkloadsused,resourcesdemanded,andmetricsusedtotest
theimpactofresourceallocationonDRS.
Wesetupahomogenousclusterofhosts,eachconfiguredwithtwo2.8GHzIntelXeonCPUswith
hyperthreadingenabled.Eachhosthad4GBofmemory.WeusedadedicatedGigabitEthernetnetworkfor
VMotionandconfiguredDRStorunatthedefaultmigrationthreshold.
Forthisfirstsetoftests,weusedbenchmarkstoreflectrealisticworkloads.Thestableperiodfortheseloads
wasanhourduringwhichwetookmeasurements.Table1liststheseworkloads.
Table 1. Workloads for Resource Allocation Tests
Virtual Machine Workload
Benchmark
Resource Demand
Metric
Databaseserver
Swingbench
CPU:850MHz
Memory:700MB
Transactions/sec
Webserver
SPECweb2005
NetworkI/O:30Mb/s
CPU:1000MHz
Accesses/sec
Fileserver
Dbench
DiskI/O:120Mb/s
CPU:550MHz
MB/sec
Javaserver
SPECjbb2005
Memory:1.7GB
CPU:550MHz
Neworders/sec
Idlevirtualmachine
None
None
None
Test Results
ToverifythatDRScanautomaticallyreducetheloadsonanovercommittedhost,weranthefiveworkloads
showninTable1,eachinitsownvirtualmachine,ononehost.Allvirtualmachineshadequalsharesandno
reservationsorlimits.Figure1showsthiscase.Eachcirclerepresentsonevirtualmachine,withdifferent
lettersrepresentingthedifferentworkloads,andtheboxesrepresentingseparatephysicalhosts.Thesizeof
eachcirclerepresentshowmanysharesaregiventothatvirtualmachine,nottheCPUormemoryresource
demandsofthevirtualmachine.DetailsontheresourcedemandsarelistedinTable1.
ThevirtualmachineworkloadsinFigure1andsubsequentdiagramsareidentifiedusingthefollowingkey:
Ddatabaseserver
Ffileserver
Iidle
JJavaapplicationserver
WWebserver
W
J
WhenweenabledDRS,itmovedsomeoftheloadontheovercommittedhosttotheidlehostasshownin
Figure2.Usingthedefaultmigrationthreshold(moderate),DRSimprovedoverallsystemthroughput
(geometricmeanoftheindividualthroughputimprovements)by27percent.Theseresultsconfirmthatsimple
balancingprovidedbyDRScanautomaticallybringsignificantimprovementsinthroughputwithouttheneed
toplacevirtualmachinesmanually.
CPUintensiveloadsimprovedsignificantly,becausesystemCPUresourceswereovercommittedbeforewe
enabledDRS.Figure3showsthenormalizedthroughputimprovementforindividualvirtualmachinesused
inthistest.Thedatabaseserverthroughputimprovedby25percentbecausemoreCPUresourceswere
availableafterweenabledDRS.WebserverthroughputimprovedsignificantlyafterweenabledDRS.Because
theWebserverisanetworkintensiveload,itbenefitedfromDRSbecausethenetworkI/Ocausesincreased
CPUload.Ontheotherhand,theJavaapplicationserverthroughputremainedthesame,becausetheJava
serverworkloadismemoryintensive,andthesystemdidnotexperienceanymemorycontentionduringthe
test.Wedidnotuseafileserverworkloadforthistest,becausefileserverdemandsarelativelylowlevelof
CPUresources.Instead,weusedasecondinstanceoftheWebserverworkloadforthistest,whichrequires
moreCPUresourcesthanthefileserverworkloaddoes.
1.0
0.8
0.6
0.4
0.2
0.0
DB server
Java server
DRS disabled
Web server 1
Web server 2
DRS enabled
J
I
W
J
F
D D
J
I
W
J
SI
Scenario1showsthebaselineorbalancedthroughputconfiguration.Wechosethisasthebalanced
arrangementbecauseputtingoneofeachvirtualmachineperhostbalancestheCPU,memory,andI/O
demandsevenlywhileignoringlocalityorcachingeffect.WeusedthisasourbaselineandweexpectedDRS
toachieveathroughputclosetothis.
Scenario2depictstheworstpossibleconfiguration.WestartwiththisconfigurationtoseehowDRSbalances
theloadofthesevirtualmachines.
Weperformedthetestusingdefault(moderate)andaggressivemigrationthresholdsettingstotestwhether
theresourceallocationthatDRSgeneratescanmatchthebaselineconfiguration.Theworstpossible
configurationcouldnotbesustainedononehostwithoutfailuresinthebenchmarkmetricsbecauseof
resourceconstraints.SoweinsertedadelaybetweenthestartofonevirtualmachineandthenexttoallowDRS
tobalanceresourcesasweaddedeachnewloadtothefirsthost.
OurresultsshowedthatthevirtualmachineplacementaswellassystemthroughputwithDRSwasveryclose
tothatofthebaselineconfiguration.Theoverallsystemthroughputwas95percentofthatofthebaseline
configurationfortheDRSconfigurationwithamoderatethresholdand99percentofthatofthebaseline
configurationwiththeaggressivethresholdsetting.Thisincludestheimpactonthroughputcausedbythe
virtualmachinemigrationsneededtogettothefinalstateofthecluster.Becausetheloadswereconstant,the
finalstatewasachievedquicklyaftertheloadsreachedstablestateandnofurthermigrationsweremade.
Forthetestwiththedefaultthresholdsetting(moderate),DRSperformedfourmigrationsthatis,itmoved
oneWebserver,oneJavaserver,andthetwodatabaseservers.
Thetestusingtheaggressivethresholdyieldedanalmostbaselineconfiguration.Thistestperformedmore
migrationsatotalofeighttogettothisstate.ThenumberofmigrationsreflectsthefactthatDRSload
balancingoccurswhiletheloadisrampingup,andhenceinthemoreaggressivecase,somemigrationswere
revertedafterallloadsreachedsteadystate.Theidlevirtualmachinewasnotmigratedbecausethecostof
migratingthevirtualmachinecouldnotbejustified,giventhattherewasnoload.Becausetheloadsare
relativelystableovertime,weobtainedbetterthroughputwithanaggressivethreshold.However,thedefault
settinginDRSismoderateinordertoaccommodatechangingworkloads.
Figure5comparesthesystemthroughputforindividualworkloadsinthebaselinecase,usingDRSwitha
moderatethreshold,andusingDRSwithanaggressivethreshold.Wenormalizedallthroughputresultstothe
baselinecaseforeasiercomparisonandmeasuredtheresultsoverastableperiodofanhour.
Inthemoderatecase,allindividualthroughputsarewithin80percentofbaselineandoverallwithin95
percent(geometricmeanofalltherelativethroughputs)ofbaseline.ThisistruedespitethefactsthatDRShas
anopportunitytorebalanceonlyonceeveryfiveminutes(default)andthatitisaffectedbyVMotionoverhead.
WhenweusedDRSwithamoderatethreshold,thedatabaseserverswerethemostaffectedbecauseDRS
placedbothdatabasevirtualmachinesonthesamehost.Migratingoneofthedatabaseloadstoadifferenthost
wasnotjustifiedunderthemoderatethresholdbecausetheimbalanceacrosshostswasnotgreatenoughto
justifyit.Someworkloadsachievedbetterthroughputthanthebaselinebecauseadditionalresourceswere
availableonthehostonwhichtheworkloadwasrunning.
Inthemoreaggressivecase,theresultsaredifferent.Becauseeachworkloadwasfairlyconstant,overtime
DRSaggressiverebalancingproducedbetterthroughput.
Iftheloadsvary,DRSwithamoderatethresholdmayachievebetterthroughputinthelongrunbecauseit
wouldrecommendfewermigrationsandhenceincuralowerVMotioncostcomparedtoDRSwithan
aggressivethreshold.
Whenweincreasedthenumberofhosts,asdescribedinScalingtheNumberofHostsandVirtualMachines
onpage 9,DRSwithamoderatethresholdachievedthroughputcomparabletothatofDRSwithanaggressive
thresholdanddidsowithfewermigrations.
Figure 5. Relative Change in Throughput of Workloads
1.2
1.0
0.8
0.6
0.4
0.2
0.0
File server 1 File server 2 DB server 1
Baseline
DRS aggressive
W
W
WeassignedthethreeWebservervirtualmachineshigh,normal,andlowshares.WeplacedthetwoWeb
serverswithhighandnormalsharesonthesamehost,hencetheyshareresourcefromonehost.Thevirtual
machinewiththelowsharesettinghasnootherWebservertocontendwith,henceitgetsrelativelymoreCPU
resourcesbecauseitencounterslesscontention.
Afterweenabledit,DRSbalancedtheworkloadstogivethefinalconfigurationasshowninFigure7.DRS
detectedthattheWebserverwithhighshareswasnotreceivingtheresourcesitwasentitledtoreceive.Infact,
asshowninFigure8,itwasreceivingalowerallocationthanthevirtualmachineconfiguredforalowshare
wasreceiving.DRSswappedtheWebservervirtualmachineswithlowandhighshares.ThisallowedtheWeb
serverconfiguredforahighsharetotakeadvantageoftheresourcesavailableontheotherhost.Theswapalso
freedresourcesfortheothervirtualmachineswithhighshares.Thus,DRSbalancedtheWebserverloads
acrossthetwohostsandsatisfiedtheoverallsystemsharesforallninevirtualmachines.
Figure 7. Balanced Distribution of Virtual Machines After DRS Swaps Two Web Servers
W
D
D
W
Figure8showsthechangeinWebservervirtualmachinethroughputbeforeandafterweenabledDRS.Before
weenabledDRS,theWebservervirtualmachinewithlowsharesachievedtwicethethroughputoftheWeb
servervirtualmachinewithnormalshares.WithoutDRS,thescopeofresourcesharingislimitedtoasingle
host.BecausetheWebservervirtualmachineswithhighandnormalsharesranonthesamehost,their
respectivethroughputnumbersreflectedtheirrelativesharesasmanagedbythelocalESXresourcescheduler,
whichcannotscheduleresourcesacrosshosts.However,afterweenabledDRS,thevirtualmachine
throughputnumbersreflectedtherelativevirtualmachinesharesforallvirtualmachines.DRSenforces
resourcesharingacrosshostsbycontrollingthehostlevelresourcesettings.
TheidleandJavaapplicationworkloadsdidnotchangebecausetheydonothaveanysignificantCPUusage
andthereisnomemoryovercommitmentorimbalance.BecauseCPUistheresourcefacingcontentioninthis
scenario,theWebserveranddatabaseworkloadsweregoodcandidatesformigration.DRSmigratedtheWeb
serverwithhighsharestothehostwherevirtualmachineswereentitledtofewerresourcesbecauseDRS
estimatedthatthemigrationwouldbringbetterbalanceatminimalmigrationcost.Later,DRSbalancedCPU
usagebymovingthelowsharesWebservervirtualmachinetothesecondhosttofurtherbalancethe
entitlements.
Sharestakeeffectonlywhenthereiscontentionforresources.Hence,ifCPUresourcesonahostarenot
fullycommitted,eachvirtualmachineorresourcepoolonthathostgetsalltheCPUresourcesitdemands.
Similarly,ifresourcesonaDRSclusterarenotfullycommitted,virtualmachinesorresourcepoolsreceive
theresourcestheydemand,solongasthefollowingconditionsaremet:
TheDRSclusterhasnounappliedmanualrecommendations.
Novirtualmachineorresourcepoollimitsaresettovaluesbelowwhatthevirtualmachineworkload
demands.
Novirtualmachineorresourcepoolreservationsaresetinsuchawaythattheypreventany
balancingmigrationfromanovercommittedhost.
Noaffinityandantiaffinityrulesaresetinsuchawaythattheypreventanybalancingmigrationfrom
anovercommittedhost.
NohostVMotionincompatibilitiessuchasCPUincompatibilities,lackofasharedvirtualmachine
orVMotionnetwork,orlackofasharedvirtualmachinedatastorepreventanybalancingmigration
fromanovercommittedhost.
Youmightwasteidleresourcesifyouspecifyaresourcelimitforoneormorevirtualmachines.The
systemdoesnotallowvirtualmachinestousemoreresourcesthanthelimit,evenwhenthesystemis
underutilizedandidleresourcesareavailable.Specifyalimitonlyifyouhavespecificreasonsfordoing
so.
Forresourcepools,resourceallocationsaredistributedamongtheirsiblingvirtualmachinesorresource
poolsbasedontheirrelativeshares,reservations,andlimits.
Theexpandablereservationsettingforresourcepoolsallowsthereservationtogoaboveitsinitialvalue
ifthereservationofthechildvirtualmachinesorresourcepoolsincrease.Settingafixedreservation
preventsanyincreaseinthechildvirtualmachineorresourcepoolreservations.
Virtualmachineshavesomeoverheadmemorythatthesystemmustaccountforinadditiontothe
memoryusedbytheguestoperatingsystem.Thismemoryischargedasadditionalreservationontopof
theuserconfiguredvirtualmachinereservation.Hence,whenyousetthereservationsandlimitsfor
resourcepools,youshouldkeepabufferfortheextraoverheadreservationneededbythevirtual
machines.SeetheResourceManagementGuideforestimatedmemoryoverhead.(SeeResourceson
page 19foralink.)Theamountdependsonvirtualmachinememorysize,thenumberofvirtualCPUs,
andwhetherthevirtualmachineisrunninga32bitor64bitguestoperatingsystem.Forinstance,a
virtualmachinewithonevirtualCPUand1024MBofmemoryhasavirtualmachineoverheadof
approximately98MBfora32bitguestoperatingsystemorapproximately118MBfor64bitguest
operatingsystem.Also,theoverheadreservationmaygrowwithtimedependingontheworkload
runninginthevirtualmachine.Thisoverheadgrowthisdesignedtoensureoptimalperformancewithin
theguestoperatingsystem.
Resourcepoolsmakemanagingresourceallocationsmucheasierbygroupingvirtualmachines,resource
pools,orbothwithineachresourcepool.Thisallowsadministratorstoapplysettingsacrossmany
resourceentitieseasilyandtobuildahierarchicalstructurethatmakesiteasytoallocatetheirresources.
Thispaperdoesnotaddresstheuseofresourcepoolhierarchiesbecausetheydonothaveanyimpacton
virtualmachineperformance.
Test Methodology
TheobjectiveofthetestwastomeasuretheeffectivenessoftheDRSalgorithmsaswescaledtolarger
environments,whileensuringthattheoverallclusterutilizationremainedconstantthatis,thattheratioof
overallvirtualmachinedemandtototalclusterresourcesremainedconstant.
Wesetupahomogenousclusterofhostseachconfiguredwithtwo2.8GHzIntelXeonCPUswith
hyperthreadingenabled.Eachhosthad4GBofmemory.WededicatedaGigabitEthernetnetworkfor
VMotionandconfiguredDRStorunatthedefaultmigrationthreshold(moderate).Eachhosthadthree
workloadvirtualmachines,eachwithdifferentworkloads.
Table 2. Test Environment for Scaling Tests
Number of
Hosts
Number of Virtual
Machines
12
24
16
48
16
WeconfiguredeachvirtualmachinewithasinglevirtualCPUand1GBofRAMandinstalledRedHat
EnterpriseLinux3astheguestoperatingsystem.Forsimplicity,wegaveeachvirtualmachinethesameshares
andsetnoreservationsorlimits.Weranoneofthefollowingcategoriesofworkloadsinsideeachvirtual
machine:
ConstantworkloadusedafixedamountofCPUresourcesof10percent,50percent,or90percentof
CPU(ofonevirtualCPU).
VaryingworkloadusedavaryingamountofCPUresources,changingwithtime.Foreasyanalysis,the
workloadvariedamongthreesteps,using10percent,50percent,or90percentofCPU,witheachstep
lastingfiveminutes.Weusedallpossiblecombinationsofthethreesteploadtogeneratesixdifferent
workloadtypes,asshowninFigure9.OurresultswiththesevaryingworkloadsdemonstratethatDRS
performseffectivelyevenifloadsconstantlychangeovertimeandnostablestateisreached.Asourtests
showed,thisisoneofthemainadvantagesofusingDRS.
4
5
6
Wedistributedvariouscombinationsofthevaryingworkloadvirtualmachinesacrosshoststoachieve
differentpatternsthatreflectthreehostusagescenarios.
10
Host load
1
+
Host capacity
Total virtual
machine load
Underutilized
+
5
Intheoptimalplacementscenario,showninfigure10,weplacedcompensatingloadstogetheroneachhost.
Withthisplacement,allpeaksmatchedthetroughsofothervirtualmachinesonthesamehost.Asaresult,
noneofthehostsareovercommittedandtheloadisbalancedacrossallhostsinthecluster.Thethroughput
achievedinthisscenarioisusedasthebaselineforoptimalthroughputintheteststhatfollow.
Figure 11. Workload on a Host with Bad Placement
Virtual machine loads
Host load
Total virtual
machine load
Overcommitted
1
+
1
Host capacity
=
+
Underutilized
1
Inthebadplacementscenario,showninFigure11,weplacedsimilarworkloadstogetheroneachhost
whereverpossible.Asaresult,peaksandvalleysoccurtogetheronahost,resultinginahighlyimbalanced
cluster.Consequently,thehostsareeitherovercommittedorhighlyundercommitted.Thisresultsinnotonly
majorloadimbalanceacrosshostsbutalsopoorperformanceinthevirtualmachinesrunningon
overcommittedhosts.AsshowninTable2,weusedincreasingnumbersofvirtualmachineswithsimilar
workloadsasweincreasedthesizeofthecluster.Asaresult,wesawgreaterimbalanceforabadplacement
astheclusterscaledup.
Figure 12. Workload on a Host with Symmetric Placement
Host load
Overcommitted
Host capacity
+
=
2
+
Underutilized
3
Inthesymmetricplacementscenario,theclusterismoderatelyimbalanced,withamixofhoststhatare
somewhatovercommittedandhoststhataresomewhatundercommitted.Wecallthissymmetricplacement
becauseweusesthesameloaddistributionforeachpairofhostsaswescaledup.Asaresult,theload
imbalanceremainedthesameforallstagesofourtests.
Withthesmallerconfigurations,therearefewerworkloadsofthesametypeandhencetheloaddistribution
cannotbeasflexibleasitiswiththelargersetups.
11
Test Results
OurresultsshowthatDRSimproveshostloadbalanceacrossacluster.Theloadbalanceimprovement
improvesasthenumberofhostsandvirtualmachinesintheclusterincreases.Furthermore,throughputalso
improvesasthenumberofhostsandvirtualmachinesincreases.DRSperformsslightlybetterwithmorehosts
becausetherearemoreopportunitiesforbalancingloads.Asthesizeoftheclusterincreased,thenumberof
migrationsincreasedproportionally,butDRSensuredthatthemigrationswouldbenefitthroughput,hence
theaveragevirtualmachinethroughputimproved.DRSdidnotaffectthroughputwhenthevirtualmachines
wereplacedintheoptimalconfiguration,becausetheclusterwasalreadybalanced.Thekeyresultspresented
inthissectionhighlightthesefindings.
Figure13showstheresultswhenvirtualmachinesweredistributedinabadplacementthatis,ahighly
imbalancedclusteratthebeginningoftheexperimentwitheachofthevirtualmachineworkloadsvarying
everyfiveminutes.BecauseDRScomputedloadsandoptionsforbalancingtheloadseveryfiveminutes,this
wasaworstcaseworkload.SoonafterDRSfinishedbalancingtheload,thevirtualmachineloadschanged.
AsshowninFigure13,DRSimprovesvirtualmachinethroughputevenwhenitmustbalanceworkloadsthat
varygreatly.Atallenvironmentsizes,theaveragevirtualmachinethroughputstayedwithin96percentofthe
optimalthroughput.DRSachievedtheseresultsdespitethefactthatitwasusingVMotion,withitsattendant
overhead,tobalancetheload.Furthermore,thethroughputimprovementcomparedtotheinitialcondition,
withoutDRS,increasedaswescaledtolargerenvironments.Thisresultmainlyreflectsthefactthataswe
scaledup,thenumberofsimilarworkloadsincreased.Thehighernumberofsimilarworkloads,inturn,
causedgreaterimbalancewithbadplacementinlargerenvironments.
Figure13showstheresultswhenwesetDRStothedefaultaggressiveness(moderate).WhenwesetDRStoa
moreaggressivethreshold,weobservedmanymoremigrations,butthroughputimprovementsweresimilar
tothoseachievedwiththemoderatethreshold.Inaseparatetest,inwhichthevirtualmachineswererunning
constantloads,weobservedevenbetterthroughputimprovements.
Figure 13. Throughput with Varying Load and Bad Placement
1.02
1.00
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
2 hosts / 6 VMs
4 hosts / 12 VMs
8 hosts / 24 VMs
No DRS
Optimal placement
DRS moderate
16 hosts / 48 VMs
Figure14showstheresultswhenwedistributedvirtualmachinesinasymmetricplacementthatis,a
somewhatimbalancedclusteratthebeginningofthetestwitheachofthevirtualmachineworkloadsvarying
everyfiveminutes.AsFigure14shows,theinitialcondition,withoutDRS,achievesthesamevirtualmachine
throughputatallenvironmentsizesbecausethevirtualmachinedistributionissymmetric.Ineachsize
environment,whenweenabledDRSatamoderatethreshold,itbroughttheaveragethroughputtowithin98
percentofoptimal.Increasingtheenvironmentsizebroughtminimalimprovementinthroughput,butthe
mainreasonoptimalthroughputwasnotachievedwastheoverheadofVMotionasDRSbalancedtheload.
Overall,DRAachievedsomeperformancegainsbytakingadvantageofsmalleropportunitiesforavoiding
overcommitment.
Copyright 2008 VMware, Inc. All rights reserved.
12
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
2 hosts / 6 VMs
4 hosts / 12 VMs
No DRS
DRS moderate
8 hosts / 24 VMs
16 hosts / 48 VMs
Optimal placement
DRS Aggressiveness
DegreeofaggressivenessforDRScontrolshowaggressivelyDRSactstobalanceresourcesacrosshostsina
cluster.Thedegreeofaggressivenessissetfortheentirecluster.Thefiveoptionsrangefromconservative(level
one)tomoderate(levelthree)toaggressive(levelfive)andcontrolthedegreetowhichDRStriestobalance
resourcesacrosshostswithinthecluster.
DRSmakesVMotionrecommendationsbasedonwhatisreferredtoastheirgoodnessvaluethatis,how
muchimprovementinloadbalancingtheactioncanachieve.Thegoodnessvalueistranslatedtoarating
betweenonestarandfivestars,andDRSexecutesmigrationsbasedontheaggressivenessthresholdsetfora
cluster.InVMwareInfrastructure3,amigrationreceivesagoodnessvalueoffivestarsifitisrecommendedto
resolveaffinity,antiaffinity,orreservationrules.Foreachlowerstarratingthebalancingimpactofthe
migrationisless.Soamigrationwitharatingoffourstarswouldimprovetheloadbalancemorethana
migrationwitharatingofthreestars,twostars,oronestar.
Figure15showshowthroughputcomparesforthemoderateandaggressivelevelsforthemixofconstantload
virtualmachinesandbadplacementdescribedinScalingtheNumberofHostsandVirtualMachineson
page 9.Theresultsshowthattheaggressivethresholdachievedslightlybetterthroughputcomparedtothe
moderatesetting.However,aswescaledtolargerconfigurations,thethroughputwasalmostthesameforboth
settings.Whenweusedtheaggressivesetting,wesawmoremigrationsthanwedidwhenweusedthe
moderatesetting.Whenweusedvaryingworkloads,notonlyweretheremoremigrationsintheaggressive
casebutthethroughputdifferencesforthemoderateandaggressivesettingswereevensmaller.
13
1.00
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
2 hosts / 6 VMs
No DRS
4 hosts / 12 VMs
DRS moderate
8 hosts / 24 VMs
DRS aggressive
16 hosts / 48 VMs
Optimal placement
14
Althoughwerecommendkeepingthedefaultvalueoffiveminutes,youcanchangethefrequencybyadding
thefollowingoptionsinthevpxd.cfgfile,changing300tothedesirednumberofseconds.
<config>
.
<drm>
<pollPeriodSec>
300 <!--# of seconds desired between 60 and 3600 -- >
</pollPeriodSec>
</drm>
</config>
Cost-Benefit Analysis
Costbenefitcalculationsactasapivotindeterminingwhetherasinglemigrationacrosshostsisbeneficialor
not.Usingthehistoryofvirtualmachineworkloads,DRSdeterminestheexpectedworkloadofthevirtual
machinesoneachhostduringintervalbetweenitscalculationandthenexttimeDRSwillbeinvoked.Before
migratinganyvirtualmachine,itcalculatesthegainachievedbymigratingthevirtualmachineminusthe
expectedmigrationcost(VMotionrequiresresources)andaccountsforthepotentialworstcaseworkloadof
thevirtualmachineafteritmigratestothedestinationhost.Themigrationisallowedonlyifthetotalgainis
positive.
Virtualmachineswithhighlyvaryingworkloadscanbenefitfromusingcostbenefitanalysisbynotmigrating
often,onlywhenDRScalculatesthatthetargetvirtualmachineandotheraffectedvirtualmachineswill
benefitfromthemovebeforeDRSwouldbeinvokedagain.Wetestedseveralscenariosinwhichcostbenefit
analysisiseitherturnedonorturnedoffwhiletheclusterrunsvaryingworkloads.Theresultsdemonstrate
thatwithcostbenefitanalysisenabled,DRSmakesbetterdecisions.Weobservedveryfewunnecessary
migrations(unnecessarymigrationsincludepingpongmigrationsbackandforthbetweentwohostsasload
changes)whencostbenefitanalysiswasenabled.Redundantmigrationsmightoccurinearlyrounds,aswe
observedinourtests,becauseDRSneedstoadapttounderstandthemigrationcosts,whichvaryforparticular
virtualmachinesandnetworkenvironments.Werecommendthatyouleavethecostbenefitalgorithm
enabledthedefaultsetting.
VMotionhardwarecompatibilityVirtualmachinescannotuseVMotiontomigrateacrossdifferentCPU
types(fromInteltoAMDorviceversa).Also,theremaybeCPUincompatibilitiesacrossdifferent
generationsfromthesamemanufacturerthatcanalsopreventVMotionforexample,CPUsthatare
SSEenabledtothosethatarenotSSEenabled.However,inVMwareInfrastructure3beginningwith
version3.5Update2,EnhancedVMotionCompatibilitygivesyoutheabilitytoensureVMotion
compatibilityforthehostsintheclusterbypresentingthesameCPUfeaturesetacrosshosts,evenwhen
theactualCPUsonthehostsdiffer.SeeVMwareVMotionandCPUCompatibilityfordetails.See
Resourcesonpage 19foralink.
HardwaredifferencesacrosscompatiblehostsEvenifhostsareVMotioncompatible,theymaybe
runningatdifferentclockspeeds,usedifferenthyperthreadingsettings,orhavedifferentmemory
subsystemarchitecturesforexample,differentcachesizesorNUMAenabledononeanddisabledon
theother.Asaresult,theeffectiveresourceentitlementsofvirtualmachinesmightbedifferenton
differenthosts,andthesedifferencesmightleadtodifferencesinperformanceacrosshosts.
15
Highlyskewedhostsizes
Virtualmachinesmightnotfitonallhosts.Forexample,virtualmachineswithfourvirtualCPUs
cannotmigratetohostswithonlytwophysicalCPUs.
DRStendstofavormigratingvirtualmachinestoahostwithmorememoryorCPUresourcesto
balancememoryorCPUloads.Essentially,thealgorithmtriestobalancetherelativeutilizationsfor
eachhost.Soifaclusterhasvirtualmachineswithidenticalloads,DRStypicallyplacesmorevirtual
machinesonahostthathasmoreCPUormemoryresources.Ifthevirtualmachineloadsare
primarilyCPUintensive,thehostmemorysizesdonotreallymatter.Conversely,iftheloadsare
memoryintensive,CPUcapacitiesdonotmatter.
HostconfigurationTheconfigurationofthehostnetworkordatastoremaydisallowmigrationusing
VMotionbecausenotallhostssharethesamevirtualmachineorVMotionnetworkordatastore.
DRSdoesareasonablejobinbalancingloadacrossheterogeneoushostssolongastheclusterismadeupofa
subsetofhoststhatareVMotioncompatible.Thefactorsdiscussedinthissectionmaylimitbenefits.We
recommendahomogenousclusterwhereverpossible.
WhendecidingwhichhoststogroupintoaDRScluster,trytochoosehoststhatareashomogeneousas
possibleinCPUandmemory.Thisensureshigherperformancepredictabilityandstability.VMotionis
notsupportedacrosshostswithincompatibleCPUs.Hencewithheterogeneoussystemsthathave
incompatibleCPUs,DRSislimitedinthenumberofopportunitiesforimprovingtheloadbalanceacross
thecluster.ToensureCPUcompatibility,systemsshouldbeconfiguredwithCPUsfromthesamevendor,
withsimilarCPUfamilyandSSE3status.However,inVMwareInfrastructure3beginningwithversion
3.5Update2,EnhancedVMotionCompatibilitygivesyoutheabilitytoensureVMotioncompatibilityfor
thehostsintheclusterbypresentingthesameCPUfeaturesetacrosshosts,evenwhentheactualCPUs
onthehostsdiffer.SeeVMwareVMotionandCPUCompatibilityfordetails.SeeResourceson
page 19foralink.
DRSperformsinitialplacementofvirtualmachinesacrossallhostseveniftheyarenotVMotion
compatible.
IfheterogeneoussystemsdohavecompatibleCPUsbuthavedifferentCPUfrequencies,differing
amountsofmemory,orboth,thesystemswithmorememoryandhigherCPUfrequenciesaregenerally
preferred,allotherthingsbeingequal,ashostsforvirtualmachines,becausetheyhavemoreroomto
accommodatepeakload.
Usingmachineswithdifferentcacheormemoryarchitecturesmaycausesomeinconsistencyin
performanceofvirtualmachinesacrossthosehosts.
WhenmoreESXhostsinaDRSclusterareVMotioncompatible,DRShasmorechoicestobetterbalance
workloadsacrossthecluster.Werecommendclustersofupto32hosts.
16
BesidesCPUincompatibilities,somemisconfigurationscanmaketwoormorehostsincompatible.For
instance,ifthehostsVMotionnetworkadaptersarenotconnectedbyaGigabitEthernetlink,the
VMotionmigrationmightnotoccurbetweenthehosts.YoushouldalsobesuretheVMotiongatewayis
configuredcorrectly,VMotionnetworkadaptersonthesourceanddestinationhostshavecompatible
securitypolicies,andthevirtualmachinenetworkisavailableonthedestinationhost.SeetheResource
ManagementGuideandtheESXServer3ConfigurationGuidefordetails.SeeResourcesonpage 19for
links.
Thedefaultmigrationthreshold(moderate)worksformostconfigurations.Youcansetthemigration
thresholdtomoreaggressivelevelswhenallofthefollowingconditionsaresatisfied:
Thehostsintheclusterarerelativelyhomogeneous.
Thevirtualmachinesresourceutilizationremainsfairlyconstant.
Theclusterhasrelativelyfewconstraintsonwhereavirtualmachinecanbeplaced
Youshouldsetthemigrationthresholdtomoreconservativelevelswhentheconverseistrue.
ThedefaultDRSfrequencyisonceeveryfiveminutes,butyoucansetittoanyperiodbetweenoneand
60minutes.Youshouldavoidchangingthedefaultvalue.Ifyouareconsideringachangeinthesetting,
seeIntervalBetweenDRSInvocationsonpage 14.
Ingeneral,donotspecifyaffinityrulesunlessyouhaveaspecificneedtodoso.Insomecases,however,
specifyingaffinityrulescanimproveperformance.
Keepingvirtualmachinestogethercanimproveperformanceifthevirtualmachinesneedto
communicatewitheachother,becausenetworkcommunicationbetweenvirtualmachinesonthe
samehostenjoyslowerlatencies.
Separatingvirtualmachinesmaintainsmaximalavailabilityofthevirtualmachines.
Forexample,iftwovirtualmachinesarebothWebserverfrontendstothesameapplication,you
wanttomakesurethattheydonotbothgodownatthesametime.
Anotherexampleofvirtualmachinesthatmightneedtobeseparatedisvirtualmachineswith
I/Ointensiveworkloads.Iftheyshareasinglehost,theymightsaturatethehostsI/Ocapacity,
leadingtoperformancedegradation.DRSdoesnotmakevirtualmachineplacementdecisionsbased
ontheirusageofI/Oresources.
Assignresourceallocationstovirtualmachinesandresourcepoolscarefully.Bemindfuloftheimpactof
limits,reservationsandvirtualmachinememoryoverhead.SeeResourceAllocationRecommendations
onpage 8fordetails.
VirtualmachineswithsmallermemorysizesorfewervirtualCPUsprovidemoreopportunitiesforDRS
tomigratetheminordertoimprovebalanceacrossthecluster.Virtualmachineswithlargermemorysize
ormorevirtualCPUsaddmoreconstraintsinmigratingthevirtualmachines.Henceyoushould
configureonlyasmanyvirtualCPUsandasmuchmemoryforavirtualmachineasneeded.
YoucanspecifyDRSmodesofautomatic,manual,orpartiallyautomatedattheclusterlevelaswellasthe
virtualmachinelevel.Werecommendthatyoukeeptheclusterinautomaticmode.Forvirtualmachines
sensitivetoVMotion,youcansetmanualmodeatthevirtualmachinelevel.Thissettingallowsyouto
decideifandwhenthevirtualmachinecanbemigrated.KeepvirtualmachinesinDRSautomaticmode
asmuchaspossible,becausevirtualmachinesinautomaticmodeareconsideredforclusterloadbalance
migrationsacrossESXhostsbeforevirtualmachinesthatcannotbemigratedwithoutuseractions.
17
18
Conclusions
TheeffectivenessandscalabilitytestsweperformedwithDRSdemonstratethatresourcesaredistributed
acrosshostsinaDRSclusteraccordingtotheresourceallocationsandthatbetterthroughputsareachieved
especiallyinthepresenceofvaryingloadsandrandomvirtualmachineplacementonhosts.TheDRS
frameworkalsoprovidestheabilitytoadjusttheaggressivenessoftheDRSalgorithmandintervalbetween
invocationsofthealgorithm.Inaddition,DRSavoidswastefulmigrationswithcostbenefitanalysisand
minimizesmigrationofidlevirtualmachines.
Basedonourexperienceandcustomerexperienceinthefield,weprovidedalistofbestpracticesthatyoucan
followtoavoidpitfalls.
Finally,DRSalsoprovidesafewmechanismsyoucanusetomonitortheDRSclusterperformanceand
potentiallyprovidefeedbackonhowtheresourcesarebeingdeliveredandwhethertheDRSsettingsneedto
beadjusted.
Resources
ESXServer3ConfigurationGuide
http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_3_server_config.pdf
ResourceManagementwithVMwareDRS
http://www.vmware.com/resources/techresources/401
ResourceManagementGuide
http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_resource_mgmt.pdf
VMwareVMotionandCPUCompatibility
http://www.vmware.com/resources/techresources/1022
If you have comments about this documentation, submit your feedback to: [email protected]
VMware, Inc. 3401 Hillview Ave., Palo Alto, CA 94304 www.vmware.com
Copyright 2008 VMware, Inc. All rights reserved. Protected by one or more of U.S. Patent Nos. 6,397,242, 6,496,847, 6,704,925, 6,711,672, 6,725,289, 6,735,601, 6,785,886,
6,789,156, 6,795,966, 6,880,022, 6,944,699, 6,961,806, 6,961,941, 7,069,413, 7,082,598, 7,089,377, 7,111,086, 7,111,145, 7,117,481, 7,149, 843, 7,155,558, 7,222,221, 7,260,815,
7,260,820, 7,269,683, 7,275,136, 7,277,998, 7,277,999, 7,278,030, 7,281,102, 7,290,253, and 7,356,679; patents pending. VMware, the VMware boxes logo and design,
Virtual SMP and VMotion are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned
herein may be trademarks of their respective companies.
Revision 20090109 Item: PS-060-PRD-01-01
19