0% found this document useful (0 votes)
439 views319 pages

VMware VSphere Metrics v2.0.1

The vSphere metrics system has several layers of complexity: 1. There are many components involved in virtualization, creating more metrics than in physical systems. 2. Metrics can behave differently depending on whether they are measured at the guest OS, virtualization, or physical layers. 3. Some metrics have the same name but different formulas depending on the object they are measured in, like CPU usage. 4. Metrics naming can be inconsistent, such as memory usage having different meanings for VMs, hosts, and clusters. 5. Subtle differences exist in how metrics are calculated that are important to understand for troubleshooting. The document aims to explain these complexities.

Uploaded by

Thuận Diam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
439 views319 pages

VMware VSphere Metrics v2.0.1

The vSphere metrics system has several layers of complexity: 1. There are many components involved in virtualization, creating more metrics than in physical systems. 2. Metrics can behave differently depending on whether they are measured at the guest OS, virtualization, or physical layers. 3. Some metrics have the same name but different formulas depending on the object they are measured in, like CPU usage. 4. Metrics naming can be inconsistent, such as memory usage having different meanings for VMs, hosts, and clusters. 5. Subtle differences exist in how metrics are calculated that are important to understand for troubleshooting. The document aims to explain these complexities.

Uploaded by

Thuận Diam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 319

The book is dedicated to the loving memory of Mama and Papa…

for your love and sacrifice in raising me in the old town of Suroboyo.
Foreword

Digital transformation is one of the most significant contributors to Business transformation. In this digital era, data
center modernization, application modernization, and adopting cloud is the norm. VMware vSphere is the core of
these transformations for many companies globally.
Iwan has spent 20+ years in the field working with companies of various sizes to make their "IT transformation" a
success. He is the go-to person for both vSphere and Cloud Operations product teams for dwelling deep into
mapping vSphere metrics into day-to-day operations. Iwan is a core member of the technical leadership team. I first
met him back in 2015 VMworld and he has since become a trusted technical advisor to my product leadership team
globally.
The book is deeply technical in content. Reading this book feels like having a conversation with Iwan. He has taken
time to explain the concept, showing the value of each metric, and mapping them together to answer real-world
questions. Many oddities make sense and complexities clear once you understand the underlying architecture.
I am always thankful to have met him and proud of his passion and accomplishments. His passion for helping
companies run VMware vSphere optimally has led him to open-source the book. There is still much to document in
the vast body of knowledge that makes up operations management and I hope the VMware community responds to
his call for collaboration.

Kameswaran Subramanian
Head of Product Management
Aria Cloud Operations
VMware
Reviewer

John Yani Arrasjid is currently a Field Principal at VMware, Inc. Prior to this he was CTO/CIO at Ottometric, a startup
focused on intelligent validation of systems and sensors in the automotive space using AI, Computer Vision, and
Deep Learning to increase accuracy, shorten analysis time, and reduce cost. He has spent a lifetime working as an
innovation architect and technical evangelist in his roles.
John is co-founder of the IT Architect Series. John is an author with multiple publishing houses on multiple technical
topics. He has worked on patents covering workload modeling, blockchain, and accelerator resource management.
John was previously the USENIX Association Board of Directors VP. He is currently active in both CERT (Community
Emergency Response Team) and VMware ERT (Emergency Response Teams), and is also a Disaster Service Worker.
John continues his interest in IT architecture, autonomous systems, AI, IoT/Edge, Big Data, and Quantum Computing.
Online, John can be reached at LinkedIn.com/in/johnarrasjid/ and Twitter @VCDX001.
How To Use This Book

The book is designed to be consumed as offline Microsoft Word document on Windows. It is not designed to be
printed. Its table of content is the side menu of Microsoft Word. Follow the steps shown on following screenshot:

Use the navigation as a dynamic table of content, else it’s easy to be lost even when using 43” monitor. If you simply
read it top down, without having the navigation on the left, you will feel that the chapter ends abruptly. The reason
is each chapter does not end with a summary, which is required in printed books but redundant in online books.
Preface

Metric is essentially an accounting of systems in operation. To understand the counter properly hence requires a
knowledge of how the system works. Without internalizing the mechanics, you will have to rely on memorizing. In
my case, memorizing is only good for exam. So grab a cup of your favourite drink, and do take time to truly
understand the reasons behind the metrics. You will appreciate the threshold better when you know how it was
calculated.
vSphere ships with many metrics and properties. If we take object by object, and document metrics by metrics, it
would be both dry and theoretical. You will be disappointed as it does not explain how your real world problems are
solved. This document begins with you. It focuses on the problems you are trying to solve when running your
operations. It looks at all the use cases and breaks down the metrics from there, which helps you appreciate why the
metrics are layered in such manner.
At 300+ pages, it’s not a light reading. To keep the book size manageable, I have excluded some metrics. To see the
full list, see VMware Operations Transformation, 4th Edition. This 900-page book is also open-source and free.
While version 2.0 delivers many updates, the book is far from completing its mission. You will notice that vSphere
objects such as Cluster, Datastore, and Distributed Switches are not yet documented. This book is a call for
collaboration to the VCDX, VCIX and all VMware professsionals. The book is a living document, with update every 6
months. You can find the latest version at VMware Ops Guide website by Stellios Williams.
The book is not a product book. It does not cover how to use vSphere Client performance tab and esxtop. There are

better documentations on that already 😊

Acknowledgement A technical book like this took a lot of contribution from many experts. I’m indebted
to the advice and help from folks like Kalin Tsvetkov, Valentin Bondzio, Branislav
Abadzhimarinov, Prabira Acharya, Stellios Williams, Brandon Gordon, George
Stephen Manuel, Sandeep Byreddy, Gayane Ohanyan, Hakob Arakelyan, Ming Hua
Zhou, Paul James and many others.
This page is intentionally left blank.
Why? I don’t know. Some people do it, so I just follow as IT behaves more like fashion nowadays…
VMware vSphere Metrics May 2023

Chapter 1

Introduction

Introduction: Metrics Complexity Page 1


VMware vSphere Metrics May 2023

Metrics Complexity

vSphere and vSAN counters are more complex than physical machine counters because there are many components
as well as inconsistencies that are caused by virtualization. When virtualized, the 4 elements of infrastructure (CPU,
RAM, Disk, Network) behave differently.
The complexity is created by a new layer because it impacts the adjacent layers below and above it. So the net effect
is you need to learn all 3 layers (Guest OS layer, virtualization layer and physical layer). That’s why from a monitoring
and troubleshooting viewpoint, Kubernetes and container technology require an even deeper knowledge as the
boundary is even less strict.

Nuances in Metrics
I find it useful to know the subtle differences in the behaviour of the metrics and properties. By knowing their
differences, we can then pick the correct metrics for the tasks at hand.

Naming Complexity
Same name, same ob- The metrics have the same name, belong to the same object, yet they have a differ-
ject, different formula ent formula depending on where in the object you measure it.
Example: VM CPU Used in vCPU level does not include System time but at the VM
level it does. The reason is that System time does not exist at vCPU level since the ac-
counting is charged at the VM level.
Same name, different Metrics with the same name do not always have the same formula in different
formula vSphere objects.
Memory Usage: in VM this is mapped to Active, while in ESXi Host this is mapped to
Consumed. In Cluster, this is Consumed + Overhead. Technically speaking, mapping
usage to active for VM and consumed for ESXi makes sense, due to the 2-level mem-
ory hierarchy in virtualization. At the VM level, we use active as it shows what the VM

Introduction: Metrics Complexity Page 2


VMware vSphere Metrics May 2023

is actually consuming (related to performance). At the host and cluster levels, we use
consumed because it is related to what the VM claimed (related to capacity manage-
ment). This confusion has resulted in customers buying more RAM than what they
need. Aria Operations uses Guest OS data for Usage, and falls back to Active if it’s not
available.
Memory Consumed: in ESXi this includes memory consumed by ESXi, while in Cluster
it only includes memory consumed by VM. In VM this does not include overhead,
while in Cluster it does.
VM Used includes Hyper Threading but penalty is 37.5%. ESXi Used is also aware of
HT but the penalty is 50%.
Virtual Disk: in VM this includes RDM, but in Datastore it does not. Technically, this
makes sense as they have different vantage points.
Steal Time in Linux only includes CPU Ready, while stolen time in VM (CPU Latency)
include many other factors including CPU frequency.
Same name, different Metrics with the same name, yet different meaning. Be careful as you may misinter-
meaning pret them.
VM CPU Usage (%) shows 62.5% when ESXi CPU Usage (%) shows 100%. This happens
since VM CPU Usage considers Hyper Threading, while ESXi CPU Usage does not. It
happens when the ESXi core that the VM vCPU runs is also running another thread.
Disk Latency and Memory Latency indicate a performance problem. They are in fact
the primary counter for how well the VM is being served by the underlying IaaS. But
CPU Latency does not always indicate a performance problem. Its value is affected by
CPU Frequency, which can go up or down. Sure, the VM is running at a higher or
lower CPU speed, but it is not waiting to be served. It’s the equivalent on running on
older CPU.
Same name, different be- Memory Reservation and CPU Reservation have different behaviors from monitoring
haviour viewpoint.
In Microsoft Windows, the CPU queue includes only counts the queue size, while the
disk queue excludes the IO commands being processed.
Same purpose, different You would expect if the purpose is identical then the label or name will be identical.
name Swapped Memory in VM is called Swapped, while in ESXi is called Swap Used.
Static frequency CPU utilization in VM is called Run, while ESXi calls it Utilization.
What vCenter calls Logical Processor (in the client UI) is what ESXi calls Physical CPU
(in esxtop panel)
vCenter uses Consumed (%) and Usage (%) for the same ESXi CPU utilization.
Confusing name The name of the counter may not be clear.
VM CPU Wait counter includes Idle time. Since many VMs do not run at 100%, you
will see CPU Wait counter to be high. You may think it’s waiting for something (e.g.
Disk or Memory) but it’s just idle. If we see from the viewpoint of VMkernel schedule,
that vCPU is waiting to be used. So the name is technically correct.
The term virtual disk actually includes RDM. It’s not just VMDK. The reason is RDM
appears as virtual disk when you browse the directory in the datastore, even though
the RDM file is just a pointer to an external LUN.

Introduction: Metrics Complexity Page 3


VMware vSphere Metrics May 2023

Architecture Complexity
The 4 basic elements of infrastructure have their own unique nature. This in turn creates complexity in observability.
The following table list some example of nuances. We will explain in-depth in the next chapter.

CPU The primary speed metric (GHz) is not comparable across different hardware genera-
tion or architecture. 1 GHz in today’s CPU is faster than 1 GHz in older CPU.
Memory Its function is caching, so its counters tend to be near 100%, and that is what you
want.
CPU and memory metrics have different nature. 95% utilization for memory could be
low, while 85% for CPU could be high already.
It’s a form storage, so its metrics are mostly disk space.
Storage It has 2 sides (speed and space) but both have utilization metrics.
The speed has 2 components for utilization: IOPS and Throughput
Network While server and storage are nodes, network is interconnect. This makes it more chal-
lenging.

Their complexity results in difference in type of metrics applicable:


Utilization Reservation Allocation
CPU Yes Yes Yes
Memory Yes Yes Yes
Disk Yes Not Applicable Yes
Network Yes Yes Yes

And lastly, beyond metrics there are also further complications such as:

VM vs ESXi The CPU metrics from a VM viewpoint differs to the CPU metrics from ESXi viewpoint.
A VM is a consumer. Multiple VMs can share the same physical core, albeit at the
price of performance. So metrics such as Ready does not apply to ESXi. The core and
the thread are always ready.

ESXi vs vCenter While ESXi is the source of metrics, vCenter may add its own metrics and the formula
don’t always match 100% in all scenarios, such as Used vs Usage.
ESXi provides Run (ms), Used (ms), Demand (MHz) for VM CPU. vCenter adds Usage
(MHz) and Usage (%), which create confusion as there are now 5 choices.
ESXi shows Used (%), while vCenter shows Used (ms). The first one affected by CPU
frequency and can go beyond 100%.

ESXi ≠ VMs + VMkernel The metrics at ESXi is more complex than the sum of its VM + VMkernel. We dedicate
a subchapter for this within ESXi chapter.

Introduction: Metrics Complexity Page 4


VMware vSphere Metrics May 2023

M:N relationship A VM with multiple virtual disks can span across multiple datastores, and even RDMs.
On the other hand, a datastore typically hosts many VMs. An ESXi may mount multi-
ple LUNs and a LUN is typically presented into multiple ESXi or even multiple clusters.
These many to many relationships make the metrics across VM, datastore, ESXi, Clus-
ter, Data Center inconsistent when viewed overall. Each of them is correct as each
has to look from their own vantage point.
Windows vs Linux Windows CPU queue excludes the running thread, Linux includes the threads being
executed.
Windows memory metrics are different to Linux memory metrics.

There is also a scalability concern. Take for example, vCenter has 17 CPU metrics available at the VM level, and 12 of
them are available at a vCPU level too. In addition, each VM comes with 28 memory metrics. That means a VM with
4 vCPUs will have 93 metrics (17 + 4 x 12 + 28). A vSphere environment with 1,000 VMs with 4 vCPUs as the average
VM size will have process 93K metrics each time it collects. If you do that every minute, you will collect almost 134
million metrics per day. Since many customers like to keep for at least 6 months, that’s 24+ billion metrics!
With so many metrics, the amount of business value received becomes a valid concern. At the end of the day, you
are not in the business of collecting metrics.
I’m not a fan of simply regurgitating the metrics that the source system has. We should start by understanding the
unique behaviour of the system we want to manage (e.g. vSphere, Windows), then simplifying it by consolidating
and standardizing the metrics. For example, Aria Operations creates derived metrics such as KPI and capacity
metrics, then applies them to CPU, RAM, disk, and network as appropriate.

Other Nuances
Mixing terminology Allocation and reservation are different concepts.
When you allocate something to someone, it does not mean it’s guaranteed. If you
want a guarantee, then do reservation. Allocation is a maximum (you can’t go beyond
it), while Reservation is a minimum. The actual utilization can be below reservation
but can’t exceed allocation.
You cannot overcommit reservation as it’s a guarantee. You can overcommit alloca-
tion as it is not a guarantee.
Avoid using metric names like these:

 Allocation Reservation. This makes no sense 😊


 Maximum Reservation. Simply use Allocation instead.
 Minimum Allocation. Simply use Reservation instead.
Confusing roll up Why is VM CPU Ready above 100%? If you look at esxtop, many VM level metrics are
>100%.
vCenter measures every 20000 ms, but the maximum value for a completely idle
thread is 10000. The reason is 20000 is the value set at the core level. Since a core has
2 threads when HT enabled, each was allocated 10000.
Confusing unit Why are CPU metrics expressed in milliseconds instead of percentage or GHz? How
can a time counter (milliseconds in this case) account for CPU Frequency? There is a

Introduction: Metrics Complexity Page 5


VMware vSphere Metrics May 2023

good reason for that!


Is 1 Giga = 1000 Mega or 1024 Mega?
Esxtop and vSphere Client use different units for the same metric. For example, esx-
top use megabit while vCenter UI use kilobyte for networking counter.
“Missing” Metrics You will find VM CPU Demand, but not VM Memory Demand. Demand does not apply
to memory as it’s a form of storage, just as there is no such thing as a Demand metric
for your laptop disk space.
Too many choices When you have 2 watches showing different times, you become unsure which watch
is the correct one.
There are 5 metrics for VM CPU “utilization”: Run, Used, Usage, Usage in MHz, and
Demand. Why so many metrics just to track utilization, different to what Windows or
Linux tracks?
There are 6 metrics for ESXi CPU “utilization”: Core Utilization, Utilization, Used, Us-
age, Usage in MHz, and Demand.
Why so many? You’ll find out in this book.
Confusing formula ESXi CPU Idle (ms) includes CPU Frequency.
Inconsistent implemen- There is reservation for CPU, memory and network, but not for Disk.
tation There is limit for disk IOPS, but not for disk throughput.
Incorrect name Task Manager in Windows is not correct as the kernel does not have such concept.
The terminology that Windows has is actually called Job. A job is a group of processes

that can be managed as one. Do you want it to be called Job Manager? 😊


I think Process Manager is better as that’s what running on top of the kernel.

Virtualization Impact
From observability viewpoint, a VM is not what most of us think it is. It changes the fundamental of operations
management. It introduces a whole set of metrics and properties, and relegates many known concepts as irrelevant.
For example, you generally talk about these types of system-level metrics in Windows or Linux
 Processes
 Threads
 System Calls/sec
But when it comes to VM, you don’t. The reason these OS-level metrics are not relevant is because a VM is not an
OS.
From VMkernel’s vantage point, a VM is just a collection of process that needs to be run together. Each process is
called World. So there is a world for each vCPU of a VM, as each can be scheduled independently. The following
screenshot shows both VM and non VM worlds running side by side. I’ve marked the kernel modules with red dot.
You can spot familiar process like vpxa and hostd running alongside VM (marked with the yellow line).

Introduction: Metrics Complexity Page 6


VMware vSphere Metrics May 2023

Not all VMware-specific characteristics are well understood by management tools that are not purpose-built for it.
Partial understanding can lead to misunderstanding as wrong interpretation of metrics can result in wrong action
taken.

Visibility
Guest OS and VM are 2 closely related due to their 1:1 relationship. They are adjacent layers in SDDC stacks.
However, the two layers are distinct, each provide unique visibility that the other layer may not be able to give.
Resource consumed by Guest OS is not the same as resource consumed by the underlying VM. Other factors such as
power management and CPU SMT also contribute to the differences.
The different vantage points result in different metrics. This creates complexity as you size based on what happens
inside the VM, but reclaim based on what happens outside the VM (specifically, the footprint on the ESXi). In other
words, you size the Guest OS and you reclaim the VM.
The following diagram uses the English words demand and usage to explain the concept, where demand consists of
usage and unmet demand. It does not mean the demand and usage metrics in vSphere, meaning don’t assume these
metrics actually mean this. They were created for a different purpose.

I tried adding application into the above diagram, but that complicated the whole picture that I removed it. So just
take note that some applications such as Java VM and database manage their own resources. Another virtualization
layer such as Container certainly takes the complexity to another level.
We can see from the above that area A is not visible to the hypervisor.

Layer A Queue inside the Guest OS (CPU Run Queue, RAM Page File, Disk Queue Length, Driver Queue,

Introduction: Metrics Complexity Page 7


VMware vSphere Metrics May 2023

network card ring buffer). These queues are not visible to the underlying hypervisor as they have
not been sent down to the kernel. For example, if Oracle sends IO requests to Windows, and
Windows storage subsystem is full, it won’t send this IO to the hypervisor. As a result, the disk
IOPS counter at VM level will under report as it has not received this IO request yet.
Layer B What the Guest actually uses. This is visible to the hypervisor as a VM is basically a multi-process
application. The Guest OS CPU utilization somehow translates into VM CPU Run. I added the
word “somehow” as the two metrics are calculated independently of each other, and likely taken
at different sampling time and use different roll up technique.
Layer C Hypervisor overhead (CPU System, CPU MKS, CPU VMX, RAM Overhead, Disk Snapshot). This
overhead is obviously not visible to the Guest OS. You can get some visibility by installing Tools,
as it will add new metrics into Windows/Linux. Tools do not modify existing Windows/Linux met-
rics, meaning they are still unaware of virtualization.
From VMkernel viewpoint, a VM is group of processes or user worlds that run in the VMkernel.
There are 3 main types of groups:
 VM Executable (VMX) process is responsible for handling I/O to devices that are not critical
to performance. The VMX is also responsible for communicating with user interfaces, snap-
shot managers, and remote console.
 VM Monitor (VMM) process is responsible for virtualizing the guest OS instructions, and
managing memory mapping. The VMM passes storage and network I/O requests to the VMk-
ernel, and passes all other requests to the VMX process. There is a VMM for each virtual CPU
assigned to a VM.
 Mouse Keyboard Screen (MKS) process is responsible for rendering the guest video and han-
dling guest OS user input. When you console into the VM via vCenter client, the work done is
charged to this process. This in turn is charged to the VM, and not specific vCPU.
If you want to see example of errors in the above process, review this KB article.
Layer D Unmet Demand (CPU Ready, CPU Co-Stop, CPU Overlap, CPU VM Wait, RAM Contention, VM
Outstanding IO).
The Guest OS experiences a frozen time or slowness. It’s unaware what it is, meaning it can’t ac-
count for it.

I’ve covered the difference in simple terms, and do not do justice to the full difference. If you want to read a
scientific paper, I recommend this paper by Benjamin Serebrin and Daniel Hecht.

Overhead vs Not Overhead


We need to be careful not to lump every additional load as overhead. Overhead means it’s mandatory (cannot be
avoided) and has negative impact (such as slower performance or more resource required). They typically do not
bring additional, new capabilities.
Let’s list some examples:

Overhead Cache. The only purpose of cache is performance. It does not increase capacity.
IO processing by hypervisor. There is an additional processing done by VMkernel, which could re-
sult in IO blender effect.
VM CPU and memory overhead for the VM Monitor layer. This is a small amount and opera-

Introduction: Metrics Complexity Page 8


VMware vSphere Metrics May 2023

tionally negligible.
vSAN. The actual ESXi memory consumed and CPU used by vSAN processes.
VM log files. VM is a layer on its own and the log provides necessary observability.
Not Overhead VM snapshot. Snapshot is optional and it delivers new functionalities not available in Guest OS.
VM memory snapshot. This does not have the same purpose with hibernation file inside Win-
dows or Linux. This feature enables memory overcommit at ESXi level.
vSAN Failures-to-Tolerate policy. They provide availability protection since vSAN does not use
hardware-level redundancy. For workloads where the VM is transient and you have the master
template, you can set this to 0 (no protection).

VMkernel
To master vSphere metrics, you need to know VMkernel. The kernel is a different type of OS as it runs virtualized
motherboard (known as virtual machine). As a result its metrics are different to typical OS such as Windows and
Linux.
It consumes its own resources in all aspects (CPU, memory, disk, and network). With vSAN and NSX, the
consumption is no longer something you can ignore.
All the processes that run in VMkernel belong to one these 4 top-level resource groups:

System host/system resource pool for low-level kernel services and drivers. You will find world such
as minfree, kernel, helper, ft, vmotion, and drivers.
The CPU reservation value for this world is surprisingly low. It’s below 1 GHz.
The memory reservation value for this world is high. It’s ~20 – 30 GB depending on the ESXi.
Compared with the VIM resource, it tends to have much lower CPU reservation but much
higher memory reservation
VIM host/vim resource pool for host management process such as vpxa, DCUI, and hostd
The reservation value for this world is relatively high. I notice it’s around 4 – 12 GHz depend-
ing on the ESXi.
The CPU reservation value for this world is surprisingly low. It could be even 0 GB.
IO Filter host/iofilter resource pool
The IO Filter processes are grouped here.
vSphere Client UI does not display the CPU or memory reservation metrics.
User host/user resource pool
All the running VMs are children of the User resource pool. This includes the VM overhead
as it’s part of the VM.
vSphere Client UI does not display the CPU or memory reservation metrics.

In the vSphere Client UI, you will see the list of resource grouping in the Target Objects section in the performance
chart. I’ve highlighted them in the following screenshot:

Introduction: Metrics Complexity Page 9


VMware vSphere Metrics May 2023

The VMkernel scheduler uses share, limit and minimum reservation to manage all the above. This is fairly complex,
so let’s elaborate.
There are 3 types of metrics:

Type Analysis
Utilization This is the actual, visible, consumption.
Utilization can be lower than reservation, but not higher than allocation.
Since you’ve already paid for the hardware, you want to drive ESXi utilization as high as possible so
long there is no contention. Since VMkernel has higher priority than VM, we can safely assume we
can use VM contention as the proxy for overall contention (assuming manual VM Limit is not set).
The ESXi utilization metric considers both VMkernel and VM. There is no need to separate VMker-
nel in this case. The only time we need to separate is when we’re migrating the VMs into another
architecture.
Allocation For VM, allocation is useful as there is overcommit between virtual and physical.
For VMkernel, since there is no “virtual”, then there is no overcommit.

Introduction: Metrics Complexity Page 10


VMware vSphere Metrics May 2023

Note: some VMkernel processes have no limit. If you plot them in vSphere Client UI, you will find
their limits are either blank or 0.
Reservation For VMkernel processes, the maximum amount is taken care of by allocation, while the minimum
amount is by reservation. This a safety mechanism to ensure VMkernel can still run when all the
VMs want 100% resource.
Should ESXi capacity exclude this reserved component? I’m unsure as the only reservation metric
at vSphere Client UI does not include it.
Processes that run at kernel level does not get its reserved memory up front. It’s granted on de-
mand. CPU, being an instruction in nature, does not use the reserved amount unless it needs to
run. If you plot in vSphere Client UI, you will see the value of utilization can be lower than reserva-
tion.

Why is it hard to determine the size of the above 3 values up front? Taking from page 258 of Frank Denneman and
Niels Hagoort’s book, with my own addition:
 Some services have static values (allocation and reservation) regardless of the host configuration. Ok, this is
the easy part.
 Some services have relative values. It scales with the memory configuration of the host. Ok, that means you
need to know the percentage for each.
 Some services have relative values that are tied to the number of active VMs. Ok, that means you need to
know how many VMs are active.
 Some services consume more when they do more work. Example is storage and networking stack.
 Some services consume more depending on the configuration. For example, vSAN consumes more when you
turn on dedupe and compression.
Since an ESXi host has many services, it is impossible to predict the overall values of the above 3 metrics.

Resource Management
vSphere uses the following to manage the shared resources:
 Reservation
 Limit
 Share
 Entitlement
Reservation and Limit are absolute. Share is relative to the value of other VMs on the same cluster.
Unlike a physical server, you can configure a Limit and a Reservation on a VM. This is done outside the Guest OS, so
Windows or Linux does not know. You should minimize the use of Limit and Reservation as it makes SDDC operations
more complex.

Introduction: Metrics Complexity Page 11


VMware vSphere Metrics May 2023

Reservation impacts the provider (e.g. ESXi) as that’s where the reservation takes place.

CPU CPU Reservation is on demand. If the VM does not use the resource, then it does not come into play
as far as the VM is concerned.
Accounting wise, it does not impact CPU utilization metrics. Run, Used, Demand, Usage do not in-
clude it. Their value will be 0 or near 0 if the Guest OS is not running.
RAM Memory Reservation is permanent, hence impacts memory utilization metric. The Memory Con-
sumed counter includes it even though the page is not actually consumed yet. If you power on a 16
GB RAM VM into a BIOS state, and it has 10 GB Memory Reservation, the VM Consumed memory
counter will jump to 10 GB. It has not actually consumed the 10 GB, but since ESXi has reserved the
space, it is not available to other VMs.
If it’s not yet used, then it does not take effect. Meaning ESXi Host does not allocate any physical
RAM to the VM. However, once a VM asks for memory and it is served, the physical RAM is re-
served. From then on, ESXi continues reserving the physical RAM even though the VM is no longer
using it. In a sense, the page is locked despite the VM become idle for days.

Limit should not be used as it’s not visible to the Guest OS. The result is unpredictable and could create a worse
performance problem than reducing the VM configuration. For CPU, it impacts the CPU Ready counter. For RAM, in
the VMX file, this is sched.mem.max.
Reservation, Share and Limit are relatively static. They do not fluctuate unless they are manually changed. Hence,
they behave more like a property than a metric.

Entitlement
Entitlement means what the VM is entitled to. It's a dynamic value determined by the hypervisor. It varies every
second, determined by Limit, Shares and Reservation of the VM itself and any shared allocation with other VMs
running on the same host. For Shares, it certainly must consider shares of other VMs running on the same host. A
VM can’t use more than what ESXi entitles it.

Introduction: Metrics Complexity Page 12


VMware vSphere Metrics May 2023

Obviously, a VM can only use what it is entitled to at any given point of time, so the Usage counter cannot go higher
than the Entitlement counter.
In a healthy environment, the ESXi host has enough resources to meet the demands of all the VMs on it with
sufficient overhead. In this case, you will see that the Entitlement and Usage metrics will be similar to one another
when the VM is highly utilized.
The numerical value may not be identical because of reporting technique. vCenter reports Usage in percentage, and
it is an average value of the sample period. vCenter reports Entitlement in MHz and it takes the latest value in the
sample period. This also explains why you may see Usage a bit higher than Entitlement in highly-utilized vCPU. If the
VM has low utilization, you will see the Entitlement counter is much higher than Usage.

Collection | Aggregation
Before we cover the metrics, you need to know the various units and how they get
 collected within a collection period (e.g. 20 second)
 rolled up across time (e.g. from 20 second to 5 minutes)
 aggregated to higher level object (e.g. from ESXi to Cluster)

Collection
When you collect a metric you have a choice on what to collect:
1. Collect the data at that point in time.
2. Collect the average of all the data within the collection cycle.
3. Collect the maximum (or minimum) of all the data within the collection cycle.
The 1st choice is the least ideal, as you will miss majority of the metric. For example, if you collect every 5 minutes,
that means you collect the data of the 300th second, and miss 299 seconds worth of data points. Unfortunately,
many software have chosen this choice.
The 2nd choice gives you the complete picture, as no data is missing. The limitation is your collection interval can’t be
too long for the use case you’re interested in.
The 3rd choice complements the 2nd choice by picking the worst. That means you need 2 number per metrics for
certain use case.
As you collect regularly, you also need to decide if you reset to 0, or you continue from previous cycle. Most metrics
reset to 0 as accumulation is less useful in operations.
Let’s take a look at what you see at vCenter UI, when you open the performance dialog box. What do the columns
Rollups and Stat Type mean?

Introduction: Metrics Complexity Page 13


VMware vSphere Metrics May 2023

Stat Type explains the nature of the metrics. There are 3 types:

Delta The value is derived from a running counter that perpetually accumulates over time. What you see
is difference between 2 points in time. As a result, all the units in milliseconds are of delta type.
Rate The value measures the rate of change, such as throughput per second. Rate is always the average
across the 20 second period.
Absolute The value is a standalone number, not relative to other numbers.
Absolute can be latest value at 20th second or the average value across the 20 second period.

Some common units are milliseconds, MHz, percent, KBps, and KB.
Metrics in MHz is more complex as you need to compare with the ESXi physical CPU static frequency. In large
environments, this can be operationally difficult as you have different ESXi hosts from different generations or sport
a different GHz. This is one of the reasons why I see vSphere cluster as the smallest logical building block. If your
cluster has ESXi hosts with different frequencies, these MHz-based metrics can be difficult to use, as the VMs get
vMotion-ed by DRS.

Why Milliseconds as Unit?


vSphere uses 3 types of units for CPU: millisecond, MHz and %.
Of the 3, the millisecond is the source. Time is the raw unit, meaning both the percentage unit and the MHz unit are
derived from it, because they are expressed as the average/minimum/maximum over time. When we see the CPU
demand is 2 GHz at 9:00:00 am what vSphere likely means is it the average from previous collection. It is not a point
in time.
Time as a unit to measure CPU utilization does not seem logical. Where does it come from and why?
Hint: the stat type is Delta.
To answer that, we need to see from the ESXi VMkernel scheduler point of view. Think in terms of the passage of
time and the amount of CPU cycles that get completed during that time. A CPU core running at 2 GHz will get 2x CPU
cycles completed compared with a core running at 1 GHz. The same goes with Hyper Threading. You get less cycles
completed when there is a peer thread competing at the same time.
What you think as utilization or usage or demand or used, it will be easier if you see them as cycles, once you make
that small paradigm shift.
Let’s take VM CPU Ready. The following is taken from ESXi vsish1 command. It shows that the original, raw counter is
actually a running number. To calculate the CPU ready of a given time period, we need to subtract the last number
from the first number. To convert to percentage, we divide over the collection, which is 20000 ms in the screenshot.

1
Pronounced as V S I S H, not vSish. It stands for VMkernel System Information Shell

Introduction: Metrics Complexity Page 14


VMware vSphere Metrics May 2023

In the above, the slightly different values are due to different time in sample interval start and end.
I’ll take another example, to show that the original unit is time (microsecond, not millisecond).
/sched/groups/169890525/stats/cpuStatsDir/> cat /sched/groups/169890525/stats/cpuStatsDir/cpuStats
group CpuStats {
number of vsmps:7
size:19
used-time:905379300543 usec
latency-stats:latency-stats {
cpu-latency:798578245914 usec
memory-latency:memory-latency {
swap-fault-time:0 usec
swap-fault-count:0
compress-fault-time:0 usec
compress-fault-count:0
mem-fault-time:17939139 usec
mem-fault-count:3834600
}
network-latency:0 usec
storage-latency:0 usec

In vSphere UI and API, the counter for CPU Latency is percentage. But in the above, you can see that it’s true unit is
microseconds.

Summation
The Rollups column tells you how the data is rolled up to longer time period. Average means the average of 5
minutes in the case of vRealize Operations. What about Summation? Why does the number keep going up as you
roll up?
It is actually average for those metrics where accumulation makes more sense. Let’s take an example. CPU Ready
Time gets accumulated over the sampling period. vCenter reports metrics every 20 seconds, which is 20000
milliseconds. The following table shows a VM has different CPU Ready Time on each second. It has 900 ms CPU
Ready on the 5th and 6th second, but has lower number on the remaining 18 seconds.

Introduction: Metrics Complexity Page 15


VMware vSphere Metrics May 2023

Over a period of 20 seconds, a VM may accumulate different CPU Ready Time for each second. vCenter sums all
these numbers, then divides it by 20,000. This is actually an average, as you lose the peak within the period.
Latest, on the other hand, is different. It takes the last value of the sampling period. For example, in the 20-second
sampling, it takes the value between 19th and 20th seconds. This value can be lower or higher than the average of the
entire 20 seconds period. Latest is less popular compared with average as you miss 95% of the data.
Rolling up from 20 seconds to 5 minutes or higher results in further averaging, regardless whether the rollup
technique is summation or average. This is the reason why it is better to use Aria Operations than vCenter for data
older than 1 day, as vCenter averages the data further, into a 0.5 hour average.
The Collection Level in vCenter is shown in the following table.

Statistics Levels Metrics


Cluster Services (VMware Distributed Resource Scheduler) – all metrics
CPU –entitlement, total MHz, usage (average), usage MHz
Disk – capacity, max Total Latency, provisioned, unshared, usage (average), used
Memory – consumed, mem entitlement, overhead, swap in Rate, swap out Rate, swap used,
Level 1
total MB, usage (average), balloon, total bandwidth (DRAM or PMem)
Network – usage (average), IPv6
System – heartbeat, uptime
VM Operations – num Change datastore, num Change Host, num Change Host datastore
Level 1 metrics, plus the following:
CPU – idle, reserved Capacity
Disk – All metrics, excluding number Read and number Write.
Level 2
Memory – All metrics, excluding Used, maximum and minimum rollup values, read or write la-
tency (DRAM or PMem).
VM Operations – All metrics
Level 2 metrics, plus the following:
Level 3 Metrics for all metrics, excluding minimum and maximum rollup values.
Device metrics
Level 4 All metrics, including minimum and maximum rollup values.

Introduction: Metrics Complexity Page 16


VMware vSphere Metrics May 2023

Real Time Collection


Do we really need real-time collection and analysis for every single metrics, on every single objects, 24 x 7?
We collect the metrics for a reason, such as performance and capacity. The reasons dictate the frequency for each
type of metrics.
Take note that how frequent you collect is not the same with how granular the data points. For example, Aria
Operations collect every 5 minutes by default from vCenter, but it grabs 15 data points in 1 collection cycle. For
majority of the data, it averages these 15 data points and store as 1 number.

Use Case Collection Point Collection Frequency


Performance: Pro- 1 – 20 seconds for all counters 1 – 20 seconds
filing
Performance: Trou- 1 - 20 seconds for raw contention, 5 minutes for everything 5 minutes for both
bleshooting else. More explanation after this table.
Performance: SLA 5 minutes. Why SLA differs to troubleshooting and why 5- 5 minutes
minute is the sweet spot is covered here.
Capacity
Cost 15 minutes for all. Value is the average over 15 minutes, not the peak.
Compliance Functionally, you do not need 15-minute granularity. Operationally, it’s safer to do 15 min-
utes. If there is collection failure, either due to collector or target system, you only lose 15
Sustainability minutes’ worth of data.
Inventory

Performance Troubleshooting
For troubleshooting, you want per-second data. Who does not want sharper visibility? However, there are potential
problems:
1. It may not be possible. The system you’re monitoring may not be able to produce the data, or it comes with
capacity or performance penalty.
2. It’s expensive. Your monitoring system might grow to be as large as the systems being monitored. You could
be better off spending the money on buying more hardware, preventing the problem to begin with.
3. You get diminishing return. The first data point is the most valuable. Subsequent data points are less valu-
able if they are not providing new information.
4. The remediation action is likely the same as there are only a handful of things you can do to fix the problem.
The number of problems outweigh the actual solution.
So what can you do instead?
Begin with the end in mind. Look at the solution (e.g. add hardware, change some settings) and ask what metrics are
required. For each required metrics, ask what granularity is required.
I find that 1 – 20 second is only required for the contention-type of metrics. For utilization-type and contextual-type,
I think 5 minute is enough. You need higher resolution when the contention-type metrics do not exist. For example,
there is no metric for network latency and packet retransmit at VM level. All you have is packet dropped. To address
the missing metrics, use utilization metric such as packet per second and network throughput.
We’ve done the analysis of what metrics are required and document them here.

Introduction: Metrics Complexity Page 17


VMware vSphere Metrics May 2023

Units
Before we cover aggregation, we need to clarify unit as aggregation often using a different unit.

1000 vs 1024
There is confusion between 1024 and 1000. Is 1 gigabyte = 1024 megabyte or 1000 megabyte?
Is 1 Gigabit = 1000 Mb or 1024 Mbit?
The answer is 1000. Because both are byte, so the only change is from giga to mega. The following screenshot is
taken from Google.

However, many products from many vendors use the binary conversion instead of decimal. This is one of those issue
between what’s popular in practice vs what it should be in theory.
To add further confusion, there is consistency among storage and network vendors.
Microsoft Windows use 1024 for storage. My 1,000,202,039,296 bytes physical SSD is shown at 931 GB, not 1024 GB.

The disk vendor states that’s a 1 TB disk.


Another vendor, Samsung also uses 1000. It states the SSD as 250 GB. Microsoft shows 232.87 GB, including hidden
partitions.

Introduction: Metrics Complexity Page 18


VMware vSphere Metrics May 2023

Kilo vs Kibi
To address the confusion, the committee at International System of Quantities came up with a new set of name for
the binary units. Instead of kilo, mega, giga, they use kibi, mebi and gibi.
I find it confusing to drop familiar terms like kilo, mega and giga. Personally I’d have preferred kilobi, megabi and
gigabi as it shows the relationship to the commonly known units. Or if you want to emphasize the binary nature,
perhaps kilo2byte, mega2byte, giga2byte as the name.
Let’s take an example
 1 Kibibyte = 1024 bytes. That means 1 Kibibyte = 1.024 KB.
 1 Gibibyte = 1024 Mebibytes = 1,073,741,824 bytes
The abbreviation is also changed from K, M, G to Ki, Mi, Gi, where the letter i is small case.
Note the conversion from byte to bit remains. 1 byte = 8 bit.

Bit vs Byte
Do you use Byte/second or bit/second?
To me, it depends on the context. If you talk about disk space, you should use byte. You measure the amount of disk
space read or written per second. If you talk about network line, you should use bit. You measure the amount of SCSI
blocks travelling inside ethernet or FC cable. Pearson uses 1024 for disk space, and 1000 for transmission speed, in
their certification. There are other references, such as gbmb.org, NIST, and Lyberty. In short, there is really no
standard.
The following is network transmit. It’s showing 30.81 MBps. So this is a rate, showing bandwidth consumption or
network speed.

What would it show if you convert into KBps?


30810, if it uses 1000.

Introduction: Metrics Complexity Page 19


VMware vSphere Metrics May 2023

Since vRealize treats 1 Mega = 1024 Kilo, the above is what you get.
Since it’s network, let’s convert into bit.
What do you expect you get in Mbps?

You get 31 x 553.13 x 8 bits / 1024 = 246 / 51

Aggregation
Aggregating to a higher-level object is complex as there is no lossless solution. You are trying to represent a range of
values by picking up 1 value among them, so you tend to lose the details. The choices of techniques are mean,
median, maximum, minimum, percentile, sum and count of. The default technique used is the average() function.
The problem with average is it will mask out the problems unless they are widespread. By the time the average
performance of 1000 VMs is bad, you likely have a hundred VMs in bad shape.
Let’s take an example. The following table shows ESXi hosts. The first host has CPU Ready of 149,116.33 ms. Is that a
bad number?

It is hard to conclude. It depends on the number of running vCPU, not the number of physical cores.
That host has 67 running VMs, and each of those VMs can have multiple vCPU. In total there are 195 vCPU. Each
vCPU could potentially experience CPU Ready of 20,000 ms (which is the worst possible scenario).
If you sum the CPU Ready of the 67 VM, what number would you get?

Introduction: Metrics Complexity Page 20


VMware vSphere Metrics May 2023

You’re right, you get the same number reported by the ESXi host.
This means the ESXi CPU Ready = Sum (VM CPU Ready), and the VM CPU Ready = Sum (VM vCPU Ready).
Because it’s a summation of the VMs, to convert into % requires you to divide with the number of running VM vCPU.
ESXi CPU Ready (%) = ESXi CPU Ready (ms) / Sum (vCPU of running VMs)
Are the CPU Ready equally distributed among the VMs? What do you think?
It depends on many settings, so there is a good chance you get something like the following. This heat map shows
the 67 VMs on the above host, colored by CPU Ready and sized by VM CPU configuration. You can see that the larger
VMs tend to have higher CPU ready, as they have more vCPU.

“Peak” Utilization
One common requirement is the need to monitor for peak. Be careful in defining what peak actually is, as by default,
averages get in the way.
How do you define peak utilization or contention without being overly conservative or aggressive?
There are two dimensions of peaks. You can measure them across time or across members of the group.
Let's take a cluster with 8 ESXi hosts as an example. The following chart shows the 8 ESXi utilizations.
What’s the cluster peak utilization on that day?

Introduction: Metrics Complexity Page 21


VMware vSphere Metrics May 2023

The problem with this question is there are 1440 minutes in a day, so each ESXi Host has at least 288 metrics (based
on the 5-minute reporting period). So this cluster has 288 x 8 = 2304 metrics on that day. A true peak has to be the
highest metric among these 2304 metrics.

To get this true peak, you need to measure across members of the group. For each sample data, take the utilization
from the host with the highest utilization. In our cluster example, at 9:05 am, host number 1 has the highest
utilization among all hosts. Let’s say it hit 99%. We then take it that the cluster peak utilization at 9:05 am is also
99%.
You repeat this process for each sample period (e.g. 9:10 am, 9:15 am). You may get different hosts at different
times. You will not know which host provides the peak value as that varies from time to time.
What’s the problem of this true peak?
Yup, it might be too sensitive. All it takes is 1 number out of 2304 metrics. If you want to ignore the outlier, you need
to use percentile. For example, if you do 99th percentile, it will remove the highest ~23 datapoints.
Take note that the most common approach is to take the average utilization among all the 8 ESXi hosts in the cluster.
So you lose the true peak, as each data point becomes an average. For the cluster to hit 80% average utilization, at
least 1 ESXi host must have hit over 80%. That means you can't rule out the possibility that one host might hit near
100%.
The same logic applies to a VM. If a VM with 64 vCPUs hits 90% utilization, some cores probably hit 100%. This
method results in under-reporting as it takes an average of the “members” at any given moment, then take the peak
across time (e.g. last 24 hours).
This “averaging issue” exists basically everywhere in monitoring, as it’s the default technique when rolling up. For a
more in-depth reading, look at this analysis by Tyler Treat.

Performance Metrics
Can performance of a complex system be quantified?

Introduction: Metrics Complexity Page 22


VMware vSphere Metrics May 2023

For example, what is the performance of vSphere? How to define the performance of a large system such as NSX,
Horizon and Kubernetes? Quantifying something complex with many components is difficult. It’s like trying to figure
out the inflation rate of a country. It’s impossible to have the Consumer Price Index that properly represents the
economy as different individuals have different basket of goods. Even if we could develop the basket for each
individual, that basket changes each year, rendering comparison with previous year invalid.
Using the above situation, we develop the model for performance index.
After years of trials and improvements, I’m happy to share that we can define performance as a metric. This means
you can have the performance metric for any object, such as vSphere Cluster Performance (%) and Kubernetes Node
Performance (%).
For ease of use, we will simply call it Performance (%) instead of KPI.

Calculation
Performance is defined as 0 – 100%, where 100% means best possible performance. 0% means it’s at your worst
expectation, not the absolute slowest possible. For example, if you expect 40 ms as the least you can tolerate, then
the value will turn to 0% when disk latency hits 40 ms.
We use 4 colors, so we can divide 100% into 4 equal parts. So Green is simply 75% - 100% and Red is simply 0% -
25%. This is more natural than dividing into 3, where you end up with odd numbers such as 33.33% and 66.67%.
The other advantage is it gives you leading indicator (shown as yellow).

Why don’t we make green 95% - 100%? 75% for green sounds rather bad or low.
My answer is if you create an unequal distribution, some bands will have to be narrower than others. With uneven
bands, you also need to be extra careful when defining the threshold for each metric that make up the KPI. I made
the 4 bands equal, so the thresholds are easier to set.
Making the threshold easy to set is critical. As you design your KPI, you will vrealize that the threshold is the hardest
part. In fact, there are times where I drop a metric as I do not feel comfortable with the threshold.
The following KPI uses 4 metrics as its input. Each metric has a set of thresholds for green, yellow, orange and red.

Introduction: Metrics Complexity Page 23


VMware vSphere Metrics May 2023

Now that we have the threshold for each metric, we can convert each metric into Green – Red. The model is also
able to handle when the entire range is defined by a single number. This is useful when you want to define green = 0.
That means a single packet loss will put the metric into the yellow range already.
What if anything above 0 is red?
You simply set 0 for green, yellow and orange. Within the red zone, you can set 0 – 1, or 0 to something.

Translation
How do we translate a row?
Let’s use an example. Take the Disk Latency (%) metric. It has range from 0 to 40 ms, which maps into the 0 – 100%
using the following mapping table.

With the above mapping, we can be precise in assigning the value. For examples:
 9 millisecond disk latency translates into KPI value of 77.5%, which is green. The reason is green ranges from
75% to 100%, where 0 ms equals to 100% and 10 ms equals to 75%. So each millisecond is around 2.5%.
 42 millisecond disk latency translates into 0%. It is above the upper threshold of 40 millisecond. Since we do
not show negative, anything above the limit is shown as 0%

Threshold Design
If you have many metrics that make up the KPI, and one of them is red but the remaining is all green, the overall KPI
value may not reveal that there is problem. That single red does not have enough weight to bring down the rest.
So how do we solve it?
Enter progressive weightage.

Introduction: Metrics Complexity Page 24


VMware vSphere Metrics May 2023

We assign weightage so that yellow is 2x Green, Orange is 2x yellow and Red is 2x Orange. Mathematically, a single
red has equal weightage with 8 greens. The following table shows that 1 perfect red and 8 perfect greens will result
in the score of 50.

That also means that if you have 1 perfect red, and your green are not perfect, you can expect your value to be in
the orange category.
This relative weightage plays a key role in determining the threshold. Try to match the actual value so they also
correspond to 1x 2x 4x 8x. For example, set the VM disk latency so it goes up from 20 ms  40 ms  80 ms  160
ms. Notice they always double.

Note that this method does not replace assigning different weightage to each metric. You can still do that.
See this for the actual implementation in super metric, as it’s pretty complex.

Validation
Once you design the KPI for a specific object, always do a validation. This helps you validates if the thresholds,
weightage, metrics actually deliver the score that matches your expectation. Write down the common scenarios
along with the expected value.

Introduction: Metrics Complexity Page 25


VMware vSphere Metrics May 2023

I got a surprise on the result, that I thought there was a bug in the formula. Remember that 1 red has the weight of 8
green? So when I see 3 reds and 9 greens, I expect the value to be in the red, which is below 25. But I got a low
orange.
So let’s do some validation. I find testing the corner case useful. So let’s see what value we get when we have 9
perfect green and 3 worst red. What value do you expect?
A simple, non-weighted average will give a value of 75. This is right in the border of green or yellow.
What color does the weightage score give us?

It gives us a low orange. It is not red, but close enough to be red. This is why the score is important too, not just the
color.
What if your red is not the worst, but barely red? How many borderline red (near 25%) required before a perfect
green (100%) is showing red?
The following table shows 1 perfect green score and 11 barely-red scores. What color do you get at the end?

Yup, you get orange, not red. It takes many red scores, which makes it practically impossible to get a red if each red
is barely there. That’s why your red threshold needs to be 2x your orange threshold. If you make it too big, you will
get barely-red in most cases.
In actual environment, you certainly do not want to see red, even in development environment. Each VM will have
their own score, but overall you want to see majority green. Use heatmap to show, as it will automatically order
them by the value.

Introduction: Metrics Complexity Page 26


VMware vSphere Metrics May 2023

Introduction: Metrics Complexity Page 27


VMware vSphere Metrics May 2023

System Architecture

We covered in previous chapter that system architecture contributes to metric complexity.


Throughout this book, I’d cover the 4 elements of infrastructure in the sequence of CPU  Memory  Storage 
Network.

CPU
What used to be Windows or Linux running on a server has transformed into Guest OS  VM  ESXi. The 3 distinct
layers resulted in complexity documented earlier. The good part is this is not as complex as memory, where you have
4 layers as process running inside a Guest OS represents another layer.
The following infographic shows how the nature of CPU metrics change as a result of virtualization.

2
If you suspect that I can’t create professional graphic like this, you are right! That’s done by Abhishek Chouksey

Introduction: Metrics Complexity Page 28


VMware vSphere Metrics May 2023

Specifically for CPU, we need to be aware of dynamic metric. This means their values fluctuates depending on CPU
clock speed and HT effect. As a result, the values are harder to figure out due to lack of observability on the
fluctuation. This would not be an issue if the range is negligible. It is not. For example, HT can increase the value of
CPU Latency anywhere from 0% to 37.5%.

Guest OS vs VM
CPU metrics for a VM differ greatly from those in the Guest OS. For example, vCenter provides 5 metrics to account
for the utilization of VM CPU, yet none directly maps to Windows/Linux CPU utilization.
The following diagram shows some of the differences.

When the VMkernel de-schedules a VM vCPU to process something else (e.g. other VM, kernel interrupt) on the
same physical thread or core, the Guest OS does not know why it is interrupted. In fact, it experiences frozen time
for that particular vCPU running on the physical core. Time jumps when it’s scheduled again. Because of this unique
visibility, it’s important to use the correct metrics at the correct layers.
On the other hand, ESXi cannot see how the Guest OS schedules its processes. ESXi can only see what’s being sent
out by the Guest.
Both layers need to be monitored, as each measure different performance problems. Hence it’s imperative to install
VMware Tools. It reports the statistics about Guest OS to the ESXi host every 20 seconds by default.
The following example summarizes that mapping between Guest and VM is not possible.

Type Guest OS Metric VM Metric


Contention Run Queue None.
DPC Time All these are internal operations of Windows or Linux.

Context Switch
C1 Time None. ESXi does not break down per VM as it focuses on the physical
C2 Time core.

Introduction: Metrics Complexity Page 29


VMware vSphere Metrics May 2023

C3 Time
None CPU Ready, CPU Co-Stop, CPU System
Utilization Usage Run – Overlap if you think Windows/Linux counter does not consider
CPU frequency.
Usage. If you think otherwise.
I’m not sure which one as I need to do profiling and compare.

VM vs ESXi
Just like Guest OS and VM have different vantage point, the same complexity happens between VM and ESXi.
For VM, you discuss vCPU. It has virtual socket and virtual core. Physical cores and physical sockets do not apply,
meaning they are referring to different things.

State of a VM vCPU
ESXi Scheduler keeps in mind the following goals:
 To balance load across physical cores.
 To preserve cache state, minimize migration cost.
 To avoid contention from hardware (hyperthreading, low level cache, etc.) and sibling vCPUs (from the same
VM).
 To keep VMs or threads that have frequent communications close to each other.
With the above understanding, now look at the life of a single vCPU of a VM.
At the most basic level, a VM CPU is either being utilized or not being utilized by the Guest OS. At any given moment,

it either runs or it does not; there is no walk state 😉

Being used The hypervisor must schedule the vCPU. A multi vCPU VM has multiple schedules, 1 for each
vCPU. For each vCPU:
 If VMkernel has the physical CPUs to run it, then the vCPU gets to Run. The Run
counter is increased to track this.
 If VMkernel has no physical CPUs to run it, then the vCPU is placed into Ready State.
The VM is ready, but the hypervisor is not. The Ready counter tracks this.
Not being used There are 2 possible reasons why it’s used:
 The CPU is truly idle. It’s not doing any work. The Idle Wait counter accounts for it.
 The CPU is waiting for IO. CPU, being faster than RAM, waits for IO to be brought in.
There are 3 sub cases here (Co-stop, Other Wait and memory wait), and they will be
covered later.

With the above understanding, we’re ready to examine the following state diagram. The diagram shows a single
schedule (1 vCPU, not the whole VM). It’s showing the view from hypervisor (not from inside the Guest OS):

Introduction: Metrics Complexity Page 30


VMware vSphere Metrics May 2023

ESXi places each vCPU of the VM in one of the 4 above states. A vCPU cannot be in 2 states at the same time. This is
fundamental in understanding the formula behind CPU metrics.
 Run does not check how fast it runs (frequency) or how efficient it runs (hyperthreading). Run measures how
long it runs, hence the counter is in milliseconds, not GHz.
 Ready and Co-stop. They are mutually exclusive states. If a vCPU is in Co-stop, it is not in Ready state.
 Wait handles both Idle and Wait. The reason is the hypervisor cannot tell whether the Guest OS is waiting for
IO or idle. As far as the hypervisor concern, it’s not doing anything. This also measures the state where the
wait is due to hypervisor IO.
Those of you familiar with Operating Systems3 kernel will notice that the diagram is similar with a physical OS
scheduler state diagram. In the following screenshot, I took Huawei Harmony OS as an example as it’s the newest OS
and it’s designed for a range of device4.

Init The process is being created.


Maps to New in VMkernel
Ready The process is in the ready list and waits for being scheduled by the CPU.

3
Understanding how an OS works is paramount and well worth it. Here is a 3.5 hour lecture by Mike Murphy.
4
Designing an OS for multiple hardware classes is hard. Notice Apple MacOS, iPhone OS, and iPad OS. Google has Android and
ChromeOS.

Introduction: Metrics Complexity Page 31


VMware vSphere Metrics May 2023

Maps to Ready in VMkernel


Running Maps to Run in VMkernel
Pending The process is blocked and suspended. When all threads in a process are blocked, the process is
blocked and suspended.
Maps to Wait in VMkernel. Notice they also include Idle here in their Wait state.
Zombies Maps to Zombies in VMkernel
“none” Our Co-stop is unique as VM is a multi-process scheduled entity

Back to our VMkernel 4 possible states, you can conclude that:


Run + Ready + Co-stop + Wait = 100%

VM 2 can run when VM 1 is on Co-stop state, Ready state, or Wait state. This is because the physical thread is
available.

State across Time


The above is at any given moment. To measure over time and report it (say every 20 seconds), we need to add a
time dimension. The following example shows the above state diagram repeated over time, where each block is 1
second. In reality, the scheduler checks more frequently than this.

vCenter happens to use 20000 milliseconds as the reporting cycle, hence 20000 milliseconds = 100%.
The above visually shows why Ready (%) + Co-stop (%) needs to be seen in context of Run. Ready at 5% is low when
Run is at 95%. Ready at 2% is very high when Run is only 10%, because 20% of the time when the VM wanted to run
it couldn’t.
The above is per vCPU. A VM with 24 vCPU will have 480,000 as the total. It matters not if the VM is configured with
1 vCPU 24 vCores or 24 vCPU with 1 vCore each.
You can prove the above by stacking up the 4 metrics over time. In this VM, the total is exactly 80000 ms as it has 4
vCPU. If you wonder why CPU Ready is so high, it’s a test VM where we artificially placed a limit.

Introduction: Metrics Complexity Page 32


VMware vSphere Metrics May 2023

The formula for the millisecond metrics in vRealize Operations are also not normalized by the number of vCPU. The
following shows the total adds up to 80000 as the VM has 4 vCPU.

This is why you should avoid using the millisecond counter. Use the % version instead as it has been normalized.

Simultaneous Multi-Threading
CPU SMT (Hyper Threading as Intel calls it) is known to deliver higher overall throughput. It increases the overall
throughput of the core, but at the expense of individual thread performance.
Accounting wise, ESXi records this overall boost at 1.25x regardless of the actual increase, which maybe less or more
than 1.25x. That means if both threads are running at the same time, the core records 1.25x overall throughput but
each thread only gets 62.5% of the shared physical core. This is a significant drop from the perspective of each VM.
From the perspective of each VM, it is better that the second thread is not being used, because the VM could then
get 100% performance instead of 62.5%. Because the drop could be significant, enabling the latency sensitivity
setting will result in a full core reservation. The CPU scheduler will not run any task on the second HT.
The following diagram shows 2 VMs sharing a single physical core. Each run on a thread of the shared core. There are
4 possible combinations of Run and Idle that can happen:

Introduction: Metrics Complexity Page 33


VMware vSphere Metrics May 2023

Each VM runs for half the time. The CPU Run counter = 50%, because it’s not aware of HT. But is that really what
each VM gets, since they have to fight for the same core?
The answer is obviously no. Hence the need for another counter that accounts for this. The diagram below shows
what VM A actually gets. The allocation is fixed.

The CPU Used counter takes this into account. In the first part, VM A only gets 62.5% as VM B is also running. In the
second part, VM A gets the full 100%. The total for the entire duration is 40.625%. CPU Used will report this number,
while CPU Run will report 50%.
If both threads are running all the time, guest what CPU Used and CPU Run will report?
62.5% and 100% respectively.

Quiz Time: VM vs ESXi


Review the following chart5. It shows a cluster with 2 metrics. The first counter sums all the ESXi CPU Usage, while
the second counter sums all the VM CPU Usage.

5
Courtesy of Hiroki Horikawa from the land of the rising sun.

Introduction: Metrics Complexity Page 34


VMware vSphere Metrics May 2023

Did you spot something that does not make sense?


They intertwine. How is that possible?
Clue: Notice the sum of VM is lower during lower utilization, but higher during high utilization.
During low utilization, the sum of VM is lower as it does not include VMkernel.
During high utilization, the sum of VM is higher due to hyper-threading. Each vCPU sees the full GHz, because the VM
does run at that speed, albeit with less efficiency. At the physical core level, there is only 1 core running both thread,
so ESXi uses 1.25x multiplier while VM uses 2x multiplier.

Power Management
The 2nd factor that impacts CPU accounting is CPU clock speed. The higher the frequency (GHz), the faster the CPU
run. Ceteris paribus, a CPU that run at 1 GHz is 50% slower than when it runs at 2 GHz. On the other hand, Turbo
Mode can kick in and the CPU clock speed becomes higher than stated frequency. Turbo Boost normally happens
together with power saving on the same CPU socket. Some cores are put to sleep mode, and the power saving is
used to turbo mode other cores. The overall power envelope within the socket remains the same.
Each core can have its own frequency. This makes rolling up the number to ESXi level more complex. You can’t derive
one throughput counter from the other. Each has to be calculated independently at core level.

CPU Architecture
As CPU architecture moves towards System on a Chip design, it’s important not to assume that a CPU socket is a
simple and linear collection of cores. Take a 64-core AMD EPYC for example. It’s actually made of 8 Core Complex
Dies. From the following diagram (taken from page 5 on the AMD link above), you can see that a thread on CCD 0 is
relatively close to a thread that runs on the same CCX, but far to a thread that runs on another CCD. You can see an
example of the performance impact here.

Introduction: Metrics Complexity Page 35


VMware vSphere Metrics May 2023

Another consideration you need to be aware of is NUMA. NUMA Node = Socket / Package, as 1 socket can have >1
package (if you enable Cluster-on-Die feature of Intel Xeon).

CPU States
There are 2 types of power states as defined by ACPI standard.

C-State When a core is idle, ESXi applies deep halt states, also known as C-states. The deeper the C-
state, the less power the CPU uses, but the longer it takes for the CPU to start running
again. ESXi predicts the idle state duration and chooses an appropriate C-state to enter.
There are 3 possible sub-states in C-state:
C0 = fully running. Within this C0 state, there is a further dimension called P-State.
C1 = a shallow state where the clock is gated (switched off). However, all the modules re-
main active, and the processor can go back to the active C0 state instantaneously. powered
on. In power management policies that do not use deep C-states, ESXi stops at C1.
C2 - Cn= varying degrees of CPU sections turned off. The higher the C state, the deeper the
sleep.

Introduction: Metrics Complexity Page 36


VMware vSphere Metrics May 2023

P-State There are 14 grades of CPU performance, measured by its frequency. You can see all the
frequencies in esxtop if your hardware supports it.
P0 state where Turbo Boost happens.
P1 is where it runs at Nominal Frequency (NF).
P13 is the lowest CPU frequency.

For details on P-State and C-State, see Valentin Bondzio and Mark Achtemichuk, VMworld 2017, Extreme
Performance Series.

Impact on Performance
How high can Turbo/Boost go?
It turns out that it is high enough that your performance and capacity need to account for it.
The following diagram is taken from page 12 of “Host Power Management in VMware vSphere 7.0” whitepaper by
Ranjan Hebbar and Praveen Yedlapalli. It shows that Intel Xeon Platinum 8260 can increase its speed by 1.29x (from
2.4 GHz to 3.1 GHz). If it only needs to increase 1 core, that single core can go up by 1.62x. This will be noticeable by
application that is CPU intensive. Consider this benefit before you decide to disable power management. The high
performance is static, it runs at the same frequency throughout.

Introduction: Metrics Complexity Page 37


VMware vSphere Metrics May 2023

Viewing the Impact


Let’s say a physical chip comes with 2 GHz as its standard speed. If ESXi increases the clock speed to 3 GHz, Used
counter will be 50% higher than the Run counter. The Guest OS (e.g. Windows or Linux) is not aware of this
performance boost. It reports a value based on the standard clock speed, just like Run does. On the other hand, if
ESXi decreases the clock speed to 1.5 GHz, then Used will report a value that is 25% lower than what Run reports.
Let’s take an example. What do you notice?

As you can see from the preceding chart, the impact is noticeable. The System and Overlap metrics hovers averages
<10 ms (negligible as this VM is basically idle), but the gap between Used and Run averages around 20% Used is
~20% higher than Run, likely due to Turbo Boost.
Let’s take another example, this time from a busy VM. I’ve removed System and Overlap as they are also negligible in
this example. This is a 32 vCPU VM running Splunk. Notice Used is consistently higher than Run.

Now let’s look at the opposite scenario. This VM is a 64 bit Ubuntu running 4 vCPU. Used (ms) is around 44% of Run
(ms). The VM had minimal System Time (ms) and Overlap (ms), so Used is basically lowered by both power savings
and CPU SMT. In this example, if Run is far from 100% and the application team want faster performance, your
answer is not to add vCPU. You should check the power management and CPU SMT, assuming the contention
metrics are low.

Introduction: Metrics Complexity Page 38


VMware vSphere Metrics May 2023

Does it mean we should always set power management to maximum?


No. ESXi uses power management to save power without impacting performance. A VM running on lower clock
speed does not mean it gets less done. You only set it to high performance on latency sensitive applications, where
sub-seconds performance matters. VDI, VoIP, video calling, Telco NFV are some examples that are best experienced
with low latency.

Memory
Let's now take a trip down memory lane, pun intended.
Memory differs from CPU as it is a form of storage.
 CPU is highly transient in nature. Instructions enter and leave the execution pipelines in less than a
nanosecond.
 Memory is a lot more stable. We are comparing nanoseconds to seconds (or longer, up to months,
depending upon the uptime of your VM).
As a storage, memory is basically a collection of blocks in physical DIMM. Information is stored in memory in
standard block sizes, typically 4 KB or 2 MB. Each block is called a page. At the lowest level, the memory pages are
just a series of zeroes and ones. MS Windows initializes its pages with 0, hence there is a zero-page counter in ESXi.
Keeping this concept in mind is critical as you review the memory metrics. The storage nature of memory is the
reason why memory monitoring is more challenging than CPU monitoring. Unlike CPU, memory has 2 dimensions:

Speed Nanoseconds The only counter ESXi has is Memory Latency. This counter increases when
the time to read from the RAM is longer than usual. The counter tracks the
percentage of memory space that’s taking longer than expected. It’s not
tracking the actual latency in nanosecond.
This is the opposite of Disk, where we track the actual latency, but not the
percentage of amount of space that is facing latency.
Both are storage, but “server people” and “storage people” measure them

differently 😊

Space Bytes This is the bulk of the metrics

Introduction: Metrics Complexity Page 39


VMware vSphere Metrics May 2023

Virtual Memory
Before we talk about memory counter, we need to cover virtual memory, as it’s an integral part of memory
management. The following shows how Windows or Linux masks the underlying physical memory from processes
running on the OS.

From the process’ point of view, this technique provides a contiguous address space, which makes memory
management easier. It also provides isolation, meaning process A can’t see the memory of process B. This isolation
provides some level of security. The isolation is not as good as isolation by container, which in turn is inferior to
isolation by VM.
Virtual Memory abstraction provides the possibility to overcommit. The machine may have 16 GB of physical RAM,
but by using pagefile the total memory available to its processes can exceed 16 GB. The process is unaware what is
backing its virtual address. It does not know whether a page is backed by Physical Memory or Swap File. All it
experiences is slowness, but it won’t know why as there is no counter at process level that can differentiate the
memory source.
On the other hand, some applications manage its own memory and do not expose to the operating system. Example
of such applications as are database and Java VM. Oleg Ulyanov shared in this blog SQL Server has its own operating
system called SQLOS. It handles memory and buffer management without communicating back to underlying
operating system.
With virtualization, VM object adds yet another layer.
If you add ESXi, we actually have 4 layers from Process à Guest OS à VM à ESXi.
The only layer that manages the actual physical memory is the last layer. IMHO, the term “Guest physical memory” is
illogical.
Each of these layers have their own address space. And that’s where the fun of performance troubleshooting begins

Introduction: Metrics Complexity Page 40


VMware vSphere Metrics May 2023

From the VMs point of view, it provides a contiguous address space and isolation (which is security). The underlying
physical pages at ESXi layer may not be contiguous, as it’s managed differently. The VM Monitor for each VM maps
the VM pages to the ESXi pages6. This page mapping is not always 1:1. Multiple VM pages may point to the same ESXi
pages due to transparent page sharing. On the other hand, VM page may not map to ESXi page due to balloon and
swapped. The net effect is the VM pages and ESXi pages (for that VM) will not be the same, hence we need two sets
of metrics.

VM memory Metrics tracks the VM Pages. There are 2 sets, one for each VM, and one a summation at ESXi
level for all running VMs. Do not confuse the summation with ESXi memory metrics.
Examples: Granted or Memory Shared
ESXi memory Metrics tracks the ESXi Pages. There are also 2 sets, but the summation at ESXi level contains
VMkernel own memory and VM overhead
Examples: Consumed or Memory Shared Common

This abstraction provides the possibility to overcommit, because the VM is unaware what is backing the physical
address. It could be Physical Memory, Swap File, Copy On Write, zipped, or ballooned.

Take note the position of Granted and Consumed. While both are metrics for VM, their context is different. One
looks at it from the VM viewpoint, the other from ESXi.
Understanding the vantage point is required to make sense of the metrics. It will prevent you from comparing
metrics that are not comparable (e.g. granted vs consumed) as they have different context.
Further reading: vSphere Resource Management technical paper.
6
Other documents use the term Guest Physical Page and Machine Page. I find it unnecessarily confusing, so I just call it VM
pages and ESXi pages. IMHO, physical is something you can hold in your hand.

Introduction: Metrics Complexity Page 41


VMware vSphere Metrics May 2023

If you need more convincing, here is from VMware vSphere 6.5 Host Resources Deep Dive by Frank Denneman and
Niels Hagoort. You will find it at Chapter 11 VMkernel Memory Management, page 243. I have highlighted in green

the part you need to pay attention 😊

Read further and you will see that VMkernel large page setting contributes more to ESXi capacity and the VM
performance.

Guest OS vs VM
Both come with dozens of metrics. Compared with Guest OS such as Windows, can you notice what’s missing and
what’s added?
The following diagram compares the memory metrics between VM and Guest OS,

Introduction: Metrics Complexity Page 42


VMware vSphere Metrics May 2023

Guest OS and VM metrics do not map to each other. Neither the VMkernel nor the Guest OS have full visibility into
each other.
Right off the bat, you will notice that popular metrics such as Consumed, Shared, and Reservation do not even exist
in Windows.

Type Guest OS Metric VM Metrics


Contention Paging None
None Latency
Utilization In Use None
Cache None
Free None
Compressed None
None Swapped or Compressed

ESXi Host cannot see how the Guest OS manages its memory pages, how it classifies the pages as Use, Modified,
Cache and Free. ESXi also cannot see the virtual memory (page file).
ESXi can only see when the Guest OS performs reads or writes. That’s why vSphere VM main metrics are basically
what is active recently and what has been active. The first one is called Active, the second is called Consumed. All
other metrics are about ESXi memory management, and not about VM memory utilization. VM memory utilization
impacts ESXi memory management, but they are clearly not the same thing.

Example: Guest OS More Accurate


Let’s take an example with a simple Microsoft Windows server running Active Directory. It has 4 GB of memory as it’s
just serving a small number of objects in the Singapore office lab. Take a look at the following table, where I
compared the counter from inside the Guest OS and the VM memory active counter.

There are four periods above where I made changes inside Windows. Let’s step through them.

Period What happened


A Microsoft AD server in normal running condition. vCenter is reporting low utilization, around 15-20%.

Introduction: Metrics Complexity Page 43


VMware vSphere Metrics May 2023

Note vCenter users the Active metric, not Consumed.


B I installed the vRealize Operations agent, which is based on the open source Telegraf. This gives the
Guest OS metric, which is shown by the blue color. The agent collects data every 5 minutes, hence the
regular spike. So far so good.
Notice the value from VM Active metric jumps to 100%. That’s fine, but then it stays at 100% for more
than 12 hours. All I did was installing a small collection agent and that’s it.
I actually got an alarm in vCenter, even though the VM does not need the RAM obviously. What hap-
pened here prove that the Active counter is based on sampling, and that sampling could be wrong.
More on that here.
C The next morning, I decided to generate some load as the pattern does not change at all. Since Win-
dows has not been patched for a long time, I started Windows patch. The entire process is mostly
downloading and installing, which last for several hours.
The two metrics show no correlation at all.
D After several hours, the entire Windows update process is completed.

Example: VM More Accurate


Let’s now look inside the VM. I will use another VM to show a different example. This time around, I will take an idle
VM so we can see how the metrics behave. An idle VM will have minimal or 0 activity.
You can see that this Windows Server 2016 VM has 16 GB, but 0 GB is active. It is expected as we know the Guest OS
is idle as nothing is installed. vCenter is showing the data correctly. So far so good….

What do you think you will see inside Windows?


Will the Windows In Use counter show that it’s using 0 GB or somewhere near there? You know that it won’t show 0
GB as it’s impossible that any OS does not use any memory while it’s running. So what number will the In Use
counter show?

Introduction: Metrics Complexity Page 44


VMware vSphere Metrics May 2023

It’s showing 7.2 GB! That’s no where near 0%.


Look at the chart. What do you notice?
It portrays that it has been constantly or actively using that much of memory. In reality, we know it’s idle because
ESXi is the one doing the actual reading and writing. The other proof that it is idle is Windows actually compressed
1.5 GB of this 7.2 GB.
One possible reason why Windows is showing high usage when there is none is applications that manage their own
memory. These applications will ask for the memory upfront in 1 contiguous block. You can see in the example
below:

Introduction: Metrics Complexity Page 45


VMware vSphere Metrics May 2023

You can see that java.exe takes up 26 GB.


JVM (Java Virtual Machine) manages that
memory and Windows can’t see inside this
block. Windows sees the entire block as used
and committed, regardless whether the
application actually uses it or not.

BTW, the above is taken from old blog article of Manny Sidhu. The blog no longer available, hence I could not
provide the link.
I hope the above simple experiments shows that you should use the right counter for the right purpose.

Introduction: Metrics Complexity Page 46


VMware vSphere Metrics May 2023

Storage
Virtualization increases the complexity in both storage capacity and performance. Just like memory, where we have
more than one level, we have multiple layers of storage and each layer only has control over its own. In addition,
each layer may not use the same terminology.
Storage in VMware IaaS is presented as datastore. In some situation, RDM and network file shares are also used by
certain VM.

Layer Description
Guest OS The most upper layer is the Guest OS. It sees virtual disks presented by the VM motherboard.
Guest OS typically has multiple partition or drive. Each partition has its own filesystem, serving
different purpose such OS drive, paging file drive, and data drive. A large database VM will
have even more partitions. Partition may not map 1:1 to the virtual disk. There is no visibility
to this mapping. This makes calculating unmapped blocks accurately an impossible task in the
case of RDM disk.
To make it more complex, there is also networked drive. Windows or Linux mounts them over
protocol such as SMB. These filesystems are not visible to the hypervisor, hence they are not
virtual disk. The disk IO is not visible to the VM as it goes out via vNIC.
VM The main file is virtual disks. This can be RDM, VMDK, vSAN and vVOL. Both are presented as
Local SCSI disk so the Guest OS does not know of the underlying protocol. For example, you
can actually have MS-DOS using drive on Fibre Channel network!
They are identified as scsiM:N, starting with scsi0:0, where M is the adapter number.
The discrepancy between VM layer and Guest OS utilization happens because each layer
works differently.
 If there is RDM or thick VMDK, VM can’t see the actual used inside Guest OS. It simply
sees 100% used, regardless of what Windows or Linux uses.
 If there is unmapped block, Guest OS can’t see this overhead.
We are interested in data both at the VM aggregate level, and at the individual virtual disk
level. If you are running a VM with a large data drive (for example, Oracle database), the per-
formance of the data drive is what the VM owner cares about the most. At the VM level, you

Introduction: Metrics Complexity Page 47


VMware vSphere Metrics May 2023

get the average of all drives; hence, the performance issue could be obscured.
ESXi In this layer we have the ESXi storage subsystem and the storage adapter. We do not deep
dive into ESXi in our discussion of storage metrics as in general it is not a cause of storage
bottleneck. Yes, the VMkernel prioritizes and queues the I/Os, but all these operations should
be less than 1 millisecond. If the I/O is held at the kernel, there is a good chance that the
physical device latency is more than 10 milliseconds.
In a typical shared storage, multiple VMs run on the same ESXi, and multiple VMs share a
datastore. So it is common to have an I/O blender effect, where sequential writes on individ-
ual vmdk files become random writes at the datastore level. It also changes the read/write ra-
tio. This can occur in either VMFS or NFS. This certainly increase complexity in troubleshoot-
ing. Complexity also increases when the IO needs to go over the network, especially across
different physical data centers asynchronously.
Datastore What you can see at this level, and hence how you monitor, depends on the storage architec-
ture.
The underlying storage protocol can be files (NFS) or blocks (VMFS). vSAN uses VMFS as its
consumption layer as the underlying layer is unique to vSAN, and hence vSAN requires its own
monitoring technique. Because vSAN presents itself as a VMFS datastore you need to know
that certain metrics will behave differently when datastore type is vSAN.
For NFS datastore, as it is network file share (as opposed to block), you have no visibility to
the underlying storage. The type of metrics will also be more limited, and network metric be-
comes more critical.
Datastore is not the sum of its VMs that VMs may span multiple datastores, or use RDM.
There can also be orphaned files outside the VM folder, which are not associated with VM.
Storage Subsys- This can be virtual (e.g. vSAN) or physical (e.g. physical array).
tem The datastore is normally backed one to one by a LUN, so what we see at the datastore level
matches with what we see at the LUN level. Multiple LUNs reside on a single array.

Type Guest OS Metrics VM Metrics


Contention OS Queue None
Driver Queue None
Latency Latency.
They should be similar, especially when plotted over time.
Utilization IOPS They should match with the metrics at VM level, especially when plotted
over time. If not, there is something wrong.
Throughput

Multi-Layer Management
The layers present challenge in management, as they create limitation in end to end visibility and raise different
questions. They also do not have consistent terminology. For example, the term disk, LUN and device may mean the
same thing. A device is typically physical (something you can hold, like an SSD card). LUN is typically virtual, a striping
across physical devices in a volume.

Introduction: Metrics Complexity Page 48


VMware vSphere Metrics May 2023

Storage metrics can be largely grouped into 2:

Speed Performance is measured in 2 ways (contention and utilization).


Utilization is further divided into IOPS and Throughput. Throughput = IOPS x Block Size.
Contention can happen at all 3 stages of IO processing:
 pre-processing: each layer has their own queue or outstanding IO.
 processing: aborted SCSI commands, dropped frame, etc.
 post-processing: latency.
Space Capacity has no concept of slowness in modern, SSD based storage as access to data is no longer
relying on spinning platter. 1% disk space full is not slower/faster than 99% full as defragmenta-
tion is no longer causing latency. However, it impacts availability. At 100% full, the storage will
stop processing IO and your application will crash as a result.
Capacity, as in disk space, is measured in bytes.
Storage differs to compute as reality overwrites projection. In compute, you use a projected ca-
pacity remaining number, which takes into account the past. In storage, if you have 0 bytes left,
the number overwrites whatever number shown by capacity engine.
You should also focus on reclamation as the amount tends to be substantial.

Performance
Latency can happen when IOPS and throughput are not high, because there are multiple stacks involved and each
stack has their own queue. It begins with a process, such as a database, issuing IO request. This gets processed by
Windows or Linux storage subsystem, and then send to the VM storage driver.
Ensure that you do not have packet loss for your IP Storage, dropped FC frames for FC protocol, or SCSI commands
aborted for your block storage. They are a sign of contention as the datastore (VMFS or NFS) is shared. The metrics
Bus Resets and Commands Aborted should be 0 all the time. As a result, it should be fine to track them at higher
level objects. Create a super metric that tracks the maximum or summation of both, and you should expect a flat
line.
Once you have ensured that you do not have packet loss on IP Storage or aborted commands on block storage, use
the latency counter and outstanding IO for monitoring. For troubleshooting, you will need to check both read latency
and write latency, as they tend to have different patterns and value. It’s common to only have read or write issue,
and not both.
Total Latency is not (Read Latency + Write Latency) / 2. It is not a simple summation. In a given second, a VM issues
many IOPS. For example, the VM issues 100 reads and 10 writes in a second. Each of these 110 commands will have
their own latency. The “total” latency is the average of these 110 commands. In this example, the total latency will
be more influenced by the read latency, as the workload is read dominated.
If you are using IP storage, take note that Read and Write do not map 1:1 to Transmit (Tx) and Receive (Rx) in
Networking metrics. Read and Write are both mapped to Transmit counter as the ESXi host is issuing commands,
hence transmitting the packets.

Introduction: Metrics Complexity Page 49


VMware vSphere Metrics May 2023

ESXi Layer
Storage at ESXi is a lot more complex than storage at VM level. Reason is ESXi virtualizes the different physical
storage subsystem, and VM simply consumes all of them as local SCSI drive.
The kernel does the IO on behalf of all the VMs. It also has its own kernel modules, such as vSAN, that also need to
be served. This creates what is popularly termed “IO Blender” effect. Sequential operations from each VM and kernel
modules become random when combined together. The opposite is when the kernel rearranges these independent
IOs and try to sequence them, so on average the latency is lower.

The green boxes are what you are likely to be familiar with. You have your ESXi host, and it can have NFS Datastore,
VMFS Datastore, vSAN Datastore, vVOL datastore or RDM objects. vSAN & vVOL present themselves as a VMFS
datastore, but the underlying architecture is different. The blue boxes represent the metric groups you see in
vCenter performance charts.
In the central storage architecture, NFS and VMFS datastores differ drastically in terms of metrics, as NFS is file-
based while VMFS is block-based.
 For NFS, it uses the vmnic, and so the adapter type (FC, FCoE, or iSCSI) is not applicable. Multipathing is
handled by the network, so you don't see it in the storage layer.
 For VMFS or RDM, you have more detailed visibility of the storage. To start off, each ESXi adapter is visible
and you can check the metrics for each of them. In terms of relationship, one adapter can have many devices
(disk or CDROM). One device is typically accessed via two storage adapters (for availability and load
balancing), and it is also accessed via two paths per adapter, with the paths diverging at the storage switch.
A single path, which will come from a specific adapter, can naturally connect one adapter to one device. The
following diagram shows the four paths:

Introduction: Metrics Complexity Page 50


VMware vSphere Metrics May 2023

The counter at ESXi level contains data from all VMs and VMkernel overhead. There is no breakdown. For example,
the counter at vmnic, storage adapter and storage path are all aggregate metrics. It’s not broken down by VM. The
same with vSAN objects (cache tier, capacity disk, disk group). None of them shows details per VM.
Can you figure out why there is no path to the VSAN Datastore?
We’ll do a comparison, and hopefully you will realize how different distributed storage and central storage is from
performance monitoring point of view. What look like a simple change has turned the observability upside down.

Storage Adapter
The screenshot shows an ESXi host with the list of its adapters. We have selected vmhba2 adapter, which is an FC
HBA. Notice that it is connected to 5 devices. Each device has 4 paths, giving 20 paths in total.

Introduction: Metrics Complexity Page 51


VMware vSphere Metrics May 2023

What do you think it will look like on vSAN? The following screenshot shows the storage adapter vmhba1 being used
to connect to two vSAN devices. Both devices have names begin with “Local”. The storage adapter has 2 targets, 2
devices and 2 paths. If you are guessing it is 1:1 mapping among targets, devices and paths, you are right.
We know vSAN is not part of Storage Fabric, so there is no need for Identifier, which is made of WWNN and WWPN.

Introduction: Metrics Complexity Page 52


VMware vSphere Metrics May 2023

Let’s expand the Paths tab. We can see the LUN ID here. This is important. The fact that the hypervisor can see the
device is important. That means the VMkernel can report if there is an issue, be it performance or availability. This is
different if the disk is directly passed through to the VM. The hypervisor loses visibility.

Storage Path
Continuing our comparison, the last one is Storage Path. In a fibre channel device, you will be presented with the
information shown in the next screenshot, including whether a path is active or not.

Introduction: Metrics Complexity Page 53


VMware vSphere Metrics May 2023

Note that not all paths carry I/O; it depends on your configuration and multipathing software. Because each LUN
typically has four paths, path management can be complicated if you have many LUNs.
What does Path look like in vSAN? As shared earlier, there is only 1 path.

Storage Devices
The term drive, disk, device, storage can be confusing as they are often used interchangeably in the industry. In
vSphere, this means a physical disk or physical LUN partition mounted by the host. The following shows that the ESXi
host has 3 storage devices, all are flash drive and the type = disk. The first two are used in vSAN datastore and are
accessed via the adapter vmhba1.

Introduction: Metrics Complexity Page 54


VMware vSphere Metrics May 2023

A storage path takes data from ESXi to the LUN (the term used by vSphere is Devices), not to the datastore. So if the
datastore has multiple extents, there are four paths per extent. This is one reason why you should not use more than
one extent, as each extent adds 4 paths. If you are not familiar with VMFS Extent, Cormac Hogan explains it here.
For VMFS (non vSAN), you can see the same metrics at both the Datastore level and the Disk level. Their value will be
identical if you follow the recommended configuration to create a 1:1 relationship between a datastore and a LUN.
This means you present an entire LUN to a datastore (use all of its capacity). The following shows a VMFS datastore
with a NetApp LUN backing it.

Introduction: Metrics Complexity Page 55


VMware vSphere Metrics May 2023

VM Files
A VM does not see the underlying shared storage. It sees local SCSI disks only. So regardless of whether the
underlying storage is NFS, VMFS, VSAN or RDM, it sees all of them as virtual disks. You lose visibility in the physical
adapter (for example, you cannot tell how many IOPSs on vmhba2 are coming from a particular VM) and physical
paths (for example, how many disk commands travelling on that path are coming from a particular VM).

VM can consume storage via:


 Virtual disk.
Each virtual disk has label, type (RDM or VMDK), provisioning type (thin, lazy zero, eager zero). If it’s RDM,
need to know additional properties such as RDM type (physical or virtual).
 Compute virtualization. Snapshots, Swapped, Logs. Guest OS can’t see them.This can be overhead and non-
overhead. This is not visible to the Guest OS. They are shown in blue in the following diagram.
 Storage virtualization. This includes vSAN protection, deduplication and decompression. We need this
number to reported separately as it’s not applicable in non vSAN.

There are more file types than shown above. However, from monitoring and troubleshooting viewpoint, the above is
sufficient.

Introduction: Metrics Complexity Page 56


VMware vSphere Metrics May 2023

Files
At the end of the day, all those disk space appear as files in the VMFS filesystem. You can see them when you browse
the datastore. The following is a typical example of what vSphere Client will show.

We can categorize them into 4 from operations viewpoint:

Disk Virtual disk or RDM. This is typically the largest component. This can be thin provisioned, in which
case the provisioned size tends to be larger than the actual consumption as Guest filesystem typi-
cally does not fill 100%.
All virtual disks are made up of two files, a large data file equal to the size of the virtual disk and a
small text disk descriptor file which describes the size and geometry of the virtual disk file.
The descriptor file also contains a pointer to the large data file as well as information on the virtual
disks drive sectors, heads, cylinders and disk adapter type. In most cases these files will have the
same name as the data file that it is associated with (i.e. MyVM1.vmdk and MyVM1-flat.vmdk).
A VM can have up to 64 disks from multiple datastores.
Snapshot Snapshot protects 3 things:
 VMDK
 Memory
 Configuration
For VMDK, the snapshot filename uses the syntax MyVM-000001.vmdk where MyVM is the name
of the VM and the six-digit number 000001 is just a sequential number. There is 1 file for each
VMDK.
Snapshot does not apply to RDM. You do that at storage subsystem instead, transparent to ESXi.
If you take snapshot with memory, it creates a .vmem file to store the actual image.
The .vmsn file stores the configuration of the VM. The .vmsd file is a small file, less than 1 KB. It

Introduction: Metrics Complexity Page 57


VMware vSphere Metrics May 2023

stores metadata about each snapshot that is active on a VM. This text file is initially 0 bytes in size
until a snapshot is created and is updated with information every time snapshots are created or
deleted. Only 1 file exists regardless of the number of snapshots running as they all update this
single file. This is why your IO goes up.

Swap The memory swap file (.vswp). A VM with 64 GB of RAM will generate a 64 GB swap file (minus
the size of memory reservation) which will be used when ESXi needs to swap the VM memory into
disk. The file gets deleted when the VM is powered off.
You can choose to store this locally on the ESXi Host. That would save space on vSAN. The catch is
vMotion as the swap file must be transferred too.
Others All other files. They are mostly small, in KB or MB. So if this counter is large, you’ve got unneeded
files inside the VM directory.
Logs files, configuration files, and BIOS/EFI configuration file (.nvram)
Note that this includes any other files you put in the VM directory. So if you put a huge ISO image
or any file, it gets counted.

Raw Device Mapping


RDM appears clearly as LUN in the VM Edit Settings dialog box:

Introduction: Metrics Complexity Page 58


VMware vSphere Metrics May 2023

But what does it appear when you browse the VM folder in the parent datastore?
RDM appears like a regular VMDK file. There is no way to distinguish it in the folder.

Multi-Writer Disk
Shared disk can be either shared RDM or VMDK. The following screenshot shows the option when creating a multi-
writer VMDK in vCenter Client.

Introduction: Metrics Complexity Page 59


VMware vSphere Metrics May 2023

When multiple VMs are sharing the same virtual disk or RDM, it creates additional challenge in capacity, cost and
performance management.

Network
Network monitoring is complex, especially in large data centers. Adding network virtualization takes the complexity
of performance troubleshooting even higher.
Just like CPU, Memory and Disk, there is also a new layer introduced by virtualization. There are virtual network
cards on each VM, and software-based switch on each ESXi bridging the VM card to the physical NIC card. The
various ESXi VMkernel modules also do not “talk” directly to the physical card. Basically, what used to be the top of
rack switch are now living inside each ESXi as an independent switch.

From performance and capacity management point of view, network has different fundamental characteristics to
compute or storage. The key differences are summarized below.

Compute or Storage Network


Net available resource to VM Relatively high Low

Introduction: Metrics Complexity Page 60


VMware vSphere Metrics May 2023

Resource allocation at VM level Granular Coarse


Hardware Single purpose Multi-purpose
Nature A node An interconnect
Upper Limit Yes No
Monitoring Simpler Harder
Location Fewer Many
Workload Type 1 Many

Let’s explain the preceding table, covering each row one by one.

Net Available Resource


At the end of the day, the net available resources to the VMs are what we care about. What the IaaS platform used is
considered an overhead. The more ESXi VMkernel, NSX, vSAN, vSphere Replication use, the lesser you have left for
the business workload.
An ESXi host has a fixed specification (for example, 2 CPUs, 60 cores, 512 GB RAM, 2 x 25 GE NIC). This means we
know the upper physical limit. How much of that it available to the VMs? Another word, what is the usable capacity
for the business workload?
For compute, the hypervisor consumes a relatively low proportion of resources. Even if you add a software-defined
storage such as vSAN, you are looking at around 10% total utilization but depends on many factors.
The same cannot be said about network. Mass vMotion (for example, when the host enters maintenance mode),
storage vMotion (in IP storage case), VM provisioning or cloning (for IP storage), and vSAN all take up significant
network bandwidth. In fact, the non-VM network takes up the majority of the ESXi resources. If you have 2 x 25 GE
NIC, majority of it is not used by VM. The following screenshot shows that VM only gets 100 shares out of 500
shares. So the overhead can be as high as 80%!

Introduction: Metrics Complexity Page 61


VMware vSphere Metrics May 2023

Allocated Resource
This means the resource that is given to a single VM itself. For compute, we can configure a granular size of CPU and
RAM. For the CPU, we can assign one, two, three, four, etc. vCPUs.
With network, we cannot specify the vNIC speed. It takes the speed of the ESXi vmnic assigned to the VM port group.
So each VM will either see 1 GE or 10 GE or 25 GE (you need to have the right vNIC driver, obviously). You cannot
allocate another amount, such as 500 Mbps or 250 Mbps in the Guest OS. In the physical world, we tend to assume
that each server has 10 GE and the network has sufficient bandwidth. You cannot assume this in a virtual data center
as you no longer have 10 GE for every VM at the physical level. It is shared and typically oversubscribed.
A network intensive VM can easily hit 1 Gbps for both egress and ingress traffic. The following chart shows a Hadoop
worker node receiving more than 5 Gbps worth traffic multiple times. You need to be careful in sizing the underlying
ESXi if you want to run multiple VMs. While you can use Network I/O Control and vSphere Traffic Shaping, they are
not configuration property of a VM.

Introduction: Metrics Complexity Page 62


VMware vSphere Metrics May 2023

Hardware
The networking hardware itself can provide different functionalities.
For compute, you have servers. While they may have different form factors or specifications, they all serve the same
purpose—to provide processing power and a set of working memory for hypervisor or VM.
For network, you have a variety of network services (firewall and load balancer) in addition to the basic network
functionalities (switch, router, and gateway). You need to monitor all of them to get a complete picture. These
functionalities can take the form of software or hardware.
Unlike storage, network has concept of duplex. A full duplex means it has 100% on both direction. For example, an
ESXi with a 25 Gb port can theoretically handle 25 Gb TX + 25 Gb RX as its full duplex.
Blade servers and other HCI form factors blur the line between server and network.

Nature of Network
The fourth difference is the nature of network. Compute and storage are nodes. When you have a CPU or RAM
performance issue on one host, it doesn't typically impact another host on a different cluster. The same thing
happens with storage. When a physical array has a performance issue, generally speaking it does not impact other
arrays in the data center.
Network is different. A local performance issue can easily be a data center-wide problem. Here is a good read by
shared Ivan Pepelnjak. To give a recent example (H2 2021), here is one from a world-class network operator 7:

7
The name of this Internet giant is irrelevant for this purpose, as it could have happened to anyone. It happens more often on
smaller companies. BTW, notice how they made the text grey so it’s harder to read!

Introduction: Metrics Complexity Page 63


VMware vSphere Metrics May 2023

Being an interconnect, it also connect users and servers to the Internet. If you have a global operations, you likely
have multiple entry points, provided by different providers. These connectivity needs to be secured and protected
with HA, preferably from 2 different ISPs.
There are typically many paths and routes in your network. You need to ensure they are available by testing the
connectivity from specific points.

Upper Limit
CPU or RAM workload have a per VM physical limit. This makes capacity management possible, and aids in
performance troubleshooting.
While network has a physical limit, it can be misleading to assume it is available to all VMs all the time. Because the
physical capacity of the network is shared, you have a dynamic upper limit for each workload. The VM Network port
group will have more bandwidth when there is no vMotion happening. Furthermore, each VM has a dynamic upper
limit as it shares the VM Network port group with other VMs.
The resource available to VM also varies from host to host. Within the same host, the limit changes as time
progresses. Unlike Storage I/O Control, Network I/O Control does not provide any metrics that tell you that it has
capped the bandwidth.
In many situations, the bandwidth within the ESXi host may not be the smallest pipe between the originating VM and
its destination. Within the data center, there could be firewalls, load balancers, routers, and other hops that the
packet has to go through. Once it leaves the data center, the WAN and Internet are likely to be a bottleneck. This
dynamic nature means every VM has its own practical limit.

Monitoring and Troubleshooting


A distributed system is harder to monitor than a single node, especially if workload varies among the components
that make up the system.
The network resource available to VM also varies from host to host. Within the same host, the limit changes as time
progresses. Unlike Storage I/O Control, Network I/O Control (NIOC) does not provide any metrics that tell you that it
has capped the bandwidth.

Introduction: Metrics Complexity Page 64


VMware vSphere Metrics May 2023

NIOC can help to limit the network throughput for a particular workload or VM. If you are using 10 GE, enable NIOC
so that a burst in one network workload does not impact your VM. For example, a mass vMotion operation can
saturate the 10 Gb link if you do not implement NIOC. In vCenter 7, there is no counter that tracks when NIOC caps
the network throughput.
The primary contention metrics are
 Latency.
 Dropped Packets
 Retransmit Packets. For TCP, dropped packets will be retransmitted.
 Jitter. This measures the inconsistency of the latency. An application may tolerate poor latency better than
variable latency.
Note there is no latency and retransmit metrics in vSphere.
Remember that Storage has 2 metrics (IOPS and Throughput) for consumption? Network also has these 2 types,
except the more popular one is the throughput. The PPS (packet per second) is less popular although they are useful
in gaining insight into your network. It can take up a significant CPU time to process high number of packets, as you
can see in NSX edge VM.

Location
Server and storage tend to be located fewer places. Even in the ROBO office, they are typically located in a rack, with
proper cooling and physical security. Network switch, especially Wireless Access Points, need to be placed in
multiple places within the building, if that’s required to provide enough network coverage.
Solution such as SDWAN even requires a network device to be deployed at employee home. I actually have the Dell
edge device at my home.

Workload Type
In network, not all packets are of the same type. You can have unitcast, multicast and broadcast.
Majority of traffic should be unicast, as ESXi or VM should not be broadcasting to all IP addresses in the network or
multicasting to many destination. The challenge is there are purposes for each type so you need to monitor if the
broadcast and multicast happens at the wrong time to the wrong network.
Storage and Server only has 1 type. From operations management viewpoint, for almost all customers, A CPU
instruction is a CPU instruction. You do not care what it is. The same goes with memory access and disk IO
commands.

Introduction: Metrics Complexity Page 65


VMware vSphere Metrics May 2023

This page is intentionally left blank.


When you open multiple pages in a 43” monitor, the white space helps identify the chapter.

Introduction: Metrics Complexity Page 66


VMware vSphere Metrics May 2023

Chapter 2

VM & Guest OS

Introduction: Metrics Complexity Page 67


VMware vSphere Metrics May 2023

Microsoft Windows

We will cover Microsoft Windows only in this release of the book. The Linux version is not yet ready, but you can see
the draft in the VMware Operations Management, 4th edition book.
Both the server variants of Windows and the desktop variants of Windows use the same set of metrics.
An operating system runs processes, which in turn run 1 or more threads. The thread is what is scheduled for
execution by the CPU. This is the only way a process runs. A process with 0 thread is not doing any work. Based on
the famous book Windows System Internals, 7th Edition: “If a process shows zero threads, it usually means the
process can’t be deleted for some reason—probably because of some buggy driver code”.
Majority of server programs use background process, meaning it has no user interaction. The status can be running
or suspended.
A thread has context, which stores private information specific to the thread. The term CPU context switch refers to
the unloading of the outgoing thread context and loading the incoming thread context. This work can be expensive if
it happens repeatedly. Windows has a feature called User-Mode Scheduling, which reduces the overhead of context
switching.
A thread typically opens 1 or more handles to the kernel objects.
A process can create another process, and so forth. This creates a hierarchy.
Idle process is a special process. It’s created for accounting purpose as the total sum of CPU cycle has to be 100%.
From what I know, Performance Monitor is still the main tool for Windows, despite it showing its age and it has not
been enhanced for years. Go to docs.microsoft.com and browse for Windows Server. It took me to this article, which
cover PerfMon. Many explanations on metrics at https://learn.microsoft.com/ are still based on end of life Windows.

CPU
PerfMon groups the CPU counters under Processor group. However, it places the Processor Queue Length and
Context Switches metrics under the System group. The System group covers system wide metrics, not just CPU.
The following screenshot show the counters under Processor group.

Introduction: Metrics Complexity Page 68


VMware vSphere Metrics May 2023

PerfMon UI provides a description, which I use as a reference below:

% C1 Time Based on this April 2004 article, Windows can operate in 4 different power level. The C0 is
% C2 Time the highest, while C3 consumes the least amount of power.
% C3 Time If you set dynamic power management, expect the lower power to be registering higher
value during idle period.
Reference: here.
C1 Transitions/sec The amount of time on each power level does not tell the full picture. You also need to
C2 Transitions/sec know how frequent you enter and exit that level.
C3 Transitions/sec These 3 metrics track the number of transitions into the respective level. For example, a
high numbers on all 3 counters mean Windows is fluctuating greatly, resulting in inconsis-
tent speed.
% DPC Time Deferred Procedure Calls (DPC). According to this, this counter is a part of the Privileged
Time (%) because DPCs are executed in privileged mode. They are counted separately and
are not a component of the interrupt counters.
% Interrupt Time Interrupt means the processor was interrupted from executing normal thread. This can
happen for a variety of reasons, such as system clock, incoming network packets, mouse
and keyboard activity. Interrupt can happen on regular basis, not just ad hoc. For example,
the system clock does it every 10 milliseconds in the background.
A high interrupt value can impact performance.
% Processor Time These 2 add up to 100%
% Idle Time
% User Time These 2 add up to 100%.
% Privileged Time
DPCs Queued/sec Unlike the CPU Run Queue, this metric captures per processor. It can be handy to compare

Introduction: Metrics Complexity Page 69


VMware vSphere Metrics May 2023

across processors as there can be imbalance.


Note this is a rate counter, not a count of the present queue. It tracks the speed per sec-
ond.
DPC Rate This is an input to the above, as the above is calculated as the delta of 2 rates, divided over
sampling period.
Interrupts/sec As above, but for interrupts.

We start with the contention type of metrics as that’s the primary metric for performance, followed by utilization
type of metrics.

CPU Run Queue


Number of threads in the processor queue. Unlike Linux, Windows excludes the threads that are running (being
executed).
Let’s take a VM configured with 8 vCPUs. The Guest OS sees 8 threads so it will schedule up to 8 parallel processes. If
there is more demand, it will have to queue them. This means the queue needs to be accounted for in Guest OS siz-
ing.
Because it reports the queue, this is the primary counter to measure Guest OS performance. It tells you if the CPU is
struggling to serve the demand or not.
What is a healthy value?
Windows Performance Monitor UI description is not consistent with MSDN documentation (based on Windows
Server 2016 documentation). The description shown in Windows UI is “Processor Queue Length is the number of
threads in the processor queue. Unlike the disk counters, this counter shows ready threads only, not threads that are
running. There is a single queue for processor time even on computers with multiple processors. Therefore, if a
computer has multiple processors, you need to divide this value by the number of processors servicing the workload.
A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent of the
workload.”
MSDN document states that a sustained processor queue of greater than 2 threads generally indicates processor
congestion. SQL Server document states 3 as the threshold. Let me know if you have seen other recommendation
from Microsoft or Linux.
Windows or Linux utilization may be 100%, but as long as the queue is low, the workload is running as fast as it can.
Adding more vCPU will in fact slow down the performance as you have higher chance of context switching.
There is a single queue for processor time even on computers with multiple processors. Therefore, if a computer has
multiple processors, you need to divide this value by the number of processors servicing the workload. That’s why
Tools reports the total count of the queues. This counter should play a role in the Guest OS CPU sizing.
You should profile your environment, because the number can be high for some VMs. Just look at the numbers I got
below, where some VMs have well over 10 queues per vCPU. Share the finding with the VM Owner, as the
remediation to reduce the queue could mean changing the application settings.

Introduction: Metrics Complexity Page 70


VMware vSphere Metrics May 2023

Based on the overall guidance of 3 queue per vCPU, the first 2 VM shows a high value. Both VM are only 4 vCPU, so
we expect the queue value to be less than 20, preferably less than 10.
The first VM shows a sustained value as it’s still relatively high at worst 5th percentile. Let’s drill down to see the first
VM.

The CPU Run Queue spikes multiple times. It does not match the CPU Usage and CPU Context Switch Rate in pattern.
I’m unsure how to explain this so if you know drop me a note. I notice the data collection is erratic though, so let’s
look at another VM.

Introduction: Metrics Complexity Page 71


VMware vSphere Metrics May 2023

The following is a 2 vCPU VM running Photon OS. CPU Queue is high, even though Photon is only running at 50%.
Could it be that the application is configured with too many threads that the CPU is busy doing context switching?
Notice the CPU Queue maps the CPU Context Switch Rate and CPU Run. In this situation, you should bring it up to
the application team attention, as it may cause performance problem and the solution is to look inside. As a proof
that it’s not because of underlying contention, I added CPU Ready.

This property displays the last observed value only; it is not an average. Windows & Linux do not provide the highest
and lowest variants either.
The counter name in Tools is guest.processor.queue. It is based on Win32_PerfFormattedData_PerfOS_System =
@#ProcessorQueueLength from WMI
Reference: Windows
I can’t find documentation that states if CPU Hyper Threading (HT) technology provides 2x the number of queue
length. Logically it should as the threads are at the start of the CPU pipelines, and both threads are interspersed in
the core pipeline.

CPU Context Switch


CPU Context Switch costs performance “due to running the task scheduler, TLB flushes, and indirectly due to sharing
the CPU cache between multiple tasks”. It’s important to track this counter and at least know what’s an acceptable
behaviour for that specific application.

Introduction: Metrics Complexity Page 72


VMware vSphere Metrics May 2023

Context switches are considered “expensive” operations, as the CPU can complete many instructions within the time
taken to switch context from one process to another. If you are interested, I recommend reading this paper.
Based on Windows 10 Performance Monitor documentation, context switches/sec is the combined rate at which all
processors on the computer are switched from one thread to another. All else being equal, the more the processors,
the higher the context switch. Note that thread switches can occur either inside of a single multi-thread process or
across processes. A thread switch can be caused either by one thread asking another for information, or by a thread
being pre-empted by another, higher priority thread becoming ready to run.
There are context switch metrics on the System and Thread objects.
The rate of Windows or Linux switching CPU context per second ranges widely. The following is taken from a
Windows 10 desktop with 8 physical threads, which runs around 10% CPU. I observe the value hovers from 10K to
50K.

The value should correlate with CPU “utilization”, since in theory the higher the utilization the higher the chance of
CPU context switch. The following chart shows a near perfect corelation. Every time CPU Usage went up, CPU
Context Switch also.

Introduction: Metrics Complexity Page 73


VMware vSphere Metrics May 2023

CPU context switch can happen even in a single thread application. The following shows a VDI VM with 4 vCPU. I
plotted the CPU Usage Disparity vs CPU Context Switch. You can see the usage disparity went up to 78%, meaning
the gap between the busiest vCPU and the most idle vCPU is 78%. This was running a security agent, which is unlikely
to be designed to occupy multiple vCPU.

Let’s plot the context switch at the same period. There is a spike at the same time, indicating that the agent was busy
context switching. Note that it does not always have to be this way. The red dot shows there is no spike in context
switch even though the vCPU Usage Disparity went up.

The values of CPU Context Switch vary widely. It can go well beyond 0.5 million, as shown in the following table,
hence it’s important to profile and establish a normal base line for that specific application. What is healthy for 1 VM
may not be healthy for another.

You can see from the table that some VM experience prolonged CPU context switch, while others do not. The VM #4
only has a short burst as the value at worst 5th percentile dropped to 3796. Momentary peak of context switch may
not cause performance problem so in general it’s wiser to take the value somewhere between 95 th and 99th
percentile.
Let’s drill down to see the first VM. This CentOS VM sporting only 4 vCPU constantly hit almost 1 million context
switch. The pattern match CPU Usage.

Introduction: Metrics Complexity Page 74


VMware vSphere Metrics May 2023

On the other hand, majority of Guest OS spends well below 10K. I profiled around 2200 production VMs and here is
the distribution of their CPU Context Switch. You can see that the values between 0 – 12000 accounts for 80%.

In your environment, you can profile it further. In the following example, I adjusted the bucket threshold by grouping
all the values above 10K as one bucket, and splitting 0 – 10K bucket into multiple buckets. You can see more than
half has less than 1K CPU Context Switch Rate.

Introduction: Metrics Complexity Page 75


VMware vSphere Metrics May 2023

Thread Ping Pong


The following is a Windows Server 2019 DC edition VM with 10 vCPU. It’s basically idle, as you can see below.

But if we zoom into each vCPU, they are taking turn to be busy.
In the span of just 1 hour, the 10 vCPU inside Windows take turn.

This is a bit illogical. Is this a process ping pong?


We can see them clearer if we stack them up. Notice they take turn, except the 3 rd one from the top (I drew a green
line on it). That one is actually fairly stable.

Introduction: Metrics Complexity Page 76


VMware vSphere Metrics May 2023

It is running Horizon Connection Server. It has around 118 – 125 processes, but much higher threads.

CPU Run Queue is very low, which is expected.


Context switches is fairly steady. This is expected as it consistently run >2K threads on >100 processes on just 10
CPU.

Introduction: Metrics Complexity Page 77


VMware vSphere Metrics May 2023

CPU Usage
CPU Usage in Windows is not aware of the underlying hypervisor hyper-threading. When Windows run a CPU at
100% flat, that CPU could be competing with another physical thread at ESXi level. In that case, what do you expect
the value of VM CPU Usage will be, all else being equal?
62.5%.
Because that’s the hyper-threading effect.
What about VM CPU Demand? It will show 100% .
However, CPU Usage is affected by power management. Windows 8 and later will report CPU usage >100% in Task
Manager and Performance Monitor when the CPU Frequency is higher than nominal speed. The reason for the
change is the same with what we have covered so far, which is the need to distinguish amount of work being done.
More here.

What happens to CPU Usage when VM is experiencing contention? VM Contention = Ready, Co-Stop, Overlap, Other
Wait.

Introduction: Metrics Complexity Page 78


VMware vSphere Metrics May 2023

Time basically stops. So there is a gap in the system time of Windows. How does it deal with the gap? Does it ignore
the gap, or artificially fills it with best guess values? I’m not sure. If you do let me know.
The above nature of CPU Usage brings an interesting question. Which VM counters can be used when you have no
visibility into the Guest? Let’s do a comparison:

Metric Frequency Scaling Hyperthreading VM Contention


Guest CPU Usage Yes No No
VM CPU Run No Yes No
VM CPU Usage Yes Yes No
VM CPU Demand Yes No Yes

If there is slowness but utilization is low, it’s worth checking if the utilization is coming from lower power state. This
is important for application that requires high frequency (as opposed to just lots of light threads).
Windows provides the time the CPU spent on C1, C2 and C3 state. The following is taken from my laptop. Notice a
dip when the total of C1 + C2 + C3 < 100%. That’s basically the time on C0.

The Idle loop is typically executed on C3. Try plotting the Idle Time (%) and C3 Time (%), and they will be similar.

Introduction: Metrics Complexity Page 79


VMware vSphere Metrics May 2023

OS vs Process
CPU imbalance can happen in large VM.
Review the following chart carefully. It’s my physical desktop running Windows 10. The CPU has 1 socket 4 cores 8
threads, so Windows see 8 logical processors. You can see that Microsoft Word is not responding as its window is
greyed out. The Task Manager confirms that by showing that none of the 3 documents are responding. Word is also
consuming a very high power, as shown in the power usage column.
It became unresponsive because I turned on change tracking on a 500 page document and deleted hundreds of
pages. It had to do a lot of processing and it did not like that. Unfortunately I wasn’t able to reproduce the issue after
that.
At the operating system, Windows is responding well. I was able to close all other applications, and launched Task
Manager and Snip programs. I suspect because Word does not consume all CPUs. So if we track at Windows level,
we would not be aware that there is a problem. This is why process-level monitoring is important if you want to
monitor the application. Specific to hang state, we should monitor the state and not simply the CPU consumption.
From the Windows task bar, other than Microsoft Word and Task Manager, there is no other applications running.
Can you guess why the CPU utilization at Windows level is higher than the sum of its processes? Why Windows show
57% while Word shows 18.9%?

My guess is Turbo Boost. The CPU counter at individual process level does not account for it, while the counter at OS
level does.
I left it for 15 minutes and nothing change. So it wasn’t that it needed more time to process the changes. I suspect it
encountered a CPU lock, so the CPU where Word is running is running at 100%. Since Windows overall only reports
57%, it’s important to track the peak among Windows CPU.

Introduction: Metrics Complexity Page 80


VMware vSphere Metrics May 2023

Memory
Windows memory management is not something that is well documented. Ed Bott sums it this article by saying
“Windows memory management is rocket science”. Like what Ed has experienced, there is conflicting information,
including the ones from Microsoft. Mark Russinovich, cofounder of Winternals software, explains the situation in this
TechNet post.
Windows Performance Monitor provides many metrics, some are shown below.

Let’s start with the main metrics. In formula, here is their definition:
 Cached = Standby + Modified
 Available = Standby + Free
Available means exactly what the word means. It is the amount of physical memory immediately available for use.
Immediately means Windows does not need to copy the existing page before it can be reused.
They are also available in Bytes and Kbytes.
It is easier to visualize it, so here it is:

Introduction: Metrics Complexity Page 81


VMware vSphere Metrics May 2023

A popular tool for Windows monitoring is SysInternal. In addition to the above, it shows Transition and Zeroed.

In Use
This is the main counter used by Windows, as it’s featured prominently in Task Manager.

Introduction: Metrics Complexity Page 82


VMware vSphere Metrics May 2023

This is often thought as the minimum that Windows needs to operate. This is not true. If you notice on the preceding
screenshot, it has compressed 457 MB of the 6.8 GB In Use pages, indicating they are not actively used. Windows
compresses its in-use RAM, even though it has plenty of Free RAM available (8.9 GB available). This is a different
behaviour to ESXi, which do not compress unless it’s running low on Free.
Look at the chart of Memory Usage above. It’s sustaining for the entire 60 seconds. We know this as the amount is
too high to sustain for 60 seconds if they are truly active, let alone for hours.
Formula:
In use = Total – (Modified + Standby + Free)

A problem related to the In Use counter is memory leak. Essentially, the application or process does not release
pages that it no longer needs, so over time it accumulates. This is hard to detect as the amount varies by application.
The process will continue growing until the OS runs out of memory.

Modified
Page that was modified but no longer used, hence it’s available for other usage but requires to be saved to disk first.
It’s not counted as part of Available, but counted as part of Cache.
OS does not immediately write all inactive pages to disk, especially if the disk is on power saving mode. It will
consolidate these pages and write them in one shot, minimizing IO to the disk. In the case, of SSD disk, it can shorten
the life span as SSD has physical limits on the number of writes.

Introduction: Metrics Complexity Page 83


VMware vSphere Metrics May 2023

Standby
Windows has 3 levels of standby. As reported by VMware Tools, their names are:
 Standby Core
 Standby Normal
 Standby Reserve
Different applications use the memory differently, resulting in different behaviour of the metrics. As a result,
determining what Windows actually uses is difficult.
The Standby Normal counter can be fluctuating wildly, resulting in a wide difference if it’s included in rightsizing. The
following VM is a Microsoft Exchange 2013 server mailbox utility.

Notice the Standby Normal fluctuates wildly, reaching as high at 90%. The other 2 cache remains constantly
negligible. The chart above is based on >26000 samples, so there is plenty of chance for each 3 metrics to fluctuate.
Now let’s look at another example. This is a Windows Server 2016. I think it was running Business Intelligence
software Tableau.

Notice the VM usable memory was increased 2x in the last 3 months. Standby Normal hardly move, but Standby
Reserve took advantage of the increments. It simply went up accordingly, although again it’s fluctuating wildly.

Cache
Cache is an integral part of memory management, as the more you cache, the lower your chance of hitting a cache
miss. This makes sense. RAM is much faster than Disk, so if you have it, why not use it? Remember when Windows
XP introduced pre-fetch, and subsequently Windows SuperFetch? It’s a clue that memory management is a complex
topic. There are many techniques involved. Unfortunately, this is simplified in the UI. All you see is something like
this:

Introduction: Metrics Complexity Page 84


VMware vSphere Metrics May 2023

Free
As the name implies, this is a block of pages that is immediately available for usage. This excludes the cached
memory. A low free memory does not mean a problem if the Standby value is high. This number can reach below
100 MB, and even touch 0 MB momentarily. It’s fine so long there is plenty of cache. I’d generally keep this number >
500 MB for server VM and >100 MB for VDI VM. I set a lower number for VDI because they add up. If you have 10K
users, that’s 1 TB of RAM.
When Windows or Linux frees up a memory page, it normally just updates its list of free memory; it does not release
it. This list is not exposed to the hypervisor, and so the physical page remains claimed by the VM. This is why the
Consumed counter in vCenter remains high when the Active counter has long dropped. Because the hypervisor has
no visibility into the Guest OS, you may need to deploy an agent to get visibility into your application. You should
monitor both at the Guest OS level (for example, Windows and Red Hat) and at the application level (for example,
MS SQL Server and Oracle). Check whether there is excessive paging or the Guest OS experiences a hard page fault.
For Windows, you can use tools such as pfmon, a page fault monitor.
This is one the 3 major metrics for capacity monitoring. The other 2 metrics are Page-in Rate and Commit Ratio.
These 3 are not contention metrics, they are utilization metrics. Bad values can contribute to bad performance, but
they can’t measure the severity of the performance. Windows and Linux do not have a counter that measures how
long or how often a CPU waits for memory.

Introduction: Metrics Complexity Page 85


VMware vSphere Metrics May 2023

In Windows, this is the Free Memory counter. This excludes the cached memory. If this number drops to a low
number, Windows is running out of Free RAM. While that number varies per application and use case, generally
keep this number > 500 MB for server VM and >100 MB for VDI VM. The reason you should set a lower number for
VDI because they add up quickly. If you have 10K users, that’s 1 TB of RAM.
It’s okay for this counter to be low, so long other memory metrics are fine. The following table shows VMs with near
0 free memory. Notice none of them are needing more memory. This is the perfect situation as there is no wastage.

Page File
Memory paging is an integral part of Guest OS Memory Management. OS begins using it even though it still has
plenty of physical memory. It uses both physical memory and virtual memory at the same time. Microsoft
recommends that you do not delete or disable the page file. See this for reference.

As shown on the diagram, processes see virtual memory, not physical memory. Guest OS presents this as system API
to processes. The virtual memory is backed by the page file and physical memory. Guest OS shields the physical
memory and hardware. Paging is an operation of reading/writing from the page file into the physical memory, not
from physical disk into the page file.

Introduction: Metrics Complexity Page 86


VMware vSphere Metrics May 2023

Let Windows manages the pagefile size. This is the default setting, so you likely have it already. By default, windows
sets the pagefile size to the same size with the physical memory. So if the VM has 8 GB of RAM, the pagefile is an 8
GB file. Anything above 8 GB indicates that Windows is under memory pressure.
The VM metric Guest \ Swap Space Remaining tracks the amount of swap space that's free.
The size of Page File is not a perfect indicator of the RAM usage, because they contain pages that are never de-
manded by the application. Windows does SuperFetch, where it predicts what pages will be used and prefetch them
in advance. Some of these pages are never demanded by the application. Couple with the nature that Guest OS
treats RAM as cache, including the page file will result in oversized recommendation. Paging rate is more realistic as
it only considers the recent time period (300 seconds in vRealize Operations case)
A page would be used as cache if it was paged out at some point due to memory pressure and it hasn’t been needed
since. The OS will reuse that page as cache. That means that at some point the OS was constrained on memory
enough to force the page out to happen.
A page that was paged out earlier, has to be brought back first before it can be used. This creates performance issue
as the application is waiting longer, as disk is much slower than RAM.
There are 2 types of page operations:
 Page In. This is a potential indicator for performance.
 Page-out. This is a potential indicator for capacity.
While Paging impacts performance, the correlation between the paging metrics and performance varies per
application. You can’t set a threshold and use it to monitor different applications or VM. The reason is paging is not
always used when Guest OS runs out of memory. There are a few reasons why paging may not correlate to memory
performance:
 Application binary. The initial loading causes a page-in. Nobody will feel the performance impact as it’s not
even serving anyone.
 Memory mapped files. This is essentially a file that has a mapping to memory. Processes use this to exchange
data. It also allows the process to access a very large file (think of database) without having to load the
entire database into memory.
 Proactive pre-fetch. It predicts the usage of memory and pre-emptively reads the page and bring it in. This is
no different to disk where the storage array will read subsequent blocks even though it’s not being asked.
This especially happens when a large application starts. Page-in will go up even though there is no memory
pressure (page out is low or 0).
 Windows performs memory capacity optimization in the background. It will move idle processes out into the
page file.
If you see both Page-in and Page-out having high value, and the disk queue is also high, there is a good chance it’s
memory performance issue.
The rate pages that are being brought in and out can reveal memory performance abnormalities. A sudden change,
or one that has sustained over time, can indicate page faults. Page faults indicate pages aren’t readily available and
must be brought in. If a page fault occurs too frequently it can impact application performance. While there is no
concrete guidance, as it varies by application, you can judge by comparing to its past behaviour and its absolute
amount.
Operating Systems typically use 4KB or 2MB page sizes. Larger page size will result in more cache, which translates
into more memory required.
The counter %pagefile tracks how much of the pagefile is used, meaning the value 100% indicate the pagefile is fully
utilized. While the lower the number the better, there is no universal guidance. If you know, let me know!

Introduction: Metrics Complexity Page 87


VMware vSphere Metrics May 2023

Reference: this is an old article as it covers 32 bit Windows. If you find a newer one, kindly let me know.

Guest OS Paging metrics


There are 2 metrics. Page-in and Page-out.
The unit is in number of pages, not MB. It's not possible to convert due to mix use of Large Page (2 MB) and Page (4
KB). A process can have concurrent mixed usage of large and non-large page in Windows. The page size isn’t a
system-wide setting that all processes use.
The page-in rate metric tracks the rate OS brings memory back from disk to DIMM per second. Another word, the
rate of reads going through paging/cache system. It includes not just swap file I/O, but cacheable reads as well (so
it’s double pages/s).
Page Out is the opposite of the above process. It is not as important as Page In. Just because a block of memory is
moved to disk that does not mean the application experiences memory problem. In many cases, the page that was
moved out is the idle page. Windows does not page out any Large Pages.
The following shows the page out value at 99th percentile in the last 4 months. What do you observe?

There are 3325 VM in the above chart. In the last 4 months, 97% of them have page-out rate of less than 32000
pages, on a 5-minute average basis.
How about the remaining 3%?
Surprisingly, a few of them can be well 500000, indicating there is a wide range. So majority of VMs do not page out,
but those that do, they do it excessively.
The block size is likely 4 KB. Some applications like Java and databases use 2 MB pages. Using 8 KB as the average,
10000 pages per second sustained over 5 minutes means 80000 KB x 300 = 24 GB worth of data.
You can profile your environment to see which VMs are experiencing high paging. Create a view with the following 6
columns
 Highest Page-In. Color code it with 1000, 10000, and 100000 as the thresholds. That means red is 10x
orange, which in turn is 10x yellow.
 Page-In value at 99th percentile. Same threshold as above.
 Highest Page-Out. Same threshold as above.
 Page-Out value at 99th percentile. Same threshold as above.
 Sum of Page-In

Introduction: Metrics Complexity Page 88


VMware vSphere Metrics May 2023

 Sum of Page-Out
Set the dates to the period you are interested, but make it at least 1 week, preferably 3 months. There 2016 data
points in a week, so the 99th percentile ignores the highest 20 datapoints.
In the following example, I used 4 months. I listed the top VMs in terms, sorted by the highest page-in. What
observation do you see?

For a start, some of those numbers are really high!


They are above 1 millions. Assuming 8K block size, that’s 8 GB per second, sustained for 300 seconds.
What else do you notice?
Page-In is higher than Page-Out. I average all the 3K VMs and I got the following result:

Page-In is 4x higher in the max value. Page-In also sustains longer, while Page-Out drops significantly. At the 99 th
percentile mark, Page-In is 9x higher. I suspect it is the non-modifiable page, like binary. Since it cannot be modified,
it does not need to be paged out. It can simply be discarded and retrieved again from disk if required.
The good news is both do not sustain, so the paging is momentary. The following shows that the value at 99 th
percentile can drop well below 5x.

Introduction: Metrics Complexity Page 89


VMware vSphere Metrics May 2023

To confirm the above, I downloaded the data so I can determine if the paging is indeed momentarily. Using a
spreadsheet, I build a ratio between the 99th percentile value and the maximum value, where 10% means there is a
drop of 10x. I plotted around 1000 value and got the following.

As you can see, majority of the paging drops drastically at 99th percentile.
Let’s dive into a single VM, so we can see pattern over time. I pick a database, as it does heavy paging. The following
is a large Oracle RAC VM. Notice this has a closer ratio between page in and page out, and there is correlation
between the two.

Introduction: Metrics Complexity Page 90


VMware vSphere Metrics May 2023

Assuming the page size is 4 KB, that means 100,000 pages = 400 MB/sec. Since vRealize Operations averages the
value over 300 seconds, that means 400 MB x 300 = 120 GB worth of paging in 5 minutes!

Active File Cache Memory


This is the actively in-use subset of the file cache. Unused file cache and non-file backed anonymous buffers (mallocs
etc) are not included.
This is the size of the portion of the system file cache which is currently resident and active in physical memory. The
System Cache Resident Bytes and Memory \ Cache Bytes metrics are equivalent. Note that this counter displays the
last observed value only; it is not an average during the collection period.
In Linux, this is the amount of file cache memory, in kibibytes, that is in active use, or was in active use since the last
time the system reclaimed memory. This is retrieved via the command:
$ cat /proc/meminfo | grep Active
Active: 50955636 kB
Active (anon): 30148196 kB
Active (file): 20807440 kB

For further reading, refer to Windows

Committed
Commit sounds like a guaranteed reservation, which means it’s the minimum the process can get.
This tracks the currently committed virtual memory, although not all of them are written to the pagefile yet. It
measures the demand, so commit can go up without In Use going up, as Brandon Paddock shares here. If Committed
exceeds the available memory, paging activity will increase. This can impact performance.
Commit Limit: Commit Limit is physical RAM + size of the page file. Since the pagefile is normally configured to map
the physical RAM, the Commit Limit tends to be 2x. Commit Limit is important as a growing value is an early warning
sign. The reason is Windows proactively increases its pagefile.sys if it’s under memory pressure.

Introduction: Metrics Complexity Page 91


VMware vSphere Metrics May 2023

The pagefile is an integral part of Windows total memory, as explained by Mark Russinovich explains here. There is
Reserved Memory, and then there is Committed Memory. Some applications like to have its committed memory in 1
long contiguous block, so it reserves a large chunk up front. Databases and JVM belong in this category. This
reserved memory does not actually store meaningful application data or executable. Only when the application
commits the page that it becomes used. Mark explains that “when a process commits a region of virtual memory,
the OS guarantees that it can maintain all the data the process stores in the memory either in physical memory or on
disk”.
Notice the word on disk. Yes, that’s where the pagefile.sys comes in. Windows will use either the physical memory
or the pagefile.sys.
So how do we track this committed memory?
The metric you need to track is the Committed Byte. The % Committed metric should not hit 80%. Performance
drops when it hits 90%, as if this is a hard threshold used by Windows. We disabled the pagefile to verify the impact
on Windows. We noticed a visibly slower performance even though Windows 7 showing >1 GB of Free memory. In
fact, Windows gave error message, and some applications crashed. If you use a pagefile, you will not hit this limit.
We have covered Free Memory and Committed Memory. Do they always move in tandem? If a memory is
committed by Windows, does it mean it’s no longer free and available?
The answer is no. Brandon Paddock demonstrated here that you can increase the committed page without
increasing the memory usage. He wrote a small program and explained how it’s done. The result is Windows
committed page is double that of memory usage. The Free Memory & Cached Memory did not change.

Guest OS Needed memory


We shared earlier that the purpose of memory is to act as disk cache. So you want to utilize all the cache given to
you. Because the static nature of memory consumption, you can create a heat map that plots all your VMs memory
consumption. You want it near 100% while making sure the page in and page out rate within normal expectation.

This is not a raw counter from Windows or Linux. This is a derived counter provided by VMware Tools to estimate
the memory needed to run with minimum swapping. It’s a more conservative estimate as it includes some of the
cache.
The counter Needed memory tracks the amount of memory needed by the Guest OS. It has 5% buffer for spike,
based on the general guidance from Microsoft. Below this amount, the Guest OS may swap.
Formula for Linux = physical memory – Maximum of (0, ( Available - 5 % of physical ))

Introduction: Metrics Complexity Page 92


VMware vSphere Metrics May 2023

Formula for Windows = physical memory - Maximum of (0, ( Unneeded - 5 % of physical ))


where Unneeded = Free + Reserve Cache + Normal Priority Cache

Example: the VM has 10 GB of RAM. So the Physical RAM = 10 GB


So 5% of physical = 0.5 GB
Situation 1: max memory utilization.
Memory Available = 0 GB.
Tools will calculate Needed memory as
= 10 GB - Maximum (0, 0 – 0.5)
= 10 - Maximum (0, -0.5)
= 10 - 0 GB
= 10 GB
Needed memory is the same as it’s already maxed.
Situation 2: high memory utilization.
Memory Available = 2 GB.
Tools will calculate Needed memory as
= 10 GB - Maximum (0, 2 – 0.5)
= 10 - Maximum (0, 1.5 GB)
= 10 - 1.5 GB
= 8.5 GB
You actually still have 2 GB here. But Tools adds around 5%
Situation 3: low memory utilization.
Memory Available = 8 GB.
Tools will calculate Needed memory as
= 10 GB - Maximum (0, 8 – 0.5)
= 10 - Maximum (0, 7.5 GB)
= 10 - 7.5 GB
= 2.5 GB
Again, Tools adds around 5%.
We’ve covered that you need to look at more than 1 metric before you decide to add more memory. I’m afraid it is
case by case, as shown in the following table. All these VMs are low on free memory, but other than VM on row no
3, the rest has sufficient memory.

Introduction: Metrics Complexity Page 93


VMware vSphere Metrics May 2023

Storage
This is the layer that application team care as it is what is presented to them.

Questions Description
Configuration For each partition, need to know name, filesystem type (e.g. NTFS, ext4), network or local,
block size.
Ideally, we get the mapping between partition and virtual disk.
Capacity For each partition, need to know the configured space and used space. For free space, we
need to know both in absolute (GB) and relative (%).
Need to alert before running out of disk space, else the OS crashes.
We should not include the networked drive in Guest OS capacity, because the networked
drive is typically shared by many. An exception is in VDI use case, where the user personal
files is stored on the network.
Reclamation This can be determined from the free space. Reclamation is tricky as it needs to shrink parti-
tion.
Performance Queue, Latency (read and write), IOPS, Throughput

Introduction: Metrics Complexity Page 94


VMware vSphere Metrics May 2023

Disk Metrics
You will find the disk metrics in Performance Monitor under Physical Disk. It’s interesting it’s still called Physical.
Does it mean it’s unaware it’s actually virtual?

Source is here.
Current Disk Queue Length This is the primary counter for performance, hence I show it first. It is covered
in-depth after this summary.
Avg. Disk Queue Length
Avg. Disk Write Queue Length
Avg. Disk Bytes/transfer This is block size in bytes. It should be 4KB or 2 MB.
Avg. Disk Bytes/read
Avg. Disk Bytes/write
Avg. Disk Sec/transfer This is latency, but strangely shown in second instead of millisecond.
Avg. Disk Sec/read
Avg. Disk Sec/write
Disk Bytes/sec This is disk throughput in bytes. So you have the total throughput, read
throughput and write throughput.
Disk Read Bytes/sec
The word transfer here means being read from or written to the disk.
Disk Write Bytes/sec
Disk Transfers/sec This is IOPS
Disk Reads/sec

Introduction: Metrics Complexity Page 95


VMware vSphere Metrics May 2023

Disk Writes/sec
Free (Megabytes)
Split IO/sec From the manual: Shows the rate at which that I/O requests to the disk were
split into multiple requests. A split I/O may result from requesting data in a
size that is too large to fit into a single I/O or that the disk is fragmented on
single-disk systems.

Disk Queue
This counter tracks the queue inside Linux or Windows storage subsystem. It’s not the queue at SCSI driver level,
such as LSI Logic or PVSCSI. If this is high then the IO from applications did not reach the underlying OS SCSI driver,
let alone the VM. If you are running VMware storage driver, such as PVSCSI, then discuss with VMware Support.

There are actually 2 metrics: One is a point in time and the other is average across the entire collection cycle. Point in
time means the snapshot at the collection period. For example, if the collection is every 5 minute, then it’s number
on the 300th second, not the average of 300 numbers.
Windows documentation said that “Multi-spindle disk devices can have multiple requests active at one time, but
other concurrent requests await service. Requests experience delays proportional to the length of the queue minus
the number of spindles on the disks. This difference should average < 2 for good performance.”

guest.disk.queue Win32_PerfFormattedData_PerfDisk_PhysicalDisk.Name = \"_Total\"#CurrentDiskQueue-


Length" from WMI
guest.disk.queueAv Win32_PerfFormattedData_PerfDisk_PhysicalDisk.Name = \"_Total\"#AvgDiskQueue-
g Length" from WMI

Introduction: Metrics Complexity Page 96


VMware vSphere Metrics May 2023

High disk queue in the guest OS, accompanied by low IOPS at the VM, can indicate that the IO commands are stuck
waiting on processing by the OS. There is no concrete guidance regarding these IO commands threshold as it varies
for different applications. You should view this in relation to the Outstanding Disk IO at the VM layer.
Based on 3000 production VMs in the last 3 months, the value turn out to be sizeable. Almost 70% of the value is
below 10. Around 10% is more than 100 though, which I thought it’s rather high.

Strangely, there are values that seem to off the chart. I notice this in a few metrics already, including this. Look at the
values below. Do they look like a bug in the counter, or severe performance problem?

Unfortunately, we can’t confirm as we do not have latency counter at Guest OS level, or even better, as application
level. I am unsure if the queue is above the latency, meaning the latency counter does not start counting until the IO
command is executed.
I plot the values at VM level, which unsurprisingly does not correlate. The VM is tracking IO that has been sent, while
Guest OS Disk Queue tracks the one that has not been sent.

Introduction: Metrics Complexity Page 97


VMware vSphere Metrics May 2023

The preceding line chart also reveals an interesting pattern, which is disk queue only happens rarely. It’s far less
frequent than latency.
Let’s find out more. From the following heat map, you can see there are occurrences where the value is >100.

However, when we compare between current value and maximum value, the value can be drastically different.

Introduction: Metrics Complexity Page 98


VMware vSphere Metrics May 2023

Let’s take one of the VMs and drill down. This VM has regular spikes, with the last one exceeding 1000.

Their values should correlate with disk outstanding IO. However, the values are all low. That means the queue
happens inside the Guest OS. The IO is not sent down to the VM.

Which in turn should have some correlation with IOPS, especially if the underlying storage in the Guest OS (not VM)
is unable to cope. The queue is caused by high IOPS which cannot be processed.

Finally, it would manifest in latency. Can you explain why the latency is actually still good?

It’s because that’s from the IO that reaches the hypervisor. The IO that was stuck inside Windows is not included
here.

Introduction: Metrics Complexity Page 99


VMware vSphere Metrics May 2023

The application feels latency is high, but the VM does not show it as the IO is stuck in between.
Can the disk queue be constantly above 100?
The following VM shows 2 counters. The 20-second Peak metric is showing ~200 – 250 queue, while the 5-minute
average shows above 125 constantly. The first counter is much more volatile, indicating the queue did not sustain.

Disk Space
Guest OS partition and virtual disks have M:N relationship. The following Windows 11 screenshot shows an example
where multiple partitions share a virtual disk.

You can also make a single partition spans across multiple virtual disks.

Introduction: Metrics Complexity Page 100


VMware vSphere Metrics May 2023

Guest OS also does not have to allocate all the space in a virtual disk. For those space allocated, it does not have to
make it visible to users.
vRealize Operations show the partitions as instanced metrics, under Guest File System metric group

For each partition, you get the configured capacity, used capacity (in relative) and used capacity (in absolute). You
also get the overall number, which is handy for trim/unmap calculation.
You can see if any of the Windows drives or Linux partitions are running out of storage. You can also compute the
potential savings from trim/unmap.
Note that only local disk device partitions are shown. Network mounted filesystems or drives require the Telegraf
agent.

Network
Understanding network counter at Guest OS level is important as the data inside the guest relates better to the
application.
Windows provides visibility at multiple levels. It provides a good set of metrics at each of these levels:
 adapter
 interface
 process (only in Task Manager. No network metric at Performance Monitor at process level)
 TCP and UDP connection
The following screenshot from Performance Monitor shows some of the metrics at Network Adapter and Network
Interface

Introduction: Metrics Complexity Page 101


VMware vSphere Metrics May 2023

Let’s look at the metrics at the adapter level, as that’s the one closest to the VM level metric.

Contention Metrics
Packets Received Errors Expect these to be 0 at all times?
Packets Outbound Errors While error packets are obviously discarded, I think the value is not included
in the discarded metric below.
Packets Received Discarded The packet is not an error packet but it is dropped, typically due to buffer
overflow.
Packets Outbound Discarded
The sum of all 4 metrics should be 0 at all times.
Output Queue Length It measures the length of the output packet queue, in packets.
Windows manual states that saturation exists if this value is >2. If you know
why the queue length is low, let me know. However, the value is always 0 in
Windows 10 and 11 “since the requests are queued by the Network Driver In-
terface Specification (NDIS) in this implementation”
Packets Received Unknown Interesting to see that Windows also has unknown protocol packet. ESXi
VMkernel, being an OS, also has this unknown packet.
Take note that there is no unknown packet sent, because all packets sent are
of known type.

Consumption Metrics
Let’s start with the basic or common metrics first

Introduction: Metrics Complexity Page 102


VMware vSphere Metrics May 2023

Current Bandwidth This is interesting as Windows tries to determine the actual bandwidth, which
is typically lower than the configured bandwidth.
% Usage
% Usage Peak
Bytes Sent/sec It measures the rate at which bytes are sent and received over each network
adapter, including framing characters.
Bytes Received/sec
Windows manual in this link states that the network is saturated if >70% uti-
Bytes Total/sec lization. I think this is on the low side, and I’d like to see proof of saturation
(e.g. dropped packet, retransmit)
Packets Sent/sec This can be an important counter as typically there is a limit in number of
packet that can be processed per second.
Packets Received/sec
Packets/sec

Introduction: Metrics Complexity Page 103


VMware vSphere Metrics May 2023

VM CPU

Take note that some metrics are for VMkernel internal consumption, and not for vSphere administrators. Just
because they are available in the UI and have names that sound useful do not mean it’s for your operations. Their
names are written from CPU scheduler viewpoint.
You get 6 metrics to track contention.

You get 9 metrics for consumption.

I group Wait metrics separately as it mixes both contention and consumption.

Introduction: Metrics Complexity Page 104


VMware vSphere Metrics May 2023

Contention Metrics
Let’s dive into each counter. As usual, we start with contention type of metrics, then utilization.

Ready
Ready tracks the time when a VM vCPU wants to run, but ESXi does not have a physical thread to run it. VMkernel
places the VM vCPU into Ready state. Ready also accounts when Limit is applied, as the impact to the vCPU is the
same (albeit for a different reason altogether). When a VM is unable to run due to Limit, it accumulates limbo time
when sitting in the limbo queue. Be careful when using a Resource Pool, as it can unintentionally cause limits.
Take note that Ready is unaware of contention due to hyperthreading. The vCPU is not placed in ready state because
both threads can execute at the same time. The contention for shared resources happens at low level hardware and
essentially transparent to ESXi scheduler. If you are concerned about this certain degradation in throughput when
two worlds execute at the same time on the same core, what counter should you use?
You’re right. It’s CPU Contention. Different purpose, different counter.
Take a look at the high spikes on CPU Ready value. It hits 40%!

Introduction: Metrics Complexity Page 105


VMware vSphere Metrics May 2023

Notice the overall pattern of the line chart correlates very well with CPU Usage and CPU Demand. The CPU Usage hit
3.95 GHz but the Demand shot to 6.6 GHz. This is a 4 vCPU VM running on a 2.7 GHz CPU, so its total capacity is
10.77 GHz. Why did Usage stop at 3.95 GHz?
What’s causing it?
If your guess is Limit you are right. This VM had a limit set at 4 GHz.
Ready also includes the CPU scheduling cost (normally completed in microseconds), hence the value is not a flat 0 on
idle VM. You will notice a very small number. Ready goes down when Guest OS is continually busy, versus when a
process keeps waking up and going to sleep, causing the total scheduling overhead to be higher. The following shows
Ready is below 0.2% on an idle VM (running at only 0.8%). Notice Co-stop is basically flat 0.

CPU Ready tends to be higher in larger VMs, because Ready tends to hit all vCPU at the same time. Instead of
thinking of CPU ready in 2D (as shown in the first chart below), think in 3D where each vCPU moves across time. The
2nd chart below shows how the 8 vCPUs move across time better.

Best Practice
I sample 3937 VMs from production environment. For each of them, I took the 20-second peak and not the 5-minute
peak.

Introduction: Metrics Complexity Page 106


VMware vSphere Metrics May 2023

Why do I take the 20-second?


Unless the performance issue is chronic, CPU Ready tends to last in seconds instead of minutes. The following is one
such example.

The following shows a different behaviour. Notice initially both metrics are bad, indicating severe CPU ready.
However, the gap is not even 2x. I think partly because the value is already very high. Going beyond 50% CPU Ready
when CPU Usage is high will result in poor performance. This VM has 16 vCPU.

Subsequently, the performance improved, and both values became very similar and remained in a healthy range.
I collected 4 months’ worth of data, so it’s around 35040 metrics per VM.
The following screenshot was my result. What do you expect to get in your environment?

Introduction: Metrics Complexity Page 107


VMware vSphere Metrics May 2023

The first column takes the highest value from ~35K data points. The table is sorted by this column, so you can see the
absolute worst from 35040 x 3937 = 137 million data points. Unsurprisingly, the number is bad. Going down the
table, it’s also not surprising as the worst 10 are bad.
But notice the average of these “worst metrics”. It’s just 0.97%, which is a great number!
The 2nd column complements the first one. I eliminate the worst 1% of the data, then took the highest. So I took out
350 datapoints. Since vRealize Operations collects every 5 minutes, that eliminates the worst 29 hours in 4 months.
As you can expect, for most VMs the values improve dramatically. The 2nd column is mostly green.

vCenter Metrics
There are 2 metrics provided: Ready (ms) and Readiness (%).
I plotted both of them. They show identical pattern. This is a 4 vCPU, hence the total is 80000 ms.

Introduction: Metrics Complexity Page 108


VMware vSphere Metrics May 2023

The Readiness (%) has been normalized, taking into account the number of vCPU. Notice 80000 ms matches with
100%. If it is not normalized, you will see 80000 as 400%.

Co-stop
Co-stop is a different state than Ready because the cause is different.
Co-stop only happens on Simultaneous Multi Processor (SMP) VMs. SMP means that the OS kernel executes parallel
threads. This means Co-stop does not apply to 1 vCPU VMs, as there is only 1 active process at any given time. It is
always 0 on single vCPU VM.
In a VM with multiple vCPUs, ESXi VMkernel is intelligent enough to run some of the VM vCPUs when it does not
have all physical threads to satisfy all the vCPU. At some point, it needs to stop the running vCPU, as it’s too far
ahead of its sibling vCPU (which it cannot serve, meaning they were in ready state). This prevents the Guest OS from
crashing. The Co-stop metrics track the time when the vCPU is paused due to this reason. This explains why Co-stop
tends to be higher on a VM with more vCPUs.

Introduction: Metrics Complexity Page 109


VMware vSphere Metrics May 2023

If only one or some vCPU are in ready state, then the remaining ones will soon be co-stopped, until all the vCPU are
co-started. The preceding diagram show vCPU 0 hit a ready state first. Subsequently, the remaining 7 vCPU hit a co-
stop.
Just like Ready, Co-stop happens at the vCPU and not the VM level.
One reason for Co-stop is snapshot. Refer to this KB article for details.
Guest OS is not aware of both Co-stop and Ready. The vCPU freezes. “What happens to you when time is frozen?”8 is
a great way to put it. As far as the Guest OS is concerned, time is frozen when it is not scheduled. Time jumps when
it’s scheduled again.
The time it spends under Co-stop or Ready should be included in the Guest OS CPU sizing formula as the vCPU wants
to run actually.
By the way, there is a performance improvement in the VMkernel scheduler in handling Co-stop in ESXi 7.0 Update
1. Prior to the improvement, the application performance dropped after 384 vCPU. If you have a monster VM with >
128 vCPU, let me know.

Best Practice
The value of Co-stop should be <0.5% in healthy situation. This is based on 63.9 million datapoints, as shown on the
following pie chart.

8
Asked to me by Valentin Bondzio in one of the VMworld where we got to meet. Those were the days!

Introduction: Metrics Complexity Page 110


VMware vSphere Metrics May 2023

Note that the value of Co-stop tends to be larger for large VM. Its value also tends to be smaller than Ready, as
shown below. Ready and Co-stop may or may not corelate with Usage. In the following chart you can see both the
correlation and lack of correlation.

Overlap
When ESXi is running a VM, this activity might get interrupted with IO processing (e.g. incoming network packets). If
there is no other available cores in ESXi, VMkernel has to schedule the work on a busy core. If that core happens to
be running VM, the work on that VM is interrupted. The counter Overlap accounts for this, hence it’s useful metric
just like Ready and Co-stop counter.
The interrupt is to run a system service, and it could be on behalf of the interrupted VM itself or other VM.
Notice the word system services, a process that is part of VMkernel. This means it is not for non-system services,
such as vCPU world. That’s why the value in general is lower than CPU Ready or even Co-Stop. The value is generally
affected by disk or network IO.
Some documentation in VMware may refer to Overlap as Stolen. Linux Guest OS tracks this as Stolen time.
When a vCPU in a VM was interrupted, the vCPU Run counter is unaware of this and continues tracking. To the Guest
OS, it experiences freeze. Time stops for this vCPU, as everything is paused. The clock on motherboard does not tick
for this vCPU. Used and Demand do account for this interruption, making them useful in accounting the actual
demand on the hypervisor. When the VM runs again, the Guest OS experiences a time jump.
Review the following charts. It shows CPU Usage, CPU Overlap and CPU Run. See the green highlights and yellow
highlights. What do you notice?

Introduction: Metrics Complexity Page 111


VMware vSphere Metrics May 2023

The above prove that Run is not aware of overlap. Notice when overlap went up, Run did not go lower. CPU Usage
however, did go down as it’s aware of overlap.
The correlation is not perfect as Usage is also aware of hyperthreading and CPU frequency.
The Overlap counter is useful to troubleshoot performance problem, complementing Ready, Co-stop, Other Wait
and Swap Wait. Ready does not include Overlap as the VM remains on the Run State (see the CPU State Diagram).
The unit is millisecond, and it’s the summation of the entire 20 seconds. vRealize Operations averages over 300
seconds. So the amount at 300 seconds is max 20000 (this is 100%), and must be multiplied by 15 if we want to see
the actual average in the 300 second period.
The amount is the sum of all vCPU, so you need to divide by the number of running vCPU if you are converting into a
percentage. Divide over 20000 ms x 100%. When I did that, and plot the highest 5 among ~3K production VMs, I get
this.

Overlap (ms) vCPU Overlap (%)


6,169 30 1.03%
284 2 0.71%
509 4 0.64%
484 4 0.61%

Introduction: Metrics Complexity Page 112


VMware vSphere Metrics May 2023

237 2 0.59%

The above indicates the VMs only experienced minimal interruption by VMkernel.
Let’s dive into a single VM. The following is a 68 vCPU VM running Splunk. In the last 7 days, it experienced a low but
sizeable CPU overlap. 10K is relatively low for a 68 vCPU VM, but it still represents half a vCPU worth of interruption.

Overlap should be included in Guest OS sizing as the Guest OS wants to run actually. The effect is the same with an
unmet Demand.
A high overlap indicates the ESXi host is doing heavy IO (Storage or Network). Look at your NSX Edge clusters, and
you will see the host has relatively higher Overlap value versus non IO-intensive VM.

Contention | Latency
This metric tracks the “stolen time”, which measures the CPU cycle that could have been given to the VM in ideal
scenario.
The metric is called Contention in vRealize Operations, but Latency in vCenter, which in turns maps to ESXi LAT_C
counter.
The diagram shows what it includes. LAT_C excludes Max Limited in Ready, but it includes Co-stop even if the Co-
stop was the result of Limit.
Notice that HT and CPU Frequency are effect and not metrics. You can see the impact of CPU Frequency in esxtop
%A/MPERF counter.

Introduction: Metrics Complexity Page 113


VMware vSphere Metrics May 2023

It measures the full possible contention a VM may have, that is not intentionally imposed on the VM by the vSphere
Administrator. It considers CPU SMT effect. In ESXi CPU accounting, Hyper Threading is recorded as giving 1.25x
throughput. That means when both threads are running, each thread is recorded as only getting 62.5%. This will
increase the CPU Contention to 37.5%. All else being equal, VM CPU Contention will be 37.5% when the other HT is
running. This is done so Used + Latency = 100%, as Used will report 62.5% when the vCPU has a competing thread
running.
In the above scenario, what’s the value of CPU Ready?
Yup, it’s 0%.
CPU Contention also accounts for power management. What happens to its value when frequency drops by 25%. It
can’t go to negative right? If you know the answer, let me know!
Because of these 2 factors, its value is more volatile, making it less suitable as a formal Performance SLA. Use CPU
Ready for Performance SLA, and CPU contention for performance troubleshooting. You can do a profiling of your
environment by calculating the value of CPU Ready at the time CPU Contention hits the highest, and vice versa. The
following table only shows 5 VM out of 2500 that I analyzed. These 2 metrics do not have good correlation, as they
are created for different purpose.

In many cases, the impact of both threads running is not felt by the application running on each thread. If you use
CPU Contention as formal SLA, you may be spending time troubleshooting when the business does not even notice
the performance degradation.
The following screenshot shows CPU Contention went down when both Ready and Co-stop went up.

Introduction: Metrics Complexity Page 114


VMware vSphere Metrics May 2023

How about another scenario, where Contention is near 0% but Ready is very high? Take a look at this web server.
Both CPU Demand and CPU Usage are similar identical. At around 1:40 am mark, both Demand and Usage showed
72.55%, Contention at 0.29%, but Ready at above 15%. What’s causing it?

The answer is Limit. Unlike CPU Ready, it does not account for Limit (Max Limited) because that’s an intentional
constraint placed upon the VM. The VM is not contending with other VMs. VMware Cloud Director sets limit on VM
so this counter will not be appropriate if you aim to track VM performance using Contention (%) metric.
Here is a clearer example showing contention consistently lower than Ready due to limit.

Introduction: Metrics Complexity Page 115


VMware vSphere Metrics May 2023

A better and more stable metric to track the contention that a VM experience is Ready + Co-stop + Overlap + VM
Wait + Swap Wait. Note that the raw metric for all these are millisecond, not GHz.
Where do you use CPU Contention then?
Performance troubleshooting for CPU-sensitive VM.
If the value is low, then you don’t need to check CPU Ready, Co-stop, Power Management and CPU overcommit. The
reason is they are all accounted for in CPU Contention.
If the value is high ( > 37.5%), then follow these steps:
 Check CPU Run Queue, CPU Context Switch, “Guest OS CPU Usage“, CPU Ready and CPU Co-stop. Ensure all
the CPU metrics are good. If they are all low, then it’s Frequency Scaling and HT. If they are not low, check
VM CPU Limit and CPU Share.
 Check ESXi power management. If they are set to Maximum correctly, then Frequency Scaling is out (you are
left with HT as the factor), else HT could be at play. A simple solution for applications who are sensitive to
frequency scaling is to set power management to max.
 Check CPU Overcommit at the time of issue. If there is more vCPU than pCore on that ESXi, then HT could be
impacting, else HT not impacting. IMHO, it is rare that an application does not tolerate HT as it’s transparent
to it. Simplistically speaking, while HT reduces the CPU time by 37.5%, a CPU that is 37.5% faster will logically
make up for it.
There is a corner case accounting issue in %LAT_C that was resolved in ESXi 6.79. VMs with Latency Sensitive = High
on ESXi 6.5 or older, will show any “guest idle” time of vCPUs as LAT_C, for those VMs the counter should not be
relied on. This is a corner case because majority of VM should not be set with this, as it impacts performance of
other VMs.

Latency Sensitivity
You can reduce the latency and jitter caused by virtualization by essentially “reserving” the physical resource to a
VM. In the vSphere Client UI, edit VM settings, and go to VM Options tab.
9
Both 6.5 and 6.7 have End of General Support on 15 October 2022 and End of Technical Guidance on 15 November 2023

Introduction: Metrics Complexity Page 116


VMware vSphere Metrics May 2023

Scroll down until you see this option.

What happens to the metrics when you set Latency Sensitivity = High?

CPU Impact
CPU is different. That “pipeline” has to be made available. In a sense, the CPU is scheduled 100% of the time. This
prevents any wakeup or scheduling latencies that result of having to schedule a vCPU when it wakes up in the first
place. Yes, the exclusive bit of exclusive affinity is literal.
Let’s see what it looks like in esxtop. I’ve removed unnecessary information so it’s easier to see. What do you notice?
GID NAME %USED %RUN %SYS %WAIT %IDLE
153670 vmx 0.03 0.03 0.00 100.00 0.00
153670 NetWorld-VM-2127520 0.00 0.00 0.00 100.00 0.00
153670 NUMASchedRemapEpochInitialize 0.00 0.00 0.00 100.00 0.00
153670 vmast.2127520 0.00 0.00 0.00 100.00 0.00
153670 vmx-vthread-212 0.00 0.00 0.00 100.00 0.00
153670 vmx-filtPoll:WindowsTest 0.00 0.00 0.00 100.00 0.00
153670 vmx-mks:WindowsTest 0.00 0.00 0.00 100.00 0.00
153670 vmx-svga:WindowsTest 0.00 0.00 0.00 100.00 0.00
153670 vmx-vcpu-0:WindowsTest 0.31 100.21 0.00 0.00 0.00
153670 vmx-vcpu-1:WindowsTest 0.16 100.21 0.00 0.00 0.00
153670 vmx-vcpu-2:WindowsTest 0.15 100.21 0.00 0.00 0.00
153670 vmx-vcpu-3:WindowsTest 0.15 100.21 0.00 0.00 0.00
153670 LSI-2127520:0 0.00 0.00 0.00 100.00 0.00
153670 vmx-vthread-212:WindowsTest 0.00 0.00 0.00 100.00 0.00

We can see Run shot up to 100%. This means Wait has to go down to 0%.
Strangely, Used remains low, so we can expect that Usage remains low too. This means the formula that connect
Run and Used do not apply in this extreme scenario. You’re basically cutting a physical core to the VM.
But what about Demand?

Introduction: Metrics Complexity Page 117


VMware vSphere Metrics May 2023

Demand shot up to 100% flat out.

So you have an interesting situation here. Demand is 100%, Usage is 0%, yet Contention is 0%.
Now let’s plot what happened to Wait and Idle. Notice both went from 100% to 0%.

Introduction: Metrics Complexity Page 118


VMware vSphere Metrics May 2023

So if you combine Run, Demand, Wait and Usage metrics, you can see basically Run and Demand shot up to 100% as
Wait drops to 0%, while Usage is oblivious to the change.

Introduction: Metrics Complexity Page 119


VMware vSphere Metrics May 2023

Just for documentation purpose, System and Ready are obviously not affected.

Memory Impact
Memory is fundamentally storage. So I do not expect any of the counters to go up. They will go up when the VM
actually needs them.

Introduction: Metrics Complexity Page 120


VMware vSphere Metrics May 2023

The above VM has 4 GB of RAM, fully reserved. But since it’s basically idle, there is no change on the counter.

Wait
CPU is the fastest component among infrastructure resources, so there are times it must wait for data. The data
comes from memory, disk or network.
There are also times when there is nothing to do, so the CPU is idle. Whether the upper-layer (Guest OS vCPU) is
truly idle or blocked by pending IO, the VMkernel does not have the visibility. It can only see that Windows or Linux
is not doing any work.
There are 3 sub-metrics that make up Wait.
 Idle. Waiting for work.
 Swap Wait. Waiting for memory.
 Other Wait. Waiting for other things.
Guest OS isn’t aware of both Other Wait and Swap Wait. Just like other type of contention, it experiences freeze. The
time it spends under Other Wait and Swap Wait should be included in the Guest OS CPU sizing formula as the VM
wants to run actually.
Idle counter tracks when VM is not running. Regardless of the reason in the upper-layer, VM Idle should not be
included in both VM sizing, and definitely not in Guest OS sizing. The reason is the vCPU is not running and you can’t
predict what the usage would be. You should address the IO and memory bottleneck in Guest OS level, using
Windows and Linus metrics.
Swap Wait tracks the time CPU is waiting for Memory page to come in from ESXi swap. This metric was superseded
by Memory Contention metric.

Introduction: Metrics Complexity Page 121


VMware vSphere Metrics May 2023

Other Wait tracks the time CPU is being blocked by other things, such as IO and vMotion. For example, the VMM
layer is trying to do something and it’s blocked. The number of reasons vary and it’s hard to pinpoint exactly which
one, as you need low level debug logs such as stats vmx, schedtraces, and custom vprobes. You’re better off
removing the common reasons. Snapshot is a common reason here10, that it was mistakenly named as IO Wait.

Other Wait
Take note of a known bug that wrongly inflates the value of Other Wait and esxtop %VMWait.
Actions you can do to reduce Other Wait:
 vMotion the VM.
 Remove Snapshot
 Update to the latest build of ESXi (incl. physical device drivers), virtual HW and VMware Tools (virtual device
drivers).
 If this happens to multiple VMs, find commonality.
If the above is not helping in your case, file a Support Request with VMware GSS and tag me. Please mention that
you get it from here, so I have a context.
I plotted Other Wait for 4000 production VMs. Surprisingly, the value is not low.

I was curious if the value corelates with CPU Ready or Co-stop. From around 4000 production VM in the last 1
month, the answer is a no.

10
Based on this KB article, snapshot increases the read operations as every snapshot has to read to ensure you’re fetching the
correct data. Write is not impacted as you simply write a new block and not updating existing one.

Introduction: Metrics Complexity Page 122


VMware vSphere Metrics May 2023

Since snapshot is another potential culprit, let’s compare with disk latency and outstanding IO.
What do you expect?

Again, negative corelation. None of the VMs with high VM Wait is experiencing latency. Notice I put a 99 th percentile,
as I wanted to rule out a one time outlier. I’m plotting the first VM as the value at 99 th is very near to the max,
indicating sustained problem.

Introduction: Metrics Complexity Page 123


VMware vSphere Metrics May 2023

It turned out to be true. It has sustained VM Wait value around 15% (above is zoomed into 1 week so you can see
the pattern).
I’m curious why it’s so high. First thing is to plot utilization. I checked Run, Usage and Demand. They are all low.

Using vRealize Operations correlation feature, I checked if it correlates with any other metric. The only metric it
founds is Idle, which is logical they basically add up to 100% when Run is low.

Consumption Metrics
This covers both utilization and reservation. Allocation is a property for VM.

Run
Run is when the Guest OS gets to run and process instruction. It is the most basic counter among the 4 CPU
utilization metrics. It’s the only counter not affected by CPU frequency scaling and hyper threading. It does not check
how fast it runs (frequency) or how efficient it runs (SMT).
Run at VM level = Sum of Run at vCPU levels

This means the value of CPU Run at VM level can exceed 20000 ms in vCenter.
The following screenshot shows CPU Run higher than CPU Used. We can’t tell if the difference is caused by power
management or hyperthreading, or mix of both.

Introduction: Metrics Complexity Page 124


VMware vSphere Metrics May 2023

If the above was all we need to know, monitoring VMware vSphere would have been easy. You wouldn’t need a
book like this. In reality, the following factors must also be considered:
 Interrupt
 System time
 Power Management or CPU Frequency Scaling
 Simultaneous Multithreading (Hyper Threading as Intel calls it)
Because CPU Run do not take into account this external work, and not aware of CPU speed and HT, we will see later
in the right-sizing section that this property makes it suitable as input to size the Guest OS.

Used | Usage | Demand


As covered earlier, CPU Run does not account for the following:
 How fast is the “run”? All else being equal, a 5 GHz CPU is 5x faster than a 1 GHz CPU. Throughput impacts
utilization. The faster it can complete a task, the shorter it has to work. That’s why you see some metrics in
MHz, because they account for this speed.
 How efficient is the “run”? If there is competing thread running in the same core, the 2 threads have to share
the core resource. ESXi accounting records this as 1.25x overall gain, hence each thread drops to 62.5% only.
This is a significant drop that should be accounted.
 IO work. IO performed by hypervisor has to be charged back to the VM.
This is where Used and Demand come in. vCenter then adds Usage (MHz) and Usage (%) metrics. The following table
shows the 5 VM utilization metrics.

Counter Available at Unit Source CPU Frequency SMT


Run vCPU level Millisecond ESXi No No
Used vCPU level Millisecond ESXi Yes Yes
VM level (include System)

Introduction: Metrics Complexity Page 125


VMware vSphere Metrics May 2023

Usage vCPU level MHz vCenter Yes Yes


Usage VM level % vCenter Yes Yes
Demand VM level MHz ESXi Yes No

Used
CPU Used covers uses cases that CPU Run does not.
 VM Migration. Moving VM to another ESXi requires that you know the actual footprint of the VM, because
that’s what the destination ESXi needs to deal with.
 VM Chargeback. You should charge the full cost of the VM, and not just what’s consumed inside the VM. In
fairness, you should also charge the actual utilization, and not rated clock speed.
Here is how Used differs to Run:

Based on the above, you can work out the formula for VM level Used, which is:
VM level Used = Run + System - Overlap + VMX +/- E

Where E is the combination of


 efficiency gained from CPU Turbo Boost or efficiency loss from power savings. For example, if the frequency
is dropped to 40% of the nominal frequency, we consider 60% of the CPU time was stolen.
 37.5% efficiency loss from CPU SMT.
VMX is typically negligible. It accounts for CPU cycles spent on things like consoling to the VM. In esxtop, System
time is charged to the VM VMX world.
Because Used accounts for the actual frequency, you may expect it to be measured in GHz and not millisecond. Think
of the number of cycles completed instead of simply frequency. You then convert it back to time. I know it requires a

bit of mental mathematics 😊

Take note: CPU Used has a different formula at VM level and vCPU level. At vCPU level, it does not include System
Time. At VM level, it includes the work done by VMkernel that is charged at VM level, such as System and other
worlds.

Introduction: Metrics Complexity Page 126


VMware vSphere Metrics May 2023

Usage
There are two metrics: Usage (MHz) and Usage (%).
These 2 metrics do not exist in ESXi, meaning they only exist in vCenter.
I’m not able to figure out if Usage (%) = Usage (MHz) / VM Static CPU Speed, as I don’t have the need yet to use both
metrics. From the chart below, it appears that they are not 100% identical, but they are very similar.

Let’s compare Usage with Used instead. We will compare Usage MHz as that’s the raw counter. The percentage
value is derived from it.

Introduction: Metrics Complexity Page 127


VMware vSphere Metrics May 2023

From the preceding chart, we can see they are basically the same, with the difference due to y-axis scales. Formula
wise, Usage (MHz) includes all the VM overhead, such as the time spent by VMX process.
vRealize Operations Usage (MHz) and Usage (%) metrics map 1:1 to the respective metrics from vCenter.

Usage (GHz)
We stated that CPU power management & HT impact Usage.
Review the following example. This is a single VM occupying an entire ESXi.

The ESXi has 12 cores with nominal frequency of 2.4 GHz. The number of socket does not matter in this case.
Since HT is enabled, the biggest VM you can run is a 24 vCPU. The 24 vCPU will certainly have to share 12 cores, but
that’s not what we’re interested here.
What do you expect the VM CPU Usage (GHz) when you run the VM at basically 100%?
36 GHz, if you did not enable dynamic power management.
Why not 57.6 GHz, because it’s 24 vCPU x 2.4 GHz?
Because HT does not yield 2x. It yields 1.25x only. At the end of the day, the one that does the computation is the
core not the thread.
12 cores x 2.4 GHz 1.25 HT = 36 GHz total capacity with hyperthreading enabled.
In the preceding example, power management was enabled. Naturally Turbo Boost kicked in, albeit not so dramatic
as the VM already used up the entire ESXi CPU cores.
You got around 39 GHz, a small increase over 36 GHz. Formula is 2.4 GHz x 12 cores x 1.25 HT x 1.08x Turbo Boost.
Notice it is 12 not 24.

Introduction: Metrics Complexity Page 128


VMware vSphere Metrics May 2023

What happens when we disable turbo boost? That’s what we did at point 1 in the diagram above..
CPU Usage drops to slightly below 36 GHz.

Usage (%)
The following is a single vCPU production Windows Server. Both CPU Usage (MHz) and Demand jump to over 100%.
Their values are identical for almost 30 days. The VM had near 0% Contention (not shown in chart), hence the 2
values are identical.

However, when we plot the value in %, we see a different number. Usage (%) is strangely capped at 100%.

The VM experienced some contention around May 12. That’s why Demand was higher than Usage.

Demand
Demand differs to Usage as it assumes the VM does not share the physical core. It’s unaware of the penalty caused
by hyperthreading. It’s what the VM utilization would be had it not experienced any contention.
In the event the VM vCPU is sharing, the value of Usage will be 37.5% lower, reflecting the fact that the VM only gets
62.5% of the core. This makes sense as the HT throughput benefit is fixed at 1.25x.
If there is no contention, Demand and Usage will be similar.
Take a look at the following screenshot from vCenter. It’s comparing Demand (bold) and Usage.
What do you notice?

Introduction: Metrics Complexity Page 129


VMware vSphere Metrics May 2023

How can Usage be higher than Demand at some of the point?


The reason is Demand is averaged over a longer time, giving it a more steady value. That’s why the peak is shorter
but wider. Notice the average over 1 hour is higher for Demand.
Demand could be lower than Run if there is power management savings, as it accounts for speed & efficiency of the
run.
Demand (MHz) and Usage (MHz) can exceed 100%. The following is a 32-vCPU Hadoop worker node. Notice it
exceeds the total capacity multiple times. Demand and Usage are identical as it’s the only VM running and the has
more than 32 cores, hence there is 0 contention.

Introduction: Metrics Complexity Page 130


VMware vSphere Metrics May 2023

Okay, now that you have some knowledge, let’s test it 😊

Quiz Time! Looking at the chart below, what could be causing it?
Notice Demand jump while Usage dropped. VM CPU Contention (%) jumped even more. What is going on?

And why is that Contention is much more than Demand – Usage?


The reason why Demand metric jumps while Usage drops is contention. The VM experiences contention, which
includes hyperthreading sharing. I should have included the screenshot of CPU Ready, Co-stop, Overlap, VM Wait
and Swap Wait.
From the chart you can see that the formula for VM CPU Contention > Demand – Usage. Contention (%) is around
20% when Demand is 25% and Usage is 15%. The reason is Contention accounts for both CPU frequency and hyper
threading, while the difference between Demand and Usage is hyper-threading.

Used vs Run vs Usage


By now I hope you vrealize that the various “utilization” metrics in the 4 key objects (Guest OS, VM, ESXi and Cluster)
varies. Each has their own unique behaviour. Because of this, you are right to assume that they do not map nicely

across the stack. Let’s test your knowledge 😊

Review the following chart carefully. Zoom in if necessary.

Introduction: Metrics Complexity Page 131


VMware vSphere Metrics May 2023

The vCenter chart11 above shows VM utilization metrics from a single VM. The VM is a large VM with 24 vCPUs
running controlled CPU test. The power management is fixed so it runs at nominal clock speed. This eliminates CPU
frequency scaling factor.
The VM starts at 50% “utilization”, with each vCPU pinned to a different physical core. It then slowly ramps up over
time until it reaches 100%.
Can you figure out why the three metrics moved up differently? What do they measure?
Now let’s look at the impact on the parent ESXi. It only has a single VM, but the VM vCPU matches the ESXi physical
cores. The ESXi starts at 50% “utilization”, then slowly ramp up over time until it reached 100%.

11
Provided by Valentin Bondzio

Introduction: Metrics Complexity Page 132


VMware vSphere Metrics May 2023

Can you figure out why the 3 metrics moved up differently? What do they measure?
Let’s break it down…

At the start of the test


The VM runs 12 vCPU, but each vCPU was pinned to each ESXi core. So all cores are 100% utilized, but each running
1 thread.
VM CPU Run (ms) is 240K milliseconds, which is 20K milliseconds x 12 (half of its 24 vCPU).
VM CPU Used (ms) is also at 240K milliseconds. There is no loss from overlap, the VM does not do much IO, and no
efficiency loss/gain due to HT.
VM CPU Usage is 50%.
So at this point, all 3 metrics of VM CPU are 50%.
The counter at ESXi tells a different story. The ESXi Core Utilization (%) immediately went up to 100% while Utiliza-
tion went up to only 50%. The reason is Core Utilization measures whether the core is used or not. It’s unaware of
HT.
Usage (%) is identical to Core Utilization in this case.
On the other hand, ESXi Utilization (%) looks at if each thread HT is running or not. It does not care about the fact
that the 2 threads share a core, and simply roll up to ESXi level directly from thread level. This is why it’s showing
50% as it only cares whether a thread is running or not, at any point in time.

Introduction: Metrics Complexity Page 133


VMware vSphere Metrics May 2023

During Ramp Up period


VM is being ramped up steadily. You can see all 3 metrics went up in steps.
VM CPU Run (ms) ramps up from 240K to 480K. All 24 vCPU has 20K ms value, which equals to 100%.
VM CPU Used (ms) barely moved. From 240K to 300K. That’s 1.25x, demonstrating that Used understands HT only
delivers 1.25x throughput.
VM CPU Usage (%) ramp up from 50% to 62.5%, also demonstrating awareness of contention due to HT.
Used (ms) = Usage (%)
ESXi CPU Usage (%) counter stayed flat at 100%. The reason is all 12 cores were already busy. That means VM CPU
Usage (%) is aware of HT, but ESXi CPU Usage (%) is not.
ESXi CPU Core Utilization (%) matches VM Run. Both went 2x.

Towards the end of the run


VM CPU Run is at 480K ms. This counter is suitable for VM Capacity sizing, as it correctly accounts that each vCPU is
used by Guest OS.
VM CPU Used is at 300K milliseconds, which is 62.5%
VM CPU Usage (%) is at 62.5%. On average, each of the VM vCPU only gets 62.5%. If you use this for your VM capac-
ity, you will get the wrong conclusion as it’s already running 100%
ESXi CPU Usage (%) is at 100%. This makes it suitable from Capacity viewpoint, albeit too conservative. It is not suit-
able from Performance, as you can not tell if there is still room.
ESXi CPU Utilization (%) is at 100%. Because it tracks the ramp correctly, it can be used from Performance. You can
use it for Capacity, but take note that 100% means you get performance hit from. In fact, at 50% the HT effect will
kick in.

CPU Usage Disparity


This metric is required to convince the owners of the VM to downsize their large VMs. It’s very common for owners
to refuse sizing it down even though utilization is low, because they have already paid for it or cost is not an issue.
Let’s an example. This VM has 104 vCPU. In the last 90 days, it’s utilization is consistently low. The Usage (%) counter
never touch 40%. Demand is only marginally higher. Idle (%) is consistently ~20%.

All the key performance metrics such as Guest OS CPU Run Queue are low.
Obviously the VM does not need 104 vCPU. How to convince the owner if he is not interested in refund? The only
angle left is performance. But then we’re faced with the following:
1. CPU Run Queue inside the Guest OS is low. Decreasing CPU will in fact increase it, which is worse for per-
formance.

Introduction: Metrics Complexity Page 134


VMware vSphere Metrics May 2023

2. CPU Context Switch is high from time to time.


3. CPU Co-Stop is very low (max of 0.006% in the last 90 days). Decreasing CPU may or may not make it lower.
Regardless, it’s irrelevant. Same goes with VM Wait and Swap Wait.
4. CPU Ready is very low (max of 0.14% in the last 90 days).

The only hope we have here to convince VM owner is to give insight on how the 104 vCPU are used. There are 2
ends of the spectrum:

At one end, all 104 All are running at that low 20%. This triggers an interesting discussion on why the applica-
are balanced tion is unable to even consume a single vCPU. Is this inefficiency the reason why the ap-
plication vendor is asking for so many vCPU? Commercially, it’s wasting a lot of software li-
cense
Imbalance Some are saturated, while others are not.
 The Peak among vCPU metric will capture if any of them is saturated. This is good
insight.
 The Min among vCPU is not useful as there is bound to be 1 vCPU among 104 that
is running near 0%.
 The delta between Max and Min will provide insight on the degree of the usage
disparity. Does it fluctuate over time? This type of analysis helps the application
team. Without it they have to plot 104 vCPU one by one.

In reality, there could be many combinations in between the 2 extremes. Other insights into the behaviour of the
104 vCPU are:
1. Jumping process. Each vCPU takes turn to be extreme high and low, as if they are taking turn to run. This
could indicate process ping pong, where processes either do frequent start/stop, or they jump around from
one vCPU to another. Each jump will certainly create context switch, like the cache needs to be warm up. If
the target CPU is busy, then the running process was interrupted.
2. CPU affinity. For example, the first 10 vCPU is always much busier than the last 10 vCPU. This makes you
think why, as it’s not normal.

Naming wise, vCPU Usage Disparity is a better name than Imbalance vCPU Usage. Imbalance implies that they should
be balanced, which is not the case. It’s not an indication that there is a problem in the guest OS because vRealize
Operations lacks the necessary visibility inside the guest OS

System
A VM may execute a privilege instruction, or issue IO commands. These 2 activities are performed by the hypervisor,
on behalf of the VM.
IO processing differs to non-IO processing as it has to be executed twice. It’s first processed inside the Guest OS, and
then in the hypervisor storage subsystems, because each OS has their own storage subsystem. For ESXi, its network
stack also have to do processing if it’s a IP-based storage.

Introduction: Metrics Complexity Page 135


VMware vSphere Metrics May 2023

ESXi typically uses another core for this work instead of the VM vCPU, and put that that VM vCPU in wait state. This
work has to be accounted for and then charged back to the associated VM. The System counter tracks this. System
counter is part of VMX counter.
Guest OS isn’t aware of the 2nd processing. It thinks the disk is slower as it has to wait longer.
If there is snapshot, then VMkernel has to do even more work as it has to traverse the snapshot.
The work has to be charged back to the VM since CPU Run does not account for it. Since this work is not performed
by any of the VM CPU, this is charged to the VM CPU 0. The system services are accounted to CPU 0. You may see
higher Used on CPU 0 than others, although the CPU Run are balanced for all the VCPUs. So this is not a problem for
CPU scheduling. It’s just the way VMkernel does the CPU accounting.
The System counter is not available per vCPU. Reason is the underlying physical core that does the IO work on behalf
of the VM may be doing it for more than 1 vCPU. There is no way to break it down for each vCPU. The following
vCenter screenshot shows the individual vCPU is not shown when System metric is selected.

Introduction: Metrics Complexity Page 136


VMware vSphere Metrics May 2023

ESXi is also performing IOs on behalf of all VMs that are issuing IOs on that same time, not just VM 1. VMkernel may
serialize multiple random IO into sequential for higher efficiency.
Note that I wrote to CPU accounting, not Storage accounting. For example, vSphere 6.5 no longer charges the
Storage vMotion effort to the VM being vMotion-ed.
Majority of VMs will have System value less than 0.5 vCPU most of the time. The following is the result from 2431
VMs.

On IO intensive VM like NSX Edge, the System time will be noticeable, as reported by this KB article. In this case,
adding more vCPU will make performance worse. The counter inside Linux will differ to the counter in vSphere. The
following table shows high system time.

Introduction: Metrics Complexity Page 137


VMware vSphere Metrics May 2023

Introduction: Metrics Complexity Page 138


VMware vSphere Metrics May 2023

VM Memory

I will use the vSphere Client as the source of metrics in the following screenshots as that is, well, the source 😊

Just like the case for CPU, some metrics are for VMkernel consumption, not your operations.

Overview
For performance use case, the only counter tracking actual performance is Page-fault Latency.

Next, check for swapping as it’s slower than compressed. You get 6 metrics for it

Next is compressed

Introduction: Metrics Complexity Page 139


VMware vSphere Metrics May 2023

Host Cache should be faster than disk (at least I assume you designed it with faster SSD), so you check it last.

Lastly, there is the balloon.

Wait! Where is the Intel Optane memory metrics?


It does not exist yet, as that’s supposed to be transparent to ESXi.
Performance is essentially the only use case you have at VM level. For Capacity, you should look at Guest OS. The VM
capacity metrics serve as input to the host capacity and are used in determining the VM memory footprint (e.g.
when migrating to another ESXi).
You’ve got 5 metrics, with consume being the main one.

I’m going to add Active next, although I don’t see any use case for it. It’s an internal counter used by VMkernel
memory management.

Lastly, you get the shared pages and 0 pages.

Now that we’ve got the overview, let’s dive into the first counter!

Introduction: Metrics Complexity Page 140


VMware vSphere Metrics May 2023

“Contention” Metrics
I use quote because the only true contention counter is latency. The second reason is Aria Operations has a metric
called Contention, which is actually vCenter counter called latency.

Latency
Memory Latency, aka "Page-fault latency" is tracking the amount of time a vCPU spends waiting on the completion
of a page fault. Its value is probably mostly swap wait, and probably minimally page decompression / copy-on-write-
break. The counter is called %LAT_M in esxtop, while CPU Contention is called %LAT_C. This counter has the effect of
reduced value of the Compressed metric and/or Swapped metric, and increased the value of Consumed & Granted.
This is the only performance counter for memory. Everything else does not actually measure latency. They measure
utilization, because they measure the disk space occupied. None captures the performance, which is how fast that
memory page is made available to the CPU.
Consider the hard disk space occupied. A 90% utilization of the space is not slower than 10%. It’s a capacity issue, not
performance.
If a page is not in the physical DIMM, the VM has to wait longer. It could be in Host Cache, Swapped or Compressed.
It will take longer than usual. vSphere tracks this in 2 metrics: CPU Swap Wait and RAM Latency.
 CPU Swap Wait tracks the time for Swapped In.
 RAM Latency tracks the percentage of time VM waiting for Decompressed and Swapped In. The RAM
Latency is a superset of CPU Swap Wait as it caters for more scenarios where CPU has to wait. vRealize
Operations VM Memory Contention metric maps to this.
Latency is >1000x lower in memory compared to disk, as it's CPU basically next to the CPU on the motherboard. Time
taken to access memory on the DIMM bank is only around 200 nanoseconds. Windows/Linux does not track memory
latency. The closest counter is perhaps page fault. The question is does page fault includes prefetch? If you know, let
me know please.
Does it mean we don’t track balloon, swapped and compressed?
No.
The higher the value is for balloon, swapped, and compressed, the higher the chance of a performance hit
happening in the future if the data is requested. The severity of the impact depends on the VM memory shares,
reservation, and limit. It also depends upon the size of the VM's configured RAM. A 10-MB ballooning will likely have
more impact on a VM with 4 GB of RAM than on one with 512 GB.
Latency does not include balloon as that’s a different context. In addition, the hypervisor is not aware of the Guest
OS internal activity.
Actions you can do to address high value:
 Store vswp file on higher throughput, lower latency storage, such as using Host Swap Cache.
 Increase memory shares and/or reservation to decrease amount of swapping. If the VM belongs to a
resource pool, ensure the resource pool has sufficient for all its VMs.

Introduction: Metrics Complexity Page 141


VMware vSphere Metrics May 2023

 Reduce assigned memory. By rightsizing, you reduce the size of memory reclamation, hence minimizing the
risk.
 Remove VM Limit.
 Unswap the swapped memory. You cannot do this via API, but you can issue the command manually. Review
this article by Duncan Epping and Valentin Bondzio.
 If possible, reboot the VM as part of regular maintenance. This will eliminate the swap file, hence avoiding
future, unexpected swap wait on that swapped page. Note this does guarantee the same page to be
swapped out again.

Best Practice
In an environment where you do not do memory overcommit and place limit, the chance of hitting memory
contention will be basically 0. You can plot the highest VM Memory Contention counter in all clusters and you will
basically see a flat line. That would be a lot of line charts, so I’m using a pie chart to analyze 2441 VM in the last 4
months. For each VM, I took the highest value in the last 4 months. Only 13 VM had its worst VM Contention above
1%.

Introduction: Metrics Complexity Page 142


VMware vSphere Metrics May 2023

Balloon
Balloon is an application (kernel driver to be precise) running inside the Guest OS, but it can take instruction from
VMkernel to inflate/deflate.
When it receives an instruction to inflate, it asks the Guest OS to allocate memory to it. This memory in the Guest OS
is not backed up by physical memory in ESXi, hence it is available for other VMs. When ESXi is no longer under
memory pressure, it will notify the Balloon to release its requested page inside Guest OS. This is a proactive
mechanism to reduce the chance of the Guest OS doing paging. Balloon will release the page inside the Guest OS.
The Balloon counter for the VM will come down to 0.
Guest OS will start allocating from the Free Pages. If insufficient, it will take from Cache, then Modified, then In Use.
This by itself does not cause performance problem. What will cause performance is when the ballooned page is
requested by Windows or Linux. The following shows a VM that is heavily ballooned as limit was imposed on it.
Notice the actual performance happens rarely.

Just because Balloon asks for 1 GB of RAM, does not mean ESXi gets 1 GB of RAM to be freed. It can be less if there is
TPS.
To use ballooning, Guest OS must be configured with sufficient swap space.
How much will be asked depends on Idle Memory Tax. I do not recommend playing with this setting.
Guest OS initiate memory reallocation. Therefore, it is possible to have a balloon target value of 0 and balloon value
greater than 0. The counter Balloon Target tracks this target, so if you see a nonzero value in this counter, it means
that the hypervisor has asked this VM to give back memory via the VM balloon driver.
Balloon is a memory request from ESXi. So it’s not part of the application. It should not be included in the Guest OS
sizing, hence it’s not part of reclamation.
Balloon impacts the accuracy of Guest OS sizing. However, there is no way to measure it.
 When Balloon driver asks for pages, Guest OS will allocate, resulting in In Use to go up. This is because the
balloon driver is treated like any other processes.
 If the page comes from Free, then we need to deduct it from In Use.
 If the page comes from In Use, then we can’t simply deduct the value of In Use. Guest OS pages out, so we
need to add Page Out or Cache.

Introduction: Metrics Complexity Page 143


VMware vSphere Metrics May 2023

Compressed or Swapped
Compressed and Swapped are different from ballooning, as the hypervisor has no knowledge of the free memory
inside the Guest OS. It will randomly compress or swap. As a result, any value in this counter indicates that the host
is unable to satisfy the VM memory requirement. This can have potential impact on performance.
You may notice that there is no compression target. Right?

The Consumed counter includes this metric. To be accurate, the Compressed counter should track the result of the
compression, as that’s the actual amount consumed by the compressed pages.
It is possible to have balloon showing a zero value while compressed or swapped are showing nonzero values—even
though in the order of ESXi memory reclamation techniques, ballooning occurs before compression. This indicates
that the VM did have memory pressure in the past that caused ballooning, compression, and swapping then, but it
no longer has the memory pressure. These events could have happened at different time. Data that was compressed
or swapped out is not retrieved unless requested, because doing so takes CPU cycles. The balloon driver, on the
other hand, will be proactively deflated when memory pressure is relieved.
There are other compression related metrics that are provided.

Metrics Description
Average Compressed Average amount of compressed memory in the reporting period. In vCenter case, this is
the average of the last 20 seconds. In vRealize Operations case, this is the average of the
last 5 minutes.
Latest Zipped Last amount of compressed memory in the reporting period. In vCenter case, this is data
in the 20th second. vRealize Operations then averages 15 of these datapoints to make a
300 second average.
Zip Saved The present amount of memory saved from the compression.
Compression Rate This complements the compressed size as it covers how much memory is compressed at
any given period. A 10 MB compressed in 1 second is different to 10 KB compressed
over 1000 seconds. Both results in the same amount, but the problem is different. One
is a acute but short fever, the other is low grade but persistent fever. You don’t want
neither, but good to know what exactly you’re dealing with.
Decompression Rate Same as above, but for the opposite process.
Swap Target We have a balloon target and swap target, so we should expect a compression target
too right?
No, because both swap and compression work together to meet the swap target
counter, the counter should actually be called Compression or Swap target.

Introduction: Metrics Complexity Page 144


VMware vSphere Metrics May 2023

This counter tracks the amount of RAM that is subjected to the compression process. It
does not track what the resultant compressed amount. There are 2 levels of compres-
sion (4:1 and 2:1), so a 4 KB page may end up as 1 KB or 2 KB. If the compression result
is less than that, the page will be swapped instead as that’s a cheaper operation. So it’s
completely possible to have 0 swapped as all the pages were compressed instead

Which one should you pay attention to?


The answer always goes back to: when you see the value, what are you going to do about it? Basing on the purpose
or use case helps in applying the metrics in the context of operations.

Limit
Does limit result in Balloon?
The answer is no. Why not?
They are at different level on memory management. Limit results in swapped or compressed.
Let’s take an example with a VM that is configured with 16 GB RAM. This is a My SQL database running on RHEL. You
can see in the last 7 days, it’s using around 13.4 GB and increasing to 13.6 GB.

It’s given a bad limit of 2 GB.


In the last 7 days, we can see the limit is a perfectly flat line. It’s 2.12 GB as it includes the overhead value.

The VM, or rather the Guest OS, did ask for more. You can see the demand by looking at the Granted or Compressed
or Swapped metrics. I’m only showing Granted here:

Because of the limit, the Consumed counter did not past the 2 GB. It’s constantly hovering near it as the VM is asking
more than that.

Introduction: Metrics Complexity Page 145


VMware vSphere Metrics May 2023

What do you expect to see the Balloon value?


If Balloon has something to do with it, it would not stay a perfectly flat line.
But this is what you got. A perfectly flat line, proving Limit had nothing to do with Balloon.

Consumption Metrics
This covers both utilization and reservation.

Granted
The English word granted accurately defines this metric, so I will just put a picture for you to conclude what it is.

What the Guest OS can see is what is configured by vSphere. Guest OS can’t see the hypervisor memory overhead.
Overhead is mostly negligible, as it’s just storing metadata or index information required by virtualization, such as
the shadow page tables. Overhead value goes up as you configure more vCPUs and memory.
Say you have a VM configured with 16 GB RAM. Any part of these 16 GB of memory pages can fall under one of
these:

Introduction: Metrics Complexity Page 146


VMware vSphere Metrics May 2023

Not touched The VM never uses the page since it’s powered on.
Ballooned The page was reclaimed by the balloon driver. It has not been asked back by the Guest OS, hence
it’s just seating there collecting pixie dust.
I put in yellow color as that’s not a green situation. The higher the balloon, the higher the chance
a page will be required in the future
Compressed These 2 are mutually exclusive and go together. What can’t be compressed will be swapped.
Compressed is preferred as unzipping memory from DRAM is faster than bringing it from SSD
Swapped
disk.

Whatever left is called Granted 😊


Granted

Granted metric includes the Shared memory. Shared counts the number of memory pages that
are pointing to the same underlying block. Granted does not care about underlying usage as its
vantage point is VM, not ESXi.
Entitlement = Granted + Overhead.

Ok, the above is the theory. How do you know I’m not making this stuff up, considering I’m pretty good at it?
Let’s take a VM and plot its value over time. As you can see, the value in the last 4 weeks is a constant 16 GB.

The line is a perfect flat. Both the Highest value and Lowest value show 16,384 MB.
The VM was ballooned. 63.66% of its memory was reclaimed. That’s a whopping 10,430 MB!

Why is the Ballooned not moving at all?


Because the Guest OS never needs any of those 10+ GB.
So Guest was playing with the remaining 6 GB.
So what do you expect if we plot Granted + Swapped + Compressed?
You got it. A flat line.

Introduction: Metrics Complexity Page 147


VMware vSphere Metrics May 2023

Looking at the diagram in Granted metric, can you explain why Limit is not there?
You are right, it operates at a different layer. Limit is about limiting the usage at the physical layer.
Let’s take an example. The following is VM is a Windows 2016 server, configured with 12 GB of RAM, but was limited
to 8 GB (the flat line in cyan near the bottom). The purple line jumping up and down is Granted. Granted ignores the
limit completely and run way above it.

Notice Consumed (KB) is consistently below Limit. Granted does not exceed 12 GB as it does not exceed configured.

Shared
There are 2 types of shared pages:
 Intra-VM sharing: sharing within the same VM. By default, each page is 4 KB. If Guest OS uses the Large Page,
then it’s 2 MB. The chance of sharing in 4 KB is much higher than 2 MB.
 Inter-VM sharing. Due to security concern, this is by default disabled in vSphere.
A commonly shared page is certainly the zero page. This is a page filled with just zeroes.
For accounting purpose, the Shared page is counted in full for each VM. Example:
 VM 1: 1 GB private, 100 MB Shared within itself, 10 MB shared with other VMs (it does not matter how
many and what VMs).

Introduction: Metrics Complexity Page 148


VMware vSphere Metrics May 2023

 The 100 MB is the amount that is being shared. If not shared, they would consume 100 MB. But how much is
actually consumed as a result of this sharing?
 The 10 MB is shared with other VMs. VM 1 could be sharing 1 MB each with 10 other VMs, or the entire 10
MB with just 1 VM. The Shared counter merely counts that this 10 MB is being shared. VM 1 definitely
consumes this 10 MB, and it’s not sharing within itself.
Shared includes zero pages. The following screenshot shows the 2 moved in tandem over several days.

Shared Saved metric tracks the estimated amount of machine memory that are saved due to page sharing.
Because the ESXi machine page is shared by multiple Guest OS physical pages, this metric charge "1/ref" page as the
consumed machine memory for each of the guest physical pages, where "ref" is the number of references. So, the
saved machine memory will be "1 - 1/ref" page. For example, if there are 4 pages pointing to the same physical
DIMM, then the savings is 3 pages worth of memory.

Consumed
Consumed = Granted – Saving from Sharing
Consumed does not include overhead memory, although this number is practically negligible. I’m not sure why it
does not, as the page is indeed for the VM.
Consumed does not include compressed memory. My guess is because the pages are not readily available for use.
Consumes includes memory that might be reserved.
Consumes tracks the ESXi Memory mapped to the VM. ESXi assigns large pages (2 MB) to VM whenever possible; it
does this even if the Guest OS doesn’t request them. The use of large pages can significantly reduce TLB misses,
improving the performance of most workloads, especially those with large active memory working sets.
The above is one reason why the Consumed metric is higher than the Guest OS In Use. The other reason is it contains
pages that were active (and no longer active), but still mapped to the VM.
Here is a screenshot comparing Windows 10 Task Manager memory metrics with vRealize Operations Memory \ Non
Zero Active (KB) and Memory \ Consumed (KB). As you can see, none of the metrics match.

Introduction: Metrics Complexity Page 149


VMware vSphere Metrics May 2023

When a Guest OS frees up a memory page, it normally just updates its list of free memory, it does not release it. This
list is not exposed to the hypervisor, and so the physical page remains claimed by the VM. This is why the Consumed
counter in vCenter remains high when the Active counter has long dropped.
It is a common mistake to think they are calculated in a similar, and simply differ based on aggressive vs
conservative. The following test shows Active going down while Consumed going up!

Introduction: Metrics Complexity Page 150


VMware vSphere Metrics May 2023

Consumed does not include Shared page. When you see Consumed lower than Guest OS Used, check if there are
plenty of shared pages. The following screenshot shows Guest OS Used consistently higher. It’s also constant, around
156 GB throughout. Consumed was relatively more volatile, but never exceed 131 GB. The reason for it is Shared.
Notice the value is high, around 61 – 63 GB.

Consumed is affected by Limit. The following is a VM configured with 8 GB RAM but was limited to 2 GB.

Introduction: Metrics Complexity Page 151


VMware vSphere Metrics May 2023

Utilization
Utilization (KB) = Guest Needed Memory (KB) + ( Guest Page In Rate per second * Guest Page Size (KB) ) + Memory
Total Capacity (KB) – Guest Physically Usable Memory (KB).
Because of the formula, the value can exceed 100%. The following is an example:

It’s possible that vRealize Operations shows high value when Windows or Linux does not. Here are some reasons:
 Guest metrics from VMware Tools are not collecting. The value falls back to Consumed (KB). Ensure your
collection is reliable, else the values you get over time contains mixed source. If their values aren’t similar,
the counter values will be fluctuating wildly.
 Guest Physically Usable Memory (KB) is less than your configured memory. I’ve seen in one case where it’s
showing 58 GB whereas the VM is configured with 80 GB. My first guess is the type of OS licensing. However,
according to this, it should be 64 GB not 58 GB.
 Low utilization. We add 5% of Total, not Used. A 128 GB VM will show 6.4 GB extra usage.
 Excessive paging. We consider this. The tricky part is excessive is relative.
 We include Available in Linux and cache in Windows, as we want to be conservative.

Demand
Can you spot a major counter that exists for CPU, but not for RAM?
That’s right. It’s Demand. There is no memory demand counter in vCenter UI.
To figure out demand, we need to figure out unmet demand, as demand is simply unmet demand + used (which is
met demand). Since the context here is VM, and not Guest OS, then unmet demand includes only VM level metrics.
The metrics are ballooned + swapped + compressed.
Do you agree with the above?
If we are being strict with the unmet demand definition, then only the memory attributed to contention should be
considered unmet demand. That means balloon, swap, or compressed memory can’t be considered unmet demand.
Swap in and decompression are the contention portion of memory. The problem then becomes the inability to
differentiate contention due to limits using host level metrics, which means we’d need to look at VM level metric to
exclude that expected contention.

Introduction: Metrics Complexity Page 152


VMware vSphere Metrics May 2023

Active
This is a widely misunderstood counter. ESXi calls this Touch as it better represents the purpose of the metric. Note
that vCenter still calls it Active, so I will call it Active.
This counter is often used to determine the VM utilization, which is not what it was designed for. To know why, we
need to go back to fundamental. Let’s look at the word active. It is an English word that needs to be quantified
before we can use it as metric. There are 2 dimensions to consider before we apply it:
 Definition of active. In RAM context, this means read or write activity. This is similar to disk IOPS. The more
read/sec or write/sec to a page, the more active that page is. Note that the same page can be read/written
to many times in a second. Because a page may be accessed multiple times, the actual active pages could be
lower. Example: a VM do 100 reads and 100 writes on its memory. However, 50 of the writes are on the
page that were read. In addition, there are 10 pages that were read multiple times. Because of these 2
factors, the total active pages are far fewer than 300 pages. If the page is average 4 KB, then the total active
is way less than 1200 KB.
 Active is time bound. Last week is certainly not active. Is 300 seconds ago active? What exactly, is recent? 1
second can be defended as a good definition of recent. Windows shows memory utilization in 1 second
interval. IOPS is always measured per second, hence the name IOPS. So I think 1 second seems like a good
definition of recent.
Applying the above understanding, the active counter is actually a rate, not a space. However, the counter reported
by vCenter is in KB, not KB/s.
To translate from KB/s to KB, we need to aggregate based on the sampling period. Assuming ESXi samples every 2
seconds, vCenter will have 10 sampling in its 20 second reporting period. The 10 samplings can be sampling the same
identical pages, or completely different ones. So in the 20 seconds period, the active memory can be as small as 1
sampling, or as large as 10 samplings.
Examples:
 First 2 seconds: 100 MB Active
 Next 2 seconds: 150 MB Active
In the above 4 seconds, the active page ranges from 150 MB to 250 MB.
Each sampling is done independently, meaning you could be sampling the same block again. But the value is then
averaged it with previous samples. Because sampling and averaging takes time, Active won't be exact, but becomes
more accurate over time to approximate the amount of active memory for the VM. This is why there is actually a
longer version of Active, which you will see in esxtop (it is not available in vSphere Client).
VM Active is typically different from Guest OS working set estimate. Sometimes the difference may be big, because
Guest OS and VMkernel use different working set estimate algorithm. Also, VM has a different view of active
memory, due to ballooning and host swapping. Logically, ballooned memory is considered inactive, so, it is excluded
from the sampling.
If you plot a vRealize Operations VM in vCenter real-time performance chart, you will see 12 peaks in that one-hour
line chart. The reason is vRealize Operations pulls, process, and writes data every 5-minutes. The chart for CPU, disk
and network will sport the same pattern. This is expected.
But if you plot the memory metrics, be it total active, active write or consumed, you will not see the 12 peaks. This is
what I got instead.

Introduction: Metrics Complexity Page 153


VMware vSphere Metrics May 2023

Consume is completely flat and high. Active (read and write) and Active Write (write only) is much lower but again
the 12 peaks are not shown.
Can you figure it out?
My guess is the sampling size. That’s just a guess, so if you have a better answer let me know!
Now let’s go to vRealize Operations. In vRealize Operations, this metric is called Memory \ Non Zero Active (KB).
vCenter reports in 20 seconds interval. vRealize Operations takes 15 of these data and average them into a 300-
second average. In the 300 second period, the same page can be read and written multiple times. Hence the active
counter over reports the actual count.
Quiz: now that you know Active over reports, why is it lower than Consumed? Why is it lower than Guest OS
metrics?
Active is lower than both metrics because these 2 metrics do not actually measure how actively the page is used.
They are measuring the disk space used, so it contains a lot of inactive pages. You can see it in the general pattern of
Consume and Guest OS used metrics. The following is vRealize Operations appliance VM. Notice how stable the
metrics are, even over millions of seconds.

Both Active and Consumed are not suitable for sizing the Guest OS. They are VM level metrics, with little correlation
to the Guest OS memory usage. Read Guest OS Used counter for the counter we should use.

Introduction: Metrics Complexity Page 154


VMware vSphere Metrics May 2023

The reason is the use case. It is not about the IOPS. It is about the disk space used. Guest OS expects the non-active
pages to be readily available. Using Active will result in a lot of paging.
Reference: Active Memory by Mark Achtemichuk.

Usage (%)
Usage metric in vCenter differs to Usage metric in vRealize Operations.
What you see on the vCenter UI is Active, not Consumed.

Mapping to Active makes more sense as Consumed contains inactive pages. As covered earlier, neither Active nor
Consumed actually measures the Guest OS memory. This is why vRealize Operations maps Usage to Guest OS. The
following shows what Usage (%) = Guest OS Needed Memory over configured memory. The VM has 1 GB of memory,
so 757 MB / 1024 = 74%.
Take note that there can be situation where Guest OS metrics do not make it to vRealize Operations. In that case,
Usage (%) falls back to Active (notice the value dropped to 6.99%) whereas Workload (%) falls back to Consumed
(notice the value jump to 98.95%).

Introduction: Metrics Complexity Page 155


VMware vSphere Metrics May 2023

Examples
Let’s apply the knowledge.

Balloon vs Consumed
This 64-bit CentOS VM runs My SQL and is configured with 8 GB of RAM.
Linux was heavily ballooned out (default limit is around 63%). Why is that so?

In this VM case, we set a limit to 2 GB. As a result, Consumed did not exceed 2 GB.

Introduction: Metrics Complexity Page 156


VMware vSphere Metrics May 2023

Did you notice the common deep in Balloon and Consumed?


Can you explain them?
Balloon dropped by 0.46 GB then went back to its limit again. This indicated Guest OS was active.
Consumed went down from 2.09 GB to 1.6 GB, and then slowly going back up. Why did it suddenly consume 0.4 GB
less in the span of 20 minutes? Both the configured limit and the runtime limit did not change. They were constant at
2 GB. This makes sense, else the Consumed would not be able to slowly go up again.

There must be activity by the VM and pages were compressed to make room for the newly requested pages. The
Non Zero Active counter shows that there are activities.

The pages that are not used must be compressed or swapped. The Swapped value is negligible, but the Compressed
metric shows the matching spike.

So far so good. Windows or Linux were active (2.4 GB in 5 minute at the highest point, but some pages were
probably part of Consumed). Since Consumed was at 100%, some pages were moved out to accommodate new
pages. The compression resulted in 0.6 GB, hence the uncompressed amount was in between 2x and 4x.
Consumed dropped by 0.4 GB as that’s the gap between what was added (new pages) and what was removed
(existing pages).

Granted vs Consumed
This is a mystery to me.
Boot a Windows VM. Windows writes zeroes to initialize the pages, but VMkernel is smart enough to do a copy-on-
write, so all the pages are pointing to the same physical page. After a while, as the pages are replaced with actual
data.
If the above is true, why Consumed shoots up ahead of Granted? It should be the other way around.

Introduction: Metrics Complexity Page 157


VMware vSphere Metrics May 2023

VM Storage

At the VM level, you can look at metrics at the individual virtual disk level, at the datastore level, and at the disk
level. Which ones should you when?

Virtual Disk Use the virtual disk metrics to see VMFS vmdk files, NFS vmdk files, and RDMs. However,
you don’t get data below the virtual disk. For example, if the VM has snapshot, the data
does not know about it. Also, a VM typically has multiple virtual disks (OS drive, swap drive,
data drive), so you need to add them manually if you use vCenter.
Datastore Use the datastore metrics to see VMFS and NFS, but not RDM. Because snapshots happen
at Datastore level, the counter will include it. Datastore figures will be higher if your VM has
a snapshot. You don’t have to add the data from each virtual disk together as the data
presented is already at the VM level. It also has the Highest Latency counter, which is useful
in tracking peak latency
Disk Use the disk metrics to see VMFS and RDM, but not NFS. The data at this level should be the
same as at Datastore level because your blocks should be aligned; you should have a 1:1
mapping between Datastore and LUN, without extents. It also has the Highest Latency
counter, which is useful in tracking peak latency

If all the virtual disks of a VM are residing in the same datastore, and that datastore is backed by 1 LUN, then all the
metrics will be identical. The following VM has 2 virtual disks (not shown). Notice all 3 metrics are identical over
time.

Introduction: Metrics Complexity Page 158


VMware vSphere Metrics May 2023

Virtual Disk
Take note that vSphere Client does not provide summary at VM level. Notice the target objects are individual
scsiN:N, and there is no aggregation at VM level as the option in Target Objects column below.

Latency metrics
The main metrics are latency. They are provided in both ms and microsecond.

Introduction: Metrics Complexity Page 159


VMware vSphere Metrics May 2023

Outstanding IO
vSphere also provides information about the number of I/Os that have been issued, but not yet completed. They are
waiting in the queue, indicating a bottleneck

The formula is
Outstanding IO = Latency x IOPS

You can prove the above formula by plotting the latency metric. In the following example, this Windows 10 VM has
good latency, constantly below 5 ms except for 1 occasion.

If we plot the IOPS, it reveals a different pattern. There is a regular spike, albeit the number is very low. There is a
one time spike near the start of the chart.

What do you expect the Outstanding IO chart to look like?


Well, since we know the formula, we expect the chart to “combine” both latency and IOPS. And the following is
exactly what we got:

Outstanding IO should be seen in conjunction with latency. It can be acceptable to have high number of IO in the
queue, so long the actual latency is low.
Since your goal is maximum IOPS and minimum latency, the metric is less useful as its value is impacted by IOPS. See
this KB article for VSAN specific recommendation on the expected value.
What should be the threshold value?
That depends on your storage, because the range varies widely. Use the profiling technique to establish the
threshold that is suitable for your environment.

Introduction: Metrics Complexity Page 160


VMware vSphere Metrics May 2023

In the following analysis, we take more than 63 million data points (2400 VM x 3 months worth of data). Using data
like this, discuss with the storage vendor if that’s inline with what they sold you.

Utilization metrics
A typical suspect for high latency is high utilization. As you can expect, you’re given both IOPS and throughput
metrics.

If the IOPS is low, but the throughput is high, then the block size is large. Compare this with your expected block size,
as they should not deviate greatly from plan.

Real-World Profiling
VM Disk IOPS and throughput vary widely among workload. For a single workload or VM, it also depends on whether
you measure during its busy time or quiet time.
In the following example, I plotted from a 3500 production VMs. They are sorted by the largest IOPS on any given 5
minute. What’s your take?

Introduction: Metrics Complexity Page 161


VMware vSphere Metrics May 2023

I think those numbers are high. At 1000 IOPS averaged over 5 minutes, that means 300,000 total IO commands that
need to be processed. So 10K IOPS translates into 3 millions commands, which must be completed within 300
seconds.
A high IOPS can also impact the pipe bandwidth, as it’s shared by many VMs and VMkernel. If a single VM chews up
1 Gb/s, you just need a handful of them to saturate 10 Gb ethernet link.
There is another problem, which is sustained load. The longer the time, the higher the chance that other VMs are
affected.
In the following example, it’s a burst IOPS. Regardless, discuss with the application team if it is higher than expected.
What’s normal from one application may not be for another.

While there is no such thing as normal distribution or range, you can analyse your environment so you get a sense. I
plotted all the 3500 VMs and almost 85% did not exceed 1000 IOPS in the last 1 week. The ones hitting >5K IOPS only
form around 3%.

Introduction: Metrics Complexity Page 162


VMware vSphere Metrics May 2023

Lastly, there are storage DRS metric and seek size.

IOPS Limit
You can set the limit for a VM. Note that the limit is per virtual disk, not per VM.

A few rows below, and you will see the following.

The default setting is no limit, which is what I recommend.


Take note that since the limit is applied at VM level, the metrics that will show high latency is at Guest OS levels. The
VM metric will not show high latency, as the IO that were allowed to pass was not affected by this limit. This is no
different to any problem at Guest OS layer. For example, if LSI Logic or PVSCSI driver is causing problem, the VM will
not report anything as it’s below the Guest OS driver.

Introduction: Metrics Complexity Page 163


VMware vSphere Metrics May 2023

Disk
I think this should be called Physical Disk or Device, as the terminology “disk” sounds like a superset of virtual disk.
Disk means device, so we’re measuring at LUN level or RDM level. It’s great to know that we can associate the
metrics back to the VM. Notice we can’t associate it to specific virtual disk. This is one benefit of keeping all the VM
files in 1 datastore.

Also, depending on the metric, the association is at the disk level. So I’m not 100% sure if the value is per VM or per
disk (which typically has many VM).

As usual, we start with error metrics before we look at latency.

For latency, there is no breakdown. It’s also the highest among all disks. Take note the roll-up is latest.

There are 2 sets of metrics for IOPS. Both are basically the same. One if the total number of IO in the collection
period, while the other one is average of 1 second.

Introduction: Metrics Complexity Page 164


VMware vSphere Metrics May 2023

There are the usual metrics for throughput.

It will be great to have block size, especially the maximum one during the collection period.

Datastore
Just like LUN level, we lose the breakdown at virtual disk. The metric is only available at VM level.

For contention, only latency is provided. There is no outstanding IO.

The highest latency is useful for VMs with multiple datastores. But take note the roll-up is Latest, not average.
For utilization, both IOPS and throughput are provided.

Introduction: Metrics Complexity Page 165


VMware vSphere Metrics May 2023

Review the following screenshot. Notice something strange among the 3 metrics?

Yes, the total IOPS at datastore level is much lower than the IOPS at physical disk and virtual disk levels. The IOPS at
physical disk and virtual disk are identical over the last 7 days. They are quite active.
The IOPS at datastore level is much lower, and only spike once a day. This VM is an Oracle EBS VM with 26 virtual
disks. Majority of its disks are RDM, hence the IOPS hitting the datastore is much less.

Snapshot Impact
Snapshot requires additional read operations, as the reads have to be performed on all the snapshots. The impact on
write is less. I’m not sure why it goes up so high, but logically it should be because many files are involved. Based on
the manual, a snapshot operation creates .vmdk, -delta.vmdk, .vmsd, and .vmsn files. Read more here.
For Write, ESXi just need to write into the newest file.

Introduction: Metrics Complexity Page 166


VMware vSphere Metrics May 2023

The pattern is actually identical. I take one of the VM and show it over 7 days. Notice how similar the 2 trend charts
in terms of pattern.

You can validate if snapshot causes the problem by comparing before and after snapshot. That’s exactly what I did
below. Notice initially there was no snapshot. There was a snapshot briefly and you could see the effect immediately.
When the snapshot was removed, the 2 lines overlaps 100% hence you only see 1 line. When we took the snapshot
again, the read IOPS at datastore level is consistently higher.

Introduction: Metrics Complexity Page 167


VMware vSphere Metrics May 2023

How I know that’s IOPS effect as the throughput is identical. The additional reads do not bring back any data. Using
the same VM but at different time period, notice the throughput at both levels are identical.

And here is the IOPS on the same time period. Notice the value at datastore layer is consistently higher.

For further reading, Sreekanth Setty has shared best practice here.

Disk Space
I’m splitting disk space separately as operationally you manage performance and capacity differently.

Introduction: Metrics Complexity Page 168


VMware vSphere Metrics May 2023

Simple Example
Let’s start with a single virtual VMDK disk. Review the following diagram. What potential operational complexity do
you see?

The above disk is thin provisioned. It still has uncommitted space as it’s not yet fully used up.
There are actually 4 types of consumption in Virtual Disk.
 Actual used by Guest OS
 Unmapped block
 vSAN protection (FTT)
 vSAN savings (dedupe and compressed).

Advanced Example
Let’s take an example of a VM with 3 virtual disks, so we can cover all the combinations. We’re using vSAN so we can
show the additional disk space consumed by vSAN.

Introduction: Metrics Complexity Page 169


VMware vSphere Metrics May 2023

The various vRealize Operations metrics are shown in Times New Roman font.
The boxes with blue line show the actual consumption at VM layer. Let’s go through each rectangle.

vDisk 1 RDM.
That’s why it’s not on vSAN as can’t be on a VMFS datastore. It’s mapped to a LUN backed by an
external storage.
It’s always thick provisioned, regardless of what Windows or Linux uses. The LUN itself could be
thin provisioned but that’s another issue and transparent to ESXi (hence VM).
vDisk 2 VMDK thin.
We blended vSAN protection into a single box as you can't see the breakdown. It's inside the same
file (so there is only 1 file but inside there is actual data + vSAN protection - vSAN dedupe - vSAN
compressed).
Thin Provisioned can accumulate unmapped block over time. You should reclaim them by running
a trim operation.
Uncommitted space is the remaining amount that the VMDK can grow into. Since it’s not yet
written, it does not have vSAN overhead yet.
vDisk 3 VMDK thick.
The Used size equals the configured size as it’s fully provisioned regardless of usage by Guest OS.
I’m not sure the final of dedupe and compression. I expect it will be near 100% saving in both lazy
zero and eager zero.

vSAN protection (Failure To Tolerate) is shown in purple. It applies to every file in the datastore. Yes, even your
snapshot and log files are protected if you choose so.

Metrics and Properties


These are the fundamental metrics you use to see the VM disk space consumption.

Metric Description
Disk Space | Virtual Disk The actual consumed size of the VMDK files + the configured size of the RDM files. It
Used (GB) excludes other files such as snapshot files. Note: RDM actually appears as a VMDK in
the datastore folder, when you browse files of a VM.
Note: For RDM the used space is the configured size of the RDM, unless the LUN is
thin provisioned by the physical storage array. So its disk space consumption at VM
level works like a thick provisioned disk.
If this is higher than Guest OS used, and you’re using thin provisioned, then run un-
map to trim the unmapped blocks.
Disk Space | Virtual Ma- Just like above, but includes non virtual disks. So this metric is always larger.
chine Used (GB) The actual consumed size of the VM files + the configured size of the RDM files. It in-
cludes all files in the VM folder in the datastore(s).
Formula:
Sum ( [layoutEx.file] uniqueSize != null ? uniqueSize : size) / (1024 * 1024 * 1024)
Disk Space | <Datastore Just like above, but only includes files in that specific datastore only. For VM that

Introduction: Metrics Complexity Page 170


VMware vSphere Metrics May 2023

Name> | Virtual Machine only resides in 1 datastore, the value will be identical to above.
Used (GB)
Disk Space | Provisioned Just like the Disk Space | Virtual Machine Used (GB), but thin provisioned is based on
Space for VM configured not actual usage. So this metric will have higher value if the thin provi-
sioned is not fully used.
This metric is useful at the datastore level. When you overcommit the space and
want to know what the total space would be when all the VMs grow to the full size.
This metric is not useful for capacity as it mixes both allocation and utilization.
BTW, there can be cased where the number here is reported as much higher num-
ber. See KB 83990. This is fixed in 7.0.2 P03 or 7.0 U2c, specifically in PR 2725886.

Snapshot
In addition of latency and IOPS, snapshot can also consume more than the actual space consumed by the virtual disk,
especially if you are using thin and you take snapshot early while the disk is basically empty. The following VM has 3
virtual disks, where the snapshot file _1-00001.vmdk is much larger than the corresponding vmdk.

Disk Space | Snapshot | Disk Space used by all files created by snapshot (vmdk and non vmdk). This is the
Virtual Machine Used total space that can be reclaimed if the snapshot is removed. Use this to quickly de-
(GB) termine which VMs have large snapshot.
Formula:
Sum of all files size / (1024 * 1024 * 1024)
where aggregation is only done for snapshot files. A file is a snapshot file if its layou-
tEx file type equals to snapshotData, or snapshotList or snapshotMemory
Disk Space | Snapshot | The date and timestamp the snapshot was taken. Note you need to format this.
Access Time (ms)

Introduction: Metrics Complexity Page 171


VMware vSphere Metrics May 2023

Multi Writer Disk


Name Description
Disk Space | Active Not The total amount of disk space from all the VMDK and RDM that are exclusively
Shared (GB) owned by this VM.
Shared means the virtual disk is mounted by multiple VMs. So, this metric is useful
only when we have multi-writer disks.
Active means the disks minus snapshot. Snapshot is considered as non-active.
Formula: Disk Space|Not Shared (GB) - Disk Space|Snapshot Space (GB)

Virtual Disk
The following properties is available on each virtual disk:

Property Name Values


Virtual Device Node Virtual disks SCSI bus location. Virtual disks are enumerated starting with the first con-
troller and moving along the bus.
Compatibility Mode Physical
Virtual
Virtual mode specifies full virtualization of the mapped device. Physical mode specifies
minimal SCSI virtualization of the mapped device.
Disk Mode Dependent
Independent – Persistent
Independent – Nonpersistent
SCSI Bus Sharing None
Physical
Virtual
SCSI Controller Type BusLogic Parallel
LSI Logic Parallel
LSI Logic SAS
VMware Paravirtual
Virtual Disk Sharing Unspecified
No Sharing
Multi-Writer
Encryption Status
Number of RDMs Number of RDMs attached to the VM.
Pro Tip: sum these for all the VMs in a single physical array. Compare the result with
the number of LUNs carved out for RDM purpose.

If there are more LUNs than this number, you have unused RDM. You can’t have less 😊
You need to do the above per physical array, so you know which array needs attention.

Introduction: Metrics Complexity Page 172


VMware vSphere Metrics May 2023

Number of VDMK This excludes RDM.


Is RDM true
false
False means the virtual disk is a VDMK not RDM.

vSphere Client UI
Let’s start with the basic and progress quickly. In the following example, I would create a small VM from scratch,
with 2 VMDK disk.

Hard disk 1 is 10 GB. Thin Provisioned. On vSAN.


The VM is powered off. All other settings follow default setting.
I created the VM with just the first disk, to validate the metrics value that will be shown upon creation. What do you
expect to see on the vCenter UI?
Here is what I got on vSphere 7.

Introduction: Metrics Complexity Page 173


VMware vSphere Metrics May 2023

You get 2 numbers, used and allocated, as shown in the Capacity and Usage section.
Used is only 1.9 KB. This is expected as it’s thin provision and the VM is powered off. This is very low, so let’s check
the next number….
Allocated is 12.22 GB. This is 10 GB configured + 2.22 GB used. The hard disk 1 size shows 10 GB not 20 GB. This is
what is being configured, and what Guest OS see. It is not impacted by vSAN as it’s not utilization.
So you have 2 different numbers for the use portion: 1.9 KB and 2.22 GB.
Why 2 different values?
Let’s see what the files are. We can do this by browsing the datastore and find the VM folder.

The total from the files above is 36 MB. This does not explain 1.9 KB nor 2.22 GB.
Let’s continue the validation. This time I added Hard disk 2 and configure it with 20 GB. Unlike the first disk, this is
Thick Provisioned so we can see the impact. It is also on vSAN.

Introduction: Metrics Complexity Page 174


VMware vSphere Metrics May 2023

Used has gone up from 1.9 KB to 760 MB. As this is on vSAN, it consists of 380 MB of vSphere + 380 MB of vSAN
protection. The vSAN has no dedupe nor compression, so it’s a simple 2x.
Allocated is 32.93 GB as it consists of 30 GB configured and 2.93 GB. This 2.93 is half vSphere overhead + vSAN
protection on the overhead.
Looking at the datastore level, the second hard disk is showing 40.86 GB. It maps to hard disk 2.

From this simple example, you can see that Allocated in vCenter UI actually contains used and allocated. By allocated
it means the future potential used, which is up to the hard disk configured size. The used portion contains vSAN
consumption if it’s on vSAN, while the unused portion does not (obviously since vSAN has not written any block).

Introduction: Metrics Complexity Page 175


VMware vSphere Metrics May 2023

VM Network

VM is not an Operating System, so it has far less networking metric than Windows or Linux.

Overview
We will cover each metric in-depth, so let’s do an overview first.
As usual, we start with contention. There is no latency metric

Next, you check if there are unusual traffic. Your network should be mostly unicast, so it’s good to track the
broadcast and multicast packets. They might explain why you have many dropped packets.

Next you check utilization. There are 6 metrics, but I think they are triplicate.

Introduction: Metrics Complexity Page 176


VMware vSphere Metrics May 2023

Each packet takes up CPU for processing, so it’s good to check if the packet per second becomes too high

The metrics are available at each individual vNIC level and at the VM level. Most VMs should only have 1 vNIC, so the
data at VM level and vNIC level will be identical.
The vNICs are named using the convention "400x". That means the first vNIC is 4000, the second vNIC is 4001, and so
on. The following is a vCenter VM. Notice it receives a few broadcast packets, but it’s not broadcasting (which is
what you expect). It also does not participate in multicast, which is again expected.

The metrics are grouped into 2:


 Transmit for outgoing
 Receive for incoming.
For each group, the following metrics are provided:

Broadcast packets Count of packets.

Introduction: Metrics Complexity Page 177


VMware vSphere Metrics May 2023

Multicast packets It is the sum during the sampling window, not the rate (which is packet/second).
Packet dropped Multicast packet and broadcast packet are listed separately. This is handy as they
are supposed to low for most VM. Understand the nature of the applications so you
can check if the behaviour is normal or not.
Total packets The total includes the broadcast and multicast, but not the dropped ones.
Throughput per second This is measured in kilobyte, as packet length is typically measured in bytes. While
there are other packet size, the standard packet is 1500 bytes.
BTW, esxtop measures in megabit.
I assume this includes broadcast and multicast, but not the dropped packet.

Guess what metrics are missing?


 Retransmit. This can be useful in troubleshooting TCP packet. It naturally does not apply to UDP traffic.
 Latency. A normalized latency would help, especially if it’s broken into internal network and external
network. Network latency could be impacted by CPU. CPU might not fast enough to process the packet. In
VM, this could also be due to the VM having CPU contention.
 Packets per second. This can be derived by packet count / sampling window. If you have 200 packets in 20
seconds, that means 10 packets per second.
 Packet size. This can be computed by throughput / packet count. Expect this to be around 1500 byte.

Dropped Packet
As usual, let’s approach the metrics starting with Contention. There are no latency counter nor retransmit counter so
you cannot track how long it takes for a packet to reach its destination. There are, however, metrics that track
packet loss.
For TCP connection, dropped packet needs to be retransmitted and therefore increases network latency from
application point of view. The counter will not match the values from Guest OS level as packets are dropped before
it’s handed into Guest OS, or after it left the Guest OS. ESXi dropped the packet because it’s not for the Guest OS or it
violates the security setting you set.
The following summary proves that receive packet gets dropped many more times than transmit packet. This is
based on 3938 VMs. Each shows the last 1 month, so approximately 35 million data points in total. The average of 35
million data points show that dropped RX is significantly higher than dropped TX. This is why it’s not in the SLA.

Introduction: Metrics Complexity Page 178


VMware vSphere Metrics May 2023

The following table shows that the drop is short and spiky, which is a good thing. The value at 99 th percentile is 35x
smaller than the value at 100th percentile.

The high value in receive can impact the overall packet dropped (%) counter, as it’s based on the following formula
dropped = Network|Received Packets Dropped + Network|Transmitted Packets Dropped
total = Network|Packets Received + Network|Packets Transmitted
Network|Packets Dropped (%) = dropped / total * 100

I’ve seen multiple occurrences where the packet dropped (%) jumps to well over 95%. That’s naturally worrying.
They typically do not last beyond 15 minutes.

In this, plot the following 4 metrics. You will likely notice that the high spike is driven by low network throughput and
high received packet dropped.

Introduction: Metrics Complexity Page 179


VMware vSphere Metrics May 2023

Because of the above problem, profile your VM dropped packets, focusing on the transmit packets. I notice in
several customers production environment they exist, yet no one seem to complain. The following is one way to do
it, giving surprising results like this:

Introduction: Metrics Complexity Page 180


VMware vSphere Metrics May 2023

The design of the preceding table is:


 First column calculates the percentage packets dropped. I took 99 th percentile else many of the results will
be 100%.
 Second column sums all the transmitted dropped packets (actual packet counts).
 Third column takes the 99th percentile maximum of dropped packet within any 300 seconds. Each network
packet is typically 1500 bytes. Using 1.5 KB packet size, 1 thousand packets dropped = 1500 MB worth of
packets within 300 seconds.
I don’t expect dropped packets in data center network, so to see millions of dropped packets over a month needs
further investigation with network team. More over, those metrics are Transmit, not Received. So the VM sent them
but they got dropped.
What I typically notice is the spike rarely happens. They look like an outlier, especially when the number is very high.
The following is an example. I only showed in the last 1 month as the rest of the 6 months had similar pattern. The
jump is well cover 100 millions packets, and they were all dropped. Assuming each packet is 1 KB, since vRealize
Operations reports every 5 minutes, that’s 333 MB per second sustained for 300 seconds.

Introduction: Metrics Complexity Page 181


VMware vSphere Metrics May 2023

I also notice regular, predictable pattern like this. This is worth discussing with network team. It’s around 3800
packets each 5-minute, so it’s worth finding out.

Consumption
There are 2 main metrics to measure utilization: throughput and packets.
Both matter as you may still have bandwidth but unable to process that many packets per second. This outage shows
700K packets per second that only consumes 800 Mbps as the packet is small. The broadcast packet is only 60 bytes
long, instead of the usual 1500 bytes.

Introduction: Metrics Complexity Page 182


VMware vSphere Metrics May 2023

Performance

With so many metrics, how do you monitor at scale? Say you have 1000 VM and you want to monitor every 5
minutes and see the performance trend in the last 24 hours. That would be far too many trend charts.
Enter Performance (%) metric.
VM KPI includes Guest OS metrics as operationally we troubleshoot them as one, due to their 1:1 relationship.
Let’s now put together all the metrics from Guest OS and VM. For completeness, I added the utilization metrics to
act as leading indicators.

The KPI metrics maybe too technical for some users. You also need to reduce them into a single metric so you can
manage at scale. As each metric has their own units, we need to convert them into a unit-less range. I picked 0 – 100
range as that’s easier to understand.
Pick the metrics that’s relevant to your environment. Here is what I recommend, including their threshold.

Introduction: Metrics Complexity Page 183


VMware vSphere Metrics May 2023

Memory Ballooned, Swapped, Compressed are added even though their presence do not indicate real performance
as they are leading indicators. Swapped and Compressed are combined as they are the result of the same action.
Together they tell the complete picture.
Do you know why we use CPU Run – Overlap as opposed to CPU Usage? Read Part 2 Chapter 2 CPU Metrics.
We can only put metric here if they can be quantified in to the 4 brackets. Else it might do a disservice. Hence
majority of utilization metrics (e.g. disk IOPS, network throughput) are not here.
The threshold is designed to support proactive, not alert based operations. Hence, the red range does not mean
emergency and you must drop everything. It means you need to take a look within the next 24 hours. This also gives
you time to evaluate how many times it falls into the red zone and the overall trend.

20-second Peak Metrics


5 minutes interval is good enough for monitoring use case, but not for troubleshooting. 300-second average is not
granular enough, as performance problem may not be sustained that long. Even a performance issue that last days
may consist of repeated microbursts. I check if repeated burst exist by profiling a few thousand VMs. Here are some
of the results. I compare 3 metrics (disk latency, network throughput and CPU context switch).

Introduction: Metrics Complexity Page 184


VMware vSphere Metrics May 2023

The peak column is based on 20-second average. So it’s 15x sharper than the 300-second average. It gives better
visibility into the microbursts. If the burst exists, you will see something like this, where the 20-second shows much
worse value consistently.

Are you surprised to see that the 20-second peak is a lot worse than 15x worse? The preceding chart shows 10370
ms latency at 20-second vs 257 ms at 300 second.
The huge gap is due to 2 things
 There is only 1 or 2 microbursts, and it’s much higher than the average. This can happen on counter such as
disk latency and CPU context switch, where the value can be astronomically high.
 There are many sets. A VM can have many disks. For example, a database VM with 20 virtual disks will have
40 sets of metrics. Each set has 15 datapoints, giving a total of 600 metrics. The peak is reporting the highest
of 600 metrics. If the remaining is much lower, then the gap will naturally be high.

Introduction: Metrics Complexity Page 185


VMware vSphere Metrics May 2023

How are they chosen?


Take a look at the table below. It shows a VM with 2 virtual disks. Each disk has its own read latency and write
latency, giving us a total of 4 metrics.

What vRealize Operations does is to add a new metric (shown in red, showing 100 ms value). It is the peak of 15 x 4 =
60 data points. It does not change the existing metric, because both have their own purpose. The 5-minute average
is better for your SLA and performance guarantee claim. If you guarantee 10 ms disk latency for every single IOPS,
you’d be hard pressed to deliver that service. These new metrics act as early warning. It’s an internal threshold that
you use to monitor if your 5-minute SLA is on the way to be breached.
vRealize Operations takes the peak of these data points, and stores them every 5 minutes. It does not store all data
points, because that will create a lot more IOPS and consume more storage. It answers the question “Does the VM or
Guest OS experience any performance problem in any 20-second period?”
What’s the limitation?
 You can’t see a pattern within the 300 seconds window as you only have 1 data point. This is largely
mitigated by having the average counter also. If the delta between the maximum and average is high that
means the maximum is likely a one-off occurrence. The pattern can also be seen over longer period of time.
 The peak can be from a different time period. That means you can’t associate that the contention is caused
by high utilization as the 2 metrics can come from different time.

Metrics Used
5-minute Average 20-second Average
Guest OS CPU Run Queue Peak CPU Queue within collection cycle
CPU Context Switch Peak CPU Context Switch within collection cycle
Memory Page-out Rate Peak Guest OS Page-out Rate within collection cycle
Disk Queue Length Peak Disk Queue within collection cycle
VM CPU Ready (%) Peak vCPU Ready within collection cycle
Co-Stop (%) Peak vCPU Co-Stop within collection cycle

Introduction: Metrics Complexity Page 186


VMware vSphere Metrics May 2023

IO Wait (%) Peak vCPU IO Wait within collection cycle


Swap Wait (%) Peak vCPU Swap Wait within collection cycle
Overlap (second) Peak vCPU Overlap within collection cycle
System (%) Peak CPU System within collection cycle
VM Memory Contention (%) Peak Memory Contention within collection cycle
VM Disk Read Latency (ms) Peak Latency within collection cycle
Write Latency (ms)
VM Network Usage Rate (KB/s) Peak Usage Rate within collection cycle
Packet/sec Peak Network Packets/sec within collection cycle

The VM network dropped packets is not included as seeing the number over 20 second or 5 minutes do not result in
a different remediation action.
Notice all of them are VM or Guest OS metrics. No ESXi, Resource Pool, Datastore, Cluster, etc metrics. Why?
The reason is the metrics at these “higher-level” objects are mathematically an average of the VMs in the object. A
datastore with 10 ms disk latency represents a normalized/weighted average of all the VMs in the datastore.
Another word, these metrics give less visibility than the 12 above, and they can be calculated from the 12.
And 1 more reason:

You troubleshoot VM, not infrastructure. If there is no VM, there is no problem 😊

The next question is naturally why we picked the above 12. Among the 12 metrics, you notice only 1 counter tracks
utilization. The other 11 tracks contention. The reason is covered here.
Why are Guest OS level metrics provided?
Because they do not have VM equivalent, and they change the course of troubleshooting. If you have high CPU run
queue, you look inside Windows and Linux, not at the underlying ESXi Host as it’s transparent to the host.
For CPU, the complete set of contention is provided. There are 6 metrics tracking the different type of contention or
wait that CPU experiences.
For Memory, popular metrics such as Consumed, Active, Balloon, Swap, Compress, Granted, etc are not shown as
they do not indicate performance problem. Memory Contention is the only counter tracking if the VM has memory
problem. VM and Guest OS can have memory problem independently. In future, we should add Guest OS memory
performance metrics, if we find a good one. Linux and Windows do not track memory latency, only track memory
disk space consumption, throughput and IOPS. These 2 OSes do not track latency, which unfortunately is the main
counter for performance.
For Network, vCenter does not have latency and re-transmit. It has dropped packet, but unfortunately this is subject
to false positive. So we have to resort to utilization metric. In future, we should add packets per second.
Lastly, just in case you ask why we do not cover Availability (e.g. something goes down), it’s because this is better
covered by events from Log Insight.

Introduction: Metrics Complexity Page 187


VMware vSphere Metrics May 2023

Metrics Not Used


What metrics are missing from the tables?
The following metrics are not included, along with the reason why:
 Guest OS IOPS : VM IOPS Ratio. They should be near 1 or a stable number, as the block size should be
identical. The actual numbers may not match, as Guest OS tends to report the last value, while VM tends to
report average value. If they fluctuate greatly, something amiss. I do not include as I do not have the data
yet.
 Guest OS: No of dead process. Not sure what value to set for each bracket, as we need to profile first.
 Guest OS: CPU Context Switch. The profiling shows this metrics has a very wide band.
 Guest OS: Memory page-in. This could contain application binary, so its value could be over reported. Based
on our profiling of 3300 production VM, the page-in is more volatile so I’m less confident of applying a
threshold.
 Guest OS: Swapped File remaining size. Not sure if they impact performance.
 VM Balloon. We covered the reason here.
 Outstanding IO. Adding it will be duplicating as it’s a function of IOPS x latency.
 vMotion. This is an event, not a metric. It does not happen regularly, in fact most of the time it does not
happen.
 VM vMotion stunned time. I do not have enough data to decide the value to put for each range. It should be
within 0.2 second for Green, but what about yellow? Typically, I used 2K – 4K VMs over 3 months to
convince myself that the thresholds are representing real world.
 Latency due to disk snapshot. The metric VM Wait already covers it, so no need to double count.
 Undesired network packets, such as broadcast and multicast. They do not actually cause performance.
 Network RX Dropped Packets. Too many false positive.
 VM DRS Score. Niels Hagoort states here that “a VM running a lower score is not necessarily not running
properly. It is about the execution efficiency, taking all the metrics/costs into consideration.” Reading the
blog and other material, this metric is more about the cluster performance than the individual VM
performance. Plus, it’s using metrics that are already included in the KPI, so it’s double counting.
The threshold can be argued from 2 ways
 Scientifically
 “Practically”
Scientifically, a VM does not care what’s stopping it. Whether it’s Ready or Co-Stop or Overlap, the Guest OS does
not know. Using this logic, you should set all the threshold the same way. On the other hand, you can follow what
happens in production, in healthy environment. These metrics do not follow the same scale.
I take the lowest of the two, as the requirement is proactive monitoring.

Introduction: Metrics Complexity Page 188


VMware vSphere Metrics May 2023

This page is intentionally left blank.

Introduction: Metrics Complexity Page 189


VMware vSphere Metrics May 2023

Chapter 3

ESXi

Introduction: Metrics Complexity Page 190


VMware vSphere Metrics May 2023

ESXi ≠ VM + VMkernel

In theory, we can say that the consumption type of metrics at ESXi is the sum of its VMs + VMkernel, while the
contention type of metrics is the sum of its VMs + non vSphere kernel modules (e.g. vSAN world and NSX world). The
reason the 2 metric types are different is the VMkernel practically does not experience contention as it gets the
highest priority.
In practice, it is often easier to measure direct at ESXi level and avoid VM level counters altogether.

Compute
CPU Consump- It’s simpler and faster to directly look at the physical cores and their thread. There is no need
tion to view from VM level, and then sum them up. Whether a core is running a VMkernel or VM
is irrelevant.
At the physical core level, there is no such thing as Ready and Co-stop. A core either runs or
idle.
As a result, ESXi CPU consumption is the sum of its cores. The VM metrics are not involved at
tall.
Both types of consumption metrics are needed.
 Is the core running or not?
 When it’s running, how fast and how efficient is the run?
CPU Contention I think this is the normalized average of all the VMs + non vSphere kernel modules, as the
VMkernel modules do not experience ready and co-stop.
Memory Con- Balloon should not be included as it happens at a different realm altogether.
sumption Just like CPU, it’s simpler to look at the physical DIMM. It’s also more accurate if there is
memory savings across >1 VM.
Memory Con- Similar to CPU, I think VMkernel modules do not experience swap or compression.
tention

Introduction: Metrics Complexity Page 191


VMware vSphere Metrics May 2023

Storage
Disk is a little tricky as there are space and speed.
For speed, you’re looking at IOPS and latency as storage adapter, storage path and disk device. For
For space, you’re looking at disk space used at datastore level. RDM is not applicable as ESXi cannot see the used
metric.

Disk Speed Con- It’s simpler and faster to directly take the metrics from physical storage adapters and LUNs.
sumption There is no need work at the VM level and then sum all the VMs. Whether the HBA is serving
IO from VMkernel or VM is irrelevant.
Overall, the total IOPS is the sum of all VM IOPS + VMkernel. The difference is sequential IO
might become random due to IO blending effect.
Same approach applied for Storage Path and Storage Disk Device.
Disk Speed Con- At the storage adapter level, there is no more association between an IO and VM. The physi-
tention cal card also has its own queue depth. As a result, you do not want to compute from the VM
metrics.
Same approach applied for Storage Path and Storage Disk Device.
Disk Space Con- This is the sum of all VMs.
sumption
Disk Space Con- This is not applicable as it’s basically overcommit (capacity model)
tention

Network
There are 2 levels of network due to virtualization, and their utilization logically do not match.
The virtual network consists of VM and VMkernel (e.g. vMotion). If the traffic is a VM to VM traffic within the same
ESXi, the packets does not reach the physical network, hence the vmnic metrics do not register it. The virtual
network does not have the limit that physical network does, if the traffic remains in the box. This makes it harder to
use this metric as the 100% is not statically defined. So instead of just monitoring the throughput metric, you should
also check the packet per second metric.
The physical network means traffic going through the physical network card. At this level it’s no longer aware of VM
and VMkernel.

Introduction: Metrics Complexity Page 192


VMware vSphere Metrics May 2023

CPU

Now that you’ve read the VM metrics, it’s easier to understand the ESXi metrics. Be prepared to look at the metrics
from physical viewpoint. As usual, let’s start with CPU.
Throughout this book, I always cover the contention metrics first, then consumption. Why is it that I swap the order
for ESXi Host?
The reason is your operations can’t wait until problem become serious. All the built-in metrics are averaged of all the

running VMs. So by the time they are high, it’s time to prepare your resume and not start troubleshooting 😉

I’d provide a set of leading indicators to replace them. In the meantime, let’s dive into the utilization metrics with a
quiz.

Quiz: 50% or 75% or 100%?!


Hope you like the tour of VM CPU accounting. Can you apply that knowledge into ESXi and explain the following?

The above is an ESXi host, showing 3 types of utilization metrics.


 One shows 50%, indicating you have capacity.
 The second one shows 100%, indicating you do not have capacity.
 The 3rd shows 75%.

Introduction: Metrics Complexity Page 193


VMware vSphere Metrics May 2023

Which metrics do you take for the ESXi CPU “consumption” then?
Since the graph is a bit small, let’s zoom in

Notice they have similar pattern, but their sensitivity differs.


 Why is Usage (%) = 100% when Utilization (%) is around 47%? The gap is more than double. What could be
causing it?
 Why is Utilization (%) fluctuating yet Usage (%) remains constant? Notice both Utilization varies between
45% and 55% while Usage remains flat at 100%
 Why is Core Utilization (%) in the “middle”? What does it actually measure then?
To answer the above, we need to cover some fundamental. Note that we must take the vantage point of ESXi, not
VM. I know they are similar so it’s easy to get mixed up. From ESXi physical threads viewpoint, things such as Ready
and Co-Stop are not applicable as the physical threads are provider of resource.
Unlike RAM, CPU performance varies widely among different CPU models. Speed matters in CPU, whereas in RAM
we can generally ignore it. DDR5 RAM is faster than DDR4 but for general monitoring reason it can be ignored.
Because of this significant difference in CPU, we need to have metrics to account for:
 How often it runs. How much the CPU runs in a time period. E.g. if it runs 60% of the time in the last 100
seconds, that means it runs for 60 seconds accumulatively in that period. That’s why you see many metrics in
millisecond. They track the consumption over time.
 How fast it runs. All else being equal, a 5 GHz CPU is 5x faster than a 1 GHz CPU. Throughput impacts
utilization. The faster it can complete a task, the shorter it has to work. That’s why you see some metrics in
MHz.
 How efficient it runs. CPU SMP impacts the core efficiency. This is covered more here. This efficiency is then
translated into MHz, for ease of accounting. Unfortunately, this simplification creates confusion as HT and
Power Management are not the same thing.
These 3 dimensions of run are the reason why CPU consumption is hard to measure. It becomes “it depends on what
you consider”. It can’t be a single number. Insisting that the CPU has a single, static, total capacity and use this as the
only 100% for all use cases will result in confusion in “consumption” numbers.

Introduction: Metrics Complexity Page 194


VMware vSphere Metrics May 2023

Utilization
Let’s dive into the first two fundamental metrics: Utilization and Core Utilization.
We need to start at the fundamental, a single physical core of a socket. The socket can have many cores, we are just
interested on 1 core only. It has 2 threads as it supports CPU SMT.
In a time period of say 20 seconds12, this core had the following consumption:

Looking at esxtop, you will see near the top the PCPU Used and PCPU Utilization metrics. Note that their values are
in percentage, meaning you need to know what they use for 100%.
If you guess that they eventually map into vSphere Client metrics Usage (%) and Utilization (%), respectively, you are
right. However, you need to know how they map.
PCPU means a physical, hardware execution context. That means it is a physical core if CPU SMT is disabled, or a
physical thread inside a core if SMT is enabled. It does not mean CPU socket. A single socket with 10 cores and 20
threads will have 20 PCPU metrics.
PCPU Utilization (%) tracks is a physical thread is used or not over time. At any given moment, a thread is either
running (unhalted) or not (halted). So it’s binary (0% or 100%). But over the 20 second period, the value is averaged.
So when you see the number as 50%, it does not mean it’s running 100% at half the “speed”. It means it’s running
half the time, for only 10 seconds. Using a human analogy, think of it as a person who is either running or standing,
and never walking. It’s not considering CPU Frequency.
Core Utilization (%) tracks at the core level. If one of the threads is running, then the value is 100%. At the core level,
the average utilization in that entire period is 75%. In the last portion, the core still runs at 100%. The CPU Utilization
(%) tracks this. As a result, CPU Utilization (%) is only relevant when hyper-threading is enabled.
Going back to our example, here are metrics reported:
 PCPU Utilization (%) for HT 0 = 10 seconds / 20 seconds = 50%
 PCPU Utilization (%) for HT 1 = 10 seconds / 20 seconds = 50%
 Core Utilization (%) for entire core = 15 seconds / 20 seconds = 75%
BTW, in vSphere Client, you can’t choose a core if you enable HT. You choose PCPU, which is a thread. So what
happens on the Core Utilization counter at thread level?

12
I use 20 second as it’s a familiar number. That’s what you see in the real time chart in vCenter client, and 20000 ms is often
used as the 100% when converting millisecond unit to percentage.

Introduction: Metrics Complexity Page 195


VMware vSphere Metrics May 2023

Does it get split into half?


As you can see below, no. The value is duplicated.

Notice in the above chart, the 2 have identical value.


Utilization (%), on the other hand, will be different. Each thread has different value.

If you simply sum them up, you get more than 100%, so don’t!

Introduction: Metrics Complexity Page 196


VMware vSphere Metrics May 2023

Now let’s roll this up to the ESXi level. The following show a tiny ESXi with 2 cores, where each core has 2 threads.

The metrics at ESXi level is


 CPU Utilization (%) = 40 seconds / 80 seconds = 50%.
 CPU Core Utilization (%) = 30 seconds / 40 seconds = 75%
Utilization = 50% because each thread is counted independently. There are 4 threads in the preceding ESXi, each
runs 50%, so the average at ESXi level is 50%. This counter basically disregards that HT does not deliver 2x the
throughput.
This is why the Core Utilization (%) will tend to be consistently higher than Utilization (%). The following chart
demonstrate that.

Introduction: Metrics Complexity Page 197


VMware vSphere Metrics May 2023

Now let’s go back to the chart shown earlier. Can you now explain Utilization (%) and Core Utilization (%)?
Great! Let’s move to the next one.
In the following example, this ESXi has no hyper-threading. What do you notice?

Yup, the Core Utilization is identical with Utilization.


I’d use Utilization (%) but will always accompany it with the contention metrics. Since it’s about performance
troubleshooting, I’d set the threshold around 90% - 95%.

Introduction: Metrics Complexity Page 198


VMware vSphere Metrics May 2023

Used
Done reading the Utilization metric?
Great! You are now ready to tackle the next metrics, which are Used (%) and Used (ms). Used considers CPU
frequency (both Turbo Boost and power saving). Used consider HT, although it assumes it delivers no benefit and
halve its value to 50% instead of 62.5%.
Here is how Utilization (%) and Used (%) are related at PCPU level:

A physical Thread is either executing (running) or halted (idle). While it’s running, it can run at lower/higher CPU
clock speed. CPU Used accounts for this.
Its execution will be less efficient if its paired thread is also running at the same time. CPU Used accounts for both.
CPU frequency scaling is caused by power management, so let’s dive into it.

Used (%)
Now that we have covered CPU Clock Speed, we can add this dimension into the same scenario above. For that, we
will go back to our tiny ESXi:

Introduction: Metrics Complexity Page 199


VMware vSphere Metrics May 2023

In Core 0, the first thread was running at half the CPU frequency in the first period. While Utilization (%) records this
as 100% run, Used (%) is aware of this reduction and records 50% instead. The second thread wasn’t running so Used
is not impacted.
In the 4th period, the thread is competing with another thread. Used (%) recognises the drop in efficiency and
register 50% instead of 100%. Personally, I’d prefer this to register 62.5% as it’s caused by HT. This will also make it
consistent with CPU Latency and VM CPU Demand, which applies 37.5% as HT penalty.
On the other hand, when Turbo Boost increases the clock speed by 1.5x on the 2 nd thread, Utilization (%) is unaware
and record 100%, but Used registered 150%.
Here is all the possible permutation of a core. Take note that the frequency can be less than 1.

Thread 1 Thread 2 Thread 1 Thread 2 Core Frequency Thread 1 Thread 2 Core


Run Run 50% 50% 100% 1.3x 65% 65% 130%
Run Not Run 100% 0% 100% 1.3x 130% 0% 130%

So when happens when the CPU frequency goes down?

Thread 1 Thread 2 Thread 1 Thread 2 Core Frequency Thread 1 Thread 2 Core


Run Run 50% 50% 100% 0.5x 25% 25% 50%
Run Not Run 100% 0% 100% 0.5x 50% 0% 50%

Used (ms)
The following is taken from an ESXi with 24 cores and 48 threads. I’m showing logical processor 46 and 47. They are
from core no 24, meaning they share a physical core.
I stack the chart. Do you see something strange?

Introduction: Metrics Complexity Page 200


VMware vSphere Metrics May 2023

The total is exactly 20000.


Each thread is only given 10000. This is not intuitive as the data is measured every 20000 ms.
Notice the maximum value for Idle is 10000, not 20000. You expect the total for 2 threads to be 40000 as there are 2
threads.
BTW, this 10000 matches the 50% we covered earlier. So Used (ms) and Used (%) are consistent with each other.
Any other thing you notice from the chart?
The sum of the 4 metrics is a perfect line.
CPU Idle (ms) + CPU Used (ms) = 100%.
That’s a bit odd, because power saving brings down the value of Used. So Idle needs to be adjusted if the total has to
remain 20000. This is a bit odd, as by definition idle means CPU is not doing work. It’s 0. So the frequency

Thread 1 Thread 2 Frequency T1 Used T2 Used Core Used T1 Idle T2 Idle Core Idle
Run Run 0.5x 5000 5000 10000 5000 5000 10000
Run Not Run 0.5x 10000 0 10000 0 10000 10000

You can see the peak where Used (ms) shot well above 10000 multiple times.

Introduction: Metrics Complexity Page 201


VMware vSphere Metrics May 2023

Can you guess how many physical cores the following ESXi has?

Answer: 20 cores, 40 threads.


Notice the total sum is constant at 400K ms. Divide this by 20K ms and you get 20 cores. While the graph visually
shows the line is slightly above 400K, the sum of the 2 values shown is actually 400,000.01 ms.
If you want to verify with vCenter, the following ESXi host has 24 cores 48 threads. Notice the sum is 480,000 ms, not
960,000 ms.

Introduction: Metrics Complexity Page 202


VMware vSphere Metrics May 2023

The vCenter counter Used (ms) maps to PCPU Used (%) counter in esxtop.

Usage
vCenter adds this counter, meaning it does not exist at ESXi level. In some parts of the UI, vCenter uses the name
Used instead of Usage. But in the metrics chart, it uses Usage. I’m going to assume that Used (MHz) = Usage (MHz)
as vCenter does not have Used in MHz.
If you see in esxtop, you will find Used (%) and Utilization (%) but not Usage. Usage basically maps to Used, but
showing in MHz and use 20000 as opposed to 10000.

Introduction: Metrics Complexity Page 203


VMware vSphere Metrics May 2023

This is great as using millisecond is hard to account for “how fast you run” and “how efficient you run”. With MHz,
we can plot the value across time.
With this knowledge, now the screen on vCenter client UI will be clearer.
You see both the Capacity of 35.18 GHz and Used of 11.3 GHz. There is no concept of Usable Capacity in vSphere, so
the Free amount is basically Capacity – Used.

vCenter shows Used in GHz. The value is actually the value of Usage metric, as the Used counter is percentage or
millisecond.
The Used CPU is summary.quickStats.overallCpuUsage.

Introduction: Metrics Complexity Page 204


VMware vSphere Metrics May 2023

The value above is likely some average of say 5 minutes as it remains static for a while and it does not exactly match
the number below as the roll up period is not the same.

Usage is capped at 100%, even at the thread level.

Introduction: Metrics Complexity Page 205


VMware vSphere Metrics May 2023

Let’s see if Used (ms) = Usage (MHz).


To prove it, we plot 180 data points from each, and compare the average. For completeness, let’s compare the latest
value too.

Introduction: Metrics Complexity Page 206


VMware vSphere Metrics May 2023

Let’s compare the above value to prove the formula. We need to translate them into a common unit for comparison.

Bingo!
Both the average values and the latest values match.
Just like Used, Usage tops out at 100% when all cores run at least one thread at nominal frequency, even if there is
still "headroom" for Turbo Boost or scheduling "capacity" on other threads. This is why its value will be lower than
Core Utilization if there is power savings, as shown below.

Introduction: Metrics Complexity Page 207


VMware vSphere Metrics May 2023

ESXi CPU Usage (%) = CPU Usage (MHz) / CPU Total Capacity (MHz), where Total Capacity = total cores x nominal
clock speed. It does not consider hyper threading. This accounting technique of removing hyperthreading is
consistent with Used.
The following chart prove the above equation.

When is Usage (%) higher than Core Utilization (%)?


The answer has to be Turbo Boost. The following shows an ESXi where Usage is consistently higher than Core
Utilization (%) in the last 24 hours. Notice that the value of Usage was capped at 100%. It did not breach 100%

Introduction: Metrics Complexity Page 208


VMware vSphere Metrics May 2023

I’ve marked some areas of the above chart with red dot. Those areas is where Usage turns out to be lower than Core
Utilization.
Why?
The answer is power saving, which typically happens on low utilization. In an aggressive power savings, Usage can
even be lower than Utilization, as shown below. This makes sense, as the idles cores consumes are run at lower
frequency, hence the average at ESXi level is low.

Demand
This is actually an internal counter. It’s for VMkernel CPU scheduler to optimize the running of VM as the kernel is
aware that hyper-threading has performance impact. So it’s not a capacity counter.
Demand looks at different context than Utilization/Used/Usage. It looks at the VM world, not the physical cores.
That’s why it’s not available at per core or thread level. The value you see at ESXi is the summation of all the VMs,
not logical CPU.
Because Demand considers all types of contention, as a result, its value tends to be higher than all the other metrics.
It does not include the VMkernel load, so at lowly utilization, Demand will be lower than Usage.

Introduction: Metrics Complexity Page 209


VMware vSphere Metrics May 2023

One good thing about Demand metric is it can go above 100%. All the other metrics are capped at 100%. Demands
lets you see how high above 100% the demand. It does not mean the VM is experiencing performance, as there is
Turbo Boost and Hyperthreading to assist.

ESXi “Utilization” metrics


Let’s summarise the metrics we have covered so far. vCenter provides 6 metrics to account for the utilization of ESXi
CPU. Since esxtop uses the Used (%) metric but ESXi uses the Used (ms) metric in the vCenter client, I’m including
both.

Available at Unit Source HT CPU Speed


Utilization Thread level % ESXi 2x No
Used Thread level ms vCenter No
Used Thread level % esxtop only 2x Yes
Core Utilization Core level % ESXi Any No
Usage Thread level % vCenter Yes
Usage in MHz Host level MHz vCenter Yes
Demand Host level MHz ESXi Yes

Introduction: Metrics Complexity Page 210


VMware vSphere Metrics May 2023

The column HT indicates how the counter treats HT. 2x means it doubles the number of capacity. I put “Any” for
Core Utilization because if any of the thread runs, the counter goes up to 100%. If both run, it’s still 100%.
You know that only Utilization (%) and Used (%) exist at the thread level because they are the only one you see at
esxtop13, as shown below.

With so many metrics, which one should you choose?


Let’s now evaluate all the possible scenarios so you can compare the values returned by the metrics. We will use a
simple ESXi with 2 cores. Each core has 2 threads. In each of the scenario, a thread is either running or not running.
There is no partial run within a thread as that’s mathematically covered in our scenarios.
I will also use 20000 ms as that’s more familiar. The following table shows an ESXi with 2 cores. There are 6 possible
permutations in their utilization.

The table shows clearly that Used splits the Utilization into 2 when both threads are running.
Look at scenario 1. While Utilization charges 20000 ms to each thread, Used charges 10000. This is not intuitive as
ESXi considers HT to deliver 1.25x. Personally I find 12500 easier to understand. The good news is this number is
normalized back when it is rolled up to the ESXi host level.
How will those scenarios roll up at the ESXi level?
The following table shows the 4 metrics (Utilization, Used, Core Utilization, Usage). I have expressed each in % so it’s
easier to compare.
There are 6 different scenarios, so logically there should be 6 different values. But they are not, so I added my
personal take on what I like them to show. I’m keen to hear your thought.

1316
Source: VMworld presentation HCP2583 by Richard Lu and Qasim Ali

Introduction: Metrics Complexity Page 211


VMware vSphere Metrics May 2023

What’s causing the difference?


Yup. Hyper Threading.
Why do I choose 125% instead of 100%?
To me, the 1.25x bonus factor has to be shown. Without HT, it’s 100%. HT is a bonus. While it provides 1.25x overall
throughput, each thread pays an expensive price, as each suffers 37.5% penalty.
The other reason is why I choose 125% as the upper limit is it’s easier when thinking in GHz.
Example: say the CPU specification is 3 GHz for its nominal frequency. It has HT enabled, and power management
disabled.
What’s the CPU Capacity?
To me, 3 GHz is easier to explain than 3.75 GHz. It’s also more correct, as the CPU does not actually run at 3.75 GHz.
It runs 2 threads at 3 GHz each, but at 62.5% efficiency, so accounting wise it’s
= 3 GHz x 62.5% + 3 GHz x 62.5%
= 3.75 GHz, which is 125% of 3 GHz

Scenario Analysis
1 Do you notice something strange with the value of Used (%)?
Yes, it’s no longer 50%. It’s 100%. The average of 50% is 100%
The reason is the accounting does not count each thread as 20000. Each core has 20000 and not
40000. If you say that is similar behaviour to Core Utilization, you’re right.
3 Utilization is only showing 50% when both cores are utilized. I prefer this to show 80% as HT only
delivers 1.25x, not 2x.
On the other hand, Usage goes up too fast. It's already showing 100% when there is still 25% room
left.
5 Utilization is again showing too low a value, and Usage too high a value.

Now let’s add CPU clock speed. What happens when there is power management?
I’d focus on just Used and Usage to highlight the difference.
What do you notice from the table below?

Introduction: Metrics Complexity Page 212


VMware vSphere Metrics May 2023

Both Used and Usage are capped at 100%. I prefer this not to be capped, to distinguish it from the other 100%. The
good part is Demand metric is not.
For comparison, I put forth what I think the counter should be.
Let’s take some ESXi hosts running production workload to see how the values compare in real world. Each row
represents an ESXi host. What’s your conclusion from reviewing the following table?

I’ve marked two of the rows with a red dot.


 The first one happens because of CPU scaling. Not all cores are busy, since Core Utilization shows 72%. The
busy ones were dynamically boosted by VMkernel by an average of 21%, hence the Usage counter registers
88%
 The second example is the opposite. This ESXi is not even 50% utilized, as the core utilization shows 48.88%.
VMkernel decides that it could complete the job with less power, and clocks down by an average of 43%.
Notice that Usage (%) does not count the hyperthreading. The Total Capacity metric is simply based on cores x
nominal speed.
Now that we know more about the metrics, which ones should we use and how?
To answer that, we need to first determine the “100%”. That’s the ceiling, the total capacity.

Introduction: Metrics Complexity Page 213


VMware vSphere Metrics May 2023

Consumed
vSphere Client introduces a new counter: Consumed.
What does it map to?
When vSphere UI lists ESXi Hosts, it typically includes the present utilization. It lists the metrics as Consumed CPU (%)
and Consumed Memory (%).

Consumed CPU maps to CPU Usage (%). Consumed Memory (%) maps to Memory Consumed (KB).
To confirm it, simply plot CPU Usage value. The last value is what you see at the table.

ESXi Peak Core CPU Usage


Is any of the physical threads running hot?
An ESXi with 72 CPU cores will have 144 logical processors. Hence it’s possible that one of them is running hot, while
the rest is not. You will not be able to see that single core peak at ESXi Host level as it’s the average of 144 metrics. If
you are concerned that any of them is running hot, you need to track the peak among them.

Introduction: Metrics Complexity Page 214


VMware vSphere Metrics May 2023

Peak CPU Core Usage (%) tracks the highest CPU Usage among the CPU cores. A constantly high number indicates
that one or more of the physical cores has high utilization. So long the highest among any cores at any given time is
low, it does not matter which one at a specific point in time. They can take turn to be hot, it does not change the
conclusion of troubleshooting. Max() is used instead of 95thpercentile as both result in the same remediation action,
and Max() can give better early warning.
The imbalance value among the cores is not needed because it is expected when utilization is not high. When a VM
runs, it runs on a few cores, not spread out to all ESXi cores. It’s more efficient to schedule that way, as will requires
less context switches.

Contention Metrics
The nature of average is also one reason why ESXi “consumption” does not correlate to ESXi “contention”. The 4
highlighted area are examples where the metrics don’t correlate, even go the opposite way in some of them. Can
you guess why?

These are the reasons why they don’t match:


 One looks at physical CPU, the other the virtual CPU. One looks at ESXi, while the other looks at VM.
 Hyperthreading and Power Management.
 Imbalance utilization. There are many VMs in this host. Their experience will not be identical.
 Limit may impact the VM, either directly or via resource pool.
 CPU pinning, although this is rarely happen.
So what metrics should you use?
The Performance (%) metrics. We will cover them in later part of the book. In the meantime, here are the latency
metrics provided by vSphere Client.

Introduction: Metrics Complexity Page 215


VMware vSphere Metrics May 2023

VMkernel
There are 2 metrics: reservation and actual utilization.
For performance, you just base on actual utilization. For capacity, you need to take the highest of the 2 metrics.

CPU Reservation
The following screenshot shows the counter names used by vSphere Client UI

Active (1 minute) My guess this maps to the Used metric. Reason is there are only 2 basic metrics at physical
layer: Utilization and Used.
It’s an average over the last 1-minute.
Take note this is the latest amount, not the average. Since the collection interval is 20 sec-
onds, that means it’s value at the 20th second, not the average of all values in the entire 20
seconds period. Since the value itself is the average of the last 1 minute, you’re looking at
the average taken every 20 second. So there is 40-second overlap
Active (5 minute) As above, but the overlap between each data point is 4:40 minutes.

Introduction: Metrics Complexity Page 216


VMware vSphere Metrics May 2023

Running My guess is this maps to the Utilization metric. Reason is it's consistently lower than Active. I
also compare at ESXi Usage vs ESXi Utilization, and the comparison kinda match.
Allocation max- Limit. That’s the upper limit as lower limit does not make sense in counters like utilization as
imum it’s always 0.
For ease of use, let’s just call this allocation.
Allocation min- This is reservation, which is the guaranteed minimum it will get when it asks for CPU cycles.
imum
Shares Relative shares of each VMkernel world.
This is VMkernel internal metric, not something vSphere Administrator should change
Maximum Limited is the time limit was applied to throttle because usage wanted to exceed allocation. You
should expect this to be 0 all the time
Running I think this maps to the Utilization metric
Usage I think this maps to the Used metric.

Utilization is relatively much more volatile or dynamic, while allocation and reservation are more stable. The
following screenshot shows CPU Usage fluctuates every 20 seconds, while reservation remains perfectly constant.

Notice the maximum limited value is perfectly flat. That’s what you want.
The above is for host/system. The reservation is surprisingly low.
Now let’s look at host/vim. What do you notice from the following screenshot?

Introduction: Metrics Complexity Page 217


VMware vSphere Metrics May 2023

Surprisingly the reservation is not low. It’s around 6.6 GHz.


The above is from 1 ESXi. We need to plot for many to get a better understanding. The following diagram shows the
distribution of VMkernel overhead based on a sample of almost 400 ESXi in production environment.

By far the majority of the values lie in 6 – 10 GHz.


Their values tend to be stable over days, although from time to time I see fluctuating metrics, which is reasonable as
there are multiple factors impacting the reservation.
The following chart shows both the fluctuating pattern and steady pattern (most common). They are from 2 ESXi
hosts.

Introduction: Metrics Complexity Page 218


VMware vSphere Metrics May 2023

ESXi CPU reservation is tracked by the metric CPU \ Overhead (MHz). I think the formula is
CPU \ Overhead (MHz) = Number of CPU Cores * (hardware|cpuInfo GHz) – CPU |totalCapacity_average

CPU Utilization
To see the actual usage, choose the metric Resource CPU Usage.

You need to select host/iofilters, host/system, and host/vim.


Everything else runs under one of the 3 resource pools above. You can plot their values in vCenter by stacking up
their values, as shown below.

Introduction: Metrics Complexity Page 219


VMware vSphere Metrics May 2023

The above is for one ESXi Host.


Here is a sample from ~400 ESXi hosts, where I sort the top 7 from highest System usage.

The bottom two rows show the summary. The first summary is the average among all the hosts, while the last row is
the highest value.

Introduction: Metrics Complexity Page 220


VMware vSphere Metrics May 2023

Memory

vCenter provides even more metrics at ESXi level: 38 metrics for RAM plus 11 for VMkernel RAM. VMkernel has
around 50 processes that are tracked. As a result, a cluster of 8 ESXi can have > 800 metrics just for ESXi RAM! Most
of them are not shown as a percentage, making it challenging to compare across ESXi hosts with different memory
sizes.
We will cover each metric in-depth, so let’s do an overview first.

Overview
Just like the case for VM, the primary counter for tracking performance is Page-fault Latency. Take note this is
normalized average, so use the Max VM Memory Contention instead.

The contention could be caused by swapping in the past. You’ve got only 5, not 6 metrics for swap. Which counter is
missing?

Swap target is missing. It can be handy to see the total target at ESXi level.
Swap and Compress go hand in hand, so we should check both together. Here are the compressed metrics.

Introduction: Metrics Complexity Page 221


VMware vSphere Metrics May 2023

Lastly, the performance could be caused by memory being read from the Host Cache. While they are faster than
disk, they are still slower than physical memory.

Wait! What about Balloon?


As will cover in-depth shortly, that’s more of capacity than performance metrics. One can even say that other than
Page-fault Latency, the rest of the metrics are actually for capacity not performance.
The famous balloon is a warning of capacity, assuming you do not play with limit.

When will ballooning kick in? There is a counter for that!

The memory state level shows one of the 5 possible states. You want to keep this at Clear state or High state.

For environment where performance matters more than cost, you want Balloon to be 0. That means Consume
becomes your main counter for capacity. It is related to Granted and Shared.

Reservation plays a big part in capacity management as it cannot be overcommitted.

Active is not a counter for capacity or performance. It’s for VMkernel memory allocation.

Introduction: Metrics Complexity Page 222


VMware vSphere Metrics May 2023

There are a few metrics covering 0 pages and overhead.

Persistent Memory

Lastly, there are a few metrics for VMFS. I think they are internal, only used by VMkernel. Let me know if you have a
real world use case them.

“Contention” Metrics
I put the title in “quote” as none of these counters actually measure contention.
I do not cover the Latency metric as that’s basically a normalized average of all the running VMs on the host.

Balloon
Balloon is a leading indicator that an ESXi is under memory pressure, hence it’s one of the primary metrics you
should use in capacity. Assuming you’re not using Limit to artificially cap the resource, you should ensure that the
balloon amount does not cause VM to experience contention.
We know that contention happens at hypervisor level, not at VM level. The VM is feeling the side effects of the
contention, and the degree of contention depends on each VM's shares, reservation and utilization. ESXi begins
taking action if it is running low on free memory. This is tracked by a counter called State. The State counter has five
states, corresponding to the Free Memory Minimum (%) value

Introduction: Metrics Complexity Page 223


VMware vSphere Metrics May 2023

ESXi State Threshold 1 TB ESXi Example based on ESXi with 1 TB RAM


High 300% 32.4 GB First, we calculate the Free Memory
Clear 100% 10.8 GB Minimum value. There is many website to
help you with this, such as this.
Soft 64% 6.9 GB. Balloon starts here
For 1 TB, the value is 10.8 GB.
Hard 32% 3.5 GB. Compress/Swap starts here
Low 16% 1.7 GB. Block execution

Using the example above, let’s see at which point of utilization does ESXi triggers balloon process.

ESXi State 512 GB ESXi 1 TB ESXi 1.5 TB ESXi


Balloon Threshold 3.7 GB 6.9 GB 10.2 GB
Threshold 508.3 GB 0.99 TB 1.49 TB
Threshold in % 99.3% 99.3% 99.3%

As you can see from all the 3 ESXi, balloon only happens after at least 99% of the memory it utilized. It’s a very high
threshold. Unless you are deliberately aiming for high utilization, all the ESXi should be in the High state.
In addition, the spare host you add to cater for HA or maintenance mode will help in lowering the overall ESXi
utilization. Let’s use example to illustrate
 No of ESXi in a cluster = 12
 Provisioned for HA = 11
 Target ESXi memory utilization = 99% (when HA happens or planned maintenance)
 Target ESXi memory utilization = 99% x 11 / 12 = 90.75% (during normal operations)
Using the above, you will not have any VM memory swapped as you won’t even hit the ballooned stage. If you
actually see balloon, that means there is limit imposed.
The Low Free Threshold (KB) counter provides information on the actual level below which ESXi will begin reclaiming
memory from VM. This value varies in hosts with different RAM configurations. Check this value only if you suspect
ESXi triggers ballooning too early.
ESXi memory region can be divided into three: Used, Cached and Free
 Used is tracked by Active. Active is an estimate of recently touched pages.
 Cached = Consumed - Active. Consumed contains pages that were touched in the past, but no longer active.
I'm not sure Ballooned pages are accounted in Consumed, although logically it should not. It should go to
Free so it can be reused.
 Free = Total Capacity - Consumed.
The nature of memory as cache means the active part is far lower than the non-active part. It’s also more volatile.
The following shows an ESXi with low memory usage, both active and consumed, in the last 3 months.

Introduction: Metrics Complexity Page 224


VMware vSphere Metrics May 2023

Let’s look at an opposite scenario. The following ESXi is running at 100%. It has granted more memory than what it
physically has. Initially, since the pages are inactive, there is no ballooning. When the active rise up, the consumed
counter goes up and the balloon process kicks in. When the VM is no longer using the pages, the active counter
reflects that and ESXi begin deflating the balloon and giving the pages back.

I shared in the VM memory counter that just because a VM has balloon, does not mean it experiences contention.
You can see the same situation at ESXi level. The following ESXi shows a constant and significant balloon lasting at
least 7 days. Yes the worst contention experienced by any VM is not even 1%, and majority of its 19 VMs were not
experiencing contention at all.

Introduction: Metrics Complexity Page 225


VMware vSphere Metrics May 2023

Swap & Compress


The metrics are essentially the summation of running VMs and VMkernel services.

Metrics Description
Swap Used Sum of memory swapped of all powered on VMs and vSphere services on the host. This
number will reduce if pages are swapped back into the DIMM.
Swap In The total amount of memory that have been swapped in to date.
Swap Out As above, but for swapped out pages.
Swap In Rate I think this includes compressed, not just swapped, but I’m not 100% sure as I can’t find a
proof yet.
Swap Out Rate

Pages can and will remain in compressed or swapped stage. The following screenshot shows compressed remains
around 5 GB for around 1 year.

Introduction: Metrics Complexity Page 226


VMware vSphere Metrics May 2023

The above happened because there was no need to bring back those pages. Notice ballooning was flat 0, indicating
the ESX host was not under memory pressure.
The following screenshot is why I think Swap Out is simply an accumulative counter. Notice the value did not reduce
even though there was swapped in and there was no swapped out. I suspect because the pages that were swapped
in were not the pages in the swapped file.

Introduction: Metrics Complexity Page 227


VMware vSphere Metrics May 2023

Consumption Metrics
Consumption covers utilization, reservation and allocation.

Consumed

I find eliminating the components that make up a metric helps in validating their value. So let’s compare when there
is no running VM.

Consumed = VMkernel utilization (not reservation) + Granted to VM


Let’s take a look! This ESXi has 130693 MB, as vCenter use 1024 to convert from GB to MB.

The box has 3 running VMs. One of them does not have VMware Tools.

Introduction: Metrics Complexity Page 228


VMware vSphere Metrics May 2023

As we’re interested in ESXi level, look at the number under Host Mem column. They add up to 27876 MB.
I’m not 100% what the number under Guest Mem is. Need to validate if it’s based on Tools or not.
Now let’s see what ESX consume reports.

It reports 31744 MB. That means there is 3876 MB unaccounted. That should be VMkernel.
vCenter reports VMkernel is only consuming 2426. So there is 1441 MB unaccounted.
Regardless, it shows that Consumes includes VMkernel already.
You can also check at esxtop also match. See the PMEM total value. VMkernel consumes 2370 MB, and while VMs
consumes 28245 MB. Free is 100076, which is 130692 – 2370 – 28245.

The ESXi runs 3 VM, shown above as the first 3 lines. If you sum them up, you get 28245.

Introduction: Metrics Complexity Page 229


VMware vSphere Metrics May 2023

What’s interesting is the numbers between esxtop and vCenter UI do not perfectly tally. I need to run more tests to
figure it out.
Consumed does not include Ballooned. This makes sense as the pages no longer backed by physical pages.

Consumed does not include swapped. This makes sense as the page are no longer in the physical memory.

Consumed seems to include compressed. This seems logical as the pages still consumed the physical memory (it’s in
the DIMM).

Introduction: Metrics Complexity Page 230


VMware vSphere Metrics May 2023

VMkernel
The other part of Consumed is non VM. This means VMkernel, vSAN, NSX and whatever else running on the
hypervisor. Because ESXi Consumed includes non VM, it can be more than what’s allocated to all running VMs, as
shown below.

Take note that Consumed includes the actual consumption, not the reservation. The following ESXi has 0 running
VM, so the Consumed is just made of VMkernel. You can see the utilization is much lower than the reservation.

If you’re wondering why it’s consuming 17 GB when there is 0 VM, the likely answer is vSAN. Just because there is no
VM does not mean vSAN should stop running.
Just like any other modern-day OS, VMkernel uses RAM as cache as it's faster than disk. So the Consumed counter
will be near 100% in overcommit environment. This is a healthy utilization.

Granted
The following example shows ESXi hosts with no running VM, so the Consumed counter is mostly made up of
VMkernel. From the table, you can see that Consumed = VMkernel Consumed + Granted to VM.
I’ve sorted them by the Granted counter, as I’m not expecting it to have any values. Granted at the host is the total
of the granted metrics of VMs running on the host, so it should be 0 in this case. It includes the shared memory. My
guess the extra memory is for non-VM user world process.

Introduction: Metrics Complexity Page 231


VMware vSphere Metrics May 2023

Let’s take one of the ESXi to see the value over time. This time around, let’s use vCenter instead.

You can verify that ESXi Consumed includes its running VMs Consumes by taking an ESXi with a single running VM.
The ESXi below has 255 GB of total capacity but only 229 GB is consumed. The 229 GB is split into 191 GB consumed
by VM and 36 GB consumed by VMkernel.

Introduction: Metrics Complexity Page 232


VMware vSphere Metrics May 2023

The VMkernel consumption is the sum of the following three resource pools.

Shared
Metrics Description
Shared The sum of all the VM memory pages & VMkernel services that are pointing to a shared
page. In short, it’s Sum of VM Shared + VMkernel Shared.
If two VMs each have 500 MB of identical memory, the shared memory is 1 GB.
Shared Common The sum of all the shared pages.
You can determine the amount of ESXi host memory savings by calculating Shared (KB) -
Shared Common (KB)

Introduction: Metrics Complexity Page 233


VMware vSphere Metrics May 2023

Memory shared common is at most half the value of Memory shared, as sharing means at least 2 blocks are pointing
to the shared page. If the value is a lot less than half, then you are saving a lot.
I typically validate the theory with actual values. The following shows the shared common exceeding half many times
in the last 7 days.

I’m not sure why. My wild guess is large pages are involved. ESXi hosts sport the hardware-assisted memory
virtualization from Intel or AMD. With this technology, VMkernel uses large pages to back the VM memory. As a
result, the possibility of shared memory is low, unless the host memory is highly utilized. In this high consumed state,
the large pages are broken down into small, shareable pages. The smaller pages get reflected in the shared common.
Do let me know if my wild guess is correct.
You can also use the Memory shared common counter as leading indicator of host breaking large page into 4K. For
that, you need to compare the value over time, as the absolute value may be normal for that host. The following
table shows 11 ESXi hosts with various level of shared pages. Notice none of them is under memory pressure as
balloon is 0. That’s why you use them as leading indicator.

Introduction: Metrics Complexity Page 234


VMware vSphere Metrics May 2023

With Transparent Page Sharing limited to within a VM, shared pages should become much smaller in value. I’m not
sure if salting helps address the issue. From the vSphere manual, “With the new salting settings, virtual machines can
share pages only if the salt value and contents of the pages are identical”.
I’m unsure if the above environment has the salting enabled or not. Let me know what level of sharing in your
environment, especially after you disable TPS.

Reservation
The metric Reserved Capacity (MB) metric only counts reservation by powered on VM. It does not include powered
off VM and VMkernel reservation. Aim for this value to be low, as you should use reservation only if you mix VM with
different class of service.
The following screenshot shows an ESXi where the CPU reservation was flat 0 MHz. I then set one of its VM
reservation to 888 MHz. Notice the immediate yet constant change.

VMkernel
There are 2 metrics: reservation and actual utilization.
For performance, you just base on actual utilization. For capacity, you need to take the highest of the 2 metrics.

Introduction: Metrics Complexity Page 235


VMware vSphere Metrics May 2023

Reservation
The following screenshot shows the counter names used by vSphere Client UI

The Rollups column values are all Latest, and the Stat Types column values are all Absolute.

Allocation max- As per CPU, this is the limit.


imum

Allocation min- As per CPU, this is reservation.


imum
Shares Relative shares of each VMkernel world.
This is VMkernel internal metric, not something vSphere Administrator should change
Consumed The actual consumption. Just like CPU, this can be lower than the reservation.
The host/vim world has no reservation.
Mapped
Overhead
Share Saved
Shared
Swapped
Touched
Zero The entire block contains just a series of 0.

ESXi memory overhead = Memory \ ESX System Usage (KB)


This reservation is actually a raw counter from vCenter.
ESX System Usage = Total Capacity – CPU Capacity Available to VMs

mem|memMachineProvisioned - mem|totalCapacity_average
Where capacity available to VMs is the capacity reserved by and available for VMs.

Introduction: Metrics Complexity Page 236


VMware vSphere Metrics May 2023

For memory, based on 310 production ESXi, the reservation ranges from 6 GB to 88 GB. It’s a big range.
ESX System Usage =

Introduction: Metrics Complexity Page 237


VMware vSphere Metrics May 2023

The following is an ESXi 6.7 U3 host with 1.5 TB of memory. Notice the VMkernel values remains constant over a long
period. The number of running VM eventually dropped to 0. While the Granted counter drops to 1.5 GB (not sure
what it is since there is no running VM), the VMkernel did not drop. This makes sense as they are reservation and not
the actual usage.

The metric ESX System Usage measures VMkernel reservation, which varies from 2 GB to 64 GB. The following shows
the distribution of values among 185 ESXi hosts:

Utilization
Utilization is the actual usage or consumption. It is typically lower than the reserved amount. It also does not always
correspond to the reserved amount. The following chart shows the reservation remains steady when the actual
drops by 90%, from 40 GB to single digit.

Introduction: Metrics Complexity Page 238


VMware vSphere Metrics May 2023

To see the actual usage, choose the metric Resource Memory Consumed metric from vSphere Client. Stack them,
and you see something like this. The system part typically dwarfs the other 2 resources.

Do not take the value from Memory \ VMkernel consumed counter. That’s only the system resource. You can verify
by plotting this and compare against host/system resource. You will get identical charts.

Introduction: Metrics Complexity Page 239


VMware vSphere Metrics May 2023

This value is for vSphere kernel modules. It does not include vSAN.
Based on the preceding 185 ESXi hosts, how do you think the actual VMkernel usage compare with their
reservation?
It’s much lower. This means you should not confuse one for the other.

Validation
The following screenshot shows that ESXi had all its VM evacuated. Not a single VM left, regardless of power on/off
status.

In the preceding chart, we could see the metric Memory Allocated on All Consumers dropped from 452 GB to 0 GB,
and it remained flat after that.
Checking the Reserved Capacity metric, we can see it dropped to 0. This is expected.

Introduction: Metrics Complexity Page 240


VMware vSphere Metrics May 2023

How about Consumed?

Memory Consumed also dropped. The value was 400 GB, less than 452 GB of allocated to all VM. This indicated
some VM had not used the memory, which could happen.
The value dropped to 32 GB, not 0 GB. This is expected as Consumed includes every other process that runs. In this
case, it is majority vSAN, which run in the kernel.
Let’s check VMkernel utilization.

Notice it’s a bit smaller than Consumed, indicating Consumed has other thing. I suspect it’s BIOS and the console in
vSphere Client UI.
How come the value didn’t change much? I kinda expect some changes, based on the theory that some kernel
modules memory footprint depends on the number of running VM. If you know, let me know!
How about VMkernel reservation? What do we expect the value to change?

Well, it won’t since the actual usage does not change.

Introduction: Metrics Complexity Page 241


VMware vSphere Metrics May 2023

Analysis
I compare 185 production ESXi hosts to understand the behaviour of the metrics. I averaged their results to
eliminate outlier.

The average of all the 185 ESXi hosts have total capacity of 737 GB. This is the physical configured memory.
The metric Memory \ Usable Memory is 729 GB (not shown in above table). It’s 1% less or 8 GB than Total Capacity. I
suspect this maps to Managed metric in vCenter. It is the total amount of machine memory managed by VMkernel.
VMkernel "managed" memory can be dynamically allocated for VM, VMkernel, and User Worlds. I need to check
what exactly this is as I don’t see a use case for it.
The metric Memory \ VMKernel Usage is 7.6 GB (not shown in above table). This is much lower than the reservation,
which is 51.6 GB.
Consumed is always higher than the other 3 metrics. What are these?
 Host Usage. Based on what
 Machine Demand. Why is it the smallest? Based on what? Active? Use case?
 Utilization. Could be Guest OS | Consumed

Introduction: Metrics Complexity Page 242


VMware vSphere Metrics May 2023

Storage

The following screenshot shows the ESXi metric groups for storage in the vCenter performance chart.

There are 4 metrics groups: datastore, disk, storage adapter and storage path.

Storage Adapter & Path


They have identical set of counters. I was hoping adapter to have more metrics such as adapter buffer, queue and
utilization.

Introduction: Metrics Complexity Page 243


VMware vSphere Metrics May 2023

What metrics are missing from the above? I’d like to see block size. Average block size, largest block size, minimum
block size. Those will help in troubleshooting.
The highest latency metric takes the worst value among all the adapters or the paths. This can be handy compared
to tracking each of them one by one. However, it averages each adapter first, so it’s not the highest read or write.
You can see from the following screenshot that its value is lower than the read latency or vmhba0. What you want is
the highest read or write among all the adapters or paths.

I plotted 192 ESXi host and checked the highest read latency and highest write latency among all their adapters. As
the data was returning mostly < 1 ms, I extended to 1 week and took the worst in that entire week. You can see that
the absolute worst of write latency was a staggering 250 ms. But plotting the 95 th percentile value shows 0.33 ms,
indicating it’s a one off occurrence in that week. The 250 ms is also likely an outlier as the rest of the 191 ESXi shows
maximum 5 ms, and with much lower value at 95th percentile.

Introduction: Metrics Complexity Page 244


VMware vSphere Metrics May 2023

Plotting the value of the first ESXi over 7 days confirmed the theory that it’s a one off, likely an outlier.

Does it mean there is no issue with the remaining of the 191 ESXi hosts?
Nope. The values at 95th percentile is too high for some of them.

Introduction: Metrics Complexity Page 245


VMware vSphere Metrics May 2023

I modified the table by changing Maximum with 99th percentile to eliminate an outlier. I also reduced the threshold
so I can see better. The following table shows the values, sorted by the write latency.

The table revealed that there are indeed latency problem. I plotted one of the ESXi and saw the following.

From here, you need to drill down to each adapter to find out which one.

Introduction: Metrics Complexity Page 246


VMware vSphere Metrics May 2023

Disk or Device
Compared with Adapter or Path, you get a lot more metrics for disk or device. However, there is no breakdown as
VMkernel cannot actually see anything in between the HBA and the device. So no metrics such as number of hops as
it’s not even aware of the fabric topology.
From ESXi viewpoint, there are 3 major layers in the VMFS storage stack:
 VM. This
 VMkernel. This is measured by the KAVG counter and QAVG counter.
 Device.

Contention Counters
Frank Denneman, whose blog and book are great references, shows the relationship among the counters using the
following diagram:

For further reading, review this explanation by Frank, as that’s where I got the preceding diagram from.

Guest Average GAVG Guest here means VM, not Guest OS as the counter starts from VMM layer not
Windows or Linux.
Kernel Average KAVG ESXi is good in optimizing the IO, so in a healthy environment, the value of Q La-
tency should be within 0.5 ms
QAVG QAVG, which is queue in the kernel, is part of KAVG. If QAVG is high, check the
queue depths at each level of the storage stack. Cody explains why QAVG can be
higher than KAVG here.
Device Average DAVG The average time from ESXi physical card to the array and back. Typically, there is

Introduction: Metrics Complexity Page 247


VMware vSphere Metrics May 2023

a storage fabric in the middle. The array typically starts with its frontend ports,
then CPU, then cache, backend ports, and physical spindles. So if DAVG is high, it
could be the fabric or the array. If the array is reporting low value, then it’s the
fabric of the HBA configuration.

For each of the above 4 sets, you expect read latency, write latency and the combined latency. That means 12
counters and here are what they are called in vSphere Client UI:

Device

Kernel

Queue

Guest The counters are not prefixed with Guest, so they are simply called:
 Command Latency
 Write Latency
 Ready Latency

With the above understanding, let’s validate with real world values.

Introduction: Metrics Complexity Page 248


VMware vSphere Metrics May 2023

I chose the last ESXi since that’s the one with worst latency.
I plotted Kernel vs Device.
What do you notice? Can you determine which is which?

They don’t correlate. This is expected since both have reasonably good value (my expectation is below 0.5 ms).
The bulk of the latency should come from the Device. In a healthy environment, it should be well within 5 ms. With
SSD, it should be even lower. As you can see below, it’s below 1.75 ms. Notice the kernel latency is 0.2 ms at all
times except in 1 spike.

What about the Queue latency? It’s part of the kernel latency, so it will be 100% within it. When the kernel latency
value is in the healthy range, the 2 values should correlate, as the value is largely dominated by the Queue. Notice
the pattern below is basically identical.

Introduction: Metrics Complexity Page 249


VMware vSphere Metrics May 2023

Other Counters
I find the value of Bus Resets and Commands Aborted are always 0

If you’ve seen a non zero let me know.

I’m not sure what highest latency refers to (Guest, Kernel, or Device).
Maximum Queue Depth is more of a property than a metric, as it’s a setting.

Introduction: Metrics Complexity Page 250


VMware vSphere Metrics May 2023

Utilization Counters
You get the standard IOPS and Throughput metrics.
IOPS

Throughput The counters names are


 Read Rate
 Write Rate
 Usage
All their units are in KB/s
Total IO This is just the number of Read or Write in the time interval.
The counters names are
 Read Requests
 Write Requests
 Commands Issued

Datastore
For shared datastore, the metrics do not show the same value with the one at datastore object. All these metrics are
only reporting from this ESXi viewpoint, not the sum from all ESXi mounting the same datastore.

General metrics
For each datastore, you get the usual IOPS, throughput and latency. They are split into read and write, so you have 3
x 2 = 6 metrics in total. These are the actual names:

Introduction: Metrics Complexity Page 251


VMware vSphere Metrics May 2023

You also get 2 additional counters:


 Datastore latency observed by VMs
 Highest latency.
I plotted their values and to my surprise the metric Datastore latency observed by VMs is much higher. You can see
the blue line below. It makes me wonder what the gap is as there is only VMkernel in between.

The metric Highest Latency is a normalized averaged of read and write, hence it can be lower.
There is no block size but you can derive it by dividing Throughput with IOPS.

Outstanding IO
You can derive the outstanding IO metric from latency and IOPS. I think latency counter is more insightful. For
example, the following screenshot shows hardly any IO being in the queue:

Introduction: Metrics Complexity Page 252


VMware vSphere Metrics May 2023

However, if you plot latency, you get same pattern of line chart but with higher value.

You can check whether it’s read or write by plotting each.


The following screenshot shows it’s caused by write latency. It’s expected if your read is mostly served by cache.

Queue Depth
You can also see the queue depth for each datastores. Ensure that the settings are matching your expectation and
are consistent. You can list them per cluster and see their values.

Introduction: Metrics Complexity Page 253


VMware vSphere Metrics May 2023

Unmapped
vSphere 7 provides observability into unmap.
They do not measure the potential amount to be reclaimed. They measure the unmap operations, both the IOPS
generated by unmap operations and the amount being unmapped.

BTW, the counters are not available for vSAN datastore. The ESXi below is part of vSAN but the Target Objects only
list non vSAN.

Introduction: Metrics Complexity Page 254


VMware vSphere Metrics May 2023

Storage DRS
vSphere Client also provides 10 metrics for storage DRS.

Read latency Name: Datastore normalized read latency


Write latency Name: Datastore normalized write latency
Read IOPS Name: Datastore read I/O rate
Write IOPS Name: Datastore write I/O rate
Read OIO Name: Datastore outstanding read requests
Write OIO Name: Datastore outstanding write requests

There is no throughput metric. Notice none of the units are in per-second. They are all count of something (unitless).
You also get total bytes read and written. If you divide this by the collection period, you get the throughput metric.
I’m not sure what workload metrics are. Is it in percentage? If yes, what is the 100% and how is it determined as that
depends on many factors.

Storage IO Control

I profiled 47 ESXi and found they consistently doing ~18 IOPS when measured at 5 minute interval.
Take note the value for latency in wrongly shown as millisecond.

Introduction: Metrics Complexity Page 255


VMware vSphere Metrics May 2023

Network

Just like the case for CPU, memory and disk, there are also 2 layers of networking. The virtual network does not have
the limit that physical network does, if the traffic remains in the box. This makes it harder to use this metric as the
100% is not statically defined. So instead of just monitoring the throughput metric, you should also check the packet
per second metric.
vSphere Client shows the 2 layers side by side (personally I prefer up and down, with the physical layer placed
below).

The 2 layers are the physical network and the virtual network:
 Virtual is the port groups.
There are 2 types: VMkernel and VM.
They do not mix, for security reason.
The VMkernel port group runs specific traffic, such as vMotion and vSAN.
The VM port group runs VM.
 Physical is the physical network card, although they are called vmnic. Metrics at this level do not have per-
VM breakdown, or per VMkernel interface breakdown.

Introduction: Metrics Complexity Page 256


VMware vSphere Metrics May 2023

In vSphere Client, you can’t see the virtual network traffic. The following shows that you can only see the physical
network card.

The metrics are provided at both physical NIC card and ESXi level. The counter at host level is basically the sum of all
the vmnic instances. There could be small variance, which should be negligible.

Bad Packets
As usual, the first thing to check if there is anything wrong. Compare with VM metrics, vCenter UI provides three
additional metrics for ESXi. It can track Packet receive errors, Packet transmit errors, and Unknown protocol frames.

A packet is considered unknown if ESXi is unable to decode it and hence does not know what type of packet it is..
Expect these error packets, unknown packets and dropped packets to be 0 at all times. The following shows from a
single ESX:

To see from all your ESXi, use the view “vSphere \ ESXi Bad Network Packets”.

Introduction: Metrics Complexity Page 257


VMware vSphere Metrics May 2023

The hosts with error RX spans across different clusters, different hardware models and different ESXi build number. I
can’t check if they belong to the same network.
If you see a value, drill down to see if there is any correlation with other types of packets. In the following example, I
do not see any correlation

Introduction: Metrics Complexity Page 258


VMware vSphere Metrics May 2023

What I see though, is a lot of irregular collection. I marked with red dots some of the data collection.

You can see they are irregular. Compare it with the Error Packet Transmit counter, which shows a regular collection.

Dropped Packet

You’ve seen the dropped packet situation at VM. That’s a virtual layer, behind the ESXi. What do you expect to see at
ESXi layer, as it’s physically cabled to the physical top of rack switches?

Introduction: Metrics Complexity Page 259


VMware vSphere Metrics May 2023

I plotted 319 production ESXi hosts, and here is what I got for Transmit. What do you think?

There are packet drops, although they are very minimal. Among 319 hosts, one has 362 dropped transmit packet in
the last 3 months. That host was doing 0.6 Gbps on average and peaked at 8.38 Gbps.
As expected, the dropped packet rarely happened. At 99th percentile, the value is perfectly 0.
I tested with another set of ESXi hosts. Out of 123 servers, none of them has any dropped TX packet in the last 6
months. That’s in line with my expectation. However, a few of them experienced rather high dropped RX packets.

Introduction: Metrics Complexity Page 260


VMware vSphere Metrics May 2023

The dropped only happened since the ESXi had an increased load

If you see something like this, you should investigate which physical NIC card is dropping packet, and which VMK
interface is experiencing it.
While the number is very low, many hosts have packet drops, so my take is I should discuss with network team as I
expect datacenter network should be free of dropped packets.

Received
What do you think you will see for Received?
Remember how VM RX is much worse than VM TX? Here is what I got:

Introduction: Metrics Complexity Page 261


VMware vSphere Metrics May 2023

Surprisingly, the situation is the same for ESXi.


Some of them have >1 millions packet dropped in 5 minute. Within these set of ESXi, some have regular packet
dropped, as the value at 99th percentile is still very high. Notice none of the ESXi is dropping any TX packet.
I plotted the 2nd ESXi from the table, as it has high value at 99th percentile. As expected, it has sustained packet
dropped lasting 24 hours. I marked the highest packet drop time, as it mapped to the lowest packets received.

vsish
vsish provides more information that is not available in vSphere Client UI.
vsish -e get /net/portsets/DvsPortset-0/ports/67109026/clientStats
port client stats {
pktsTxOK:154121
bytesTxOK:63326625
droppedTx:0
pktsTsoTxOK:0
bytesTsoTxOK:0
droppedTsoTx:0
pktsSwTsoTx:0
droppedSwTsoTx:0
pktsZerocopyTxOK:45817
droppedTxExceedMTU:0
pktsRxOK:339700
bytesRxOK:257901191

droppedRx:2620  the reason will appear on the next output below


pktsSwTsoRx:0
droppedSwTsoRx:0
actions:0
uplinkRxPkts:0
clonedRxPkts:0
pksBilled:0
droppedRxDueToPageAbsent:0
droppedTxDueToPageAbsent:0
}

We saw dropped packets, so we probe deeper for the reason


vsish -e get /net/portsets/DvsPortset-0/ports/67109026/vmxnet3/rxSummary

Introduction: Metrics Complexity Page 262


VMware vSphere Metrics May 2023

stats of a vmxnet3 vNIC rx queue {


LRO pkts rx ok:0
LRO bytes rx ok:0
pkts rx ok:340093
bytes rx ok:257984247
unicast pkts rx ok:253678
unicast bytes rx ok:245663220
multicast pkts rx ok:42220
multicast bytes rx ok:7497292
broadcast pkts rx ok:44195
broadcast bytes rx ok:4823735

running out of buffers:2620  the reason for 2620 packets dropped


pkts receive error:0
1st ring size:512

2nd ring size:512  the ring size is on the small side. I’d say set to 2K.
# of times the 1st ring is full:354  this line shows the first ring is full 354x
# of times the 2nd ring is full:0

fail to map a rx buffer:0  other reasons look good


request to page in a buffer:0

# of times rx queue is stopped:0  other reasons look good


failed when copying into the guest buffer:0  other reasons look good
# of pkts dropped due to large hdrs:0
# of pkts dropped due to max number of SG limits:0
pkts rx via data ring ok:0
bytes rx via data ring ok:0
Whether rx burst queuing is enabled:0
current backend burst queue length:0
maximum backend burst queue length so far:0
aggregate number of times packets are requeued:0
aggregate number of times packets are dropped by PktAgingList:0
# of pkts dropped due to large inner (encap) hdrs:0
number of times packets are dropped by burst queue:0
number of times packets are dropped by rx try lock queueing:0
number of packets delivered by burst queue:0
number of packets dropped by packet steering:0
number of memory region lookup pass in Rx.:0
number of packets dropped due to pkt length exceeds vNic mtu:0
number of packets dropped due to pkt truncation:0
}

Networking VMs, such as firewall and routers, or any high VMs expecting high packet rates, check if the VM is
requesting NetQ RSS.

Introduction: Metrics Complexity Page 263


VMware vSphere Metrics May 2023

Unusual Packets

Your VM network should be mostly unicast traffic. So check that broadcast and multicast are within your
expectation. Your ESXi Hosts should also have minimal broadcast and multicast packets.

Consumption Metrics
The throughput (bandwidth consumption) metrics are:

Introduction: Metrics Complexity Page 264


VMware vSphere Metrics May 2023

I’m unsure why there are duplicates metrics.


The packet per second metrics are:

Introduction: Metrics Complexity Page 265


VMware vSphere Metrics May 2023

Capacity

This is often misunderstood. Both the supply side and the demand side are complex, due to concept of usable
capacity and unmet demand. CPU and memory also have different formula. They result in 12 different metrics.

Metrics for Utilization model Metrics for Allocation Model


Total Capacity for CPU Unit is GHz Unit is core (not thread)
Usable Capacity for CPU Use VMkernel reservation Manual sizing of vSAN + NSX + VMker-
nel
Total Capacity for Memory Same unit for both models, and we can and should use 1 metric only.
Usable Capacity for Memory Use VMkernel reservation Same, assuming it reflects vSAN and
NSX
Utilization for CPU Configured VM vCPU
Utilization for Memory Complex, see details below for formula Configured VM memory

CPU
When you buy a CPU, what exactly is the capacity?
It is tricky as there are 3 factors to consider:
 2 different units, GHz and cores, where the unit core is used in allocation model, while the unit GHz is used in
utilization model.
 Hyperthreading. This impacts the 2 models differently.
 Power Management. This should not impact any model as you do not want variable supply in your capacity
calculation.
The GHz brings complexity a CPU with 28 cores at 1 GHz is not the same with a CPU with 14 cores at 2 GHz.
 You can’t run a 16 vCPU VM on a 14-core (assuming it has no HT).
 On the other hand, a CPU intensive application will prefer faster CPU.

Introduction: Metrics Complexity Page 266


VMware vSphere Metrics May 2023

Hyperthreading provides 2x the number of logical processors, but it comes at a high cost. As the core is split into 2,
each thread only runs at 62.5% of its potential. That’s 37.5% reduction!
Power Management can dynamically increase or decrease individual cores speed. It does in basically real time, and it
varies per core. I recommend you ignore it so the Total Capacity does not become a variable. By fixing your capacity,
you can see if the CPU is on Turbo Boost. The limitation of this approach is your demand metric likely exceed 100%
when Turbo Boost kicks in. So where do you consider this actual speed? You should consider it in cost, performance
and sustainability management.

Total Capacity
The good part is the above 3 factors do not alter the fact that the CPU comes with a certain nominal frequency. As
total capacity should be a steady number, we will take this static frequency and ignore power management.
Let’s take a simple example. The CPU has 4 cores, sports hyperthreading, and has 1 GHz nominal frequency (the
rated speed as per specification).
 Each core runs at 1 GHz. If you enable HT, each core can run 2 threads, at 0.625 GHz, accounting wise. I said
accounting wise as each thread actually runs at 1 GHz but at 62.5% efficiency. So you get better throughput
at the expense of single thread performance.
 Each core has 1.25 GHz total capacity if you enable HT, and 1 GHz if you disable.
 It’s easier to express the above in percentage. Your 100% is 1.25 GHz. This is where it gets complex as you
need to decouple capacity (space) and performance (speed), because you do not have a 1.25 GHz CPU. It’s
not able to run at that speed. An application expecting 1.25 GHz will not have its expectation met.
 You can run at 100% utilization, but at a high performance penalty. For workload where performance is
highly sensitive, stop at 80% (better still, track the CPU Latency counter).
So far so good?
Great, now let’s talk about allocation model.
 You have 4 cores, 8 threads. What’s the capacity? 4 logical CPU or 8 logical CPU?
 The CPU can run 8 vCPU worth of VMs concurrently. By that definition, that means you do not overcommit
when you run 8 vCPU. This approach works with dual-threading core. In core with 4 or more threads, this
model will not work as each thread becomes too small.
 The VMs won’t experience CPU Ready. Sure, they will run slower but that’s a performance, and not capacity
question. The effect would be the same as having a slower hardware, as the VMs are not put in ready state.
 Core and thread are useful in Allocation Model, where you do not care about clock speed. If you care about
clock speed, you would change the VM vCPU depending on the ESXi clock speed. For example, if a 8 vCPU
VM is migrated from a 1 GHz ESXi to 4 GHz ESXi, you would change the VM to 2 vCPU. You obviously don’t
do this as that’s not the same thing.
Recall that the CPU has 4 cores, 1 GHz nominal clock speed and sports HT.

Capacity Model 100% is Things to note


Demand-based 5 GHz Running both threads means 100% (not 125% as that would be illog-
ical as capacity always max out at 100%). Running at 80% means one
of the thread is fully saturated while the other is idle.

Introduction: Metrics Complexity Page 267


VMware vSphere Metrics May 2023

Allocation-based 8 Logical CPU Running 8 vCPU means it’s 1:1 overcommit.


BTW, this is what AWS uses. Yes, they use allocation model and not
utilization model.

You might be curious. What does vSphere Client use?

If you divide 52.68 GHz by 2.2 GHz, you get 24. The box has 24 cores. It consist of 2 sockets x 12 cores per socket
The ESXi above has 48 logical processor, because HT is enabled.

In the vSphere API, the metric for CPU capacity is derived from summary.hardware.numCpuCores x
summary.hardware.cpuMhz.

Usable Capacity
Since we have 2 different models for total capacity, we have to have 2 answer for usable capacity.

Capacity Model Formula


Demand-based Take out Highest of VMkernel reservation and utilization.
This is consistent with memory
Allocation-based 2 – 4 cores? It all depends on the number of cores you think VMker-
nel + vSAN + NSX will consume.

Introduction: Metrics Complexity Page 268


VMware vSphere Metrics May 2023

Utilization
Compare with the Demand (%) counter to see how high above 100%. The following shows it exceeds 100% but only
marginally and momentarily. When Demand passes 100% it means the CPU is running hot (high power consumption)
and both threads are busy. Buying more cores or higher frequency could result in the VMs running faster, assuming
CPU is the gating factor.

I’d use Utilization (%) for aggressive and Core Utilization (%) for conservative. Frequency scaling is not relevant;
hence I do not use Usage. Usage will also inflate the numbers as VMkernel will take advantage of turbo boost. The
drawback of this approach is you may see a different number to what vCenter uses as it uses Usage.
If Core Utilization is not yet 100% or Utilization is not yet 50% then there is still physical cores available. You can go
ahead deploy new VMs.
If Core Utilization = 100% (meaning Utilization is at least 50%) then review Utilization and ensure it’s not passing your
threshold. I’d keep it around 80% - 90% per ESXi, meaning the level at cluster level will be lower as we have HA host.
If you want to see the number in GHz, then use Usage and Total Capacity. Just don’t be alarm if Usage hits 100%.

Check the contention metrics, as always 😊

Memory
In theory, the memory counter should be as simple as this:
Total = VMkernel + VM + Overhead + Free, where

Introduction: Metrics Complexity Page 269


VMware vSphere Metrics May 2023

 Total is the hardware memory as reported by BIOS to ESXi. This is basically the physical configured memory.
 VMkernel is the memory used by VMkernel and its loadable modules such as vSAN and NSX.
 VM is the memory used by VM
 Overhead is the hypervisor virtualization overhead on each VM. This is typically negligible.
 Free is not yet used.

Capacity
The total capacity is the configured memory. You can use this number for both utilization model and allocation
model.
The usable capacity though, is tricky.
You can’t ignore VMkernel as it does consume resources. It’s also you size your ESXi especially when you plan to
have vSAN and NSX.
ESXi Usable Capacity = Total physical capacity – VMkernel reservation

Using the actual metrics name in Aria Operations:


Capacity Available to VMs = Total Capacity - ESX System Usage

Just in case you’re wondering, the name ESX System Usage is a legacy name 😊

Since you take the VMkernel reservation from the usable capacity, you need to take it out from the demand side to
prevent double deduction.
What if the usage exceeds reservation? We need to account for this extra. This is a rare occurrence. Since usable
capacity metric should be stable for ease of planning, we will account for this in the demand metric. It’s also the right
thing to place as when usage is higher than reservation, you want to show a higher demand.

Utilization
Unlike CPU, there is no metric for memory demand from vSphere that you can use right away.
Reservation has to be considered. You can have memory not used but if there are reservation from existing powered
on VMs, you should not deploy new VM. As different VM can have different reservation and utilization, this means
the total demands has to be based on Sum of Max of (VM Reservation, VM Consumption).

VM Demand
First, we need to calculate the demand from each VM.
Demand = Highest of VM Reservation and VM Utilization

Do we include the powered off VM?


This is not so straight forward as the VM is already provisioned and can be turned on anytime. However, since
vSphere does not consider it, then I recommend we do not for consistency.

Introduction: Metrics Complexity Page 270


VMware vSphere Metrics May 2023

We then sum all the powered on VMs.


Total VM Demand = Sum of ( Max (VM Reservation, VM Consumed) )

BTW
Memory Workload (%) = sum of Memory|Utilization (KB) of all VMs / Memory|Demand|Usable Capacity after HA
and Buffer (KB).

VMkernel Demand
Remember the corner case where VMkernel usage exceeds its reservation?
This is how we take care of it.
ESXi VMkernel Demand = Max (VMkernel Reservation, VMkernel Usage – VMkernel Reservation)

Unmet Demand
ESXi uses 3 levels to manage memory:

TPS This happens automatically even if ESXi has plenty of RAM as it makes sense to do so. It’s
not an indicative of unmet demand. Sharing the same page is the right thing to do, and not
something that should be started only when physical pages are running low
Balloon The first sign of unmet demand. It happens proactively, before ESXi is unable to meet De-
mand. Ballooning reduces cache. It does not mean ESXi unable to meet Demand. Demand is
not met when Contention happen. That’s the only time it is not met.
Compress/Swap This happens proactively too.
It does not mean VMs were contending for RAM. It merely means ESXi Consumed is very
high. That Consumed can contain a lot of cache

Based on the above, my recommendation if you need to calculate demand is


Unmet Demand = Ballooned + Compressed + Swapped + Swapped to Host Cache

Practically, I think consumed is good enough. It’s operationally hard for it to reach 99%, so in most cases the other 4
metrics are near 0.
Why is memory latency not included? Because that’s about speed, which is not relevant in this context.

Total Demand
Total ESXi Demand = Total VM Demand + VMkernel Demand + Unmet Demand

Introduction: Metrics Complexity Page 271


VMware vSphere Metrics May 2023

This page is intentionally left blank.

Introduction: Metrics Complexity Page 272


VMware vSphere Metrics May 2023

Chapter 4

esxtop

Introduction: Metrics Complexity Page 273


VMware vSphere Metrics May 2023

Overview

I put esxtop as a separate chapter as it covers both VM and ESXi. While the manual uses the term Guest, esxtop does
not actually have any Guest OS metrics. You should distinguish between Guest OS and VM.
The view from a VM (consumer) and the view from ESXi (provider) are different. vCPU is construct of a VM, while
core and thread are construct seen by ESXi. I hope future version of esxtop segregates this better. You get to see
both VM level and ESXi level objects at the same time. It is confusing for newbie, but convenient for power user, and

if you’re looking at esxtop, you are a power user 😊

Now that we have covered many of the metrics, the esxtop output would be easier to understand. This
documentation is not about how to use esxtop, but about what the metrics mean and their relevance in operations
management.
The nature of esxtop meant it is excellent for performance troubleshooting, especially real time and live situation
where you know the specific ESXi Host. The tool is not so suitable for capacity management, where you need to look
at long term (often weeks or months). As a result, I cover the contention metrics first, followed by consumption.
I have not had the need to use some of the metrics, hence I don’t have much guidance on them. If you do, let’s
collaborate.

Grouping
The esxtop screen groups the metrics into 10, as shown below:

There are relationships among some of the 10 panels, but they are not obvious as the UI simply presents them as a
list. To facilitate understanding of the metrics, we need to group them differently.

Introduction: Metrics Complexity Page 274


VMware vSphere Metrics May 2023

So instead of documenting the 10 panels, I’d group them into 4.

Group Consumer Provider Remarks


CPU Yes Sort of The CPU panel has a 4 line summary that provides the provider’s view-
point.
I moved Power Management panel here as it only covers CPU. It does
not cover memory, disk, network and other parts of the box (e.g. fan,
motherboard). It complements the CPU panel as it covers the pro-
vider’s viewpoint. Take note that it does not show at socket level. And
if you enable HT, it does not show at core level.
I moved interrupt panel here as it’s about CPU.
Memory 1 shared panel for both Provider and Consumer are shown in 1 panel. The panel has a summary
at the top, which cover the provider’s viewpoint
Storage Yes Almost The Disk VM panel covers from consumer’s viewpoint.
The Disk Adapter panel and Disk Device panel cover from provider’s
viewpoint, and are best to be analyzed together.
BTW, do you notice the Path panel is missing?
I moved vSAN panel here as all the metrics are disk metrics. There is no
vSAN network and CPU counter, but you can see them in the respective
network and CPU panel.
Network 1 shared panel for both Provider and Consumer are shown in 1 panel
I moved RDMA device here as it’s about network card

Introduction: Metrics Complexity Page 275


VMware vSphere Metrics May 2023

CPU

The CPU panel begins with a summary of the load average in the last 1 minute, 5 minute and 15 minutes,
respectively. As shared, utilization is a secondary counter, supporting contention. So focus on contention first before
looking at these 3 numbers. In addition, in a large ESXi with many cores, an imbalance can mask out a busy core.

The next 3 lines covers Used (%), Utilization (%) and Core Utilization (%). The reason why I swapped the order in the
book is Used (%) is built upon Utilization, and it’s a more complex counter.
The white vertical line shows where I cut the screenshot, as the text became too small and unreadable if I were to
include all of them. Anyway, it’s just repeating for each CPU thread.
At the end of each 3 lines (after the white line in preceding screenshot), there are NUMA information. It shows the
average value across each NUMA node (hence there are 2 numbers as my ESXi has 2 NUMA nodes). The number
after AVG is the whole box, system wide average. The per NUMA node metric values are useful to easily identify if a
particular NUMA node is overloaded.
Take a look at the panel below. It mixes VM and non VM processes in a single table.

If you want to only show VMs, just type the capital letter V.
 Name based filtering allows regular expression based filtering for groups and worlds.
 Type the capital letter G to only show groups that match given string. This is useful when a host has large
number of VMs and you want to focus on a single or set of interesting VMs.
 Once a group is expanded you can type the small letter g to show only the worlds that match the given
string. This is useful when running a VM with many vCPUs and you want to focus on specific worlds like
storage worlds or network worlds.
If you want to see all, how to tell which ones are VM? I use %VMWAIT column. This tracks the various waits that VM
world gets, so it does not apply to non VM.

Introduction: Metrics Complexity Page 276


VMware vSphere Metrics May 2023

Notice the red dot in the picture. Why the Ready time is so high for system process?
Because this group includes the idle thread. Expand the GID and you will see Idle listed.
There are many columns, as shown below. The most useful one is the %State Times, which you get by pressing F.

The rest of the information are relatively static or do not require sub-20 second granularity.

CPU State
We covered earlier in the CPU Metric that there are only 4 states. But esxtop shows a lot more metrics.

So what does it mean? How come there are more than 4 states?
The answer is below. Some of these metrics are included in the other metrics.

Review the metrics below, starting with %USED. Which one does not actually belong to a CPU state, meaning it’s not
something you mix with the rest?

Introduction: Metrics Complexity Page 277


VMware vSphere Metrics May 2023

That’s right, it’s %USED.

%USED It should be excluded from this panel as it is influenced by power management and hyperthread-
ing. We explained the reason why in CPU Metric chapter. That’s why it’s necessary to review the
VM CPU states before reading each esxtop metric.
%RUN Run is covered in-depth here under VM CPU Metrics.
%SYS System time is covered in-depth here under VM CPU Metrics.
%WAIT
%VMWAIT The wait counter and its components are covered in-depth here under VM CPU Metrics.
VMWAIT includes SWPWT. vRealize Operations does not show VM Wait and uses a new counter
%SWPWT that excludes Swap Wait. The reason is the remediation action is different. You’re welcome.
%IDLE
%RDY Ready is covered in-depth here under VM CPU Metrics. As discussed in the CPU scheduling, each
vCPU has its own ready time. In the case of esxtop, the metric is simply summed up, so it can go
>100% in theory.
%CSTP Co-Stop is covered in-depth here under VM CPU Metrics. This is also 100% per vCPU.
%OVRLP Overlap is covered in-depth here under VM CPU Metrics.

MLMTD is Max Limited, not some Multi Level Marketing scam 😊. It measures the time the VM was
%MLMTD

halted due to manual limit, as opposed to VMkernel has no CPU resource.

CPU Event Count

SWTCH/s Number of world switches per second, the lower the better. I guess this number correlates with
the overcommit ratio, the number of VM and how busy they are.
What number will be a good threshold and why?

Introduction: Metrics Complexity Page 278


VMware vSphere Metrics May 2023

MIG/s Number of NUMA and core migrations per second. It will be interesting to compare 2 VM, where 1
is the size of a single socket, and the other is just a bit larger. Would the larger one experience a
lot more switches?
WAKE/s Number of time the world wakeups per second. A world wakes up when its state is changes from
WAIT to READY. A high number can impact performance.

The metric QEXP/s (Quantum Expirations per second) has been deprecated from ESXi 6.5 in an effort to improve
vCPU switch time.
In rare case where the application has a lot of micro bursts, CPU Ready can be relatively higher to its CPU Run. This is
due to the CPU scheduling cost. While each scheduling is negligible, having too many of them may register on the
counter. If you suspect that, check esxtop, as shown below:

Power Stats
This complements the power management panel as it lists per VM and kernel module, while the power panel lists
per ESXi logical CPU.

POWER Current CPU Power consumption in Watts. So it does not include memory, disk, etc.

Introduction: Metrics Complexity Page 279


VMware vSphere Metrics May 2023

Summary Stats

Other than the first 3 (which I’m unsure why they are duplicated here as they are shown in the CPU State already),
the other metrics do not exist in vSphere Client UI.

%LAT_C This is covered in-depth here: CPU Contention


%LAT_M This is covered in-depth here: Memory contention
%DMD This is covered in-depth here: CPU Demand
EMIN This is the minimum amount of CPU in MHz that the world will get when there is not
enough for everyone.
TIMER/s Timer rate for this world
AFFINITY BIT MASK Bit mask showing the current scheduling affinity for the world.
Not set for Latency Sensitive = High VMs
CPU The physical or logical processor on which the world was running when esxtop obtained this
information.
EXC_AF Yes means the VM has exclusive affinity. This happens when you enabled the Latency Sensi-
tivity setting. Use this feature very carefully.

The column HTQ is no longer shown in ESXi 7.0. In earlier release, this indicates whether the world is quarantined or
not. ‘N’ means no and ‘Y’ means yes.

CPU Allocation

AMIN Allocation Minimum. Basically, the reservation


AMAX Allocation Maximum. Basically, the limit.
ASHRS Allocation shares

Introduction: Metrics Complexity Page 280


VMware vSphere Metrics May 2023

AMLMT Max Limited. I’m unsure if this is when it’s applied or not.
AUNITS Units. For VM, this is in MHz. For VMkernel module, this is in percentage.

Power Consumption
Power management is given its own panel. This measures the power consumption of each physical thread. If you
disable hyper-threading, then it measures at physical core

The Power Usage line tracks the current total power usage (in Watts). Compare this with what the hardware
specification. Power Cap shows the limit applied. You only do this hard limit when there is insufficient power supply
from the rack.
The PSTATE MHZ line tracks the CPU clock frequency for each state.
Now let’s go into the table. It lists all the physical core (or thread if you enable HT). Note it does not group them by
socket.

%USED Used (%) metric is covered in-depth here under ESXi CPU metric section.
%UTIL Utilization (%) metric is covered in-depth here under ESXi CPU metric section.
%CState Percentage of time spent in a C-State, P-State and T-State.
%TState Power management is covered here under ESXi CPU metric section.

%A/MPERF Actual / Measured Performance, expressed in percentage. The word measured in this case
means the nominal or static value. So a value above 100% means Turbo, while a value below
100% means power saving kicked in. If this number is not what you are expecting, check the
power policy settings in BIOS and ESXi

The following screenshot shows ESXi with 14 P-States, where P0 is represented as 2401 MHz. Each row is a physical
thread as HT is enabled.
See PCPU 10 and 11 (they share core 6). What do you notice?

Introduction: Metrics Complexity Page 281


VMware vSphere Metrics May 2023

Utilization (%) shows 100% for both. This means both threads run, hence competing.
The core is in Turbo Boost. The %A/MPERF shows frequency increase of 30% above nominal. The core is in C0 state
and P0 state. This counter was introduced in ESXi 6.5. No they are not in vSphere Client UI.
Why is Used (%) for PCPU 10 and 11 are showing ~63% and 62.9%?
Unlike Utilization (%) which adds up to 200%, Used (%) adds up to 100%. So each thread is maximum 50%
But Used (%) considers frequency scaling. So 50% x 130% = 65%. Pretty close to the numbers shown there.

Interrupt
This panel captures the interrupt vectors. In the following screenshot, I’ve added 2 vertical white lines to show
where I cropped the screenshot. It’s showing the value of each CPU thread, so the column became too wide.

COUNT/s Total number of interrupts per second. This value is cumulative of the count for every CPU.

Introduction: Metrics Complexity Page 282


VMware vSphere Metrics May 2023

COUNT_x Count 0, Count 1, etc.


Interrupts per second on CPU x. My guess is CPU 0 is the first thread in the first core in the
first socket.
TIME/int Average processing time per interrupt (in microseconds).
It will be interesting to profile this for each type of interrupt.
TIME_x Time 0, Time 2, etc.
Average processing time per interrupt on CPU x (in microseconds).
DEVICES Devices that use the interrupt vector. If the interrupt vector is not enabled for the device, its
name is enclosed in angle brackets (< and >).

To see the list of devices, issue the command at ESXi console: sched-stats -t sys-service-stats. You will get something
like this:
service count time maxElapsed maxService name
32 98973493 171.267 0.000 0.000 VMK-lsi_msgpt3_0
33 93243036 153.993 0.000 0.000 VMK-lsi_msgpt3_0
34 1783955246 1841.025 0.000 0.000 VMK-igbn-rxq0
36 4 0.000 0.000 0.000 VMK-Event
37 167025903 418.733 0.000 0.000 VMK-xhci0-intr
51 242318260 792.014 0.000 0.000 VMK-0000:19:00.1-TxRx-0
60 21281764 80.125 0.000 0.000 VMK-vmw_ahci_00003b000
244 176227 0.090 0.000 0.000 VMK-timer-ipi
245 1250405 0.163 0.000 0.000 VMK-monitor
246 1868139923 340.709 0.000 0.000 VMK-resched
248 414047027 189.255 0.000 0.000 VMK-tlb
4096 3193917027 1321.416 0.000 0.000 0_2nd-level-intr-handler
4097 304258696 193.711 0.000 0.000 1_smpcall
4099 246 0.003 0.000 0.000 3_VOB-Wakeup
4100 35706272 6.186 0.000 0.000 4_TimerBH
4101 399313616 10339.744 0.000 0.000 5_fastSlab
4104 859208 7.851 0.000 0.000 8_logEvent
4105 109560008 158.914 0.000 0.000 9_netTxComp
4106 26 0.197 0.196 0.196 10_keyboard
4107 56 0.001 0.000 0.000 11_SMIEnableCountPCPU-bh
4165 365305096 2433.530 0.001 0.001 TCPIPRX
4167 54024607 55.359 0.000 0.000 SCSI
4171 54520415 124.983 0.000 0.000 START-PATH-CMDS
4173 55109136 254.927 0.000 0.000 COMPL.-ADAPTER-CMD
4174 55102189 85.804 0.000 0.000 START-ADAPTER-CMDS
4180 5254928064 13877.461 0.001 0.001 Netpoll

BTW, some services maybe combined and reported under VMK-timer. For example, IOChain from vSphere Distrib-
uted Switch does not appear on its own.

Introduction: Metrics Complexity Page 283


VMware vSphere Metrics May 2023

Memory

The top part of the screen provides summary at ESXi level. They are handy in seeing overall picture, before diving
into each VM or VMkernel modules.

MEM overcommit avg Average memory overcommit level in the last 1-minute, 5-minute, and 15-minute, re-
spectively. Calculation is done with Exponentially Weighted Moving Average.
Memory overcommit is the ratio of total requested memory and the "managed mem-
ory" minus 1. According to this, VMKernel computes the total requested memory as a
sum of the following components:
1. VM configured memory (or memory limit setting if set),
2. the user world memory,
3. the reserved overhead memory.
If the ratio is > 1, it means that total requested VM memory is more than the physical
memory available. This is fine, because ballooning and page sharing allows memory
overcommit.
I’m puzzled why we mix allocation and utilization. No 1 and no 3 make sense, but what
exactly is no 2? My recommendation is you simply take the configured VM memory
and ignore everything else. While it’s less accurate, since the purpose is capacity and
not performance, it’s more than good enough and it’s easier to explain to manage-
ment. There is no need to get other details.
PMEM Physical Memory.
Total = vmk + Other + Free
Total is what is reported by BIOS.
vmk is ESXi VMkernel consumption. This includes kernel code section, kernel data and
heap, and other VMKernel management memory.
Other is memory consumed by VM and non VM (user-level process that runs directly
on the kernel)
VMKMEM VMkernel memory. The following metrics are shown:

Introduction: Metrics Complexity Page 284


VMware vSphere Metrics May 2023

 Managed. The memory space that ESXi manage. Typically this is slightly smaller
than the total physical memory, as it does not contain all the components of
vmk metric. It can be allocated to VM, non VM user world, or the kernel itself.
 Minfree. The minimum amount of machine memory that VMKernel would like
to keep free. VMKernel needs to keep some amount of free memory for critical
uses. Note that minfree is included in Free memory, but the value tends to be
negligible.
 Reserved. The sum of the reservation setting of the groups + the overhead
reservation of the groups + minfree. I think by group it means the world or
resource pool.
 Unreserved. It is the memory available for reservation.
I have not found a practical use case for the above 4 metrics. If you do, let me know!
State is the memory state. You want this to be on high state.
NUMA In the preceding screenshot, there are 2 NUMA nodes.
For each node there are 2 metrics: the total amount and the free amount.
Note that the sum of all NUMA nodes will again be slightly smaller than total, for the
same reason why VMkernel managed is less than total.
If you enable Cluster-on-Die feature in Intel Xeon, you will see 2x the amount of nodes.
For details, see this by Frank Denneman.
PSHARE shared: the amount of VM physical memory that is being shared.
common: the amount of machine memory that is common across Worlds.
saving: the amount of machine memory that is saved due to page-sharing.
SWAP Swapped counter is covered here under VM memory. What “cannot” be zipped is
swapped. What you see on this line is sum of all the VMs.
The metric rclmtgt shows the target size in MB that ESXi aims to swap.
ZIP Zipped counter is covered here under VM memory. What you see on this line is sum of
all the VMs.
MEMCTL Memory Control, also known as ballooning is covered here under VM memory. What
you see on this line is sum of all the VMs.

There are a lot of metrics in many panels. It’s easier to understand if we group them functionally.

Contention
As usual, we start with the contention-type of metrics.

Introduction: Metrics Complexity Page 285


VMware vSphere Metrics May 2023

Balloon
I start with Balloon as this is the first level of warning. Technically, this is not a contention. Operationally, you want to
start watching as Balloon only happens at 99% utilization. So it’s high considering you have HA enabled in the cluster.

MCTL? ‘Y’ means the line is a VM, as VMkernel processes is not subjected to ballooning.
MCTLSZ (MB) Memory Control Size is the present size of memory control (balloon driver). If larger than 0
hosts is forcing VMs to inflate balloon driver to reclaim memory as host is overcommitted
Amount of physical memory the ESXi system attempts to reclaim from the resource pool or
MCTLTGT (MB)
VM by way of ballooning. If this is not 0 that means the VM can experience ballooning.
Maximum amount of physical memory the ESXi system can reclaim from the resource pool or
MCTLMAX (MB)
VM by way of ballooning. This maximum depends on the type of Guest OS.

Compressed
I think that Swap and Compressed should be shown together as what can’t be compressed is swapped.
Why am I showing Compressed first?
Because it’s faster than swapped.

CACHESZ (MB) Compression memory cache size.


CACHEUSD (MB) Used compression memory cache
ZIP/s (MB/s) The rate at which memory pages are being zipped. Once zipped, it’s not immediately available
for the VM.
This is a capacity problem. Your ESXi needs more RAM. If the pages being zipped is unused,
the VMs will not experience memory contention.
Keep this number 0. See Capacity chapter for details.
UNZIP/s (MB/s) The rate at which memory pages are being unzipped so it can be used by VM.
This is a performance problem. The pages are being asked. The VM CPU is waiting for the
data. If you check the VM memory contention counter, it will not be 0%. Make sure that num-
ber is within your SLA or KPI.

Introduction: Metrics Complexity Page 286


VMware vSphere Metrics May 2023

Swapped

SWCUR (MB) Swapped Current is the present size of memory on swapped. It typically contains inactive
pages.
SWTGT (MB) The target size the ESXi host expects the swap usage by the resource pool or VM to be. This is
an estimate.
SWR/s (MB) Swapped Read per second and Swapped Write per second. The amount of memory in
megabyte that is being brought back to memory or being moved to disk
SWW/s (MB)
LLSWR/s (MB) These are similar to SWR/s but is about host cache instead of disk. It is the rate at which
LLSWW/s (MB) memory is read from the host cache. The reads and writes are attributed to the VMM group
only, so they are not displayed for VM.
LL stands for Low Latency as host cache is meant to be faster (lower latency) than physical
disk.
Memory to host cache can be written from both the physical DIMM and disk. So the counter
LLSWW/s covers all these sources, and not just from physical DIMM.

NUMA

Current home node for the resource pool or VM. This statistic is applicable only on NUMA sys-
tems. If the VM has no home node, a dash (-) appears.
NHN When you enable CPU Hot Add, esxtop will report multiple home node. It also does not distin-
guish remote and local memory as memory is interleaved. For more information, see this by
Frank.
Number of NUMA migrations. It gets reset upon power cycle.
NMIG Migration is costly as all pages need to be remapped. Local memory starts at 0% again and
grow overtime. Copying memory pages across NUMA boundaries cost memory bandwidth
Current amount of remote memory allocated to the VM or resource pool. Ideally this amount
is 0. You increase the chance by making the Configured RAM small. A VM whose configured
NRMEM (MB)
memory is larger than the ESXi RAM attached to a socket have higher chance of having re-
mote memory.

Introduction: Metrics Complexity Page 287


VMware vSphere Metrics May 2023

Current percentage of memory allocated to the VM or resource pool that is local.


N%L
Anything less than 100% is not ideal.
Guest memory allocated for a resource pool on NUMA node x, where GST_ND0 means the
first node. The following screenshot shows the VMware vCenter VM runs non node 2 while
the vRealize-Operat VM runs on node 1

GST_NDx (MB)

VMM overhead memory allocated for a resource pool on NUMA node x, where x starts with 0
OVD_NDx (MB)
for the first node.

Consumption
I group metrics such as consumed, granted, and overhead under utilization as they measure how much the VM or
VMkernel module consumes.

Consumed
What are the use cases where you actually need these metrics? I think it’s quite rare. Review this rightsizing and

reclamation, as the answers might surprise you 😉

Amount of physical memory allocated to a resource pool or VM. The values are the same for
MEMSZ (MB) the VMM and VMX groups.
MEMSZ = GRANT + MCTLSZ + SWCUR + "never touched"
GRANT (MB) Granted is covered here. Do not confuse it with Consumed.
CNSM Yup, this is that legendary Consumed metric.
Size Target in MB.
SZTGT (MB)
Amount of machine memory the ESXi VMkernel wants to allocate to a resource pool or VM.

Introduction: Metrics Complexity Page 288


VMware vSphere Metrics May 2023

The values are the same for the VMM and VMX groups.
Amount of touched pages in MB
TCHD (MB) Working set estimate for the resource pool or VM. The values are the same for the VMM and
VMX groups.
As per above, but only for the write operations. A relatively much lower value compared to
TCHD_W
TCHD means the activities are mostly read.

Overhead
I find overhead is a small amount that is practically negligible, considering ESXi nowadays sports a large amount of
RAM. Let me know the use case where you find otherwise.

OVHD (MB) Current space overhead for resource pool.


OVHDMAX (MB) Maximum space overhead that might be incurred by resource pool or VM.
OVHDUW (MB) Current space overhead for a user world. It is intended for VMware use only.

Shared
With Transparent Page Sharing limited to within a VM, I think shared becomes limited. Let me know the use case
where you see it is material in your operations.

ZERO (MB) Resource pool or VM physical pages that are zeroed.


SHRD (MB) Resource pool or VM physical pages that are shared.
SHRDSVD (MB) Machine pages that are saved because of resource pool or VM shared pages
COWH (MB) Copy on Write Hint. An estimate of the amount of Guest OS pages for TPS purpose.

Introduction: Metrics Complexity Page 289


VMware vSphere Metrics May 2023

Active

%ACTV Active is covered in-depth here.


%ACTVS Percentage Active Slow and Percentage Active Fast.
Slow is the slow moving average, taking longer period. Longer is more accurate.
%ACTVF
I don’t have a use case for the fast moving average.
Percentage Active Next. It predicts of what %ACTVF will be at next sample estimation. It is in-
%ACTVN
tended for VMware use only.

Committed
Committed page means the page has been reserved for that process. Commit is a counter for utilization but it’s not
really used, especially for VM.
Note: none of these metrics exist in vSphere Client, as they are meant for internal use.

MCMTTGT Minimum Commit Target in MB. I think this value is not 0 when there is reservation, but I’m
not sure.
CMTTGT Commit Target in MB.
CMTCHRG Commit Charged in MB. I think this is the actual committed page.
CMTPPS Commit Pages Per Share in MB

Allocation & Reservation


I’m not placing them under utilization as they are not actual consumption.

Introduction: Metrics Complexity Page 290


VMware vSphere Metrics May 2023

Allocation minimum.
AMIN This is the term esxtop uses for memory reservation for this resource pool or VM. A value of 0
means no reservation, which is what you should set for most VM. Reservation for VMkernel
modules should be left as it is.
Allocation maximum.
This is the term esxtop uses for memory limit for this resource pool or VM. A value of -1
AMAX
means Unlimited.
Limit for VMkernel modules should be left as it is.
AMLMT Limit. You should expect the value -1, means no limit assigned.
I’m not sure how this differs to AMAX.
ASHRS Memory shares for this resource pool or VM.
AUNITS This is just displaying the units of allocations counters

Checkpoint
Checkpoint is required in snapshot or VM suspension. You can convert a VM checkpoint into a core dump file, to
debug the Guest OS and applications.

CPTRD (MB) Checkpoint Read. Amount of data read from checkpoint file. A large amount can impact the
VM performance.
CPTTGT (MB) Checkpoint Target. The target size of checkpoint file that VMkernel is aiming for.
I’m unsure why it needs to have a target, unless this is just an estimate of the final size and
not a limit.

Introduction: Metrics Complexity Page 291


VMware vSphere Metrics May 2023

Storage

The Storage monitoring sports 3 panels:


 VM
 Adapter
 Device
We covered in Part 2 Chapter 4 Storage Metrics, that an ESXi host has adapter, path and devices. I’m unsure why
esxtop does not have a panel for path. It would be convenient to check dead path or inactive path as the value will
be all 0. If your design is active/active, it can be useful to compare if their throughput is not lopsided.
Datastore is also missing. While VMFS can be covered with Device (if you do 1:1 mapping and not using extent), NFS
is not covered.
On the other hand, esxtop does provide metrics that vSphere Client does not. I will highlight those.
ESXi uses adapter to connect to device. As a result, their main contention and utilization metrics are largely similar.
I’ve put them side by side here, and highlight the similar metric groups with vertical green bar. I highlighted the word
group, as the group name may be identical, but the actual metrics within the group differ.

Introduction: Metrics Complexity Page 292


VMware vSphere Metrics May 2023

Disk VM panel
We begin with VM as that’s the most important one. It complements vSphere Client by providing unmap and IO
Filter metrics.
You can see at VM level, or virtual disk level. In the following screenshot, I’ve expanded one of the VM.

Contention
There are only 2 metrics. There is no outstanding IO metric.

LAT/rd Average latency (in milliseconds) per read.


LAT/wr Average latency (in milliseconds) per write.

Utilization

CMDS/s
Count of disk IO commands issued per second. This is basically IOPS.
READS/s
Both the Read IOPS and Write IOPS are provided.
WRITES/s
MBREAD/s Total disk amount transferred per second in MB. This is basically throughput.
MBWRTN/s Both the read throughput and write throughput are provided.

Unmap
It has unmap statistics. This can be useful that there is no such information at vSphere Client. In the UI, you can only
see at ESXi level.

Introduction: Metrics Complexity Page 293


VMware vSphere Metrics May 2023

SC_UMP/s Successful, Failed and Total Unmaps per second.


FL_UMP/s Unmap can fail for a variety of reason. One example that was addressed in vSphere 6.7 Patch
ESXi670-202008001 and documented in in KB is Guest OS does not refresh unmap granulari-
UMP/s ties and keep sending unmap based on older value. Eventually limit is reached and the opera-
tion fail.
SC_UMP_MBS/s As above, but in MB/second.
FL_UMP_MBS/s

IO Filter
I/O Filter in ESXi enable VMkernel to manipulate the IO sent by Guest OS before processing it. This obviously opens
up many use cases, such as replication, caching, Quality of Service, encryption.
There is no such metric at vSphere Client. You will not find IO Filter metrics at both VM object and ESXi object.

NUMIOFILTERS Number of IO Filters


IOFILTERCLASS Type of IO Filter Class
FAILEDIO I think Failed IO should be 0 at all times.
TOTALIO
LATENCY I’m unsure if this latency measures the additional overhead introduced by IO Filter, or the to-
tal latency as seen by the VM.

Configuration

ID Resource pool ID or VSCSI ID of VSCSI device.


GID Resource pool ID.
VMNAME Name of the resource pool.

Introduction: Metrics Complexity Page 294


VMware vSphere Metrics May 2023

VSCSINAME Name of the VSCSI device.


NDK Number of VSCSI devices

Disk Adapter
ESXi uses adapter to connect to device, so let’s begin with adapter, then device.
The panel has a lot of metrics and properties, so let’s group them for ease of understanding.

Errors
Since you check availability before performance, let’s check the errors first. This type of problem is best monitored
as accumulation within the reporting period as any value other than 0 should be investigated.
BTW, none of these metrics are available at vSphere Client UI.

FCMDS/s Number of failed commands issued per second. How does this differ to Reset and Aborted?
FREAD/s Number of failed read commands issued per second.
FWRITE/s Number of failed write commands issued per second.

FMBRD/s Megabytes of failed read operations per second.


FMBWR/s Megabytes of failed write operations per second.
CONS/s Number of SCSI reservation conflicts per second. This number should stay 0?
FRESV/s Number of failed SCSI reservations per second, if the conflict can’t be solved timely.
Number of SCSI reservations per second. This number should stay within the limit, but how to
RESV/s know what the limit is?
ABRTS/s Number of commands cancelled per second.
RESETS/s Count of disk commands reset per second.

Queue
For storage, the queue gives insight into performance problem. It’s an important counter so I was hoping there will
be more, such as the actual queue.

AQLEN Current queue depth of the storage adapter. The storage adapter queue depth. This is the
maximum number of ESX Server VMkernel active commands that the adapter driver is config-

Introduction: Metrics Complexity Page 295


VMware vSphere Metrics May 2023

ured to support
This counter is not available in vSphere Client UI

Contention
As explained in Part 1 Chapter 2 of the book, check contention before utilization.
From esxtop context, there are 2 major layers in the VMFS storage stack:
 VMkernel. This is measured by the KAVG counter and QAVG counter.
 Device. Well, that basically means from the HBA to the device and back. VMkernel cannot actually see
anything in between, so there is no breakdown. This entire round trip is measured by the DAVG counter.
From the VM (not Guest OS), the end to end latency is represented by the metric GAVG. It’s simply KAVG + DAVG,
where Guest latency = Kernel latency + Device latency.
Frank Denneman, whose blog and book are major sources in this book, shows the relationship in the following
diagram:

DAVG measures the time from ESXi physical card to the array and back. Typically, there is a storage fabric in the
middle. The array typically starts with its frontend ports, then CPU, then cache, backend ports, and physical spindles.
So if DAVG is high, it could be the fabric or the array. If the array is reporting low value, then it’s the fabric of the HBA
configuration.
For further reading, review this explanation by Frank, as that’s where I got the diagram from.
I’m unsure what DAVG measures when it’s vSAN and the data happens to be local.
QAVG, which is queue in the kernel, is part of KAVG. If QAVG is high, check the queue depths at each level of the
storage stack. Cody explains why QAVG can be higher than KAVG here.
Now that we’ve got metrics defined, you expect to get 4 sets. For each set, you expect read, write, and total. 12
metrics, and that’s exactly what you got below.

Introduction: Metrics Complexity Page 296


VMware vSphere Metrics May 2023

DAVG/cmd Average latency per command in milliseconds.


KAVG/cmd It’s an average number, not the last number in the reporting period. If you have 1000 IOPS,
that means 5K IOPS over the 5 second reporting period.
GAVG/cmd
It’s a weighted average between read and write. If the IO commands are mostly read, then
QAVG/cmd high latency from write could be masked out.
DAVG/rd
Average read latency per read operation in milliseconds. The same set of metrics as above, ex-
KAVG/rd cept it only counts the reads.
GAVG/rd It’s useful to see read and write separately as the numbers tend to be different. More impor-
tantly, the remediation action is different.
QAVG/rd
DAVG/wr
KAVG/wr
The same set of metrics as above, except it only counts the writes.
AVG/wr
QAVG/wr

Utilization
Now that we get the more important metrics (errors, queue, and contention) done, you then check utilization
counter. In this way you have better context.

The definition is “ Number of commands that are currently active”. I don’t know how it differs
ACTV to IOPS, and what does the word “active” exactly mean here.
This is worth profiling.
CMDS/s
READS/s I combine these 3 metrics as they are basically IOPS. Total IOPS, read IOPS and write IOPS.
WRITES/s
MBREAD/s I combine them as they measure throughput. Interestingly, there is no total throughput met-
ric, but you can simply sum them up.
MBWRTN/s Read the string MBWRTN as MB Written.

PAE and Split


PAECMD/s PAE Command per second and PAE Copy per second.
I think PAE (Physical Address Extension) no longer applicable in 64-bit and modern drivers/
firmware/OS, as the size is big enough. Copy operations here refer to VMkernel copies the
PAECP/s data from high region (beyond what the adapter can reach) to low region.
This statistic applies to only paths.

Introduction: Metrics Complexity Page 297


VMware vSphere Metrics May 2023

Split Commands per second.


SPLTCMD/s Disk IO commands with large block size have to be split by the VMkernel. This can impact the
performance as experiences by the Guest OS.
SPLTCP/s Number of split copies per second. A higher number means lower performance

Configuration
The panel provides basic configuration. I use vSphere Client as it provides a lot more information, and I can take
action on them. The following is just some of the settings available.

Compare the above with what esxtop provides, which is the following:

Number of path. This should match your design. An adapter typically has more than 1 path,
NPTH
which is why I said it would be awesome to have a panel for path

Disk Device panel


The device panel has a lot of metrics and properties, so let’s group them for ease of understanding.

Introduction: Metrics Complexity Page 298


VMware vSphere Metrics May 2023

Errors
I’m always interested in errors first, before I check for contention and utilization.

ABRTS/s Number of commands cancelled per second. Expect this to be 0 at all times.
RESETS/s Number of commands reset per second. Expect this to be 0 at all times.

Queue
You’ve seen that there is only 1 counter for queue in Disk Adapter. How many do you expect for Disk Device?
Interestingly, there are 6 metrics for queue, as shown below.

The formula is
LOAD (active commands + ESXi VMkernel queued commands) / queue depth.
If LOAD > 1, check the value of the QUED counter.
Number of commands in the ESXi VMkernel that are currently queued. You want this to be as
QUED
low as possible, well below the queue depth.

USD (%) = ACTV / QLEN


For world stats, QLEN is WQLEN. For LUN (aka device) stats, QLEN is DQLEN.
Percentage of the queue depth used by ESXi VMkernel active commands.
%USD So this does not include the queued command? Does it mean that if this number is not 100%,
then there is nothing in the queue, as queue should only develop when it’s 100% used?
Obviously when Used = 100% it means the queue is full. That will introduce outstanding IO,
which in turn will increase latency
DQLEN I combine this together as a device can have 1 or more world, and there is a per-device maxi-
mum.
DQLEN is the device configured queue length. The corresponding counter for adapter is called
AQLEN
WQLEN
WQLEN is the world queue depth. The manual states “This is the maximum number of ESXi
VMkernel active commands that the world is allowed to have”. So it does not look like a
present number. I am confused why we show max for world, and present for device.
The definition is “Number of commands that are currently active”. I think this means the IO in
ACTV flight, which makes it an interesting counter. This is worth profiling and I expect it to be small
most of the time.

Introduction: Metrics Complexity Page 299


VMware vSphere Metrics May 2023

Contention
See Disk Adapter as both sport the same 12 metrics.

Utilization
See Disk Adapter as both sport the same 5 metrics.

PAE and Split


See Disk Adapter as both sport the same 4 metrics.

Configuration
As you can expect, esxtop provides minimal configuration information. They are shown below.

Path/World/Partition
They are grouped as 1 column, and you can only see one at a time.
By default, none of them is shown. To bring up one of them, type the corresponding code. In the following
screenshot, I’ve type the letter e, which them prompted me to enter one of the device.

Path is obviously the path name, such as vmhba0:C0:T0:L0.


A disk device can have >1 world, which I’m unsure why. You can see each world ID, and you get the statistics per
world.

Introduction: Metrics Complexity Page 300


VMware vSphere Metrics May 2023

Partition shows the partition ID. Typically this is a simple number, such as 1 for the first partition. vSphere Client
provides the following, which is more details yet easier.

Others
Let’s cover the rest of the metrics.

NPH Number of paths. This should not be 1 as that means a single point of failure.
NWD Number of worlds. If you know the significance of this in troubleshooting, let me know.
NPN Number of partitions. Expect this to be 1 for VMFS
Number of shares. This statistic is applicable only to worlds.
SHARES This is interesting, as that means each world can have their own share? Where do we set them
then?
Block size in bytes.
I prefer to call this sector format. International Disk Drive Equipment and Materials Association
BLKSZ (IDEMA) increased the sector size from 512 bytes to 4096 bytes (4 KB).
This is important, and you want them to be in 4K (Advanced Format) or at least 512e (e stands
for emulation). Microsoft provides additional information here.
Number of blocks of the device. Multiply this with the block size and you get the total capacity.
NUMBLKS
In vSphere UI, you get the capacity, which I think it’s more relevant.

For configuration, I use vSphere Client as it provides a lot more information, and I can take action on them. The
following is just some of the settings available. More at Part 2 Chapter 4 Storage Metrics.

Introduction: Metrics Complexity Page 301


VMware vSphere Metrics May 2023

VAAI
VMware vSphere Storage APIs - Array Integration (VAAI) offloads storage processing to the array, hence improving
performance or reducing overhead. This is obviously vendor-dependant. There is no VAAI counter at adapter level or
path level, as the implementation is at back-end array.
The VAAI has a lot of metrics. They are grouped into 2 (non latency and latency metrics). I find it more logical to
group by function, which is also what this KB article does. It’s last updated on 14 May 2017 and does not cover
vSphere 7.0, so I’m following up.
As with metrics, check for contention type of metrics first. There are metrics that track failed operations, such as
CLONE_F, ATSF and ZERO_F.
I saw this note from VMware vSphere Storage APIs – Array Integration (VAAI) document by Cormac Hogan, which I
think it’s worth mentioning. Because the nature of VAAI as an offloads, you will see higher latency value of KAVG
metric. Other latency metrics are not affected, so there is no issue unless there are other symptoms present.
At this moment, I have not found the need to document them further. So what you get here is mostly from the KB
article above. Andreas Lesslhumer also has useful information in this blog article.

Extended Copy
Hardware Accelerated Move (the SCSI opcode for XCOPY is 0x83)

CLONE_RD RD stands for reader.


CLONE_WR The number of CLONE commands successfully completed where this device was a source.
WR stands for writer.
CLONE_F
The number of CLONE commands successfully completed where this device was a destination
The number of failed CLONE commands
LCLONE_RD The same set of 3 metrics, except for Linked Clone.
LCLONE_WR
LCLONE_F
MBC_RD/s MBC = megabytes of clone data.
MBC_WR/s RD/s is read per second, and WR/s is written per second

AVAG/suc The average clone latency per successful command


AVAG/f The average clone latency per failed command

Introduction: Metrics Complexity Page 302


VMware vSphere Metrics May 2023

Atomic Test & Set


Hardware Accelerated Locking on Single Extent Datastore or on Multi Extent Datastore (SCSI code 0x89)

ATS The number of Atomic Test & Set (ATS) commands successfully completed
ATSF The number of ATS commands failed. Expect this to be 0?
AAVG/suc The Average ATS latency per successful command
AAVG/f The Average ATS latency per failed command

Write Same
Hardware Accelerated Initialize or Zeroed out blocks (SCSI code 0x93 or 0x41)

ZERO The number of ZERO commands successfully completed


ZERO_F The number of ZERO commands failed
MBZERO/s The megabytes zeroed per second
ZAVG/suc The average zero latency per successful command
ZAVG/f The average zero latency per failed command

Unmapped
Unmapped block deletion (SCSI code 0x42).

DELETE The number of successful DELETE commands


DELETE_F The number of failed UNMAP commands, this value should be 0
MBDEL/s (MB/s) The rate at which the DELETE command getting processed. Measured in Megabytes per
second

Others

RESSPACE Reservation Space.


RESSPACE_F The number of commands which were successful while doing space reservation for a VMDK
file in thick Provisioning format.
RESSPACE_F captures the failure.
EXTSTATS Extended Statistics
EXTSTATS_F The number of commands which were successful in reporting extended statistics of a clone
after the cloning process had been completed.
EXTSTATS_F captures the failure
CAVG/suc The average clone latency per successful command. Unit is millisecond per clone.
CAVG/f CAVG/f captures the failures.

Introduction: Metrics Complexity Page 303


VMware vSphere Metrics May 2023

LCAVG/suc As per above, but for Linked Clone.


LCAVG/f
RAVG/suc The average latency (in ms) per successful VAAI Space Reservation command.
RAVG/f RAVG/f captures the failures

ESAVG/suc As per above, but for Extended Statistics


ESAVG/f

vSAN
I group the vSAN panel under Disk as esxtop only covers storage related information. There is no network or
compute (vSAN kernel modules).

esxtop provides visibility into 5 types of IO operations:


 Read
 Write
 Recovery Write
 Unmap
 Recovery Unmap
For each, it provides the IOPS, bandwidth, average latency (ms) and standard deviation latency (ms). Take note that
some use MB, while others use GB.

ROLE The Distributed Object Manager (DOM) role of that component, such as client, owner, and
component manager.
READS/s Reads/second is the number of reads operations. This is IOPS.
MBREAD/s MBReads/s is read throughput in Megabytes/second.
AVGLAT AvgLat is the average latency.
SDLAT Standard deviation of latency, when above 10ms latency.
WRITES/s Same set of metrics, like above, but for write
MBWRITE/s
AVGLAT
SDLAT
RECOWR/s Same set of metrics, like above, but for Recovery Write. Recovery covers component rebuild
task (e.g. from disk failure).

Introduction: Metrics Complexity Page 304


VMware vSphere Metrics May 2023

MBRECOWR/s Read the string MBRECOWR as MB Reco Wr.


AVGLAT
SDLAT
UNMAPS/s Same set of metrics, like above, but for unmap operations. I think this number should be
GBUNMAP/s within your expectation, as excessive unmap can impact performance.
AVGLAT GBUNMAP/s = Unmapped rates in Gigabytes/second
SDLAT Read the string GBUNMAP as GB Unmap

RECOUN/s Same set of metrics, but for Recovery Unmap operations.


GBRECOUN/s Read the string GBRECOUN as GB Reco Un.
AVGLAT RecoUn/s is the number of recovery unmapped operations per second.
SDLAT GBRecoUn/s is the amount of disk space in GB/second by Recovery Unmapped.

Introduction: Metrics Complexity Page 305


VMware vSphere Metrics May 2023

Network

Network traffic are grouped into 2 by their direction:


 TX for outgoing (sent) and
 RX for incoming (received).

Contention
As usual, we check contention first. There is no network latency and packet retransmit metric.

%DRPTX Percentage of Dropped Packet.


Expressed in percentage, which makes it easier as you expect this not to exceed 0.x%. In ded-
icated network such as vSAN and vMotion, this should be flat 0% non stop for every single
ESXi.
%DRPRX
Transmit and Receive have different nature. A high drop in transmit means your physical NIC
card or uplink switch is unable to cope. A high drop in receive means your ESXi or VM may not
have enough CPU to process the packet, or the ring buffer size is too small.

Introduction: Metrics Complexity Page 306


VMware vSphere Metrics May 2023

Consumption

Non-Unicast Packets
PKTTXMUL/s Number of multicast packets transmitted or received per second.
PKTRXMUL/s Read the string PKTTXMUL as Pkt Tx Mul, which is Packet TX Multicast. Same with PKTRXMUL.
PKTTXBRD/s Number of broadcast packets transmitted or received per second.
PKTRXBRD/s Read the string PKTTXBRD as Pkt Tx Brd, which is Packet TX Broadcast. Same with PKTRXBRD

All Packets

PKTTX/s This is the total packets, so it includes multicast packet and broadcast packet.
Multicast packet and broadcast packet are listed separately. This is handy as they are sup-
PKTRX/s posed to low most of the time.
MbTX/s This is measured in bit, unlike vCenter Client UI which shows in byte.
Packet length is typically measured in bytes. A standard packet is 1500 bytes, so a 10 Gb NIC
MbRX/s would theoretically max out at 833,333 packets on each direction.
Compare this with your ESXi physical network card.
PSZTX This is convenient. If you see a number far lower than 1500, it’s worth discussing with net-
PSZRX work team.

There is another metric ACTN/s, which is the number of actions per second. The actions here are VMkernel actions.
It is an internal counter, not relevant to day to day operations.

Introduction: Metrics Complexity Page 307


VMware vSphere Metrics May 2023

Configuration
This panel mixes physical and virtual. For virtual, it shows both the VMkernel network and VM network. I find it
easier to use the information in vSphere Client.

PORT-ID Virtual network device port ID.


‘Y’ means that the corresponding port is an uplink. ‘N’ means it is not. The physical NIC cards
UPLINK
(vmnic0, vmnic1, etc.) serve as the uplink
UP ‘Y’ means that the corresponding link is up. ‘N’ means it is not.
SPEED Link speed in Megabits per second.
‘Y’ means the corresponding link is operating at full duplex. ‘N’ means it is not, which is a
FDUPLX
problem.
USED-BY Virtual network device port user.
DNAME Virtual network device name.

The metric DTYP (Virtual network device type, where H means Hub and S means switch) does not seem to be
available anymore.
vSphere Client separates the components. You can see the virtual switches, VMkernel network and physical cards.
The level of details is more comprehensive.

Introduction: Metrics Complexity Page 308


VMware vSphere Metrics May 2023

RDMA Device
Remote Direct Memory Access (RDMA) enable direct access to the physical network card, bypassing the OS
overhead. The following screenshot, taken from here, shows 2 types of access from application (that lives inside a
VM. The VMs are not shown).

Usage
Since it’s about network, you get both the TX (transmit or sent) and RX (received or incoming).
For contention, there is only packet dropped. There is no packet retransmit or latency. The metrics are:

%PKTDTX
Percentage of packet dropped relative to number of packets sent.
%PKTDRX

For utilization, you get them in both amount of data, and number of packets. Both are important metrics. There is no
breakdown on the type of packets (broadcast, multicast, unicast).

PKTTX/s Packets per second.


PKTRX/s Check the limit for packet per second in your specific card.
MbTX/s
Network throughput in Megabit/second.
MbRX/s

There is no packet size. This can be handy to determine if they are much smaller or larger than you expect. For
example, if you expect jumbo frame but the reality is much smaller.
These metrics are not available in vSphere Client UI, so you need to use esxtop to get the visibility. Just in case you’re
wondering where I got the following screenshot from, they are courtesy of Shoby Cherian and Aditya Kiran Pentyala.

Introduction: Metrics Complexity Page 309


VMware vSphere Metrics May 2023

You also get the queue usage information.

QP Number of Queue Pairs Allocated and Completion Queue Pairs Allocated.


CQ RDMA uses these queues for communication.
SRQ Number of Shared Receive Queues Allocated
I think this is required in virtualization as the physical NIC card can be shared.
MR Memory Regions Allocated.
Check that this is inline with your expectation.

For more reading on RDMA, I found this academic paper, title “Understanding the concepts and mechanisms of
RDMA” useful.

Configuration
vSphere Client provides the following information. You get the first 4 columns in esxtop.

The information you get in esxtop covers the first 4 columns in the preceding screenshot. They are:

NAME Name of the device


DRIVER Name of the driver
STATE Active or down
TEAM-PNIC The physical Network Interface Card that the RDMA adapter is paired with.

Introduction: Metrics Complexity Page 310


About the Author

Thank You for making it to the end of the book. I hope you found it valuable. Do connect with me at LinkedIn and let
me know your feedback!
Here is a bit about me. I was born in the beautiful island of Lombok (Indonesia), grew up in Surabaya (Indonesia),
studied in Australia, and since 1994 I have been living in Singapore with my wife Felicia.
We both graduated from Bond University in 1994. We directly flew to Singapore to look for ajob as we did not have
enough money to go home first. We came with a few hundred dollars in our pocket, not enough to open a bank
account.
First 9 years of my career was at the application layer, doing business process innovation and application
development. Lettuce Node, I mean Lotus Notus, was dear to my heart for many years. The views and form UI
concept in the product remain relevant until today.
I moved to infrastructure world in 2003, focusing on UNIX by joining Sun Microsystems. I joined without knowing
what UNIX was and basically zero knowledge of infrastructure. My previous manager Seet Pheng Kue recommended
me, together with the head-hunter FA Mok, and Kim Boo Png made the hiring decision. I’m grateful for what they
have done as that forever changed my career. Those 5 years in Sun as strategic account SE taught me what
“enterprise infrastructure” really means.
In 2008 I applied to VMware as I wanted to follow my sales Chan Seng Chye. Poh Wah Lee convinced me to join
VMware as part his team, and until today I still see him as my elder and leader. I joined VMware as SE for global
accounts. A good chunk of my time was helping them troubleshoot performance problem, do capacity planning and
review configuration best practice. While I’m no longer an SE, I still enjoy doing this as it’s a valuable input to my
work as the domain architect in Aria Operations product team.
I set up VMware User Group in Singapore, back before it was called VMUG, and also VCP Club. In 2011, I was one of
the first to pass the VCAP DCD exam globally as beta exam participants. That knowledge proved to be critical and set
the foundation for my first book, which got published in 2014.
A lot of the analyses on this book was performed using Aria Operations. I have used since version 1.0 back in 2011. It
quickly became my favourite tool and I joined the team. Chandra Prathuri, Monica Sharma and Kameswaran
Subramanian hired and taught me “how the sausage is made”.
You can see more of my works on the Internet. Google has somehow tracked it 😊

You might also like