Pure Storage Web Guide - FlashArray VMware Best Practices
Pure Storage Web Guide - FlashArray VMware Best Practices
This document is not directly updated, rather it is updated indirectly when changes are made to the backing
documentation. While the updated date may reflect an older date on this page, this information is actively
maintained and should be considered the most recent best practices unless otherwise noted.
Executive Summary
This document describes the best practices for using the Pure Storage® FlashArray in VMware vSphere® 5.5+ and 6.0+
environments. While the Pure Storage FlashArray includes general support for a wide variety of VMware products, this
guide will focus on best practices concerning VMware ESXi® and VMware vCenter™. This paper will provide guidance
and recommendations for ESXi and vCenter settings and features that provide the best performance, value and
efficiency when used with the Pure Storage FlashArray.
Please note that this will be updated frequently, so for the most up-to-date version and other VMware information please
refer to the following page on the Pure Storage support site.
https://support.purestorage.com/Solutions/Virtualization/VMware
Audience
This document is intended for use by VMware and/or storage administrators who want to deploy the Pure Storage
FlashArray in VMware vSphere-based virtualized datacenters. A working familiarity with VMware products and concepts
is recommended.
A PowerCLI script to check and set certain best practices can be found here:
https://github.com/codyhosterman/powercli/blob/master/bestpractices.ps1
https://github.com/codyhosterman/powercli/blob/master/bestpracticechecker.ps1
http://www.codyhosterman.com/2016/05/updated-flasharray-vmware-best-practices-powercli-scripts/
Many of the techniques and operations can be simplified, automated and enhanced through Pure Storage integration
with various VMware products:
Detailed description of these integrations are beyond the scope of this document, but further details can be found at
https://support.purestorage.com/Solutions/Virtualization/VMware.
The FlashArray has two object types for volume provisioning, hosts and host groups:
• Host—a host is a collection of initiators (iSCSI IQNs or Fibre Channel WWNs) that refers to a physical host. A
FlashArray host object must have a one to one relationship with an ESXi host. Every active initiator for a given ESXi
host should be added to the respective FlashArray host object. If an initiator is not yet zoned (for instance), and not
intended to be, it can be omitted from the FlashArray host object. Furthermore, while the FlashArray supports
multiple protocols for a single host (a mixture of FC and iSCSI), ESXi does not support presenting VMFS storage
via more than one protocol. So creating a multi-protocol host object should be avoided on the FlashArray when in
use with VMware ESXi.
In the example below, the ESXi host has two online Fibre Channel HBAs with WWNs of 20:00:00:25:B5:11:11:1C
and 20:00:00:25:B5:44:44:1C.
• Host Group—a host group is a collection of host objects. Pure Storage recommends grouping your ESXi hosts into
clusters within vCenter—as this provides a variety of benefits like High Availability and Dynamic Resource
Scheduling. In order to provide simple provisioning, Pure Storage also recommends creating host groups that
correspond to VMware clusters. Therefore, with every VMware cluster that will use FlashArray storage, a respective
host group should be created. Every ESXi host that is in the cluster should have a corresponding host (as described
Be Aware that moving a host out of a host group will disconnect the host from any volume that is connected to
the host group. Doing so will cause PDLs to any datastores that are using the Volumes connected to that
Host Group.
DO NOT make this change online. If an ESXi host is running VMs on the array you are setting the host
personality on, data unavailability can occur. A fabric logout and login may occur and accidental permanent
device loss can occur. To avoid this possibility, only set this personality on hosts that are in maintenance
mode or are not actively using that array.
In Purity 5.1 and later, there is a new host personality type for VMware ESXi hosts. Changing a host personality on a
host object on the FlashArray causes the array to change some of its behavior for specific host types.
In general, we endeavor inside of Purity to automatically behave in the correct way without specific configuration
changes. Due to a variety of host types supported and varying requirements (a good example is SCSI interaction for
features like ActiveCluster) a manual configuration was required.
In Purity 5.1, it is recommended to enable the “ESXi” host personality for all host objects that represent ESXi hosts.
ESXi uses peripheral LUN IDs instead of flat LUN IDs—this changes how ESXi views any LUN ID on the FlashArray
above 255. Since ESXi does not properly interpret flat LUN IDs, it sees LUN ID higher than 255 to be 16,383 higher than
it should be (256 is seen as 16,639) which is outside of the supported range of ESXi. Setting the ESXi personality on the
FlashArray for a given host switches the FlashArray LUN methodology to peripheral, allowing ESXi to see LUN IDs
higher than 255.
While this personality change is currently only relevant for specific ActiveCluster environments and/or environments that
want to use higher-than-255 LUN IDs, it is still recommended to set this on all ESXi host objects. Moving forward other
behavior changes for ESXi might be included and doing it now ensures it is not missed when it might be important for
your environment.
BEST PRACTICE: Set FlashArray host objects to have the FlashArray “ESXi” host personality when using
Purity 5.1 or later. This change is NOT required for all environments (beyond the ones mentioned above), but
it is recommended to set it when you have a chance.
The ESXi host personality can be set through the FlashArray GUI, the CLI or REST. To set it via the GUI, click on
Storage, then Hosts, then the host you would like to configure:
Next, go to the Details pane and click the vertical ellipsis and choose Set Personality…:
Choose the radio button corresponding to ESXi and then click Save.
Private volumes, like ESXi boot volumes, should not be connected to the host group as they do not (and should not) be
shared. These volumes should be connected to the host object instead.
Pure Storage has no requirement on LUN IDs for VMware ESXi environments, and users should, therefore, rely on the
automatic LUN ID selection built into Purity.
https://support.purestorage.com/Solutions/SAN/Best_Practices/SAN_Guidelines_for_Maximizing_Pure_Performance
https://support.purestorage.com/Solutions/SAN/Configuration/Introduction_to_Fibre_Channel_and_Zoning
BEST PRACTICE: Use the Round Robin Path Selection Policy for FlashArray volumes.
1. Performance. Often the reason cited to change this value is performance. While this is true in certain cases, the
performance impact of changing this value is not usually profound (generally in the single digits of percentage
performance increase). While changing this value from 1,000 to 1 can improve performance, it generally will not
solve a major performance problem. Regardless, changing this value can improve performance in some use
cases, especially with iSCSI.
2. Path Failover Time. It has been noted in testing that ESXi will fail logical paths much more quickly when this value
is set to a the minimum of 1. During a physical failure of the storage environment (loss of a HBA, switch, cable,
port, controller) ESXi, after a certain period of time, will fail any logical path that relies on that failed physical
hardware and will discontinue attempting to use it for a given volume. This failure does not always happen
immediately. When the I/O Operations Limit is set to the default of 1,000 path failover time can sometimes be in
the 10s of seconds which can lead to noticeable disruption in performance during this failure. When this value is
set to the minimum of 1, path failover generally decreases to sub-ten seconds. This greatly reduces the impact of a
physical failure in the storage environment and provides greater performance resiliency and reliability.
3. FlashArray Controller I/O Balance. When Purity is upgraded on a FlashArray, the following process is observed (at
a high level): upgrade Purity on one controller, reboot it, wait for it to come back up, upgrade Purity on the other
controller, reboot it and you’re done. Due to the reboots, twice during the process half of the FlashArray front-end
ports go away. Because of this, we want to ensure that all hosts are actively using both controllers prior to
upgrade. One method that is used to confirm this is to check the I/O balance from each host across both
controllers. When volumes are configured to use Most Recently Used, an imbalance of 100% is usually observed
(ESXi tends to select paths that lead to the same front end port for all devices). This then means additional
troubleshooting to make sure that host can survive a controller reboot. When Round Robin is enabled with the
default I/O Operations Limit, port imbalance is improved to about 20-30% difference. When the I/O Operations
Limit is set to 1, this imbalance is less than 1%. This gives Pure Storage and the end user confidence that all hosts
are properly using all available front end ports.
For these three above reasons, Pure Storage highly recommends altering the I/O Operations Limit to 1.
BEST PRACTICE: Change the Round Robin I/O Operations Limit from 1,000 to 1 for FlashArray volumes.
(VMware KB regarding setting the IOPs Limit)
A new default SATP rule, provided by VMware by default was specifically built for the FlashArray to Pure Storage’s best
practices. Inside of ESXi you will see a new system rule:
The default pathing policy for the ALUA SATP is the Most Recently Used (MRU) Path Selection Policy. With this PSP,
ESXi will use the first identified path for a volume until that path goes away, or it is manually told to use a different path.
For a variety of reasons this is not the ideal configuration for FlashArray volumes—most notably of which is
performance.
Note that Round Robin is the default in 6.0 Patch 5 and later in the 6.0 code branch, and in 6.5 Update 1 and later in the
6.5 code branch.
The recommended option for configuring Round Robin and the correct I/O Operations Limit is to create a rule that will
cause any new FlashArray device that is added in the future to that host to automatically get the Round Robin PSP and
an I/O Operation Limit value of 1.
The following command creates a rule that achieves both of these for only Pure Storage FlashArray devices:
esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR" -O
"iops=1" -e "FlashArray SATP Rule"
This can also be accomplished through PowerCLI. Once connected to a vCenter Server this script will iterate through all
of the hosts in that particular vCenter and create a default rule to set Round Robin for all Pure Storage FlashArray
devices with an I/O Operation Limit set to 1
$creds = Get-Credential
Connect-VIServer -Server <vCenter> -Credential $creds
$hosts = get-vmhost
foreach ($esx in $hosts)
{
$esxcli=get-esxcli -VMHost $esx -v2
$satpArgs = $esxcli.storage.nmp.satp.rule.remove.createArgs()
$satpArgs.description = "Pure Storage FlashArray SATP Rule"
It is important to note that existing, previously presented devices will need to be manually set to Round Robin and an I/O
Operation Limit of 1. Optionally, the ESXi host can be rebooted so that it can inherit the multipathing configuration set
forth by the new rule.
For setting a new I/O Operation Limit on an existing device, see Appendix I: Per-Device NMP Configuration.
BEST PRACTICE: Use a SATP rule to configure multipathing for FlashArray volumes.
Note that I/O Operations of 1 is the default in 6.0 Patch 5 and later in the 6.0 code branch, and in 6.5 Update 1 and later
in the 6.5 code branch.
Verifying Connectivity
It is important to verify proper connectivity prior to implementing production workloads on a host or volume. This consists
of a few steps:
This will report the path selection policy and the number of logical paths. The number of logical paths will depend on the
number of HBAs, zoning and the number of ports cabled on the FlashArray.
The I/O Operations Limit cannot be checked from the vSphere Web Client—it can only be verified or altered via
command line utilities. The following command can check a particular device for the PSP and I/O Operations Limit:
Please remember that each of these settings are a per-host setting, so while a volume might be configured properly on
one host, it may not be correct on another. The PowerCLI script below can help you verify this at scale in a simple way.
https://github.com/codyhosterman/powercli/blob/master/bestpracticechecker.ps1
A CLI command exists to monitor I/O balance coming into the array:
purehost monitor --balance --interval <how long to sample> --repeat <how many iterations>
A well balanced host should be within a few percentage points of each path. Anything more than 15% or so might be
worthy of investigation. Refer to this post for more information.
The GUI will also report on host connectivity in general, based on initiator logins.
This report should be listed as redundant for every hosts, meaning that it is connected to each controller. If this reports
something else, investigate zoning and/or host configuration to correct this. For detailed explanation of the various
reported states, please refer to the FlashArray User Guide which can be found directly in your GUI:
If this is not configured for ESXi hosts running EFI-enabled VMs, the virtual machine will fail to properly boot. If it is not
changed on hosts running VMs being replicated by vSphere Replication, replication will fail. If it is not changed for VMs
whose applications are sending requests larger than 4MB, the larger I/O requests will fail which results in the application
failing as well.
This should be set on every ESXi host in the cluster that VMs may have access to, in order to ensure vMotion is
successful from one ESXi host to another. If none of the above circumstances apply to your environment then this value
can remain at the default. There is no known performance effect to changing this value.
For more detail on this change, please refer to the VMware KB article here:
BEST PRACTICE: Change the Disk.DiskMaxIOSize from 32 MB to at least 4 MB when any of the above
scenarios apply to your environment. A lower value is also acceptable.
VAAI Configuration
The VMware API for Array Integration (VAAI) primitives offer a way to offload and accelerate certain operations in a
VMware environment.
Pure Storage requires that all VAAI features be enabled on every ESXi hosts that are using FlashArray storage.
Disabling VAAI features can greatly reduce the efficiency and performance of FlashArray storage in ESXi environments.
All VAAI features are enabled by default (set to 1) in ESXi 5.x and later, so no action is typically required. Though these
settings can be verified via the vSphere Web Client or CLI tools.
1. WRITE SAME—DataMover.HardwareAcceleratedInit
2. XCOPY—DataMover.HardwareAcceleratedMove
3. ATOMIC TEST & SET— VMFSHardwareAcceleratedLocking
In order to provide a more efficient heart-beating mechanism for datastores VMware introduced a new host-wide setting
called /VMFS3/UseATSForHBOnVMFS5. In VMware’s own words:
“A change in the VMFS heartbeat update method was introduced in ESXi 5.5 Update 2, to help optimize the VMFS
heartbeat process. Whereas the legacy method involves plain SCSI reads and writes with the VMware ESXi kernel
handling validation, the new method offloads the validation step to the storage system.“
Pure Storage recommends keeping this value on whenever possible. That being said, it is a host wide setting, and it can
possibly affect storage arrays from other vendors negatively. Read the VMware KB article here:
Pure Storage is NOT susceptible to this issue, but in the case of the presence of an affected array from another vendor,
it might be necessary to turn this off. In this case, Pure Storage supports disabling this value and reverting to traditional
heart-beating mechanisms.
For additional information please refer to VMware Storage APIs for Array Integration with the Pure Storage
FlashArray document.
iSCSI Configuration
Just like any other array that supports iSCSI, Pure Storage recommends the following changes to an iSCSI-based
vSphere environment for the best performance.
https://storagehub.vmware.com/t/vsphere-storage/best-practices-for-running-vmware-vsphere-on-iscsi/
1. Log in to the vSphere Web Client and select the host under Hosts and Clusters.
2. Navigate to the Manage tab.
3. Select the Storage option.
4. Under Storage Adapters, select the iSCSI vmhba to be modified.
5. Select Advanced and change the Login Timeout parameter. This can be done on the iSCSI adapter itself or on a
specific target.
The default Login Timeout value is 5 seconds and the maximum value is 60 seconds.
BEST PRACTICE: Set iSCSI Login Timeout for FlashArray targets to 30 seconds. A higher value is supported
but not necessary.
Disable DelayedAck
DelayedAck is an advanced iSCSI option that allows or disallows an iSCSI initiator to delay acknowledgement of
received data packets.
1. Log in to the vSphere Web Client and select the host under Hosts and Clusters.
2. Navigate to the Manage tab.
3. Select the Storage option.
4. Under Storage Adapters, select the iSCSI vmhba to be modified.
Navigate to Advanced Options and modify the DelayedAck setting by using the option that best matches your
requirements, as follows:
Option 1: Modify the DelayedAck setting on a particular discovery address (recommended) as follows:
1. Select Targets.
2. On a discovery address, select the Dynamic Discovery tab.
3. Select the iSCSI server.
4. Click Advanced.
5. Change DelayedAck to false.
1. Select Targets.
2. Select the Static Discovery tab.
3. Select the iSCSI server and click Advanced.
4. Change DelayedAck to false.
Option 3: Modify the DelayedAck setting globally for the iSCSI adapter as follows:
DelayedAck is highly recommended to be disabled, but is not absolutely required by Pure Storage. In highly-congested
networks, if packets are lost, or simply take too long to be acknowledged, due to that congestion, performance can drop.
If DelayedAck is enabled, where not every packet is acknowledged at once (instead one acknowledgement is sent per
so many packets) far more re-transmission can occur, further exacerbating congestion. This can lead to continually
decreasing performance until congestion clears. Since DelayedAck can contribute to this it is recommended to disable it
in order to greatly reduce the effect of congested networks and packet retransmission.
Enabling jumbo frames can further harm this since packets that are retransmitted are far larger. If jumbo frames are
enabled, it is absolutely recommended to disable DelayedAck. See the following VMware KB for more information:
http://kb.vmware.com/selfservice/microsites/
search.do?language=en_US&cmd=displayKC&externalId=1002598
Configuration and detailed discussion is out of the scope of this document, but it is recommended to read through the
following VMware document that describes this and other concepts in-depth:
http://www.vmware.com/files/pdf/techpaper/vmware-multipathing-configuration-software-iSCSI-port-
binding.pdf
Note that ESXi 6.5 has expanded support for port binding and features such as iSCSI routing (though the use of iSCSI
routing is not usually recommended) and multiple subnets. Refer to ESXi 6.5 release notes for more information.
Jumbo Frames
In some iSCSI environments it is required to enable jumbo frames to adhere with the network configuration between the
host and the FlashArray. Enabling jumbo frames is a cross-environment change so careful coordination is required to
ensure proper configuration. It is important to work with your networking team and Pure Storage representatives when
enabling jumbo frames. Please note that this is not a requirement for iSCSI use on the Pure Storage FlashArray—in
general Pure Storage recommends leaving MTU at the default setting.
That being said, altering the MTU is a fully supported and is up to the discretion of the user.
Configure jumbo frames on the physical network switch/infrastructure for each port using the relevant switch CLI or GUI.
2. Configure jumbo frames on the physical network switch/infrastructure for each port using the relevant switch CLI or
GUI.
1. Browse to a host in the vSphere Web Client navigator.
2. Click the Manage tab and select Networking > Virtual Switches.
3. Select the switch from the vSwitch list.
4. Click the name of the VMkernel network adapter.
5. Click the pencil icon to edit.
6. Click NIC settings and set the MTU to your desired value.
7. Click OK.
8. Click the pencil icon to edit on the top to edit the vSwitch itself.
9. Set the MTU to your desired value.
10. Click OK.
If the ping operations does not return successfully, then jumbo frames is not properly configured in ESXi, the networking
devices, and/or the FlashArray port.
http://www.codyhosterman.com/2015/03/configuring-iscsi-chap-in-vmware-with-the-flasharray/
Please note that iSCSI CHAP is not currently supported with dynamic iSCSI targets on the Fla shArray. If
CHAP is going to be used, please configure your iSCSI FlashArray targets as static targets.
When the default configuration for iSCSI is in use with VMware ESXi the delay for these events will generally be 25-35
seconds. While the majority of environments are able to successfully recover from these events unscathed this is not
true for all environments. On a handful of occasions there have been environments which contain applications that need
faster recovery times. Without these faster recovery times I/O failures have been noted and manual recovery efforts
were required to bring the environment back online.
To better understand how these parameters are used in iSCSI recovery efforts it is recommended you read the following
blog posts for deeper insight:
iSCSI: A 25 second pause in I/O during a single link loss? What gives?
Once a thorough review of these iSCSI options have been investigated, additional testing within your own environment
is strongly recommended to ensure no additional issues are introduced as a result of these changes.
Datastore Management
Using a smaller number of large volumes is generally a better idea today. In the past a recommendation to use a larger
number of smaller volumes was made for performance limitations that no longer exist. This limit traditionally was due to
two reasons: VMFS scalability issues due to locking and/or per-volume queue limitations on the underlying array.
VMware resolved the first issue with the introduction of Atomic Test and Set, also called Hardware Assisted Locking.
Prior to the introduction of VAAI ATS (Atomic Test and Set), VMFS used LUN-level locking via full SCSI reservations to
acquire exclusive metadata control for a VMFS volume. In a cluster with multiple nodes, all metadata operations were
serialized and hosts had to wait until whichever host, currently holding a lock, released that lock. This behavior not only
caused metadata lock queues but also prevented standard I/O to a volume from VMs on other ESXi hosts which were
not currently holding the lock.
The introduction of ATS removed scaling limits via the removal of lock contention; thus, moving the bottleneck down to
the storage, where many traditional arrays had per-volume I/O queue limits. This limited what a single volume could do
from a performance perspective as compared to what the array could do in aggregate. This is not the case with the
FlashArray.
A FlashArray volume is not limited by an artificial performance limit or an individual queue. A single FlashArray volume
can offer the full performance of an entire FlashArray, so provisioning ten volumes instead of one, is not going to empty
the HBAs out any faster. From a FlashArray perspective, there is no immediate performance benefit to using more than
one volume for your virtual machines.
The main point is that there is always a bottleneck somewhere, and when you fix that bottleneck, it is just transferred
somewhere else. ESXi was once the bottleneck due to its locking mechanism, then it fixed that with ATS. This, in turn,
moved the bottleneck down to the array volume queue depth limit. The FlashArray doesn’t have a volume queue depth
limit, so now that bottleneck has been moved back to ESXi and its internal queues.
Altering VMware queue limits is not generally needed with the exception of extraordinarily intense workloads. For high-
performance configuration, refer to the section of this document on ESXi queue configuration.
For ESXi 5.x through 6.0, use VMFS-5. For ESXi 6.5 and later it is highly recommended to use VMFS-6. It should be
noted that VMFS-6 is not the default option for ESX 6.5, so be careful to choose the correct version when creating new
VMFS datastores in ESXi 6.5.
BEST PRACTICE: Use the latest supported VMFS version for the in-use ESXi host
For a deeper-dive of ESXi queueing and the FlashArray, please read this post:
http://www.codyhosterman.com/2017/02/understanding-vmware-esxi-queuing-and-the-flasharray/
QLogic 64 qlfxmaxqdepth
Brocade 32 bfa_lun_queue_depth
Emulex 32 lpfc0_lun_queue_depth
Changing these settings require a host reboot. For instructions to check and set these values, please refer to this
VMware KB article:
Changing the queue depth for QLogic, Emulex, and Brocade HBAs
There is a second per-device setting called “Disk Schedule Number Requests Outstanding” often referred to as
DSNRO. This is a hypervisor-level queue depth limit that provides a mechanism for managing the queue depth limit for
an individual device. This value is a per-device setting that defaults to 32 and can be increased to a value of 256.
It should be noted that this value only comes into play for a volume when that volume is being accessed by two or more
virtual machines on that host. If there is more than one virtual machine active on it, the lowest of the two values (DSNRO
or the HBA device queue depth limit) is the value that is observed by ESXi as the actual device queue depth limit. So, in
other words, if a volume has two VMs on it, and DSNRO is set to 32 and the HBA device queue depth limit is set to 64,
the actual queue depth limit for that device is 32. For more information on DSNRO see the VMware KB here:
In general, Pure Storage does not recommend changing these values. The majority of workloads are distributed across
hosts and/or not intense enough to overwhelm the default queue depths. The FlashArray is fast enough (low enough
latency) that the workload has to be quite high in order to overwhelm the queue.
If the default queue depth is consistently overwhelmed, the simplest option is to provision a new datastore and distribute
some virtual machines to the new datastore. If a workload from a virtual machine is too great for the default queue
depth, then increasing the queue depth limit is the better option.
If a workload demands queue depths to be increased, Pure Storage recommends making both the HBA device queue
depth limit and DSNRO equal. Generally, do not change these values without direction from VMware or Pure Storage
support.
You can verify the values of both of these for a given device with the command:
BEST PRACTICE: Leave queue depth limits at the default. Only raise them when performance requirements
dictate it.
• Disk.QFullSampleSize—the count of QUEUE FULL or BUSY conditions it takes before ESXi will start throttling.
Default is zero (feature disabled)
• Disk.QFullThreshold—the count of good condition responses after a QUEUE FULL or BUSY required before ESXi
starts increasing the queue depth limit again
The Pure Storage FlashArray does not advertise a queue full condition for a volume. Since every volume can use the full
performance and queue of the FlashArray, this limit is impractically high and this sense code essentially will never be
issued. Therefore, there is no reason to set or alter these values for Pure Storage FlashArray volumes because QUEUE
FULL will never occur.
When a latency threshold is entered, vCenter will aggregate a weighted average of all disk latencies seen by all hosts
that see that particular datastore. This number does not include host-side queuing, it is only the time it takes for the I/O
to be sent from the SAN to the array and acknowledged back.
Knowing these factors, we can make these points about SIOC and the FlashArray:
1. SIOC is not going to be particularly helpful if there is host-side queueing since it does not take host-induced
additional latency into account. This (the ESXi device queue) is generally where most of the latency is introduced
in a FlashArray environment.
2. The FlashArray will rarely have sustained latency above 1 ms, so this threshold will not be reached for any
meaningful amount of time on a FlashArray volume so SIOC will never kick in
3. A single FlashArray volume does not have a queue limit, so it can handle quite a high number of outstanding I/O
and throughput (especially reads), therefore SIOC and its random-read injector cannot identify FlashArray limits in
meaningful ways.
In short, SIOC is fully supported by Pure Storage, but Pure Storage makes no specific recommendations for
configuration.
Storage DRS
VMware vCenter also offers a feature called Storage DRS (SDRS). SDRS moves virtual machines from one datastore to
another when a certain average latency threshold has been reach on the datastore or when a certain used capacity has
been reached. For this section, let’s focus on the performance-based moves.
Storage DRS, like Storage IO Control, wait for a certain latency threshold to be reached before it acts. And, also like
SIOC, the minimum is 5 ms.
While it is too high in general to be useful for FlashArray-induced latency, SDRS differs from SIOC in the latency it
actually looks at. SDRS uses the “VMObservedLatency” (referred to a GAVG in esxtop) averages from the hosts
accessing the datastore. Therefore, this latency includes time spent queueing in the ESXi kernel. So, theoretically, a
high-IOPS workload, with a low configured device queue depth limit, an I/O could conceivably spend 5 ms or more
queuing in the kernel. In this situation Storage DRS will suggest moving a virtual machine to a datastore which does not
have an overwhelmed queue.
1. The FlashArray empties out the queue fast enough that a workload must be quite intense to fill up an ESXi queue
so much that is spends 5 ms or more in it. Usually, with a workload like that, the queueing is higher up the stack (in
the virtual machine)
2. Storage DRS samples for 16 hours before it makes a recommendation, so typically you will get one
recommendation set per-day for a datastore. So this workload must be consistently and extremely high, for a long
time, before SDRS acts.
This is one of the reasons thin virtual disks are preferred—you get better insight into how much space the guests are
actually using.
Regardless of what type you choose, ESXi is going to take the sum total of the allocated space of your virtual disks and
compare that to the total capacity of the filesystem of the volume. The used space is the sum of those virtual disks
allocations. This number increases as virtual disks grow or new ones are added, and can decrease as old ones are
deleted or moved, or even shrunk.
Compare this to what the FlashArray reports for a capacity. What the FlashArray reports for a volume usage is NOT the
amount used for that volume. What the FlashArray reports is the unique footprint of the volume on that array. Let’s look
at this VMFS that is on 512 GB FlashArray volume. The VMFS is therefore 512 GB, but is using 401 GB of space of the
filesystem. This means that there are 401 GB of allocated virtual disks:
This metric can go change at any time as the data set changes on that volume or any other volume on the FlashArray.
If, for instance, some other host writes 2 GB to another volume (let’s call it “volume2”), and that 2 GB happens to be
identical to 2 GB of that 80 GB GB on “Vol04”, then “Vol04” would no longer have 80 GB of unique space. It would drop
down to 78 GB, even though nothing changed on “Vol04” itself. Instead, someone else just happened to write similar
data, making the footprint of “InfrastructureDS” less unique.
For a more detailed conversation around this, refer to this blog post:
http://www.codyhosterman.com/2017/01/vmfs-capacity-monitoring-in-a-data-reducing-world/
So, why doesn’t VMFS report the same used capacity as that the FlashArray reports for as used for the underlying
volume? Well, because they mean different things. VMware reports what is allocated on the VMFS and the FlashArray
reports what is unique on the underlying volume. The FlashArray value can change constantly. The FlashArray metric is
only meant to show how reducible the data on that volume is internal to the volume and against the entire array.
Conversely, VMFS capacity usage is based solely on how much capacity is allocated to it by virtual machines. The
FlashArray volume space metric, on the other hand, actually relates to what is also being used on other volumes. In
other words, VMFS usage is only affected by data on the VMFS volume itself. The FlashArray volume space metric is
affected by the data on the volume and also on all of volumes. So the two values should not be conflated.
For capacity tracking, you should refer to the VMFS usage. How do we best track VMFS usage? What do we do when it
is full?
In general, using a product like vRealize Operations Manager with the FlashArray Management Pack is a great option
here. But for the purposes of this document we will focus on what can be done inside of vCenter alone.
The first question is the easiest to answer. Choose either a percentage full, or at a certain capacity free. Do you want to
do something when, for example, a VMFS volume hits 75% full or when there is less than 50 GB left free? Choose what
makes sense to you.
Configuring a script to run, an email to be issued, or a notification trap to be sent greatly diminishes the chance of a
datastore running out of space unnoticed.
The next step is to decide what happens when a capacity warning occurs. There are a few options:
Your solution may be one of these options or a mix of all three. Let’s quickly walk through the options.
This is the simplest option. If capacity has crossed the threshold you have specified, increase the volume apacity to
clear the threshold. The process is:
Choose “Use ‘Free space xxx TB’ to expand the datastore”. There should be a note that the datastore already occupies
space on this volume. If this note does not appear, you have selected the wrong device to expand to. Pure Storage
highly recommends that you do not create VMFS datastores that span multiple volumes—a VMFS should have a one to
one relationship to a FlashArray volume.
Another option is to move one or more virtual machines from a more-full datastore to a less-full datastore. While this can
be manually achieved through case-by-case Storage vMotion, Pure Storage recommends leveraging Storage DRS to
automate this. Storage DRS provides, in addition to the performance-based moves discussed earlier in this document,
the ability to automatically Storage vMotion virtual machines based on capacity usage of VMFS datastores. If a
datastore reaches a certain percent full, SDRS can automatically move, or make recommendations for, virtual machines
to be moved to balance out space usage across volumes.
When a datastore cluster is created you can enable SDRS and choose capacity threshold settings, which can either be
a percentage or a capacity amount:
• Only include datastores on the same FlashArray in a given datastore cluster. This will allow Storage vMotion to use
the VAAI XCOPY offload to accelerate the migration process of virtual machines and greatly reduce the footprint of
the migration workload
• Include datastores with similar configurations in a datastore cluster. For example, if a datastore is replicated on the
FlashArray, only include datastores that are replicated in the same FlashArray protection group so that a SDRS
migration does not violate required protection for a virtual machine
The last option is to create an entirely new VMFS volume. You might decide to do this for a few reasons:
1. The current VMFS volumes have maxed out possible capacity (64 TB each)
2. The current VMFS volumes have overloaded the queue depth inside of every ESXi server using it. Therefore, they
can be grown in capacity, but cannot provide any more performance due to ESXi limits
In this situation follow the standard VMFS provisioning steps for a new datastore. Once the creation of volumes and
hosts/host groups and the volume connection is complete, the volumes will be accessible to the ESXi host(s)
[Presuming SAN zoning is completed]. Using the vSphere Web Client, initiate a “Rescan Storage…” to make the newly-
connected Pure Storage volume(s) fully-visible to the ESXi servers in the cluster as shown above. One can then use the
“Add Storage” wizard to format the newly added volume.
Shrinking a Volume
While it is possible to shrink a FlashArray volume non-disruptively, vSphere does not have the ability to shrink a VMFS
partition. Therefore, do not shrink FlashArray volumes that contain VMFS datastores as doing so could incur data
loss.
When a FlashArray snapshot is taken, a new volume is not created—essentially it is a metadata point-in-time reference
to a data blocks on the array that reflect that moment’s version of the data. This snapshot is immutable and cannot be
directly mounted. Instead, the metadata of a snapshot has to be “copied” to an actual volume which then allows the
point-in-time, which was preserved by the snapshot metadata, to be presented to a host. This behavior allows the
snapshot to be re-used again and again without changing the data in that snapshot. If a snapshot is not needed more
than one time an alternative option is to create a direct snap copy from one volume to another—merging the snapshot
creation step with the association step.
When a volume hosting a VMFS datastore is copied via array-based snapshots, the copied VMFS datastore is now on a
volume that has a different serial number than the original source volume. Therefore, the VMFS will be reported as
having an invalid signature since the VMFS datastore signature is a hash partially based on the serial of the hosting
device. Consequently, the device will not be automatically mounted upon rescan—instead the new datastore wizard
needs to be run to find the device and resignature the VMFS datastore. Pure Storage recommends resignaturing copied
volumes rather than mounting them with an existing signatures (referred to as force mounting).
BEST PRACTICEResignature copied VMFS volumes and do not force mount them
For more detail on resignaturing and snapshot management, please refer to the following blog posts:
Deleting a Datastore
Prior to the deletion of a volume, ensure that all important data has been moved off or is no longer needed. From the
vSphere Web Client (or CLI) delete or unmount the VMFS volume and then detach the underlying device from the
appropriate host(s).
After a volume has been detached from the ESXi host(s) it must first be disconnected (from the FlashArray perspective)
from the host within the Purity GUI before it can be destroyed (deleted) on the FlashArray.
BEST PRACTICE: Unmount and detach FlashArray volumes from all ESXi hosts before destroying them on
the array
2. Detach the volume that hosted the datastore from every ESXi host that sees the volume
3. Disconnect the volume from the hosts or host groups on the FlashArray
By default a volume can be recovered after deletion for 24 hours to protect against accidental removal.
This entire removal and deletion process is automated through the Pure Storage Plugin for the vSphere Web Client and
its use is therefore recommended.
As always, configure guest operating systems in accordance with the corresponding vendor installation guidelines.
1. Thin—thin virtual disks only allocate what is used by the guest. Upon creation, thin virtual disks only consume one
block of space. As the guest writes data, new blocks are allocated on VMFS, then zereod out, then the data is
committed to storage. Therefore there is some additional latency for new write
2. Zeroedthick (lazy)— zeroed thick virtual disks allocate all of the space on the VMFS upon creation. As soon as the
guest writes to a specific block for the first time in the virtual disk, the block is first zeroed, then the data is
committed. Therefore there is some additional latency for new writes. Though less than thin (since it only has to
zero—not also allocate), there is a negligible performance impact between zeroedthick (lazy) and thin.
3. Eagerzeroedthick—eagerzeroedthick virtual disks allocate all of their provisioned size upon creation and also zero
out the entire capacity upon creation. This type of disk cannot be used until the zeroing is complete.
Eagerzeroedthick has zero first-write latency penalty because allocation and zeroing is done in advance, and not
on-demand.
Prior to WRITE SAME support, the performance differences between these virtual disk allocation mechanisms were
distinct. This was due to the fact that before an unallocated block could be written to, zeroes would have to be written
first causing an allocate-on-first-write penalty (increased latency). Therefore, for every new block written, there were
actually two writes; the zeroes then the actual data. For thin and zeroedthick virtual disks, this zeroing was on-demand
so the penalty was seen by applications. For eagerzeroedthick, it was noticed during deployment because the entire
virtual disk had to be zeroed prior to use. This zeroing caused unnecessary I/O on the SAN fabric, subtracting available
bandwidth from “real” I/O.
To resolve this issue, VMware introduced WRITE SAME support. WRITE SAME is a SCSI command that tells a target
device (or array) to write a pattern (in this case, zeros) to a target location. ESXi utilizes this command to avoid having to
actually send a payload of zeros but instead simply communicates to any array that it needs to write zeros to a certain
location on a certain device. This not only reduces traffic on the SAN fabric, but also speeds up the overall process since
the zeros do not have to traverse the data path.
With this knowledge, choosing a virtual disk is a factor of a few different variables that need to be evaluated. In general,
Pure Storage makes the following recommendations:
• Lead with thin virtual disks. They offer the greatest flexibility and functionality and the performance difference is only
at issue with the most sensitive of applications.
• For highly-sensitive applications with high performance requirements, eagerzeroedthick is the best choice. It is
always the best-performing virtual disk type.
• In no situation does Pure Storage recommend the use of zeroedthick (thick provision lazy zeroed) virtual disks.
There is very little advantage to this format over the others and can also lead to stranded space as described in this
post.
With that being said, for more details on how these recommendations were decided upon, refer to the following
considerations. Note that at the end of each consideration is a recommendation but that recommendation is valid only
when only that specific consideration is important. When choosing a virtual disk type, take into account your virtual
machine business requirements and utilize these requirements to motivate your design decisions. Based on those
decisions, choose the virtual disk type that is best suitable for your virtual machine.
• Performance—with the introduction of WRITE SAME (more information on WRITE SAME can be found in the
section Block Zero or WRITE SAME) support, the performance difference between the different types of virtual
disks is dramatically reduced—almost eliminated. In lab experiments, a difference can be observed during writes to
unallocated portions of a thin or zeroedthick virtual disk. This difference is negligible but of course still non-zero.
Therefore, performance is no longer an overridingly important factor in the type of virtual disk to use as the disparity
is diminished, but for the most latency-sensitive of applications eagerzeroedthick will always be slightly better than
the others. Recommendation: eagerzeroedthick.
• Protection against space exhaustion—each virtual disk type, based on its architecture, has varying degrees of
protection against space exhaustion. Thin virtual disks do not reserve space on the VMFS datastore upon creation
and instead grow in 1 MB blocks as needed. Therefore, if unmonitored, as one or more thin virtual disks grow on
the datastore, they could exhaust the capacity of the VMFS. Even if the underlying array has plenty of additional
capacity to provide. If careful monitoring is in place that provides the ability to make proactive resolution of capacity
exhaustion (moving the virtual machines around or grow the VMFS) thin virtual disks are a perfectly acceptable
choice. Storage DRS is an excellent solution for space exhaustion prevention. While careful monitoring can protect
against this possibility, it can still be of a concern and should be contemplated upon initial provisioning. Zeroedthick
and eagerzeroedthick virtual disks are not susceptible to VMFS logical capacity exhaustion because the space is
reserved on the VMFS upon creation. Recommendation: eagerzeroedthick.
• Virtual disk density—it should be noted that while all virtual disk types take up the same amount of physical space
on the FlashArray due to data reduction, they have different requirements on the VMFS layer. Thin virtual disks can
be oversubscribed (more capacity provisioned than the VMFS reports as being available) allowing for far more
virtual disks to fit on a given volume than either of the thick formats. This provides a greater virtual machine to
VMFS datastore density and reduces the number or size of volumes that are required to store them. This, in effect,
reduces the management overhead of provisioning and managing additional volumes in a VMware environment.
Recommendation: thin.
• Time to create—the virtual disk types also vary in how long it takes to initially create them. Since thin and
zeroedthick virtual disks do not zero space until they are actually written to by a guest they are both created in trivial
amounts of time—usually a second or two. Eagerzeroedthick disks, on the other hand, are pre-zeroed at creation
BEST PRACTICE: Use thin virtual disks for most virtual machines. Use eagerzeroedthick for virtual machines
that require very high performance levels.
No virtual disk option quite fits all possible use-cases perfectly, so choosing an allocation method should generally be
decided upon on a case-by-case basis. VMs that are intended for short term use, without extraordinarily high
performance requirements, fit nicely with thin virtual disks. For VMs that have higher performance needs
eagerzeroedthick is a good choice.
Virtual SCSI Adapter—the best performing and most efficient virtual SCSI adapter is the VMware Paravirtual SCSI
Adapter. This adapter has the best CPU efficiency at high workloads and provides the highest queue depths for a virtual
machine—starting at an adapter queue depth of 256 and a virtual disk queue depth 64 (twice what the LSI Logic can
provide by default). The queue limits of PVSCSI can be further tuned, please refer to the Guest-level Settings section for
more information.
Virtual Hardware—it is recommended to use the latest virtual hardware version that the hosting ESXi hosts supports.
VMware tools—in general, it is advisable to install the latest supported version of VMware tools in all virtual machines.
CPU and Memory - provision vCPUs and memory as per the application requirements.
IOPS Limits—if you want to limit a virtual machine or a particular amount of IOPS, you can use the built-in ESXi IOPS
limits. ESXi allows you to specify a number of IOPS a given virtual machine can issue for a given virtual disk. Once the
virtual machine exceeds that number, any additional I/Os will be queued. In ESXi 6.0 and earlier this can be applied via
the “Edit Settings” option of a virtual machine. In ESXi 6.5 and later, this can also be configured via a VM Storage Policy.
Template Configuration
In general, template configuration is no different than
virtual machine configuration. Standard recommendations
apply. That being said, since templates are by definition frequently copied, Pure Storage recommends putting copies of
the templates on FlashArrays that are frequent targets of virtual machines deployed from a template. If the template and
target datastore are on the same FlashArray, the copy process can take advantage of VAAI XCOPY, which greatly
accelerates the copy process while reducing the workload impact of the copy operation.
BEST PRACTICE: For the fastest and most efficient virtual machine deployments, place templates on the
same FlashArray as the target datastore.
The introduction of XCOPY support for virtual machine movement allows for this workload to be offloaded from the
virtualization stack to almost entirely onto the storage array. The ESXi kernel is no longer directly in the data copy path
and the storage array instead does all the work. XCOPY functions by having the ESXi host identify a region of a VMFS
that needs to be copied. ESXi describes this space into a series of XCOPY SCSI commands and sends them to the
array. The array then translates these block descriptors and copies/moves the data from the described source locations
to the described target locations. This architecture therefore does not require the moved data to be sent back and forth
between the host and array—the SAN fabric does not play a role in traversing the data. This vastly reduces the time to
move data. XCOPY benefits are leveraged during the following operations[1]:
During these offloaded operations, the throughput required on the data path is greatly reduced as well as the ESXi
hardware resources (HBAs, CPUs etc.) initiating the request. This frees up resources for more important virtual machine
operations by letting the ESXi resources do what they do best: run virtual machines, and lets the storage do what it does
best: manage the storage.
On the Pure Storage FlashArray, XCOPY sessions are exceptionally quick and efficient. Due to FlashReduce
technology (features like deduplication, pattern removal and compression) similar data is never stored on the FlashArray
more than once. Therefore, during a host-initiated copy operation such as with XCOPY, the FlashArray does not need to
copy the data—this would be wasteful. Instead, Purity simply accepts and acknowledges the XCOPY requests and
creates new (or in the case of Storage vMotion, redirects existing) metadata pointers. By not actually having to copy/
move data, the offload duration is greatly reduced. In effect, the XCOPY process is a 100% inline deduplicated
operation. A non-VAAI copy process for a virtual machine containing 50 GB of data can take on the order of multiple
minutes or more depending on the workload on the SAN. When XCOPY is enabled this time drops to a matter of a few
seconds.
Guest-level Settings
In general, standard operating system configuration best practices apply and Pure Storage does not make any
overriding recommendations. So, please refer to VMware and/or OS vendor documentation for particulars of configuring
a guest operating system for best operation in VMware virtualized environment.
That being said, Pure Storage does recommend two non-default options for file system configuration in a guest on a
virtual disk residing on a FlashArray volume. Both configurations provide automatic space reclamation support. While it
is highly recommended to follow these recommendations, it is not absolutely required.
In short:
• For Linux guests in vSphere 6.5 or later using thin virtual disks, mount filesystems with the discard option
• For Windows 2012 R2 or later guests in vSphere 6.0 or later using thin virtual disks, use a NTFS allocation unit
size of 64K
Refer to the in-guest space reclamation section for a detailed description of enabling these options.
[1
• ] Note that there are VMware-enforced caveats in certain situations that would prevent XCOPY behavior and
revert to legacy software copy. Refer to VMware documentation for this information at www.vmware.com.
In general, this change is not needed and therefore not recommended for most workloads. Only increase these values if
you know a virtual machine needs or will need this additional queue depth. Opening this queue for a virtual machine that
does not (or should not) need it, can expose noisy neighbor performance issues. If a virtual machine has a process that
unexpectedly becomes intense it can unfairly steal queue slots from other virtual machines sharing the underlying
datastore on that host. This can then cause the performance of other virtual machines to suffer.
If an application does need to push a high amount of IOPS to a single virtual disk these limits must be increased. See
VMware KB here for information on how to configure Paravirtual SCSI adapter queue limits. The process slightly differs
between Linux and Windows.
http://www.codyhosterman.com/2017/02/understanding-vmware-esxi-queuing-and-the-flasharray/
For data reduction all-flash-arrays like the FlashArray, it is of particular importance to make sure that this dead space is
reclaimed. If dead space is not reclaimed the array can inaccurately report how much space is being used, which can
lead to confusion (due to differing used space reporting from the hosts) and premature purchase of additional storage.
Therefore, reclaiming this space and making sure the FlashArray has an accurate reflection of what is actually used is
essential. An accurate space report has the following benefits:
• More efficient replication since blocks that are no longer needed are not replication
• More efficient snapshots, blocks that are no longer needed are not protected by additional snapshots
• Better space usage trending, if space is updated to be accurate frequently it is much easier to trend and project
actual space exhaustion. Otherwise, dead space can make it seem that capacity is used up far earlier than it should
be
The feature that can be used to reclaim space is called Space Reclamation, which uses the SCSI command called
UNMAP. UNMAP can be issued to underlying device to inform the array that certain blocks are no longer needed by the
host and can be “reclaimed”. The array can then return those blocks to the pool of free storage.
• VMFS—when an administrator deletes a virtual disk or an entire virtual machine (or moves it to another datastore)
the space that used to store that virtual disk or virtual machine is now dead on the array. The array does not know
that the space has been freed up, therefore, effectively turning those blocks into dead space.
• In-guest—when a file has been moved or deleted from a guest filesystem inside of a virtual machine on a virtual
disk, the underlying VMFS does not know that a block is no longer in use by the virtual disk and consequently
neither does the array. So that space is now also dead space.
So dead space can be accumulated in two ways. Fortunately, VMware has methods for dealing with both, that leverage
the UNMAP feature support of the FlashArray.
In vSphere 5.5 and 6.0, VMFS UNMAP is a manual process, executed on demand by an administrator. In vSphere 6.5,
VMFS UNMAP is an automatic process that gets executed by ESXi as needed without administrative intervention.
UNMAP with esxcli is an iterative process. The block count specifies how large each iteration is. If you do not specify a
block count, 200 blocks will be the default value (each block is 1 MB, so each iteration issues UNMAP to a 200 MB
section at a time). The operation runs UNMAP against the free space of the VMFS volume until the entirety of the free
space has been reclaimed. If the free space is not perfectly divisible by the block count, the block count will be reduced
at the final iteration to whatever amount of space is left.
While the FlashArray can handle very large values for this, ESXi does not support increasing the block count any larger
than 1% of the free capacity of the target VMFS volume. Consequently, the best practice for block count during UNMAP
is no greater than 1% of the free space. So as an example, if a VMFS volume has 1,048,576 MB free, the largest block
count supported is 10,485 (always round down). If you specify a larger value the command will still be accepted, but
ESXi will override the value back down to the default of 200 MB, which will dramatically slow down the operation.
BEST PRACTICE: For shortest UNMAP duration, use a large block count.
There are other methods to run or even schedule UNMAP, such as PowerCLI, vRealize Orchestrator and the FlashArray
vSphere Web Client Plugin. These methods are outside of the scope of this document, please refer to the respective
VMware and FlashArray integration documents for further detail.
If an UNMAP process seems to be slow, you can check to see if the block count value was overridden. You can check
the hostd.log file in the /var/log/ directory on the target ESXi host. For every UNMAP operation there will be a series of
messages that dictate the block count for every iteration. Examine the log and look for a line that indicates the UUID of
the VMFS volume being reclaimed, the line will look like the example below:
From ESXi 5.5 Patch 3 and later, any UNMAP operation against a datastore that is 75% or more full will use a
block count of 200 regardless to any block count specified in the command. For more information refer to the
VMware KB article here.
Pure Storage recommends that this be configured to “low” and not disabled. For the initial release of ESXi 6.5, VMware
currently only offers a low priority—medium and high priorities have not yet been enabled in the ESXi kernel. Pure
Storage will re-evaluate this recommendation when and if these higher priorities become available.
BEST PRACTICE: Keep automatic UNMAP enabled on VMFS-6 volumes with the setting of “low”
Please note that VMFS-6 Automatic UNMAP will not be issued to inactive datastores. In other words, if a datastore does
not have actively running virtual machines on it, the datastore will be ignored. In those cases, the simplest option to
reclaim them is to run the traditional esxcli UNMAP command.
Pure Storage does support automatic UNMAP being disabled, if that is, for some reason, preferred by the customer. But
to provide the most efficient and accurate environment, it is highly recommended to be left enabled.
When a guest writes data to a file system on a virtual disk, the required capacity is allocated on the VMFS (if not already
allocated) by expanding the file that represents the virtual disk. The data is then committed down to the array. When that
data is deleted by the guest, the guest OS filesystem is cleared of the file, but this deletion is not reflected by the virtual
disk allocation on the VMFS, nor the physical capacity on the array. To ensure the below layers are accurately reporting
used space, in-guest UNMAP should be enabled.
Prior to ESXi 6.0 and virtual machine hardware version 11, guests could not leverage native UNMAP capabilities on a
virtual disk because ESXi virtualized the SCSI layer and did not report UNMAP capability up through to the guest. So
even if guest operating systems supported UNMAP natively, they could not issue UNMAP to a file system residing on a
virtual disk. Consequently, reclaiming this space was a manual and tedious process.
In ESXi 6.0, VMware has resolved this problem and streamlined the reclamation process. With in-guest UNMAP
support, guests running in a virtual machine using hardware version 11 can now issue UNMAP directly to virtual disks.
The process is as follows:
1. A guest application or user deletes a file from a file system residing on a thin virtual disk
2. The guest automatically (or manually) issues UNMAP to the guest file system on the virtual disk
3. The virtual disk is then shrunk in accordance to the amount of space reclaimed inside of it.
4. If EnableBlockDelete is enabled, UNMAP will then be issued to the VMFS volume for the space that previously
Prior to ESXi 6.0, the parameter EnableBlockDelete was a defunct option that was previously only functional in very
early versions of ESXi 5.0 to enable or disable automated VMFS UNMAP. This option is now functional in ESXi 6.0 and
has been re-purposed to allow in-guest UNMAP to be translated down to the VMFS and accordingly the SCSI volume.
By default, EnableBlockDelete is disabled and can be enabled via the vSphere Web Client or CLI utilities.
In-guest UNMAP support does actually not require this parameter to be enabled though. Enabling this parameter allows
for end-to-end UNMAP or in other words, in-guest UNMAP commands to be passed down to the VMFS layer. For this
reason, enabling this option is a best practice for ESXi 6.x and later.
Enable the option “VMFS3.EnableBlockDelete” on ESXi 6.x hosts. This is disabled by default.
For more information on EnableBlockDelete and VMFS-6, refer to this blog post:
https://www.codyhosterman.com/2017/08/in-guest-unmap-enableblockdelete-and-vmfs-6/
ESXi 6.5 expands support for in-guest UNMAP to additional guests types. ESXi 6.0 in-guest UNMAP only is supported
with Windows Server 2012 R2 (or Windows 8) and later. ESXi 6.5 introduces support for Linux operating systems. The
underlying reason for this is that ESXi 6.0 and earlier only supported SCSI version 2. Windows uses SCSI-2 UNMAP
and therefore could take advantage of this feature set. Linux, uses SCSI version 5 and could not. In ESXi 6.5, VMware
enhanced their SCSI support to go up to SCSI-6, which allows guest like Linux to issue commands that they could not
before.
Using the built-in Linux tool, sq_inq, you can see, through an excerpt of the response, the SCSI support difference
between the ESXi versions:
You can note the differences in SCSI support level and also the product revision of the virtual disk themselves (version 1
to 2).
The following are the requirements for in-guest UNMAP to properly function:
1. The target virtual disk must be a thin virtual disk. Thick-type virtual disks do not support UNMAP.
2. For Windows In-Guest UNMAP:
1. ESXi 6.0 and later
2. VM Hardware version 11 and later
3. For Linux In-Guest UNMAP:
1. ESXi 6.5 and later
2. VM Hardware version 13 and later
4. If Change Block Tracking (CBT) is enabled for a virtual disk, In-Guest UNMAP for that virtual disk is only supported
starting with ESXi 6.5
https://kb.vmware.com/kb/2148989
Prior to this, any UNMAP requests that were even partially misaligned would fail entirely. Leading to no reclamation. In
ESXi 6.5 P1, any portion of UNMAP requests that are aligned will be accepted and passed along to the underlying array.
Misaligned portions will be accepted but not passed down. Instead, the affected blocks referred to by the misaligned
UNMAPs will be instead zeroed out with WRITE SAME. The benefit of this behavior on the FlashArray, is that zeroing is
identical in behavior to UNMAP so all of the space will be reclaimed regardless of misalignment.
BEST PRACTICE: Apply ESXi 6.5 Patch Release ESXi650-201703001 (2148989) as soon as possible to be
able to take full advantage of in-guest UNMAP.
NTFS supports automatic UNMAP by default—this means (assuming the underlying storage supports it) Windows will
issue UNMAP to the blocks a file used to consume immediately once it has been deleted or moved.
If DisableDeleteNotify is set to 0, UNMAP is ENABLED. Setting it to 1, DISABLES it. Pure Storage recommends this
value remain enabled. To change it, use the following command:
Windows also supports manual UNMAP, which can be run on-demand or per a schedule. This is performed using the
Disk Optimizer tool. Thin virtual disks can be identified in the tool as volume media types of “thin provisioned
drive”—these are the volumes that support UNMAP.
Ordinarily, this would work with the default configuration of NTFS, but VMware enforces additional UNMAP alignment,
that requires a non-default NTFS configuration. In order to enable in-guest UNMAP in Windows for a given NTFS, that
NTFS must be formatted using a 32 or 64K allocation unit size. This will force far more Windows UNMAP operations to
be aligned with VMware requirements.
BEST PRACTICE: Use the 32 or 64K Allocation Unit Size for NTFS to enable automatic UNMAP in a
Windows virtual machine.
Due to alignment issues, the manual UNMAP tool (Disk Optimizer) is not particularly effective as often most UNMAPs
are misaligned and will fail.
As of ESXi 6.5 Patch 1, all NTFS allocation unit sizes will work with in-guest UNMAP. So at this ESXi level no unit size
change is required to enable this functionality. That being said, there is additional benefit to using a 32 or 64 K allocation
unit. While all sizes will allow all space to be reclaimed on the FlashArray, a 32 or 64 K allocation unit will cause more
UNMAP requests to be aligned and therefore more of the underlying virtual disk will be returned to the VMFS (more of it
will be shrunk).
The manual tool, Disk Optimizer, now works quite well and can be used. If UNMAP is disabled in Windows (it is enabled
by default) this tool can be used to reclaim space on-demand or via a schedule. If automatic UNMAP is enabled, there is
generally no need to use this tool.
For more information on this, please read the following blog post:
http://www.codyhosterman.com/2017/03/in-guest-unmap-fix-in-esxi-6-5-patch-1/
Linux file systems do not support automatic UNMAP by default—this behavior needs to be enabled during the mount
operation of the file system. This is achieved by mounting the file system with the “discard” option.
When mounted with the discard option, Linux will issue UNMAP to the blocks a file used to consume immediately once it
has been deleted or moved.
Pure Storage does not require this feature to be enabled, but generally recommends doing so to keep capacity
information correct throughout the storage stack.
BEST PRACTICE: Mount Linux filesystems with the “discard” option to enable in-guest UNMAP for Linux-
based virtual machines.
In ESXi 6.5, automatic UNMAP is supported and is able to reclaim most of the identified dead space. In general, Linux
aligns most UNMAP requests in automatic UNMAP and therefore is quite effective in reclaiming space.
The manual method fstrim, does align initial UNMAP requests and therefore entirely fails.
In ESXi 6.5 Patch 1 and later, automatic UNMAP is even more effective, now that even the small number of misaligned
UNMAPs are handled. Furthermore, the manual method via fstrim works as well. So in this ESXi version, either method
is a valid option.
A logical block on a FlashArray volume does not refer directly to a physical location on flash. Instead, if there is data
written to that block, there is just a reference to a metadata pointer. That pointer then refers to a physical location. If
UNMAP is executed against that block, only the metadata pointer is guaranteed to be removed. The physical data will
remain if it is deduplicated, meaning other blocks (anywhere else on the array) have metadata pointers to that data too.
A physical block is only reclaimed once the last pointer on your array to that data is removed. Therefore, UNMAP only
directly removes metadata pointers. The reclamation of physical capacity is only a possible consequential result of
UNMAP.
Herein lies the importance of UNMAP—making sure the metadata tables of the FlashArray are accurate. This allows
space to be reclaimed as soon as possible. Generally, some physical space will be immediately returned upon
reclamation, as not everything is dedupable. In the end, the amount of reclaimed space heavily relies on how dedupable
the data set is—the higher the dedupability, the lower the likelihood, and amount, and immediacy of physical space
being reclaimed. The fact to remember is that UNMAP is important for the long-term “health” of space reporting and
usage on the array.
In addition to using the Pure Storage vSphere Web Client Plugin, standard provisioning methods through the FlashArray
GUI or FlashArray CLI can be utilized. This section highlights the end-to-end provisioning of storage volumes on the
Pure Storage FlashArray from creation of a volume to formatting it on an ESXi host. The management simplicity is one
of the guiding principles of FlashArray as just a few clicks are required to configure and provision storage to the server.
1. Benefits of growing and shrinking dynamically, this prevents VMDK bloat as desktops rewrite data and
delete data.
2. Available for Horizon View Composer based linked clone desktops (Not for persistent desktops) only
1. The View storage accelerator, VSA, is a feature in VMware View 5.1 onwards based on VMware vSphere
content based read caching (CBRC). There are several advantages of enabling VSA including containing
boot storms by utilizing the host side caching of commonly used blocks. It even helps in steady state
performance of desktops that use the same applications. As Pure Storage FlashArray gives you lots of
IOPS at very low latency, we don’t need the extra layer of caching at the host level. The biggest
disadvantage is the time it takes to recompose and refresh desktops, as every time you change the
image file it has to rebuild the disk digest file. Also it consumes host side memory for caching and
consume host CPU for building digest files. For shorter desktop recompose times, we recommend turning
off VSA.
4. Tune maximum concurrent vCenter operations—the default concurrent vCenter operations on the vCenter
servers are defined in the View configuration’s advanced vCenter settings. These values are quite conservative
and can be increased to higher values. Pure Storage FlashArray can withstand more operations including:
The higher values will drastically cut down the amount of time needed to accomplish typical View Administrative tasks
such as recomposing or creating a new pool.
1. These settings are global and will affect all pools. Pools on other slower disk arrays will suffer if you set these
References
1. Interpreting esxtop statistics - http://communities.vmware.com/docs/DOC-11812
2. Configuration maximums in vSphere 5.0 - http://www.vmware.com/pdf/vsphere5/r50/vsphere-50-configuration-
maximums.pdf
3. Configuration maximums in vSphere 5.1 - http://www.vmware.com/pdf/vsphere5/r51/vsphere-51-configuration-
maximums.pdf
4. Configuration maximums in vSphere 5.5 - http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-
maximums.pdf
5. Configuration maximums in vSphere 6.0 - https://www.vmware.com/pdf/vsphere6/r60/vsphere-60-configuration-
maximums.pdf
6. Configuration maximums in vSphere 6.5 - https://www.vmware.com/pdf/vsphere6/r65/vsphere-65-configuration-
maximums.pdf
7. Configuration maximums in vSphere 6.7