h17386 XC Horizon Validation Guide
h17386 XC Horizon Validation Guide
h17386 XC Horizon Validation Guide
Validation Guide
Abstract
This validation guide describes the architecture and performance of the integration of
VMware Horizon components for virtual desktop infrastructure (VDI) on Dell EMC XC
Family devices.
Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.” DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH
RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED IN THIS PUBLICATION REQUIRES AN
APPLICABLE SOFTWARE LICENSE.
Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their
respective owners. Published in the USA.
Dell EMC
Hopkinton, Massachusetts 01748-9103
1-508-435-1000 In North America 1-866-464-7381
www.DellEMC.com
Chapter 1 Introduction 5
Executive summary.............................................................................................6
Document purpose.............................................................................................. 7
Audience............................................................................................................. 7
We value your feedback...................................................................................... 7
Chapter 4 Conclusion 59
Test results and density recommendations....................................................... 60
Summary.......................................................................................................... 60
Chapter 5 References 63
Dell EMC documentation.................................................................................. 64
VMware documentation....................................................................................64
NVIDIA documentation......................................................................................64
l Executive summary................................................................................................................. 6
l Document purpose.................................................................................................................. 7
l Audience..................................................................................................................................7
l We value your feedback.......................................................................................................... 7
Executive summary
Virtual desktop infrastructure (VDI) plays a crucial role in today's business transformation
initiatives. VDI is the most efficient way to present Microsoft Windows applications to users in
their digital workspaces and provides a consistent user experience across devices for the modern-
day mobile workforce. Organizations increasingly rely on VDI to provide the agility, security, and
centralized management that is so important for their workforce.
It is often challenging for organizations to set up a VDI infrastructure. This challenge is mainly
because a typical VDI infrastructure involves the integration of multiple data center components
such as storage, network, and compute. The multivendor profile of these components often
creates challenges during deployment and can also affect the system's performance if it is not
optimized for VDI.
To consistently maintain a multicomponent and multivendor environment with a specialized skill set
is challenging for most organizations. The effort to maintain a stable VDI infrastructure can have a
negative impact on your total cost of ownership (TCO).
Dell EMC Ready Architectures for VDI based on Dell EMC XC series appliances is a perfect
solution for your VDI workloads. These hyperconverged appliances integrate Dell EMC PowerEdge
servers, Nutanix software, and a choice of hypervisors to run any virtualized workload you choose.
You can deploy an XC cluster in 30 minutes and manage it without specialized IT resources. XC
Series solutions eliminate the need for over-provisioning and capital expenditures that are based
on anticipated capacity and performance requirements.
System performance and capacity can be easily expanded one node at a time with zero downtime,
offering customers linear and predictable scale-out expansion and pay-as-you-grow flexibility. A
fault-tolerant architecture and self-healing capabilities provide system reliability and help ensure
data integrity. You will have an enterprise-level infrastructure with rapid deployment, less time
needed for routine management tasks, faster system restoration, and integrated enterprise class
data protection. Moreover, Dell EMC's Global Service and Support organization fully supports all
XC Series hardware, software, and deployments.
For customers who have chosen a Nutanix-based environment, Dell EMC recommends XC Family
devices that are optimized for VDI workloads to run VMware Horizon 7 VDI infrastructure. VMware
Horizon 7 has a streamlined approach to delivering and managing virtual desktops and applications,
providing a consistent user experience across devices and locations while keeping corporate data
secure and compliant. XC series appliances—the XC740xd-24 (2U) and the XC640-10 (1U)—are
designed for compute- and performance-intensive workloads in VDI. XC740xd-24 devices also
support GPU hardware for graphics-intensive desktop deployments.
The Dell EMC Ready Architectures for VDI team tests VDI solutions to ensure their validity. As part
of the testing process, engineers tune the system to maximize performance and efficiency, and
document best practices. Finally, a separate team of experts evaluates the test results to ensure
that the systems are properly configured and sized for customers. In the validation effort
described in this guide, we have used the Login VSI tool, which is an industry standard tool for
benchmarking VDI workloads. We tested typical Login VSI workloads such as Task worker,
Knowledge worker, Power worker, and Multimedia worker with each workload being accompanied
by an appropriate desktop virtual machine (VM) profile (that is, vCPU count, memory configured
on the VM and so on.). This document provides a detailed analysis that is based on those test
results. We recommend user density figures for these workloads while giving the utmost
importance to the user experience.
Document purpose
This validation guide details the architecture, components, testing methods, and test results for
Dell EMC XC Family devices with VMware Horizon 7. It includes the test environment
configuration and best practices for systems that have undergone testing.
Audience
This guide is intended for architects, developers, and technical administrators of IT environments.
It provides an in-depth explanation of the testing methodology and basis for VDI densities. It also
validates the value of the Dell EMC Ready Architectures for VDI that deliver Microsoft Windows
virtual desktops to users of VMware Horizon 7 VDI components on XC Family devices.
Enterprise hardware
We used a 3-node cluster of Dell EMC XC Family XC740xd-24 devices with the component
configuration that is listed in the following table. We called this configuration "Density Optimized."
It comes with 2nd Generation Intel Xeon Scalable processors, code named Cascade Lake.
We used Dell EMC XC740xd-24 appliances to deliver performance while driving savings on power,
cooling, and data center space. These XC devices are designed to handle VDI workloads and
reduce Operational Expenditure (OPEX).
XC740xd-24 2 x 6248 Intel 768 GB @ HBA 330 ADP l 2 x 240 GB M.2 2 x Mellanox
Xeon Gold (20- 2933 MT/s Connect X-4 LX 25
core, 2.6 GHz)
l 2 x 960 GB SSD GbE SFP Rack
l 6 x 1.2 TB HDD NDC
Dell EMC XC740xd-24 devices are one of the most versatile and scalable hyperconverged
infrastructure platforms. They are purpose-built for performance-intensive VDI workloads and you
can use them to scale incrementally to match VDI requirements in a pay-as-you-grow model.
The Dell EMC XC740xd-24 device is a 2U platform that can be configured with 24 x 2.5-inch disks
to serve a broad range of capacity requirements. Each one comes equipped with dual CPUs, 10 to
28 cores, and up to 1.5 TB of high performance RAM. They are VDI optimized and support GPU
hardware for graphics-intensive desktop deployments. The XC740xd-24 can be configured with or
without GPUs.
Storage hardware
We used the following storage configuration for different storage tiers.
Storage hardware used per cluster node:
l 2 x Boot Optimized Storage Solution (BOSS) M.2 SATA Device for the host OS
l 2 x 960 GB SATA SSD for the Performance tier
l 6 x 1.2 TB NL SAS for the Capacity tier
The M.2-based BOSS module boots the hypervisor and the Nutanix Controller VM (CVM). PERC
HBA330 connects the CVM to the SSDs and HDDs. All HDD and SSD disks are presented to the
Nutanix CVM running locally on each host, which contributes to the clustered DSF storage pool.
Two SSDs were used for the Performance tier (Tier1) and six HDDs were used for the Capacity
tier (Tier2), per node.
Graphics hardware
We used NVIDIA Tesla T4 GPU hardware in our tests for graphics-intensive workloads. The T4 is a
single-slot 6-inch PCI Express Gen3 graphics card featuring a single high-end NVIDIA T4 GPU.
NVIDIA's newest architecture is available in the T4 GPU, which is considered the universal GPU for
data center workflows. The GPU architecture supports GDDR6 memory, which provides improved
performance and power efficiency when compared to the previous generation, GDDR5.
The T4 provides power savings by requiring lower power (70 watts) and it does not require any
supplemental power connector. It also uses the NVIDIA Turing architecture, which includes Tensor
Cores for accelerating deep learning inference workflows and provides up to 40X higher
performance compared to CPUs, with 60 percent of the power consumption. Add up to six GPU
cards to your Dell EMC XC740xd-24 device to enable up 96 GB of video buffer. In modernized data
centers, use this card during off-peak hours to perform your inferencing workloads.
Network hardware
We used the following network hardware in our test environment:
l Dell Networking S4048 (10 GbE ToR switch)—A high-density, ultralow-latency ToR switch
that features 48 x 10 GbE SFP+, 6 x 40 GbE ports and up to 720 Gbps switch fabric capacity.
l Dell Networking S5248 (25 GbE ToR switch)—A high-density, high performance, open
networking ToR switch that features 48 x 25 GbE SFP28, 4 x 100 GbE QFSP28 ports, 2 x 100
GbE QFSP28-DD ports and up to 2.0 TB/s switch fabric capacity.
Component Description/Version
Management VM operating Microsoft Windows Server 2016 (Connection Server and DB)
system
Table 4 Sizing for XC Family devices, Remote Desktop Session Host (RDSH), and NVIDIA GRID
license server (optional)
Nutanix CVM 12 32 2 0
RDSH VM 8 32 1 80
We installed the GRID License Server software on a system running a Windows 2016 operating
system to test vGPU configurations.
We made the following changes to the GRID License Server to address licensing requirements:
l Used a reserved fixed IP address
l Configured a single MAC address
l Applied time synchronization to all hosts on the same network
DNS
DNS is the basis for Microsoft Active Directory and also controls access to various software
components for VMware services. All hosts, VMs, and consumable software components must
have a presence in DNS. We used a dynamic namespace integrated with Active Directory and
adhered to Microsoft best practices.
High availability
Although we did not enable high availability (HA) during the validation that is documented in this
guide, we strongly recommend that HA be factored into any VDI design and deployment. This
process follows the N+1 model with redundancy at both the hardware and software layers. The
design guide for this architecture provides additional recommendations for HA and is available at
the VDI Info Hub for Ready Solutions.
l Testing process......................................................................................................................18
l Login VSI test analysis and results.........................................................................................23
Testing process
To ensure good EUE and cost-per-user, we conducted PAAC testing on this solution using Login
VSI, a load-generation tool that monitors both hardware resource utilization parameters and EUE
during load-testing.
For each user scenario, we ran the tests four times, once to validate data capture and three times
to collect metrics and analyze variance.
Our EUE validation consisted of logging into a session while the system was under a load created
by the VSI Login tool and completing tasks from the workload definition. While this test is
subjective, it helps to provide a better understanding of the EUE in the desktop sessions,
particularly under high load. It also helps to ensure reliable data gathering.
Resource monitoring
To ensure that the user experience was not compromised, we monitored the following important
resources:
l Compute host server resources—VMware vCenter (for solutions based on VMware
vSphere) or Microsoft Performance Monitor (for solutions based on Hyper-V) gather key data
(CPU, memory, disk, and network usage) from each of the compute hosts during each test
run. This data was collected for each host and consolidated for reporting. We do not report any
metrics for the management host servers. However, they were monitored manually during
testing to ensure that no bottlenecks impacted the test.
l Utilization thresholds—Resource overutilization can cause poor EUE. We monitored the
relevant resource utilization parameters and compared them to relatively conservative
thresholds. The thresholds were selected based on industry best practices and our experience
to provide an optimal trade-off between good EUE and cost-per-user while also allowing
sufficient burst capacity for seasonal or intermittent spikes in demand.
Table 5 Parameter pass/fail thresholds
a. The Ready Solutions for VDI team recommends that steady-state average CPU utilization
across the three hosts in a cluster not exceed 85 percent in a production environment.
Average CPU utilization sometimes exceeds our recommended percentage. Because of the
nature of automated testing tools like Login VSI, a 5 percent margin of error was accepted
and it does not impact our sizing guidance.
Load generation
Login VSI from Login VSI, Inc. is the industry-standard tool for testing VDI environments and
RDSH environments.
Login VSI installs a standard collection of desktop application software (including Microsoft Office
and Adobe Acrobat Reader) on each VDI desktop testing instance. It then uses a configurable
launcher system to connect a specified number of simulated users to available desktops within the
environment. When the simulated user is connected, a logon script configures the user
environment and starts a defined workload. Each launcher system can launch connections to
several VDI desktops (target machines). A centralized management console configures and
manages the launchers and the Login VSI environment.
We used the following login and boot conditions:
l For most of our tests, new user sessions were logged in at a steady rate over a 1-hour period.
During tests of low-density solutions such as GPU and graphic-based configurations, users
were logged in every 10 seconds.
l All desktops were started before users logged in.
l All desktops ran an industry-standard anti-virus solution. Windows 10 machines used Windows
Defender.
The following table summarizes the Login VSI workloads that were tested in this validation effort.
For more information, see Login VSI.
Login VSI Designed for virtual machines with 2 vCPUs. This workload includes the following activities:
Knowledge worker
l Outlook—Browse messages.
l Internet Explorer—Browse websites and open a YouTube style video (480p movie trailer)
three times in every loop.
l Word—Start one instance to measure response time and another to review and edit a
document.
l Doro PDF Printer and Acrobat Reader—Print a Word document and export it to PDF.
l Excel—Open a large randomized sheet.
l PowerPoint—Review and edit a presentation.
l FreeMind—Run a Java-based Mind Mapping application.
l Other—Perform various copy and zip actions.
Login VSI Power The most intensive of the standard Login VSI workloads. The following activities are
worker performed with this workload:
l Begin by opening four instances of Internet Explorer and two instances of Adobe Reader
that remain open throughout the workload.
l Perform more PDF printer actions than in the other workloads.
l Watch a 720p and a 1080p video.
l Reduce the idle time to two minutes.
l Perform various copy and zip actions.
Login VSI A workload that is designed to heavily stress the CPU when using software graphics
Multimedia worker acceleration. GPU-accelerated computing offloads the most compute-intensive sections of
(Graphics an application to the GPU while the CPU processes the remaining code. This modified
performance workload uses the following applications for its GPU/CPU-intensive operations:
configuration)
l Adobe Acrobat
l Google Chrome
l Google Earth
l Microsoft Excel
l HTML5 3D spinning balls
l Internet Explorer
l MP3
l Microsoft Outlook
l Microsoft PowerPoint
l Microsoft Word
l Streaming video
Login VSI Task The least intensive of the standard workloads. It runs fewer applications and starts and stops
Worker them less frequently than the other workloads, resulting in lower CPU, RAM and I/O usage.
The Task Worker workload uses the following applications:
l Adobe Reader
l Microsoft Excel
l Internet Explorer
l Microsoft Outlook
l Microsoft Word
l Print and zip actions using Notepad and 7-zip
l For instant clones, the VMs are rebooted after the session ends because when a user logs out
of the instant clone, the clone is destroyed and re-created for the next user. CPU utilization
gradually increases during the login phase when users start logging in and then remains
constant during the steady state phase when logins have been completed, as shown in the
following figure. When the steady state period is over and users start to log off, CPU utilization
again decreases and drops to near zero when all users have logged off. After user logoff, the
instant clone pool is re-created. During this phase, there is a CPU spike which then drops to
near zero when pool re-creation is complete.
XC740xd-24 cluster using the Density Optimized configuration, details of which are described in
the following table.
We deployed Horizon and vSphere management roles within the cluster on a single host that also
hosted desktops. This optional configuration is beneficial for Proof of Concepts (POCs) or small
deployments looking to maximize user density.
We allocated 12 vCPUs and 32 GB of memory to the Nutanix CVM when we configured the
Nutanix cluster. A Nutanix CVM runs on each node of the Nutanix cluster, enabling the pooling of
local storage from all nodes in the cluster.
Density Optimized Knowledge worker Login VSI PCOIP 135 85.96% 170 GB 2.65
Knowledge
worker
Density Optimized Power worker Login VSI PCOIP 106 86% 206.74 3.07
Power worker GB
Density Optimized Graphics Power Login VSI Blast 96 95.57% 425 GB 2.75
+ 6x NVIDIA Tesla worker (Virtual PC: Power Worker Extreme
T4 T4:1B)
Density Optimized RDSH Task worker Login VSI Blast 233 88.87% 129 GB 0.23
Task worker Extreme (Horizon
Apps
RDSH/
Publishe
d
Desktop)
Knowledge Worker, 135 users per host, ESXi 6.7, Horizon 7.7
The following metrics were collected and analyzed for this test case.
CPU
The graph shows the performance data for 405 user sessions across three compute hosts when
tested with a Login VSI knowledge worker workload. Each compute host had approximately 135
virtual desktops. We used the PC-over-IP (PCOIP) display protocol for this knowledge worker
testing.
During the login phase, CPU utilization increased steadily until all logins were complete. Compute
host C recorded a peak CPU utilization of 98.21 percent during the login phase. During the steady
state phase, the CPU utilization reached a steady state average of 85.96 percent across all three
compute hosts. This value is close to the pass/fail threshold we set for average CPU utilization
(see Table 5). However, it did not exceed the threshold limit. To maintain a good EUE, it is
essential that this threshold is not exceeded. You can load more user sessions while exceeding this
threshold but this might result in a degradation in user experience.
As shown in the following figure, CPU utilization started decreasing after the steady state phase
when users began logging out of sessions. CPU utilization reached near zero when all users had
logged out. CPU utilization spiked during the instant clone re-creation phase after user log out.
CPU utilization on compute B reached a peak of 91.4 percent during this phase.
Memory
As shown in the following figure, an average consumed memory of 561 GB was recorded before
the testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the login phase. During the
steady state phase, consumed memory reached an average of 582 GB across the three hosts.
With a total memory of 768 GB available per compute host, memory was not a constraint during
the testing.
Figure 5 Consumed memory utilization on three hosts in the cluster
Active memory usage increased steadily during the login phase. Each host occupied about 58 GB
of active memory during the start of the test. This includes memory used by desktop VMs that
were powered on before the test and the overhead memory used by the hypervisor. During the
steady state phase, active memory remained almost constant and an average steady active
memory of 170 GB was recorded. This implies that during the steady state phase, memory was not
a concern and there was enough memory available in the ESXi host cluster to meet requirements.
Active memory utilization reduced to a minimum when users logged out of their sessions. During
the re-creation of instant clones, a peak average active memory of 502.7 GB was recorded across
all three hosts. This peak in active memory usage is expected during the instant clone re-creation
process as all VMs that were destroyed after user log out must be re-created—this is a memory
intensive task. No memory ballooning or swapping occurred on any of the hosts during the testing
process, indicating no memory constraints in the cluster.
Figure 6 Active memory utilization on three hosts in the cluster
Network usage
Network bandwidth was not an issue during testing. An average network usage of 841.48 Mbps
was recorded across the three compute hosts during the steady state operations. The busiest
period for network usage was during the re-creating of instant clones after user log out. Compute
B recorded a peak network usage of 2946 Mbps during the re-creation of instant clones. With 2 x
25 GbE NICs in an active/active team available as an uplink for hosts, network bandwidth usage
was well below the 85 percent threshold set for network throughput.
IOPS
Cluster IOPS reached a peak of 7,956 during the instant clone re-creation process. Average
cluster disk IOPS during the steady state phase was 1073. Based on these numbers, the average
disk IOPS per session during this phase was 2.65. You can select your disk specifications in
accordance with this IOPS figure in your sizing exercise. As shown in Figure 10, I/O latency during
the steady state phase was 0.37 ms for the steady state IOPS requirement. The low latency figure
indicates that storage was not a bottleneck during steady state operations.
Figure 8 Cluster IOPS utilization
Cluster controller IOPS reached a peak of 40306 during the re-creation of instant clones. Cluster
controller IOPS recorded a steady state average of 4495. These are metrics taken directly from
the Nutanix Controller VMs and give an indication of backend operations taking place in the
storage system. The IOPS metric also includes cache hits served by the memory.
Figure 9 Cluster controller IOPS utilization
We also noted that there were no failed sessions during testing, which indicates that the login and
log out processes were smooth. When manually interacting with the sessions during the steady
state phase, the mouse and window movement was responsive and video playback was good.
Moreover, all parameters we monitored were within the pass/fail threshold described in Table 5.
This indicates that there were no resource constraints on the system and system performance was
good.
The following table explains the Login VSI metrics.
The following table shows the Login VSI score summary for the knowledge worker workload.
Table 12 Login VSI score summary for the knowledge worker workload
Power Worker, 106 users per host, ESXi 6.7, Horizon 7.7
The following metrics were collected and analyzed for this test case.
CPU usage
The following graph shows the performance data for 318 user sessions across three compute hosts
when tested with a power worker workload. Each compute host had 106 virtual desktops. We used
the PC-over-IP (PCOIP) display protocol for this power worker testing.
During the login phase, CPU utilization increased steadily until all logins were complete. During the
steady state phase, the CPU utilization reached a steady state average of 86 percent across all
three compute hosts. This value is close to the pass/fail threshold we set for average CPU
utilization.
However, we found that it did not exceed the threshold limits we set, which includes a 5 percent
margin (see Table 5). To maintain a good EUE, it is essential that this threshold is not exceeded.
You can load more user sessions while exceeding this threshold for CPU but this might result in a
degradation in user experience. As shown in the following figure, CPU utilization started
decreasing after the steady state phase when users started logging out of sessions.
CPU utilization reached near zero when all users had been logged out. CPU utilization spiked
during the instant clone re-creation phase after user log out. CPU utilization reached almost 100
percent on all hosts during the re-creation of instant clones. Because the turbo feature is enabled
on CPUs, this spike is not considered to be an issue. During the re-create clones phase all VMs
that are logged out of are deleted and re-created again. Instant clones are created by forking a
parent VM and the process involves allocating new resources for the child VM. This activity is
resource-intensive.
Figure 12 CPU utilization on three hosts in the cluster
Memory
As shown in the following figure, an average consumed memory of 387.35 GB was recorded before
the testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the login phase. During the
steady state phase, consumed memory reached an average of 738.54 GB across the three hosts.
With a total memory of 768 GB available per compute host, memory was not a constraint during
the testing.
Figure 13 Consumed memory utilization on three hosts in the cluster
Active memory usage increased steadily during the login phase. Each host occupied approximately
77 GB of active memory during the start of the test. This includes memory used by desktop VMs,
which were powered on before the test and the overhead memory used by the hypervisor. During
the steady state phase, active memory remained almost constant and an average steady active
memory of 206.74 GB was recorded. This indicates that memory was not a concern during the
steady state phase and there was enough memory available in the ESXi host cluster to meet
requirements. Active memory utilization was reduced to a minimum when users logged out of their
sessions. During the re-creation of instant clones, a peak average active memory of 802.71 GB was
recorded across all three hosts. This peak in active memory usage is expected during the instant
clone re-creation process. During this process, all VMs that had been destroyed after users logged
off have to be re-created, which is a memory intensive task. No memory ballooning or swapping
occurred on any of the hosts during the testing process, indicating no memory constraints in the
cluster.
Figure 14 Active memory utilization on three hosts in the cluster
Network usage
Network bandwidth was not an issue during testing. An average network usage of 1031.61 Mbps
was recorded across the three compute hosts during the steady state operations. The busiest
period for network usage was during the recreating of instant clones after users logged out.
Compute A recorded a peak network usage of 3668.65 Mbps during the re-creation of instant
clones. With two 25 GbE NICs in an active/active team available as an uplink for hosts, network
bandwidth usage was well under the 85 percent threshold set for network throughput.
IOPS
Cluster IOPS reached a peak of 12,522 during the instant clone re-creation process. Average
cluster disk IOPS during the steady state phase was 976. Based on these numbers, the average
disk IOPS per session during the steady state phase was 3.07. You can select your disk
specifications in accordance with this IOPS figure in your sizing exercise. As shown in the following
figure, I/O latency during the steady state phase was 0.37 ms for the steady state IOPS
requirement. The low latency figure indicates that during steady state operations, storage
resources were not a bottleneck.
Figure 16 Cluster IOPS utilization
Cluster controller IOPS reached a peak of 38409 during the re-creation of instant clones. Cluster
controller IOPS recorded a steady state average of 4490. These metrics collected from the
Nutanix Controller VMs give an indication of backend operations taking place in the storage
system. This IOPS metric also includes cache hits served by the memory.
Figure 17 Cluster controller IOPS utilization
We also noted that there was only one failed session during testing, which indicates that the login
and log out processes were smooth. When manually interacting with the sessions during the
steady state phase, the mouse and window movement was responsive and video playback was
good. Moreover, all parameters we monitored were within the pass/fail threshold set as shown in
the following table. This indicates that there were no resource constraints on the system and the
system performance was good.
The following table shows the Login VSI score summary for the power worker workload.
Table 13 Login VSI score summary for the power worker workload
Graphics Multimedia worker, 48 vGPU users per host, ESXi 6.7, Horizon 7.7
In this multimedia workload test, one of the nodes in the cluster was enabled with 48 vGPU profiles
and loaded with sessions. The other two nodes in the cluster did not host any compute VMs. We
used the VMware Horizon Blast Extreme protocol for this graphics multimedia worker testing. The
following metrics were collected and analyzed for this test case.
CPU usage
The GPU-enabled compute host in the cluster was populated with 48 vGPU-enabled VMs and used
the NVIDIA Tesla T4-1B profile. With all user VMs powered on before starting the test, the CPU
usage was approximately 10 percent on the GPU-enabled compute host.
The following figure shows the CPU utilization metric data for 48 user sessions on the GPU-
enabled compute host. During the login phase, CPU utilization increased steadily until all logins
were complete. During the steady state phase, the CPU utilization reached a steady state average
of 84.92 percent across all three compute hosts. This value is close to the pass/fail threshold we
set for average CPU utilization (see Table 5). However, it did not exceed the threshold limits we
set. To maintain a good EUE it is essential that this threshold is not exceeded. You can load more
user sessions while exceeding this threshold for CPU utilization but might experience a
degradation in user experience. As shown in Figure 20, CPU utilization started decreasing after the
steady state phase when users started logging out of sessions. CPU utilization reached near zero
when all users had logged out. There was no spike in CPU utilization during the re-creation of
instant clones.
User density is also limited by the frame-buffer of GPUs. Forty-eight users with 2 GB vGPU frame
buffer profiles occupy the total 96 GB frame buffer that is provided by six NVIDIA Tesla T4 GPUs
on a server node.
Figure 20 CPU utilization on GPU host
GPU usage
We gathered the GPU metrics from the vSphere Web Client. Six NVIDIA Tesla T4 GPUs were
configured on the GPU-enabled host. The GPU usage during the steady state phase across the six
GPUs averaged approximately 31.97 percent. The GPU A had a spike of 54.66 percent of CPU
utilization. GPUs were used for executing graphics-intensive tasks, thus taking load off CPUs and
providing a better user experience.
Memory
As shown in the following figure, an active memory of 436 GB was recorded before the test
started. This was because all VMs were already powered on before user sessions were loaded.
Active memory remained constant during the login phase. With GPU enabled in the host we noted
active memory usage during the start of the test increased when compared to a test with a non-
GPU host. During the steady state phase, active memory remained almost constant and recorded
an average steady state active memory of 428.24 GB. This indicates that memory was not a
concern during the steady state phase and there was enough memory available in the ESXi host
cluster to meet requirements. Active memory utilization was reduced when users logged off from
their sessions. During the re-creation of instant clones, memory again remained constant around
428 GB. No memory ballooning or swapping occurred on any of the hosts during the testing
process, indicating no memory constraints in the cluster.
As shown in the following figure, a consumed memory of 453.86 GB was recorded before the
testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the login phase. During the
steady state phase, consumed memory reached an average of 445.82 GB on the GPU-enabled
host. With a total memory of 768 GB available per compute host, memory was not a constraint
during the testing.
Figure 23 Consumed memory utilization on GPU host
Network usage
Network bandwidth was not an issue during testing. The GPU host recorded an average network
usage of 981.23 Mbps. The busiest period for network usage was during the logging out phase. The
host recorded peak network usage of 1335.3 Mbps. With two 25 GbE NICs in an active/active
team available as an uplink for hosts, network bandwidth usage was well under the 85 percent
threshold set for network throughput.
Figure 24 Network bandwidth usage on GPU host
IOPS
The cluster reached a maximum of 7,504 disk IOPS during the logging out phase and averaged
227.31 IOPS during the steady state phase. Based on these numbers, each user session generated
4.73 disk IOPS during the steady state phase. You can select your disk specifications in
accordance with this IOPS figure in your sizing exercise. As shown in Figure 28, I/O latency during
the steady state phase was 0.45 ms for the steady state IOPS requirement. The low latency figure
indicates that storage resources were not a bottleneck during steady state operations.
Figure 25 Cluster IOPS utilization
Cluster controller IOPS reached a peak of 21,192 disk IOPS during the logging out phase. Cluster
controller IOPS recorded a steady state average of 6,480.78. These are metrics gathered directly
from Nutanix Controller VMs and give an indication of backend operations taking place in the
storage system. These IOPS metrics also include cache hits served by the memory.
Figure 26 Cluster Controller IOPS utilization
The GPU host latency reached a maximum of 1.57 ms during the boot storm and averaged 0.42 ms
during the steady state phase.
We also noted that there were no failed sessions during testing. This indicates that the login and
logging out processes were smooth. When manually interacting with the sessions during the
steady state phase, the mouse and window movement was responsive and video playback was
good. Moreover, all the parameters we monitored were within the pass/fail threshold set out in
Table 5. This indicates that there were no resource constraints on the system and system
performance was good.
The following table shows the Login VSI score summary for the graphics multimedia worker
workload.
Table 14 Login VSI score summary for the graphics multimedia worker workload
Graphics Power Worker, 96 vGPU users per host, ESXi 6.7, Horizon 7.7
In this graphics power worker test, one of the nodes in the cluster was GPU enabled. The host was
configured with 96 vGPU profiles and loaded with sessions. The other two nodes in the cluster did
not host any compute VMs. We used the VMware Horizon Blast Extreme protocol for the testing.
The following metrics were collected and analyzed.
CPU usage
The GPU-enabled compute host in the cluster was populated with 96 vGPU-enabled VMs and used
the NVIDIA Tesla T4-1B profile. With all user VMs powered on before starting the test, the CPU
usage was approximately 15 percent on the GPU-enabled compute host.
The following figure shows the CPU utilization metric data for 96 user sessions on the GPU-
enabled compute host in the cluster. During the login phase, CPU utilization increased steadily until
all logins were complete. The CPU reached a steady state average of 95.57 percent during the test
cycle when all users were logged in to the GPU-enabled compute host. Our standard threshold of
85 percent for average CPU utilization was relaxed for this testing to demonstrate the
performance when graphics resources are fully utilized (96 profiles per host). You might get a
better user experience by managing CPU at a threshold of 85 percent, by decreasing user density,
or by using a higher-binned CPU.
Figure 31 CPU utilization on GPU host
GPU usage
We gathered the GPU metrics from the vSphere Web Client. Six NVIDIA Tesla T4 GPUs were
configured on the GPU-enabled host. The GPU usage during the steady state phase across the six
GPUs averaged approximately 34 percent. The GPUs were used for executing graphics-intensive
tasks in the power workload, thus taking a load off CPUs and providing a better user experience
for graphics-intensive tasks.
Memory
As shown in the following figure, an active memory of 424.72 GB was recorded before the test
started. This was because all VMs were already powered on before the loading of user sessions.
Active memory remained constant during the login phase. With GPU enabled in the host, we noted
active memory usage during the start of the test increased when compared to a test where GPUs
were not used. During the steady state phase, active memory remained almost constant and
recorded an average steady active memory of 425 GB. This indicates that memory was not a
concern during the steady state phase and there was enough memory available in the ESXi host
cluster to meet requirements. Active memory utilization reduced when users logged out of their
sessions and it reached around 59 GB for the GPU host. During the re-creation of instant clones,
memory again remained constant at around 424 GB. No memory ballooning or swapping occurred
on any of the hosts during the testing process, indicating no memory constraints in the cluster.
As shown in the following figure, a consumed memory of 450.24 GB was recorded before the
testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the login phase. During the
steady state phase, consumed memory reached an average of 450.35 GB on the GPU-enabled
host. With a total memory of 768 GB available per compute host, memory was not a constraint
during the testing.
Figure 34 Consumed memory utilization on GPU host
Network usage
Network bandwidth was not an issue during testing. An average network usage of 654.42 Mbps
was recorded on the GPU host. The busiest period for network usage was during the re-creation of
instant clones phase. The host recorded peak network usage of 1,326.37 Mbps. With two 25 GbE
NICs in an active/active team available as an uplink for hosts, network bandwidth usage was well
under the 85 percent threshold set for network throughput.
Figure 35 Network bandwidth utilization on GPU host
IOPS
The cluster reached a maximum of 2,262 disk IOPS during the logging out phase and averaged
264.47 IOPS during the steady state phase. Based on these numbers, each user session generated
2.75 disk IOPS during the steady state phase. You can select your disk specifications in
accordance with this IOPS figure in your sizing exercise. As shown in Figure 36, I/O latency during
the steady state phase was 0.45 ms for the steady state IOPS requirement. The low latency figure
indicates that storage resources were not a bottleneck during steady state operations .
Cluster controller IOPS reached a peak of 16,382 disk IOPS during the logging out phase. Cluster
controller IOPS recorded a steady state average of 1,354. These are metrics taken directly from
Nutanix Controller VMs and give an indication of backend operations taking place in the storage
system. These IOPS metrics also include cache hits served by the memory.
The GPU host latency reached a maximum of 2.83 ms during the re-creation of instant clones and
averaged 0.38 ms during the steady state phase.
We also noted that there were no failed sessions during testing. This indicates that the login and
logging out processes were smooth. When manually interacting with the sessions during the
steady state phase, the mouse and window movement was responsive and video playback was
good. Moreover, all parameters we monitored were within the pass/fail threshold set out in Table
5. This indicates there were no resource constraints on the system and system performance was
good.
The following table shows the Login VSI score summary for the graphics power worker workload.
Table 15 Login VSI score summary for the graphics power worker workload
RDSH Task Worker, 233 users per host, ESXi 6.7, Horizon 7.7
The following metrics were collected and analyzed for this test case.
CPU
The following figure shows the performance data for 700 Remote Desktop Session Host (RDSH)
user sessions across three compute hosts in a Nutanix cluster. The test was carried out with a
Login VSI task worker workload. Each compute host in the cluster was provisioned with six RDSH
VMs, which were installed with the Windows 2016 Server operating system. We used the VMware
Blast Extreme display protocol for this testing.
During the login phase, CPU utilization increased steadily until all logons were complete. During the
steady state phase, the CPU utilization reached a steady state average of 88.87 percent across all
three compute hosts. This value is close to the pass/fail threshold we set for average CPU
utilization (see Table 5). However, it did not exceed the threshold limit, which includes a 5 percent
margin. To maintain a good EUE it is essential that this threshold is not exceeded. You can load
more user sessions while exceeding this threshold for CPU but this might result in a degradation of
the user experience.
CPU utilization started decreasing after the steady state phase when users began logging off from
sessions. CPU utilization reached near zero when all users had logged out.
Figure 42 CPU utilization on three hosts in the cluster
Memory
As shown in the following figure, an average consumed memory of 185 GB was recorded before
the testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the logon phase. During the
steady state phase, consumed memory reached an average of 190 GB across the three hosts.
Memory was not a constraint during the testing.
Active memory usage increased steadily during the logon phase. Around 68 GB of active memory
was occupied by each host during the start of the test. This includes memory used by server VMs
which were powered on before the test and the overhead memory used by the hypervisor. During
the steady state phase an average active memory of 129 GB was recorded. This indicates that
during the steady state phase, memory was not a concern and there was enough memory available
in the cluster to meet requirements. Active memory utilization decreased to a minimum when users
logged off from their sessions.
Figure 44 Active memory utilization on three hosts in the cluster
Network usage
Network bandwidth was not an issue during testing. An average network usage of 850 Mbps was
recorded across the three compute hosts during the steady state operations. The busiest period
for network usage was during the logoff phase. Peak network usage of 1,402 Mbps was recorded
by compute B during this phase. With two 25 GbE NICs configured as an uplink in an active/active
team, network bandwidth usage was well under the 85 percent threshold set for network
throughput.
Figure 45 Network bandwidth usage on three hosts in the cluster
IOPS
Cluster IOPS reached a peak of 1,478 during the logoff phase. Average cluster disk IOPS during
the steady state phase was 167 and peak IOPS recorded during this phase was 350. Based on
these numbers, the average disk IOPS per session during the steady state phase was 0.23 IOPS.
You can select your disk specifications in accordance with this IOPS figure in your sizing exercise.
As shown in Figure 47, I/O latency during the steady state phase was 0.43 ms for the steady state
IOPS requirement. The low latency figure indicates that storage was not a bottleneck during
steady state operations.
Figure 46 Cluster IOPS utilization
Cluster controller IOPS reached a peak of 25,172 during the re-creation of instant clones. Cluster
controller IOPS recorded a steady state average of 4,899. These metrics were taken directly from
the Nutanix Controller VMs and give an indication of backend operations taking place in the
storage system. The IOPS metric also includes cache hits served by the memory.
Figure 47 Cluster Controller IOPS utilization
We noted that there were only seven failed sessions during testing which is below the 2 percent
threshold we set for failed sessions. When manually interacting with the sessions during the steady
state phase, the mouse and window movement was responsive and video playback was good.
Moreover, all of the parameters we monitored were within the pass/fail threshold set out in Table
5. This indicates there were no resource constraints on the system and system performance was
good.
The following table shows the Login VSI score summary for the RDSH task worker workload.
Table 16 Login VSI score summary for the RDSH task worker workload
Table 17 User density recommendations for VMware vSphere ESXi 6.7 with VMware Horizon
Density Optimized RDSH Task worker Login VSI Task worker 233 (Horizon Apps RDSH/
Published Desktop)
a. The user density of 96 users was achieved at 95% CPU utilization. The CPU utilization threshold of 85% is relaxed
when testing with graphics cards. This test represents maximum utilization of the graphical resources available to
the system as well as full user concurrency. Ideally, in a production environment, you would decrease the user
density slightly or use higher bin processors to bring the CPU utilization closer to the 85% threshold. All LoginVSI
tests completed successfully without reaching VSI maximum, indicating that user experience was good.
All Login VSI tests were completed successfully without reaching the Login VSI maximum,
indicating that the user experience was good. Except for the Graphics Power worker profile, the
metrics for all other workloads were well within the thresholds that we set. You can get better user
densities by increasing the thresholds that we set—however, there might be a degradation in user
experience.
For additional resources on this topic, see the VMware documentation section.
Summary
The configurations for the XC Family devices—the XC740xd-24 and the XC640-10—are
optimized for performance intensive VDI workloads. We selected the memory and CPU
configurations that provides optimal performance. You can change these configurations to meet
your own requirements. Keep in mind that changing the memory and CPU configurations from
those that have been validated in this document will affect the user density per host.
In the Density Optimized configuration used in this testing we leveraged 2nd Generation Intel Xeon
Scalable processors (Cascade Lake) which have in-hardware mitigations for Spectre (variant 2),
Meltdown (variant 3), and L1 Terminal Fault side-channel methods. With mitigations in the
hardware, the new processors provide better performance and user densities than first-generation
Intel Xeon Scalable Processors (Skylake) or other previous generation processor-based VDI
systems, which still require software-level fixes to protect against side-channel vulnerabilities.
Vulnerabilities for which fixes are not available at hardware-level are mitigated through software-
level fixes. Cascade Lake processors also come with an improved architecture and higher thermal
efficiency that boosts the performance of the VDI system.
With the introduction of the six-channels-per-CPU requirement for Skylake and Cascade Lake, the
server memory configuration recommendation has increased from the previous guidance of 512 GB
to 768 GB. This change was necessary to ensure a balanced memory configuration and optimized
performance for your VDI solution. The additional memory is advantageous, considering the
resulting increase in operating system resource utilization and the enhanced experience for users
when they have access to additional memory allocations.
VMware documentation
The following VMware documentation provides additional and relevant information:
l VMware vSphere documentation
l VMware Horizon 7 documentation
l VMware Compatibility Guide
l Horizon 7 Enterprise Edition Reference Architecture
l Horizon 7 Enterprise Edition Multi-Site Reference Architecture
For additional information about advanced architectural considerations (for example, NUMA-
related topics):
l Best Practices for Published Applications and Desktops in VMware Horizon Apps and VMware
Horizon 7
NVIDIA documentation
The following NVIDIA documentation provides additional and relevant information:
l NVIDIA Virtual GPU Software Quick Start Guide