h17386 XC Horizon Validation Guide

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Dell EMC Ready Architectures for VDI

Designs for VMware Horizon on Dell EMC XC Family


August 2019
H17386.3

Validation Guide

Abstract
This validation guide describes the architecture and performance of the integration of
VMware Horizon components for virtual desktop infrastructure (VDI) on Dell EMC XC
Family devices.

Dell EMC Solutions


Copyright © 2018-2019 Dell Inc. or its subsidiaries. All rights reserved.

Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.” DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH
RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED IN THIS PUBLICATION REQUIRES AN
APPLICABLE SOFTWARE LICENSE.

Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their
respective owners. Published in the USA.

Dell EMC
Hopkinton, Massachusetts 01748-9103
1-508-435-1000 In North America 1-866-464-7381
www.DellEMC.com

2 Dell EMC Ready Architectures for VDI


CONTENTS

Chapter 1 Introduction 5
Executive summary.............................................................................................6
Document purpose.............................................................................................. 7
Audience............................................................................................................. 7
We value your feedback...................................................................................... 7

Chapter 2 Test Environment Configuration and Best Practices 9


Validated hardware resources........................................................................... 10
Enterprise hardware............................................................................. 10
Storage hardware................................................................................. 10
Graphics hardware................................................................................10
Network hardware.................................................................................11
Validated software resources............................................................................ 12
Validated system versions................................................................................. 12
Virtual networking configuration........................................................................13
Management server infrastructure.................................................................... 13
NVIDIA GRID License Server................................................................ 14
SQL Server databases.......................................................................... 14
DNS...................................................................................................... 14
High availability..................................................................................................14
VMware Horizon 7 architecture.........................................................................15

Chapter 3 Solution Performance and Testing 17


Testing process................................................................................................. 18
Resource monitoring.............................................................................18
Load generation....................................................................................18
Profiles and workloads..........................................................................19
A comparison of linked clones and instant clones................................. 22
Virtual Desktop Profile......................................................................... 23
Login VSI test analysis and results.................................................................... 23
Login VSI test results summary............................................................ 24
Knowledge Worker, 135 users per host, ESXi 6.7, Horizon 7.7 ............ 25
Power Worker, 106 users per host, ESXi 6.7, Horizon 7.7 .................... 31
Graphics Multimedia worker, 48 vGPU users per host, ESXi 6.7, Horizon
7.7........................................................................................................36
Graphics Power Worker, 96 vGPU users per host, ESXi 6.7, Horizon 7.7
.............................................................................................................44
RDSH Task Worker, 233 users per host, ESXi 6.7, Horizon 7.7 ........... 53

Chapter 4 Conclusion 59
Test results and density recommendations....................................................... 60
Summary.......................................................................................................... 60

Chapter 5 References 63
Dell EMC documentation.................................................................................. 64
VMware documentation....................................................................................64
NVIDIA documentation......................................................................................64

Dell EMC Ready Architectures for VDI 3


Contents

4 Dell EMC Ready Architectures for VDI


CHAPTER 1
Introduction

This chapter presents the following topics:

l Executive summary................................................................................................................. 6
l Document purpose.................................................................................................................. 7
l Audience..................................................................................................................................7
l We value your feedback.......................................................................................................... 7

Dell EMC Ready Architectures for VDI 5


Introduction

Executive summary
Virtual desktop infrastructure (VDI) plays a crucial role in today's business transformation
initiatives. VDI is the most efficient way to present Microsoft Windows applications to users in
their digital workspaces and provides a consistent user experience across devices for the modern-
day mobile workforce. Organizations increasingly rely on VDI to provide the agility, security, and
centralized management that is so important for their workforce.
It is often challenging for organizations to set up a VDI infrastructure. This challenge is mainly
because a typical VDI infrastructure involves the integration of multiple data center components
such as storage, network, and compute. The multivendor profile of these components often
creates challenges during deployment and can also affect the system's performance if it is not
optimized for VDI.
To consistently maintain a multicomponent and multivendor environment with a specialized skill set
is challenging for most organizations. The effort to maintain a stable VDI infrastructure can have a
negative impact on your total cost of ownership (TCO).
Dell EMC Ready Architectures for VDI based on Dell EMC XC series appliances is a perfect
solution for your VDI workloads. These hyperconverged appliances integrate Dell EMC PowerEdge
servers, Nutanix software, and a choice of hypervisors to run any virtualized workload you choose.
You can deploy an XC cluster in 30 minutes and manage it without specialized IT resources. XC
Series solutions eliminate the need for over-provisioning and capital expenditures that are based
on anticipated capacity and performance requirements.
System performance and capacity can be easily expanded one node at a time with zero downtime,
offering customers linear and predictable scale-out expansion and pay-as-you-grow flexibility. A
fault-tolerant architecture and self-healing capabilities provide system reliability and help ensure
data integrity. You will have an enterprise-level infrastructure with rapid deployment, less time
needed for routine management tasks, faster system restoration, and integrated enterprise class
data protection. Moreover, Dell EMC's Global Service and Support organization fully supports all
XC Series hardware, software, and deployments.
For customers who have chosen a Nutanix-based environment, Dell EMC recommends XC Family
devices that are optimized for VDI workloads to run VMware Horizon 7 VDI infrastructure. VMware
Horizon 7 has a streamlined approach to delivering and managing virtual desktops and applications,
providing a consistent user experience across devices and locations while keeping corporate data
secure and compliant. XC series appliances—the XC740xd-24 (2U) and the XC640-10 (1U)—are
designed for compute- and performance-intensive workloads in VDI. XC740xd-24 devices also
support GPU hardware for graphics-intensive desktop deployments.
The Dell EMC Ready Architectures for VDI team tests VDI solutions to ensure their validity. As part
of the testing process, engineers tune the system to maximize performance and efficiency, and
document best practices. Finally, a separate team of experts evaluates the test results to ensure
that the systems are properly configured and sized for customers. In the validation effort
described in this guide, we have used the Login VSI tool, which is an industry standard tool for
benchmarking VDI workloads. We tested typical Login VSI workloads such as Task worker,
Knowledge worker, Power worker, and Multimedia worker with each workload being accompanied
by an appropriate desktop virtual machine (VM) profile (that is, vCPU count, memory configured
on the VM and so on.). This document provides a detailed analysis that is based on those test
results. We recommend user density figures for these workloads while giving the utmost
importance to the user experience.

6 Dell EMC Ready Architectures for VDI


Introduction

Document purpose
This validation guide details the architecture, components, testing methods, and test results for
Dell EMC XC Family devices with VMware Horizon 7. It includes the test environment
configuration and best practices for systems that have undergone testing.

Audience
This guide is intended for architects, developers, and technical administrators of IT environments.
It provides an in-depth explanation of the testing methodology and basis for VDI densities. It also
validates the value of the Dell EMC Ready Architectures for VDI that deliver Microsoft Windows
virtual desktops to users of VMware Horizon 7 VDI components on XC Family devices.

We value your feedback


Dell EMC and the authors of this document welcome your feedback on the solution and the
solution documentation. Contact the Dell EMC Solutions team by email or provide your comments
by completing our documentation survey.
Authors: Dell EMC Ready Architectures for VDI Team.

Dell EMC Ready Architectures for VDI 7


Introduction

8 Dell EMC Ready Architectures for VDI


CHAPTER 2
Test Environment Configuration and Best
Practices

This chapter presents the following topics:

l Validated hardware resources................................................................................................ 10


l Validated software resources.................................................................................................12
l Validated system versions...................................................................................................... 12
l Virtual networking configuration............................................................................................ 13
l Management server infrastructure.........................................................................................13
l High availability...................................................................................................................... 14
l VMware Horizon 7 architecture............................................................................................. 15

Dell EMC Ready Architectures for VDI 9


Test Environment Configuration and Best Practices

Validated hardware resources


Dell EMC validated the solution with the specific hardware resources listed in this section.

Enterprise hardware
We used a 3-node cluster of Dell EMC XC Family XC740xd-24 devices with the component
configuration that is listed in the following table. We called this configuration "Density Optimized."
It comes with 2nd Generation Intel Xeon Scalable processors, code named Cascade Lake.
We used Dell EMC XC740xd-24 appliances to deliver performance while driving savings on power,
cooling, and data center space. These XC devices are designed to handle VDI workloads and
reduce Operational Expenditure (OPEX).

Table 1 Validated Density Optimized hardware configuration

Enterprise CPU Memory RAID HD configuration Network


platform controller

XC740xd-24 2 x 6248 Intel 768 GB @ HBA 330 ADP l 2 x 240 GB M.2 2 x Mellanox
Xeon Gold (20- 2933 MT/s Connect X-4 LX 25
core, 2.6 GHz)
l 2 x 960 GB SSD GbE SFP Rack
l 6 x 1.2 TB HDD NDC

Dell EMC XC740xd-24 devices are one of the most versatile and scalable hyperconverged
infrastructure platforms. They are purpose-built for performance-intensive VDI workloads and you
can use them to scale incrementally to match VDI requirements in a pay-as-you-grow model.
The Dell EMC XC740xd-24 device is a 2U platform that can be configured with 24 x 2.5-inch disks
to serve a broad range of capacity requirements. Each one comes equipped with dual CPUs, 10 to
28 cores, and up to 1.5 TB of high performance RAM. They are VDI optimized and support GPU
hardware for graphics-intensive desktop deployments. The XC740xd-24 can be configured with or
without GPUs.

Storage hardware
We used the following storage configuration for different storage tiers.
Storage hardware used per cluster node:
l 2 x Boot Optimized Storage Solution (BOSS) M.2 SATA Device for the host OS
l 2 x 960 GB SATA SSD for the Performance tier
l 6 x 1.2 TB NL SAS for the Capacity tier
The M.2-based BOSS module boots the hypervisor and the Nutanix Controller VM (CVM). PERC
HBA330 connects the CVM to the SSDs and HDDs. All HDD and SSD disks are presented to the
Nutanix CVM running locally on each host, which contributes to the clustered DSF storage pool.
Two SSDs were used for the Performance tier (Tier1) and six HDDs were used for the Capacity
tier (Tier2), per node.

Graphics hardware
We used NVIDIA Tesla T4 GPU hardware in our tests for graphics-intensive workloads. The T4 is a
single-slot 6-inch PCI Express Gen3 graphics card featuring a single high-end NVIDIA T4 GPU.
NVIDIA's newest architecture is available in the T4 GPU, which is considered the universal GPU for

10 Dell EMC Ready Architectures for VDI


Test Environment Configuration and Best Practices

data center workflows. The GPU architecture supports GDDR6 memory, which provides improved
performance and power efficiency when compared to the previous generation, GDDR5.
The T4 provides power savings by requiring lower power (70 watts) and it does not require any
supplemental power connector. It also uses the NVIDIA Turing architecture, which includes Tensor
Cores for accelerating deep learning inference workflows and provides up to 40X higher
performance compared to CPUs, with 60 percent of the power consumption. Add up to six GPU
cards to your Dell EMC XC740xd-24 device to enable up 96 GB of video buffer. In modernized data
centers, use this card during off-peak hours to perform your inferencing workloads.

Network hardware
We used the following network hardware in our test environment:
l Dell Networking S4048 (10 GbE ToR switch)—A high-density, ultralow-latency ToR switch
that features 48 x 10 GbE SFP+, 6 x 40 GbE ports and up to 720 Gbps switch fabric capacity.
l Dell Networking S5248 (25 GbE ToR switch)—A high-density, high performance, open
networking ToR switch that features 48 x 25 GbE SFP28, 4 x 100 GbE QFSP28 ports, 2 x 100
GbE QFSP28-DD ports and up to 2.0 TB/s switch fabric capacity.

Dell EMC Ready Architectures for VDI 11


Test Environment Configuration and Best Practices

Validated software resources


Dell EMC validated this solution with the software components listed in the following table.

Table 2 Validated software component versions

Component Description/Version

Hypervisor ESXi 6.7

Broker technology VMware Horizon 7 version 7.7

Broker database Microsoft SQL Server 2016

Management VM operating Microsoft Windows Server 2016 (Connection Server and DB)
system

Virtual desktop operating Microsoft Windows 10 Enterprise


system

Office application suite Microsoft Office Professional 2016

Login VSI test suite Version 4.1.32.1

Platform Nutanix AOS version 5.10.3.1

NVIDIA GRID software (for 7.1


graphics testing)

Validated system versions


Dell EMC validated this solution using the system versions listed in the following table.

Table 3 Version matrix for tested system

Server Nvidia Hypervisor Hypervisor Hypervisor Bios AOS Windows Windows 10


configuration vGPU version build version 10 version patches
version

Density- N/A ESXi 6.7 10302608 1.6.11 5.10.3.1 1803 l KB410034


Optimized 7
Density- 7.1 l KB447713
Optimized + 6 x 7
T4
l KB448097
9
l KB448096
6

12 Dell EMC Ready Architectures for VDI


Test Environment Configuration and Best Practices

Virtual networking configuration


The network configuration for the Dell EMC XC Family devices uses a 25 Gb converged
infrastructure model.
All required VLANs traverse two 25 GB network interface controllers (NICs) configured in an
active/active team. For larger scaling, we recommend that you separate the infrastructure
management VMs from the compute VMs to aid in predictable compute host scaling.
We used the following VLAN configurations for the compute hosts, management hosts, and iDRAC
in this solution model:
l Compute hosts
n Management VLAN: Configured for hypervisor infrastructure traffic—L3 routed by using
the spine layer
n Live Migration VLAN: Configured for Live Migration traffic—L2 switched by using the leaf
layer
n VDI VLAN: Configured for VDI session traffic—L3 routed by using the spine layer
l Management hosts
n Management VLAN: Configured for hypervisor management traffic—L3 routed by using the
spine layer
n Live Migration VLAN: Configured for Live Migration traffic—L2 switched by using the leaf
layer
n VDI Management VLAN: Configured for VDI infrastructure traffic—L3 routed by using the
spine layer
l VLAN iDRAC: Configured for all hardware management traffic—L3 routed by using the spine
layer

Management server infrastructure


The following table lists the sizing requirements for the management server components.

Table 4 Sizing for XC Family devices, Remote Desktop Session Host (RDSH), and NVIDIA GRID
license server (optional)

Component vCPU RAM (GB) NIC OS + data vDisk Tier 2 volume


(GB) (GB)

VMware vCenter Appliance 2 16 1 290

Platform Services Controller 2 2 1 30

Horizon Connection Server 8 16 1 40

SQL Server 4 8 1 40 210 (VMDK)

File server 1 4 1 40 2,048 (VMDK)

Nutanix CVM 12 32 2 0

RDSH VM 8 32 1 80

NVIDIA GRID License Server 2 4 1 40 + 5

Dell EMC Ready Architectures for VDI 13


Test Environment Configuration and Best Practices

NVIDIA GRID License Server


When using NVIDIA vGPU cards, graphics-enabled VMs must obtain a license from a GRID License
Server on your network to be entitled for vGPU.

We installed the GRID License Server software on a system running a Windows 2016 operating
system to test vGPU configurations.
We made the following changes to the GRID License Server to address licensing requirements:
l Used a reserved fixed IP address
l Configured a single MAC address
l Applied time synchronization to all hosts on the same network

SQL Server databases


During validation, a single dedicated SQL Server 2016 VM hosted the VMware databases in the
management layer. We separated SQL data, logs, and tempdb into their respective volumes, and
created a single database for Horizon Connection Server.
We adhered to VMware best practices for this testing, including alignment of disks to be used by
SQL Server with a 1,024 KB offset and formatted with a 64 KB file allocation unit size (data, logs,
and tempdb).

DNS
DNS is the basis for Microsoft Active Directory and also controls access to various software
components for VMware services. All hosts, VMs, and consumable software components must
have a presence in DNS. We used a dynamic namespace integrated with Active Directory and
adhered to Microsoft best practices.

High availability
Although we did not enable high availability (HA) during the validation that is documented in this
guide, we strongly recommend that HA be factored into any VDI design and deployment. This
process follows the N+1 model with redundancy at both the hardware and software layers. The
design guide for this architecture provides additional recommendations for HA and is available at
the VDI Info Hub for Ready Solutions.

14 Dell EMC Ready Architectures for VDI


Test Environment Configuration and Best Practices

VMware Horizon 7 architecture


When designing and determining the architecture for a successful VDI deployment, it is important
to understand the underlying network traffic flows, ports, and components. Use Figure 1 as a
starting reference for understanding the interdependencies of the different components within a
VMware Horizon 7 architecture. The number of ports and protocols that are required will vary
depending on the size of the deployment, the external connectivity requirements, and the display
protocols in use (RDP, Blast, or PCoIP). You should undertake careful planning and design to allow
these ports and protocols in the corporate network firewall policies.
For more information about required ports and protocols, see VMware View ports and network
connectivity requirements (1027217) and the VMware Horizon Reference Architecture guide.

Dell EMC Ready Architectures for VDI 15


Test Environment Configuration and Best Practices

Figure 1 VMware Horizon architecture

16 Dell EMC Ready Architectures for VDI


CHAPTER 3
Solution Performance and Testing

This chapter presents the following topics:

l Testing process......................................................................................................................18
l Login VSI test analysis and results.........................................................................................23

Dell EMC Ready Architectures for VDI 17


Solution Performance and Testing

Testing process
To ensure good EUE and cost-per-user, we conducted PAAC testing on this solution using Login
VSI, a load-generation tool that monitors both hardware resource utilization parameters and EUE
during load-testing.
For each user scenario, we ran the tests four times, once to validate data capture and three times
to collect metrics and analyze variance.
Our EUE validation consisted of logging into a session while the system was under a load created
by the VSI Login tool and completing tasks from the workload definition. While this test is
subjective, it helps to provide a better understanding of the EUE in the desktop sessions,
particularly under high load. It also helps to ensure reliable data gathering.

Resource monitoring
To ensure that the user experience was not compromised, we monitored the following important
resources:
l Compute host server resources—VMware vCenter (for solutions based on VMware
vSphere) or Microsoft Performance Monitor (for solutions based on Hyper-V) gather key data
(CPU, memory, disk, and network usage) from each of the compute hosts during each test
run. This data was collected for each host and consolidated for reporting. We do not report any
metrics for the management host servers. However, they were monitored manually during
testing to ensure that no bottlenecks impacted the test.
l Utilization thresholds—Resource overutilization can cause poor EUE. We monitored the
relevant resource utilization parameters and compared them to relatively conservative
thresholds. The thresholds were selected based on industry best practices and our experience
to provide an optimal trade-off between good EUE and cost-per-user while also allowing
sufficient burst capacity for seasonal or intermittent spikes in demand.
Table 5 Parameter pass/fail thresholds

Parameter Pass/fail threshold

Physical host CPU utilization 85% a

Physical host memory utilization 85%

Network throughput 85%

Storage I/O latency 20 ms

Login VSI Failed Session 2%

a. The Ready Solutions for VDI team recommends that steady-state average CPU utilization
across the three hosts in a cluster not exceed 85 percent in a production environment.
Average CPU utilization sometimes exceeds our recommended percentage. Because of the
nature of automated testing tools like Login VSI, a 5 percent margin of error was accepted
and it does not impact our sizing guidance.

l GPU resources—We collected GPU utilization metrics from VMware vCenter.

Load generation
Login VSI from Login VSI, Inc. is the industry-standard tool for testing VDI environments and
RDSH environments.
Login VSI installs a standard collection of desktop application software (including Microsoft Office
and Adobe Acrobat Reader) on each VDI desktop testing instance. It then uses a configurable

18 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

launcher system to connect a specified number of simulated users to available desktops within the
environment. When the simulated user is connected, a logon script configures the user
environment and starts a defined workload. Each launcher system can launch connections to
several VDI desktops (target machines). A centralized management console configures and
manages the launchers and the Login VSI environment.
We used the following login and boot conditions:
l For most of our tests, new user sessions were logged in at a steady rate over a 1-hour period.
During tests of low-density solutions such as GPU and graphic-based configurations, users
were logged in every 10 seconds.
l All desktops were started before users logged in.
l All desktops ran an industry-standard anti-virus solution. Windows 10 machines used Windows
Defender.

Profiles and workloads


The combination of virtual desktop profiles and simulated user workloads determines the total
number of users (density) that the VDI solution can support. Specific metrics and capabilities
define each virtual desktop profile and user workload. It is important to understand these terms in
the context of this document.
Profiles and workloads are defined as follows:
l Profile—The configuration of the virtual desktop: the number of vCPUs and the amount of
RAM that is configured on the desktop and available to the user.
l Workload—The set of applications and tasks that are defined to be used by a simulated user in
the PAAC test.
Load-testing on each machine profile uses an appropriate user workload that is representative of
the relevant use case. It is summarized in the following table:

Table 6 Virtual desktop profile to workload mapping

Profile name Workload name

Knowledge worker Login VSI Knowledge worker

Power worker Login VSI Power worker

Graphics power worker Login VSI Power worker

Graphics multimedia worker Login VSI Multimedia

RDSH task worker Login VSI Task worker

The following table summarizes the Login VSI workloads that were tested in this validation effort.
For more information, see Login VSI.

Table 7 Login VSI tested workloads

Workload name Workload description

Login VSI Designed for virtual machines with 2 vCPUs. This workload includes the following activities:
Knowledge worker
l Outlook—Browse messages.
l Internet Explorer—Browse websites and open a YouTube style video (480p movie trailer)
three times in every loop.

Dell EMC Ready Architectures for VDI 19


Solution Performance and Testing

Table 7 Login VSI tested workloads (continued)

Workload name Workload description

l Word—Start one instance to measure response time and another to review and edit a
document.
l Doro PDF Printer and Acrobat Reader—Print a Word document and export it to PDF.
l Excel—Open a large randomized sheet.
l PowerPoint—Review and edit a presentation.
l FreeMind—Run a Java-based Mind Mapping application.
l Other—Perform various copy and zip actions.

Login VSI Power The most intensive of the standard Login VSI workloads. The following activities are
worker performed with this workload:
l Begin by opening four instances of Internet Explorer and two instances of Adobe Reader
that remain open throughout the workload.
l Perform more PDF printer actions than in the other workloads.
l Watch a 720p and a 1080p video.
l Reduce the idle time to two minutes.
l Perform various copy and zip actions.

Login VSI A workload that is designed to heavily stress the CPU when using software graphics
Multimedia worker acceleration. GPU-accelerated computing offloads the most compute-intensive sections of
(Graphics an application to the GPU while the CPU processes the remaining code. This modified
performance workload uses the following applications for its GPU/CPU-intensive operations:
configuration)
l Adobe Acrobat
l Google Chrome
l Google Earth
l Microsoft Excel
l HTML5 3D spinning balls
l Internet Explorer
l MP3
l Microsoft Outlook
l Microsoft PowerPoint
l Microsoft Word
l Streaming video

Login VSI Task The least intensive of the standard workloads. It runs fewer applications and starts and stops
Worker them less frequently than the other workloads, resulting in lower CPU, RAM and I/O usage.
The Task Worker workload uses the following applications:
l Adobe Reader
l Microsoft Excel
l Internet Explorer
l Microsoft Outlook

20 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Table 7 Login VSI tested workloads (continued)

Workload name Workload description

l Microsoft Word
l Print and zip actions using Notepad and 7-zip

Dell EMC Ready Architectures for VDI 21


Solution Performance and Testing

A comparison of linked clones and instant clones


Horizon supports two provisioning methods that deliver space-optimized virtual desktop pools:
linked clones and instant clones. For this PAAC testing we used instant clones to create VMs. The
user density per host is not impacted by using one over the other. The differences in the test
graphs for these two methods are a result of the following processes:
l For linked clones, all the VMs are rebooted before the test starts to simulate a boot storm. The
CPU spike during the boot storm phase is due to the CPU being utilized by all VMs during the
powering on process. When the VMs are booted up, CPU utilization drops to near zero as
shown in the following figure. During the login phase CPU utilization again increases and once
all users have logged in CPU utilization remains constant as shown in the steady state phase in
the figure. Once the steady state phase is over and users start to log out, CPU utilization
decreases. It drops to near zero when all users have logged out.
Figure 2 Host CPU utilization for linked clones

l For instant clones, the VMs are rebooted after the session ends because when a user logs out
of the instant clone, the clone is destroyed and re-created for the next user. CPU utilization
gradually increases during the login phase when users start logging in and then remains
constant during the steady state phase when logins have been completed, as shown in the
following figure. When the steady state period is over and users start to log off, CPU utilization
again decreases and drops to near zero when all users have logged off. After user logoff, the
instant clone pool is re-created. During this phase, there is a CPU spike which then drops to
near zero when pool re-creation is complete.

22 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Figure 3 Host CPU utilization for instant clones

Virtual Desktop Profile


The following table summarizes the profile or Desktop VM configurations for the workloads that
we tested.

Table 8 Desktop VM specifications

Profile name Workload vCPUsa Configured Reserved Screen Operating


Name memoryb memoryc resolution system

Knowledge Login VSI 2 4 GB 2 GB 1920 x 1080 Windows 10


worker Knowledge Enterprise 64-
worker bit

Power worker Login VSI 4 8 GB 4 GB 1920 x 1080 Windows 10


Power worker Enterprise 64-
bit

Graphics Login VSI 4 8 GB 8 GB 1920 x 1080 Windows 10


Power worker Power worker Enterprise 64-
bit

Graphics Login VSI 4 8 GB 8 GB 1920 x 1080 Windows 10


Multimedia Multimedia Enterprise 64-
worker bit

RDSH Task Login VSI Task 8 32 GB 32 GB 1280 x 720 Windows 2016


worker d worker 64-bit

a. vCPUs—The number of virtual CPUs assigned to the desktop virtual machine.


b. Configured memory—Memory that is configured for or assigned to the desktop virtual machine.
c. Reserved memory—Amount of memory reserved for the desktop virtual machine. Reserved memory is guaranteed.
d. Task worker is tested in a Horizon Apps RDSH (published desktop) environment.

Login VSI test analysis and results


We used the Login VSI test suite to simulate the user experience for several profile types under
the typical workload for that type. We performed the PAAC testing on the 3 x Dell EMC

Dell EMC Ready Architectures for VDI 23


Solution Performance and Testing

XC740xd-24 cluster using the Density Optimized configuration, details of which are described in
the following table.
We deployed Horizon and vSphere management roles within the cluster on a single host that also
hosted desktops. This optional configuration is beneficial for Proof of Concepts (POCs) or small
deployments looking to maximize user density.
We allocated 12 vCPUs and 32 GB of memory to the Nutanix CVM when we configured the
Nutanix cluster. A Nutanix CVM runs on each node of the Nutanix cluster, enabling the pooling of
local storage from all nodes in the cluster.

Table 9 Density Optimized configuration

Enterprise CPU Memory RAID controller HD configuration Network


platform

XC740xd-24 6248 Gold 768 GB @ HBA 330 2 x 240 GB M.2 2 x Mellanox


(20-core 2.5 2933 MT/s Connect X-4 LX
2 x 960 GB SSD
GHz) 25 GbE SFP Rack
4 x 1.8 TB HDD NDC

Login VSI test results summary


Before we investigate the detailed analysis for each virtual desktop profile or workload that was
tested, let us look at the summary of the results. The following table summarizes the test results
for the profiles or workloads.
The table headings are defined as follows:
l Density Optimized—The configuration that was used for this validation effort.
l Profile name—The configuration of the virtual desktop: the number of vCPUs and the amount
of RAM that is configured on the desktop and available to the user.
l Workload name—The set of applications and tasks that are defined to be used by a simulated
user. See Table 7 for details of the workloads tested in this PAAC testing.
l User density—The number of users per compute host that successfully completed the
workload test within the acceptable resource limits for the host. For clusters, this number
reflects the average per server density that was achieved for all compute hosts in the cluster.
l Average CPU—The average CPU usage over the steady state period. For clusters, this
number represents the combined average CPU usage of all compute hosts. On the latest Intel
processors, the ESXi host CPU metrics exceed the rated 100 percent for the host if Turbo
Boost is enabled (the default setting). An additional 35 percent of CPU is available from the
Turbo Boost feature. However, this additional CPU headroom is not reflected in the VMware
vSphere metrics where the performance data is gathered. Therefore, CPU usage for ESXi
hosts is adjusted and each CPU graph includes a line indicating the potential performance
headroom that Turbo boost provides.
l Average active memory—For ESXi hosts, the amount of memory that is actively used as
estimated by the VM kernel based on recently accessed memory pages. For clusters, this
memory is the average amount of guest physical memory that is actively used across all
compute hosts over the steady state period.
l Average IOPS per user—IOPS calculated from the average disk IOPS over the steady state
period divided by the number of users.

24 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Table 10 Login VSI test results summary

Server Profile name Workload Display User Average Average Average


configuration name protocol density CPU active IOPS
per host memory per user

Density Optimized Knowledge worker Login VSI PCOIP 135 85.96% 170 GB 2.65
Knowledge
worker

Density Optimized Power worker Login VSI PCOIP 106 86% 206.74 3.07
Power worker GB

Density Optimized Graphics Login VSI Blast 48 84.92% 428.24 4.73


+ 6x NVIDIA Tesla Multimedia worker Multimedia Extreme GB
T4 (Virtual PC: T4:1B) worker

Density Optimized Graphics Power Login VSI Blast 96 95.57% 425 GB 2.75
+ 6x NVIDIA Tesla worker (Virtual PC: Power Worker Extreme
T4 T4:1B)

Density Optimized RDSH Task worker Login VSI Blast 233 88.87% 129 GB 0.23
Task worker Extreme (Horizon
Apps
RDSH/
Publishe
d
Desktop)

Knowledge Worker, 135 users per host, ESXi 6.7, Horizon 7.7
The following metrics were collected and analyzed for this test case.
CPU
The graph shows the performance data for 405 user sessions across three compute hosts when
tested with a Login VSI knowledge worker workload. Each compute host had approximately 135
virtual desktops. We used the PC-over-IP (PCOIP) display protocol for this knowledge worker
testing.
During the login phase, CPU utilization increased steadily until all logins were complete. Compute
host C recorded a peak CPU utilization of 98.21 percent during the login phase. During the steady
state phase, the CPU utilization reached a steady state average of 85.96 percent across all three
compute hosts. This value is close to the pass/fail threshold we set for average CPU utilization
(see Table 5). However, it did not exceed the threshold limit. To maintain a good EUE, it is
essential that this threshold is not exceeded. You can load more user sessions while exceeding this
threshold but this might result in a degradation in user experience.
As shown in the following figure, CPU utilization started decreasing after the steady state phase
when users began logging out of sessions. CPU utilization reached near zero when all users had
logged out. CPU utilization spiked during the instant clone re-creation phase after user log out.
CPU utilization on compute B reached a peak of 91.4 percent during this phase.

Dell EMC Ready Architectures for VDI 25


Solution Performance and Testing

Figure 4 CPU utilization on three hosts in the cluster

Memory
As shown in the following figure, an average consumed memory of 561 GB was recorded before
the testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the login phase. During the
steady state phase, consumed memory reached an average of 582 GB across the three hosts.
With a total memory of 768 GB available per compute host, memory was not a constraint during
the testing.
Figure 5 Consumed memory utilization on three hosts in the cluster

Active memory usage increased steadily during the login phase. Each host occupied about 58 GB
of active memory during the start of the test. This includes memory used by desktop VMs that
were powered on before the test and the overhead memory used by the hypervisor. During the

26 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

steady state phase, active memory remained almost constant and an average steady active
memory of 170 GB was recorded. This implies that during the steady state phase, memory was not
a concern and there was enough memory available in the ESXi host cluster to meet requirements.
Active memory utilization reduced to a minimum when users logged out of their sessions. During
the re-creation of instant clones, a peak average active memory of 502.7 GB was recorded across
all three hosts. This peak in active memory usage is expected during the instant clone re-creation
process as all VMs that were destroyed after user log out must be re-created—this is a memory
intensive task. No memory ballooning or swapping occurred on any of the hosts during the testing
process, indicating no memory constraints in the cluster.
Figure 6 Active memory utilization on three hosts in the cluster

Network usage
Network bandwidth was not an issue during testing. An average network usage of 841.48 Mbps
was recorded across the three compute hosts during the steady state operations. The busiest
period for network usage was during the re-creating of instant clones after user log out. Compute
B recorded a peak network usage of 2946 Mbps during the re-creation of instant clones. With 2 x
25 GbE NICs in an active/active team available as an uplink for hosts, network bandwidth usage
was well below the 85 percent threshold set for network throughput.

Dell EMC Ready Architectures for VDI 27


Solution Performance and Testing

Figure 7 Network bandwidth usage on three hosts in the cluster

IOPS
Cluster IOPS reached a peak of 7,956 during the instant clone re-creation process. Average
cluster disk IOPS during the steady state phase was 1073. Based on these numbers, the average
disk IOPS per session during this phase was 2.65. You can select your disk specifications in
accordance with this IOPS figure in your sizing exercise. As shown in Figure 10, I/O latency during
the steady state phase was 0.37 ms for the steady state IOPS requirement. The low latency figure
indicates that storage was not a bottleneck during steady state operations.
Figure 8 Cluster IOPS utilization

28 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Cluster controller IOPS reached a peak of 40306 during the re-creation of instant clones. Cluster
controller IOPS recorded a steady state average of 4495. These are metrics taken directly from
the Nutanix Controller VMs and give an indication of backend operations taking place in the
storage system. The IOPS metric also includes cache hits served by the memory.
Figure 9 Cluster controller IOPS utilization

Storage I/O latency


The cluster latency reached a peak of 3.32 ms during the re-creation of instant clones. A peak in
latency was expected due to the creation of the new swap files and difference disks required for
the instant clone VMs—these tasks are disk I/O intensive. Average cluster latency during the
steady state phase was 0.37 ms. This value is well below the pass/fail threshold of 20 ms set for
storage I/O latency. Overall during this testing, storage resources did not appear to be a
bottleneck.

Dell EMC Ready Architectures for VDI 29


Solution Performance and Testing

Figure 10 Cluster latency

Login VSI: User experience summary


The baseline score for the Login VSI test was 853. This score falls in the 800 through 1199 range
rated as "Good" by the Login VSI tool. For more information about Login VSI baseline ratings and
baseline calculations, see this Login VSImax article. The Login VSI test was run for 405 user
sessions for the Knowledge worker workload. The blue line in the following figure indicates that the
system reached a VSImax average score of 1222 when 405 sessions were loaded. This is well below
the VSI threshold score of 1853 set by the Login VSI tool. VSImax was never reached during the
duration of testing, which normally indicates a stable system and a better user experience. See
Table 11 for an explanation of the Login VSI metrics.
Figure 11 Login VSI summary

We also noted that there were no failed sessions during testing, which indicates that the login and
log out processes were smooth. When manually interacting with the sessions during the steady
state phase, the mouse and window movement was responsive and video playback was good.
Moreover, all parameters we monitored were within the pass/fail threshold described in Table 5.
This indicates that there were no resource constraints on the system and system performance was
good.
The following table explains the Login VSI metrics.

30 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Table 11 Login VSI metrics

Login VSI Metrics Description

VSImax VSImax shows the number of sessions that can be active on a


system before the system is saturated. It is the point where the
VSImax V4 average graph line meets the VSImax V4 threshold
graph line. A red X indicates this intersection in the Login VSI
graph. This number gives you an indication of the scalability of
the environment (higher is better).

VSIbase VSIbase is the best performance of the system during a test


(the lowest response times). This number is used to determine
what the performance threshold will be. VSIbase gives an
indication of the base performance of the environment (lower is
better).

VSImax v4 average VSImax v4 average is calculated based on the number of active


users that are logged in to the system but removes the two
highest and two lowest samples to provide a more accurate
measurement.

VSImax v4 threshold VSImax v4 threshold indicates at which point the environment's


saturation point is reached (based on VSIbase).

The following table shows the Login VSI score summary for the knowledge worker workload.

Table 12 Login VSI score summary for the knowledge worker workload

VSIbase VSImax average VSImax threshold VSImax Reached

853 1222 1853 No

Power Worker, 106 users per host, ESXi 6.7, Horizon 7.7
The following metrics were collected and analyzed for this test case.
CPU usage
The following graph shows the performance data for 318 user sessions across three compute hosts
when tested with a power worker workload. Each compute host had 106 virtual desktops. We used
the PC-over-IP (PCOIP) display protocol for this power worker testing.
During the login phase, CPU utilization increased steadily until all logins were complete. During the
steady state phase, the CPU utilization reached a steady state average of 86 percent across all
three compute hosts. This value is close to the pass/fail threshold we set for average CPU
utilization.
However, we found that it did not exceed the threshold limits we set, which includes a 5 percent
margin (see Table 5). To maintain a good EUE, it is essential that this threshold is not exceeded.
You can load more user sessions while exceeding this threshold for CPU but this might result in a
degradation in user experience. As shown in the following figure, CPU utilization started
decreasing after the steady state phase when users started logging out of sessions.
CPU utilization reached near zero when all users had been logged out. CPU utilization spiked
during the instant clone re-creation phase after user log out. CPU utilization reached almost 100
percent on all hosts during the re-creation of instant clones. Because the turbo feature is enabled
on CPUs, this spike is not considered to be an issue. During the re-create clones phase all VMs
that are logged out of are deleted and re-created again. Instant clones are created by forking a

Dell EMC Ready Architectures for VDI 31


Solution Performance and Testing

parent VM and the process involves allocating new resources for the child VM. This activity is
resource-intensive.
Figure 12 CPU utilization on three hosts in the cluster

Memory
As shown in the following figure, an average consumed memory of 387.35 GB was recorded before
the testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the login phase. During the
steady state phase, consumed memory reached an average of 738.54 GB across the three hosts.
With a total memory of 768 GB available per compute host, memory was not a constraint during
the testing.
Figure 13 Consumed memory utilization on three hosts in the cluster

32 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Active memory usage increased steadily during the login phase. Each host occupied approximately
77 GB of active memory during the start of the test. This includes memory used by desktop VMs,
which were powered on before the test and the overhead memory used by the hypervisor. During
the steady state phase, active memory remained almost constant and an average steady active
memory of 206.74 GB was recorded. This indicates that memory was not a concern during the
steady state phase and there was enough memory available in the ESXi host cluster to meet
requirements. Active memory utilization was reduced to a minimum when users logged out of their
sessions. During the re-creation of instant clones, a peak average active memory of 802.71 GB was
recorded across all three hosts. This peak in active memory usage is expected during the instant
clone re-creation process. During this process, all VMs that had been destroyed after users logged
off have to be re-created, which is a memory intensive task. No memory ballooning or swapping
occurred on any of the hosts during the testing process, indicating no memory constraints in the
cluster.
Figure 14 Active memory utilization on three hosts in the cluster

Network usage
Network bandwidth was not an issue during testing. An average network usage of 1031.61 Mbps
was recorded across the three compute hosts during the steady state operations. The busiest
period for network usage was during the recreating of instant clones after users logged out.
Compute A recorded a peak network usage of 3668.65 Mbps during the re-creation of instant
clones. With two 25 GbE NICs in an active/active team available as an uplink for hosts, network
bandwidth usage was well under the 85 percent threshold set for network throughput.

Dell EMC Ready Architectures for VDI 33


Solution Performance and Testing

Figure 15 Network bandwidth usage on three hosts in the cluster

IOPS
Cluster IOPS reached a peak of 12,522 during the instant clone re-creation process. Average
cluster disk IOPS during the steady state phase was 976. Based on these numbers, the average
disk IOPS per session during the steady state phase was 3.07. You can select your disk
specifications in accordance with this IOPS figure in your sizing exercise. As shown in the following
figure, I/O latency during the steady state phase was 0.37 ms for the steady state IOPS
requirement. The low latency figure indicates that during steady state operations, storage
resources were not a bottleneck.
Figure 16 Cluster IOPS utilization

Cluster controller IOPS reached a peak of 38409 during the re-creation of instant clones. Cluster
controller IOPS recorded a steady state average of 4490. These metrics collected from the

34 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Nutanix Controller VMs give an indication of backend operations taking place in the storage
system. This IOPS metric also includes cache hits served by the memory.
Figure 17 Cluster controller IOPS utilization

Storage I/O latency


The cluster latency reached a peak of 3.38 ms during the re-creation of instant clones. A peak in
latency was expected due to the creation of the new swap files and difference disks required for
the instant clone VMs. These tasks are disk I/O intensive. Average cluster latency during the
steady state phase was 0.4 ms, a value well below the pass/fail threshold of 20 ms set for storage
I/O latency. Overall during this testing, storage resources did not appear to be a bottleneck.
Figure 18 Cluster latency

Dell EMC Ready Architectures for VDI 35


Solution Performance and Testing

Login VSI: User experience summary


The baseline score for the Login VSI test was 776. This score falls in the range 0-779 rated as
"Very Good" by the Login VSI tool. For more information about Login VSI baseline ratings and
calculations, see this Login VSImax article. The Login VSI test was run for 318 user sessions for the
power worker workload. The blue line in the following figure indicates that the system reached a
VSImax average score of 1149 when 318 sessions were loaded. This is well below the VSI threshold
score of 1776 set by the Login VSI tool. During the duration of testing VSImax was never reached,
which normally indicates a stable system and a better user experience. See Table 11 for an
explanation of the Login VSI metrics.
Figure 19 Login VSI graph summary

We also noted that there was only one failed session during testing, which indicates that the login
and log out processes were smooth. When manually interacting with the sessions during the
steady state phase, the mouse and window movement was responsive and video playback was
good. Moreover, all parameters we monitored were within the pass/fail threshold set as shown in
the following table. This indicates that there were no resource constraints on the system and the
system performance was good.
The following table shows the Login VSI score summary for the power worker workload.

Table 13 Login VSI score summary for the power worker workload

Login VSIbase VSImax Average VSImax Threshold VSImax Reached

776 1149 1776 No

Graphics Multimedia worker, 48 vGPU users per host, ESXi 6.7, Horizon 7.7
In this multimedia workload test, one of the nodes in the cluster was enabled with 48 vGPU profiles
and loaded with sessions. The other two nodes in the cluster did not host any compute VMs. We
used the VMware Horizon Blast Extreme protocol for this graphics multimedia worker testing. The
following metrics were collected and analyzed for this test case.

36 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

CPU usage
The GPU-enabled compute host in the cluster was populated with 48 vGPU-enabled VMs and used
the NVIDIA Tesla T4-1B profile. With all user VMs powered on before starting the test, the CPU
usage was approximately 10 percent on the GPU-enabled compute host.
The following figure shows the CPU utilization metric data for 48 user sessions on the GPU-
enabled compute host. During the login phase, CPU utilization increased steadily until all logins
were complete. During the steady state phase, the CPU utilization reached a steady state average
of 84.92 percent across all three compute hosts. This value is close to the pass/fail threshold we
set for average CPU utilization (see Table 5). However, it did not exceed the threshold limits we
set. To maintain a good EUE it is essential that this threshold is not exceeded. You can load more
user sessions while exceeding this threshold for CPU utilization but might experience a
degradation in user experience. As shown in Figure 20, CPU utilization started decreasing after the
steady state phase when users started logging out of sessions. CPU utilization reached near zero
when all users had logged out. There was no spike in CPU utilization during the re-creation of
instant clones.
User density is also limited by the frame-buffer of GPUs. Forty-eight users with 2 GB vGPU frame
buffer profiles occupy the total 96 GB frame buffer that is provided by six NVIDIA Tesla T4 GPUs
on a server node.
Figure 20 CPU utilization on GPU host

GPU usage
We gathered the GPU metrics from the vSphere Web Client. Six NVIDIA Tesla T4 GPUs were
configured on the GPU-enabled host. The GPU usage during the steady state phase across the six
GPUs averaged approximately 31.97 percent. The GPU A had a spike of 54.66 percent of CPU
utilization. GPUs were used for executing graphics-intensive tasks, thus taking load off CPUs and
providing a better user experience.

Dell EMC Ready Architectures for VDI 37


Solution Performance and Testing

Figure 21 GPU utilization of GPU host

Memory
As shown in the following figure, an active memory of 436 GB was recorded before the test
started. This was because all VMs were already powered on before user sessions were loaded.
Active memory remained constant during the login phase. With GPU enabled in the host we noted
active memory usage during the start of the test increased when compared to a test with a non-
GPU host. During the steady state phase, active memory remained almost constant and recorded
an average steady state active memory of 428.24 GB. This indicates that memory was not a
concern during the steady state phase and there was enough memory available in the ESXi host
cluster to meet requirements. Active memory utilization was reduced when users logged off from
their sessions. During the re-creation of instant clones, memory again remained constant around
428 GB. No memory ballooning or swapping occurred on any of the hosts during the testing
process, indicating no memory constraints in the cluster.

38 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Figure 22 Active memory utilization on GPU host

As shown in the following figure, a consumed memory of 453.86 GB was recorded before the
testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the login phase. During the
steady state phase, consumed memory reached an average of 445.82 GB on the GPU-enabled
host. With a total memory of 768 GB available per compute host, memory was not a constraint
during the testing.
Figure 23 Consumed memory utilization on GPU host

Network usage
Network bandwidth was not an issue during testing. The GPU host recorded an average network
usage of 981.23 Mbps. The busiest period for network usage was during the logging out phase. The

Dell EMC Ready Architectures for VDI 39


Solution Performance and Testing

host recorded peak network usage of 1335.3 Mbps. With two 25 GbE NICs in an active/active
team available as an uplink for hosts, network bandwidth usage was well under the 85 percent
threshold set for network throughput.
Figure 24 Network bandwidth usage on GPU host

IOPS
The cluster reached a maximum of 7,504 disk IOPS during the logging out phase and averaged
227.31 IOPS during the steady state phase. Based on these numbers, each user session generated
4.73 disk IOPS during the steady state phase. You can select your disk specifications in
accordance with this IOPS figure in your sizing exercise. As shown in Figure 28, I/O latency during
the steady state phase was 0.45 ms for the steady state IOPS requirement. The low latency figure
indicates that storage resources were not a bottleneck during steady state operations.
Figure 25 Cluster IOPS utilization

40 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Cluster controller IOPS reached a peak of 21,192 disk IOPS during the logging out phase. Cluster
controller IOPS recorded a steady state average of 6,480.78. These are metrics gathered directly
from Nutanix Controller VMs and give an indication of backend operations taking place in the
storage system. These IOPS metrics also include cache hits served by the memory.
Figure 26 Cluster Controller IOPS utilization

GPU host disk IOPS


The GPU host reached a maximum of 2755 disk IOPS during the logging out phase and averaged
165 disk IOPS during the steady state phase. Based on these numbers, each user session
generated 3.43 IOPS in the steady state phase.

Dell EMC Ready Architectures for VDI 41


Solution Performance and Testing

Figure 27 GPU host disk IOPS utilization

Storage I/O latency


The cluster latency reached a peak of 1.81 ms during the re-creation of instant clones. A peak in
latency was expected due to the creation of new swap files and difference disks required for the
instant clone VMs. These tasks are disk I/O intensive. Average cluster latency during the steady
state phase was 0.45 ms. This value is well below the pass/fail threshold of 20 ms set for storage
I/O latency. Overall during this testing, storage resources did not appear to be a bottleneck.
Figure 28 Cluster Latency

The GPU host latency reached a maximum of 1.57 ms during the boot storm and averaged 0.42 ms
during the steady state phase.

42 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Figure 29 GPU host latency

Login VSI: User Experience Summary


The following figures show that the user experience score did not reach the Login VSI maximum
for this test. When manually interacting with the sessions during the steady state phase, the
mouse and window movement was responsive and video playback was good. The baseline
performance of 1088 indicates that the user experience for this test run was good. The Index
average reached 1473, which was well below the threshold of 2088.
The baseline score for the Login VSI test was 1088. This score falls in the 800 through 1199 range
rated as "Good" by the Login VSI tool. For more information about Login VSI baseline ratings and
calculations, see this Login VSImax article. The Login VSI test was run for 48 user sessions for the
multimedia workload. The blue line in the following figure indicates that the system reached a
VSImax average score of 1473 when 48 sessions were loaded. This is well below the VSI threshold
score of 2088 set by the Login VSI tool. VSImax was never reached during the duration of the test,
which normally indicates a stable system and a better user experience. See Table 11 for an
explanation of the Login VSI metrics.

Dell EMC Ready Architectures for VDI 43


Solution Performance and Testing

Figure 30 Login VSI graph summary

We also noted that there were no failed sessions during testing. This indicates that the login and
logging out processes were smooth. When manually interacting with the sessions during the
steady state phase, the mouse and window movement was responsive and video playback was
good. Moreover, all the parameters we monitored were within the pass/fail threshold set out in
Table 5. This indicates that there were no resource constraints on the system and system
performance was good.
The following table shows the Login VSI score summary for the graphics multimedia worker
workload.

Table 14 Login VSI score summary for the graphics multimedia worker workload

VSIbase VSImax average VSImax threshold VSImax reached

1088 1473 2088 No

Graphics Power Worker, 96 vGPU users per host, ESXi 6.7, Horizon 7.7
In this graphics power worker test, one of the nodes in the cluster was GPU enabled. The host was
configured with 96 vGPU profiles and loaded with sessions. The other two nodes in the cluster did
not host any compute VMs. We used the VMware Horizon Blast Extreme protocol for the testing.
The following metrics were collected and analyzed.
CPU usage
The GPU-enabled compute host in the cluster was populated with 96 vGPU-enabled VMs and used
the NVIDIA Tesla T4-1B profile. With all user VMs powered on before starting the test, the CPU
usage was approximately 15 percent on the GPU-enabled compute host.

44 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

The following figure shows the CPU utilization metric data for 96 user sessions on the GPU-
enabled compute host in the cluster. During the login phase, CPU utilization increased steadily until
all logins were complete. The CPU reached a steady state average of 95.57 percent during the test
cycle when all users were logged in to the GPU-enabled compute host. Our standard threshold of
85 percent for average CPU utilization was relaxed for this testing to demonstrate the
performance when graphics resources are fully utilized (96 profiles per host). You might get a
better user experience by managing CPU at a threshold of 85 percent, by decreasing user density,
or by using a higher-binned CPU.
Figure 31 CPU utilization on GPU host

GPU usage
We gathered the GPU metrics from the vSphere Web Client. Six NVIDIA Tesla T4 GPUs were
configured on the GPU-enabled host. The GPU usage during the steady state phase across the six
GPUs averaged approximately 34 percent. The GPUs were used for executing graphics-intensive
tasks in the power workload, thus taking a load off CPUs and providing a better user experience
for graphics-intensive tasks.

Dell EMC Ready Architectures for VDI 45


Solution Performance and Testing

Figure 32 GPU utilization on GPU host

Memory
As shown in the following figure, an active memory of 424.72 GB was recorded before the test
started. This was because all VMs were already powered on before the loading of user sessions.
Active memory remained constant during the login phase. With GPU enabled in the host, we noted
active memory usage during the start of the test increased when compared to a test where GPUs
were not used. During the steady state phase, active memory remained almost constant and
recorded an average steady active memory of 425 GB. This indicates that memory was not a
concern during the steady state phase and there was enough memory available in the ESXi host
cluster to meet requirements. Active memory utilization reduced when users logged out of their
sessions and it reached around 59 GB for the GPU host. During the re-creation of instant clones,
memory again remained constant at around 424 GB. No memory ballooning or swapping occurred
on any of the hosts during the testing process, indicating no memory constraints in the cluster.

46 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Figure 33 Active memory utilization on GPU host

As shown in the following figure, a consumed memory of 450.24 GB was recorded before the
testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the login phase. During the
steady state phase, consumed memory reached an average of 450.35 GB on the GPU-enabled
host. With a total memory of 768 GB available per compute host, memory was not a constraint
during the testing.
Figure 34 Consumed memory utilization on GPU host

Dell EMC Ready Architectures for VDI 47


Solution Performance and Testing

Network usage
Network bandwidth was not an issue during testing. An average network usage of 654.42 Mbps
was recorded on the GPU host. The busiest period for network usage was during the re-creation of
instant clones phase. The host recorded peak network usage of 1,326.37 Mbps. With two 25 GbE
NICs in an active/active team available as an uplink for hosts, network bandwidth usage was well
under the 85 percent threshold set for network throughput.
Figure 35 Network bandwidth utilization on GPU host

IOPS
The cluster reached a maximum of 2,262 disk IOPS during the logging out phase and averaged
264.47 IOPS during the steady state phase. Based on these numbers, each user session generated
2.75 disk IOPS during the steady state phase. You can select your disk specifications in
accordance with this IOPS figure in your sizing exercise. As shown in Figure 36, I/O latency during
the steady state phase was 0.45 ms for the steady state IOPS requirement. The low latency figure
indicates that storage resources were not a bottleneck during steady state operations .

48 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Figure 36 Cluster IOPS utilization

Cluster controller IOPS reached a peak of 16,382 disk IOPS during the logging out phase. Cluster
controller IOPS recorded a steady state average of 1,354. These are metrics taken directly from
Nutanix Controller VMs and give an indication of backend operations taking place in the storage
system. These IOPS metrics also include cache hits served by the memory.

Dell EMC Ready Architectures for VDI 49


Solution Performance and Testing

Figure 37 Cluster controller IOPS utilization

GPU host disk IOPS


The GPU host reached a maximum of 2210 disk IOPS during the logging out phase and averaged
227.28 disk IOPS during the steady state phase. Based on these numbers, each user session
generated 5.5 IOPS in the steady state phase. You can select your disk specifications in
accordance with this IOPS figure in your sizing exercise.
Figure 38 Graphics host disk IOPS utilization

50 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Storage I/O latency


The cluster latency reached a peak of 1.9 ms during the re-creation of instant clones. A peak in
latency was expected due to the creation of the new swap files and difference disks required for
the instant clone VMs. These tasks are disk I/O intensive. Average cluster latency during the
steady state phase was 0.45 ms. This value is well below the pass/fail threshold of 20 ms set for
storage I/O latency. Overall during this testing, storage resources did not appear to be a
bottleneck.
Figure 39 Cluster latency

The GPU host latency reached a maximum of 2.83 ms during the re-creation of instant clones and
averaged 0.38 ms during the steady state phase.

Dell EMC Ready Architectures for VDI 51


Solution Performance and Testing

Figure 40 GPU host latency

Login VSI: User experience


The baseline score for the Login VSI test was 1153. This score falls in the 800 through 1199 range
rated as "Good" by the Login VSI tool. For more information about Login VSI baseline ratings and
calculations, see this Login VSImax article. The Login VSI test was run for 48 user sessions for the
multimedia workload. The blue line in the following figure indicates that the system reached a
VSImax average score of 1153 when 96 sessions were loaded. This is well below the VSI threshold
score of 2041 set by the Login VSI tool. VSImax was never reached during the duration of testing,
which normally indicates a stable system and a better user experience. See Table 11 for an
explanation of the Login VSI metrics.

52 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Figure 41 Login VSI graph summary

We also noted that there were no failed sessions during testing. This indicates that the login and
logging out processes were smooth. When manually interacting with the sessions during the
steady state phase, the mouse and window movement was responsive and video playback was
good. Moreover, all parameters we monitored were within the pass/fail threshold set out in Table
5. This indicates there were no resource constraints on the system and system performance was
good.
The following table shows the Login VSI score summary for the graphics power worker workload.

Table 15 Login VSI score summary for the graphics power worker workload

VSIbase VSImax average VSImax threshold VSImax reached

1153 2041 2153 No

RDSH Task Worker, 233 users per host, ESXi 6.7, Horizon 7.7
The following metrics were collected and analyzed for this test case.
CPU
The following figure shows the performance data for 700 Remote Desktop Session Host (RDSH)
user sessions across three compute hosts in a Nutanix cluster. The test was carried out with a
Login VSI task worker workload. Each compute host in the cluster was provisioned with six RDSH
VMs, which were installed with the Windows 2016 Server operating system. We used the VMware
Blast Extreme display protocol for this testing.
During the login phase, CPU utilization increased steadily until all logons were complete. During the
steady state phase, the CPU utilization reached a steady state average of 88.87 percent across all
three compute hosts. This value is close to the pass/fail threshold we set for average CPU

Dell EMC Ready Architectures for VDI 53


Solution Performance and Testing

utilization (see Table 5). However, it did not exceed the threshold limit, which includes a 5 percent
margin. To maintain a good EUE it is essential that this threshold is not exceeded. You can load
more user sessions while exceeding this threshold for CPU but this might result in a degradation of
the user experience.
CPU utilization started decreasing after the steady state phase when users began logging off from
sessions. CPU utilization reached near zero when all users had logged out.
Figure 42 CPU utilization on three hosts in the cluster

Memory
As shown in the following figure, an average consumed memory of 185 GB was recorded before
the testing started. This was because all VMs were already powered on before the loading of user
sessions. Memory consumption remained almost constant during the logon phase. During the
steady state phase, consumed memory reached an average of 190 GB across the three hosts.
Memory was not a constraint during the testing.

54 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Figure 43 Consumed memory utilization on three hosts in the cluster

Active memory usage increased steadily during the logon phase. Around 68 GB of active memory
was occupied by each host during the start of the test. This includes memory used by server VMs
which were powered on before the test and the overhead memory used by the hypervisor. During
the steady state phase an average active memory of 129 GB was recorded. This indicates that
during the steady state phase, memory was not a concern and there was enough memory available
in the cluster to meet requirements. Active memory utilization decreased to a minimum when users
logged off from their sessions.
Figure 44 Active memory utilization on three hosts in the cluster

Network usage
Network bandwidth was not an issue during testing. An average network usage of 850 Mbps was
recorded across the three compute hosts during the steady state operations. The busiest period
for network usage was during the logoff phase. Peak network usage of 1,402 Mbps was recorded

Dell EMC Ready Architectures for VDI 55


Solution Performance and Testing

by compute B during this phase. With two 25 GbE NICs configured as an uplink in an active/active
team, network bandwidth usage was well under the 85 percent threshold set for network
throughput.
Figure 45 Network bandwidth usage on three hosts in the cluster

IOPS
Cluster IOPS reached a peak of 1,478 during the logoff phase. Average cluster disk IOPS during
the steady state phase was 167 and peak IOPS recorded during this phase was 350. Based on
these numbers, the average disk IOPS per session during the steady state phase was 0.23 IOPS.
You can select your disk specifications in accordance with this IOPS figure in your sizing exercise.
As shown in Figure 47, I/O latency during the steady state phase was 0.43 ms for the steady state
IOPS requirement. The low latency figure indicates that storage was not a bottleneck during
steady state operations.
Figure 46 Cluster IOPS utilization

56 Dell EMC Ready Architectures for VDI


Solution Performance and Testing

Cluster controller IOPS reached a peak of 25,172 during the re-creation of instant clones. Cluster
controller IOPS recorded a steady state average of 4,899. These metrics were taken directly from
the Nutanix Controller VMs and give an indication of backend operations taking place in the
storage system. The IOPS metric also includes cache hits served by the memory.
Figure 47 Cluster Controller IOPS utilization

Storage I/O latency


The cluster latency reached a peak of 1.7 ms during the steady state phase. Average cluster
latency during the steady state phase was 0.43 ms. This value is well below the pass/fail threshold
of 20 ms set for storage I/O latency. Overall during this testing, storage resources did not seem to
be a bottleneck.
Figure 48 Cluster latency

Dell EMC Ready Architectures for VDI 57


Solution Performance and Testing

Login VSI: User experience summary


The baseline score for the Login VSI test was 643. This score falls in the 0 through 799 range rated
as "Very Good" by the Login VSI tool. For more information about Login VSI baseline ratings and
calculations, see this Login VSImax article. The Login VSI test was run for 700 user sessions for
the task worker workload. The blue line in the following figure indicates that the system reached a
VSImax average score of 1114 when 700 sessions were loaded. This is well below the VSI threshold
score of 1643 set by the Login VSI tool. VSImax was never reached during the duration of testing,
which normally indicates a stable system and a better user experience. See Table 11 for an
explanation of the Login VSI metrics.
Figure 49 Login VSI summary

We noted that there were only seven failed sessions during testing which is below the 2 percent
threshold we set for failed sessions. When manually interacting with the sessions during the steady
state phase, the mouse and window movement was responsive and video playback was good.
Moreover, all of the parameters we monitored were within the pass/fail threshold set out in Table
5. This indicates there were no resource constraints on the system and system performance was
good.
The following table shows the Login VSI score summary for the RDSH task worker workload.

Table 16 Login VSI score summary for the RDSH task worker workload

VSIbase VSImax average VSImax threshold VSImax Reached

643 1114 1643 No

58 Dell EMC Ready Architectures for VDI


CHAPTER 4
Conclusion

l Test results and density recommendations........................................................................... 60


l Summary...............................................................................................................................60

Dell EMC Ready Architectures for VDI 59


Conclusion

Test results and density recommendations


The recommended user densities for this testing are shown in the following table. The user
densities were achieved by following Nutanix best practices with Redundancy Factor 2 and Cache
deduplication enabled. All configurations were tested with Microsoft Windows 10 and Microsoft
Office 2016. We implemented all mitigations to patch the Spectre, Meltdown and L1TF
vulnerabilities at the hardware, firmware, and software levels to ensure an improved performance
impact, which is reflected in the achieved user densities.

Table 17 User density recommendations for VMware vSphere ESXi 6.7 with VMware Horizon

Server configuration Profile name Workload name User density

Density Optimized Knowledge worker Login VSI Knowledge 135


worker

Density Optimized Power worker Login VSI Power worker 106

Density Optimized + 6x Graphics Multimedia Login VSI Multimedia 48


NVIDIA Tesla T4 worker (Virtual PC: T4-1B) worker

Density Optimized + 6x Graphics Power worker Login VSI Power worker 96 a


NVIDIA Tesla T4 (Virtual PC: T4-1B)

Density Optimized RDSH Task worker Login VSI Task worker 233 (Horizon Apps RDSH/
Published Desktop)

a. The user density of 96 users was achieved at 95% CPU utilization. The CPU utilization threshold of 85% is relaxed
when testing with graphics cards. This test represents maximum utilization of the graphical resources available to
the system as well as full user concurrency. Ideally, in a production environment, you would decrease the user
density slightly or use higher bin processors to bring the CPU utilization closer to the 85% threshold. All LoginVSI
tests completed successfully without reaching VSI maximum, indicating that user experience was good.

All Login VSI tests were completed successfully without reaching the Login VSI maximum,
indicating that the user experience was good. Except for the Graphics Power worker profile, the
metrics for all other workloads were well within the thresholds that we set. You can get better user
densities by increasing the thresholds that we set—however, there might be a degradation in user
experience.
For additional resources on this topic, see the VMware documentation section.

Summary
The configurations for the XC Family devices—the XC740xd-24 and the XC640-10—are
optimized for performance intensive VDI workloads. We selected the memory and CPU
configurations that provides optimal performance. You can change these configurations to meet
your own requirements. Keep in mind that changing the memory and CPU configurations from
those that have been validated in this document will affect the user density per host.
In the Density Optimized configuration used in this testing we leveraged 2nd Generation Intel Xeon
Scalable processors (Cascade Lake) which have in-hardware mitigations for Spectre (variant 2),
Meltdown (variant 3), and L1 Terminal Fault side-channel methods. With mitigations in the
hardware, the new processors provide better performance and user densities than first-generation
Intel Xeon Scalable Processors (Skylake) or other previous generation processor-based VDI
systems, which still require software-level fixes to protect against side-channel vulnerabilities.
Vulnerabilities for which fixes are not available at hardware-level are mitigated through software-

60 Dell EMC Ready Architectures for VDI


Conclusion

level fixes. Cascade Lake processors also come with an improved architecture and higher thermal
efficiency that boosts the performance of the VDI system.
With the introduction of the six-channels-per-CPU requirement for Skylake and Cascade Lake, the
server memory configuration recommendation has increased from the previous guidance of 512 GB
to 768 GB. This change was necessary to ensure a balanced memory configuration and optimized
performance for your VDI solution. The additional memory is advantageous, considering the
resulting increase in operating system resource utilization and the enhanced experience for users
when they have access to additional memory allocations.

Dell EMC Ready Architectures for VDI 61


Conclusion

62 Dell EMC Ready Architectures for VDI


CHAPTER 5
References

This chapter presents the following topics:

l Dell EMC documentation.......................................................................................................64


l VMware documentation........................................................................................................ 64
l NVIDIA documentation.......................................................................................................... 64

Dell EMC Ready Architectures for VDI 63


References

Dell EMC documentation


The following Dell EMC documentation provides additional and relevant information. Access to
these documents depends on your login credentials. If you do not have access to a document,
contact your Dell EMC representative. Also see the VDI Info Hub for Ready Solutions for a
complete list of VDI resources.
l Dell EMC Virtual Desktop Infrastructure
l Dell EMC XC Series and XC Core Technical Resource Center
This document is part of the documentation set for this architecture, which includes the following:
l Dell EMC Ready Architectures for VDI: Designs for VMware Horizon on XC Family Design
Guide
l Dell EMC Ready Architectures for VDI: Designs for VMware Horizon on XC Family Deployment
Guide
l Dell EMC Ready Architectures for VDI: Designs for VMware Horizon on XC Family Validation
Guide

VMware documentation
The following VMware documentation provides additional and relevant information:
l VMware vSphere documentation
l VMware Horizon 7 documentation
l VMware Compatibility Guide
l Horizon 7 Enterprise Edition Reference Architecture
l Horizon 7 Enterprise Edition Multi-Site Reference Architecture
For additional information about advanced architectural considerations (for example, NUMA-
related topics):
l Best Practices for Published Applications and Desktops in VMware Horizon Apps and VMware
Horizon 7

NVIDIA documentation
The following NVIDIA documentation provides additional and relevant information:
l NVIDIA Virtual GPU Software Quick Start Guide

64 Dell EMC Ready Architectures for VDI

You might also like