0% found this document useful (0 votes)
318 views367 pages

VDI Design Guide Part 2

Uploaded by

Masterson Byorn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
318 views367 pages

VDI Design Guide Part 2

Uploaded by

Masterson Byorn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 367

VDI

Design Guide
Part II
Advanced Design Topics

The ultimate guide to help you design complex and unusual


End-User Computing use cases, based on the VMware
Digital Workspace solutions.

1 VDI Design Guide Part ||


VDI Design Guide Part II
Copyright © 2021 by Johan van Amersfoort

All rights reserved. No part of this book shall be reproduced,


stored in a retrieval system, or transmitted by any means,
electronic, mechanical, or otherwise, without written permission
from the publisher. No patent liability is assumed on the use of the
information contained herein. Although every precaution has been
taken in the preparation of this book, the publisher and author
assume no responsibility for errors or omissions. Neither is any
liability assumed for damages resulting from the use of the
information herein.

International Standard Book Number: ISBN 9798744069926

All terms mentioned in this book that are known to be trademarks


or service marks have been appropriately capitalized.

Use of a term in this book should not be regarded as affecting the


validity of any trademarks or service mark.

Version 1.0

2 VDI Design Guide Part ||


DISCLAIMER
This book represents an overview of how I think Virtual Desktop
Infrastructure (VDI) and End-User Computing (EUC) solutions
can be designed, tested and built. Everything that I have written in
this book is based on my own experience as an End-User
Computing Architect. It is based on the successful outcomes of
projects I worked on, proof of concepts I have built, and research I
have conducted.

No customer is the same and no existing or new infrastructure is


the same! This book and the examples in it should only be used as
a guideline to an approach and strategy for a project. Never copy
the metric data for your own projects.

3 VDI Design Guide Part ||


ABOUT THE AUTHOR
My name is Johan van Amersfoort, and I am a Dutch, passionate,
VDI enthusiast. Since early 2014, I have been part of the team at
ITQ Consultancy (https://itq.eu).

Currently, I am working as a Technologist in


the field of End-User Computing (EUC) and
Artificial Intelligence (AI). In 2018, I released
the first edition of the VDI Design Guide. It was
the best experience ever! The process of writing was a great way of
putting all I knew about VDI Design on paper, but also the perfect
way to help me develop my writing skills, and obviously gaining
new knowledge on topics I wanted to know more about.

Further developing myself and sharing knowledge wasn’t the only


reason why I started my big project back then. For many years, my
wife and I had a major wish which we couldn’t seem to get
fulfilled. Although we were really happy and had a great life
together, our family wasn’t complete. Without going into a lot of
detail, the last 16 months of that process were particularly rough. I
wanted to be at home as much as possible, to be there for my wife,
but still wanted to work on a project which energized me as well.
And that’s what happened. In summer 2018, the VDI Design
Guide was released and in February 2019 our beautiful daughter,
Malin, was born. ☺

Between February 2018 and summer 2021, the world didn’t stop
turning. I got the ability to work on some more great customer
projects and got to meet awesome people. All the new lessons I
learned have been bundled in the book you are reading right now.

If you would like to get in touch or follow me, check my twitter


profile @vHojan

4 VDI Design Guide Part ||


DISCLAIMER ......................................................................3
ABOUT THE AUTHOR .........................................................4
FOREWORD.......................................................................8
PREFACE .........................................................................10
CONTRIBUTORS .............................................................. 11
AGE ROSKAM ............................................................................ 13
ACKNOWLEDGEMENTS ............................................................. 14
WHY THIS BOOK? ............................................................ 17
THE PAST, PRESENT, AND FUTURE ...................................22
THE UNTOLD HISTORY OF VMWARE HORIZON ........................ 23
INTERVIEW WITH SPENCER PITTS ............................................. 23
THE CURRENT STATE OF VDI ..................................................... 38
THE FUTURE OF VDI .................................................................. 40
THE SOFT SIDE OF EUC.....................................................43
OUTCOME-BASED APPROACH .................................................. 44
THE JOURNEY ............................................................................ 51
A SMALL STEP FOR IT… ............................................................. 54
THE “NEW” END USER .............................................................. 57
INTERVIEW WITH ROBERT HELLINGS ....................................... 60
THE VMWARE END-USER COMPUTING FAMILY ................67
WHAT IS VMWARE WORKSPACE ONE? .................................... 69
VMWARE WORKSPACE ONE ACCESS ........................................ 72
VMWARE WORKSPACE ONE UNIFIED ENDPOINT
MANAGEMENT ......................................................................... 81
VMWARE WORKSPACE ONE INTELLIGENCE ............................. 91
VMWARE CARBON BLACK......................................................... 97
HARDWARE INNOVATIONS ........................................... 111
THE IMPORTANCE OF HOST DESIGN ...................................... 114
CPUS ........................................................................................ 116
INTERVIEW WITH FRANK DENNEMAN ................................... 129
GPUS ....................................................................................... 142
NETWORKING ......................................................................... 152

5 VDI Design Guide Part ||


INTRODUCTION TO USE CASES AND INDUSTRIES ........... 154
MEDIA DESIGNERS AND EDITORS .................................. 157
PILOT AND CHANGE APPROACH .............................................159
RESULTS ...................................................................................174
CONSIDERATIONS ....................................................................175
INTERVIEW WITH ANIRBAN CHAKRABORTY ...........................175
THE HEALTHCARE WORKPLACE ..................................... 180
WHY ARE IT PROJECTS FOR HEALTHCARE DIFFERENT? ..........181
HOW DOES HEALTHCARE FIT IN THE FULL EUC PICTURE?......186
DESIGN CONSIDERATIONS ......................................................203
INTERVIEW WITH HUIB DIJKSTRA ...........................................208
VDI BY DAY, COMPUTE BY NIGHT .................................. 214
THE IMPORTANCE OF ACTUALLY TALKING TO END USERS .....215
NUMBERS TELL THE TALE ........................................................218
DESIGNING THE ARCHITECTURE .............................................221
DESIGN CONSIDERATIONS ......................................................227
INTERVIEW WITH TONY FOSTER .............................................238
DATA SCIENCE ON VDI .................................................. 244
DATA SCIENTISTS != DEVELOPERS ...........................................247
DESIGN CONSIDERATIONS ......................................................251
HARDWARE CONSIDERATIONS ...............................................253
INTERVIEW WITH JUSTIN MURRAY .........................................257
GAMING ON VDI (REALLY?!?) ........................................ 265
DESIGN AND BUILD CONSIDERATIONS ...................................277
THE FUTURE OF GAMING ON VDI ...........................................278
INTERVIEW WITH CHRISTIAN REILLY .......................................281
INTERVIEW WITH SCOTT FOREHAND AND JASON SOVA ........287
VIRTUAL REALITY .......................................................... 294
THE DIGITAL WORKSPACE VR LANDING ZONE........................298
UNIFIED ENDPOINT MANAGEMENT .......................................299
VR APPLICATION REMOTING...................................................300
DESIGN CONSIDERATIONS ......................................................306
INTERVIEW WITH MATT COPPINGER ......................................311

6 VDI Design Guide Part ||


THE IDEAL VMWARE EUC HOME LAB ............................. 320
WHY A HOME LAB? ................................................................. 321
DESIGNING YOUR EUC LAB ..................................................... 323
INVESTING IN HARDWARE ...................................................... 327
LICENSES ................................................................................. 342
LAB SERVICES .......................................................................... 344
MY CURRENT LAB.................................................................... 348
INTERVIEW WITH WILLIAM LAM ............................................ 351
CONCLUSION ................................................................ 358
INDEX ........................................................................... 359
BIO ............................................................................... 361

7 VDI Design Guide Part ||


FOREWORD
I think Johan’s first VDI book from 2018, VDI Design Guide, was
the first book on VDI I read where I said to myself, “Yes! Yes! Yes!
Awesome!” after almost every page. (The last VDI book where I
almost felt that way was my own 2012 book, The VDI Delusion,
written back when I was running BrianMadden.com and hosting
the BriForum conferences.) So, when Johan reached out to me in
2021 letting me know that he was writing a Part II, I was giddy
with geeky excitement. I literally told him to send me a copy then
and there, I didn’t even care if it was done yet!

Even though Johan and I have both been in the EUC industry since
the late nineties, (as he likes to say), I didn’t actually meet him
until after I joined VMware’s EUC Office of the CTO in 2018. I had
taken an 18-month sabbatical in between BrianMadden.com and
VMware, and when I first traveled to The Netherlands, I met this
super tall, super awesome, bearded VDI geek who had worked
with VMware and NVIDIA to build an F1 racing simulator rig that
was completely running via VDI and the Blast remoting protocol!
(Johan talks more about that project in this book. It’s dope!)

I wondered who this guy was, why he embarked on this crazy


cool-yet-pointless project, and, most importantly, if we could be
friends.

It was then when he told me about his soon-to-be-released first


book, which I read and loved (as mentioned), especially since I’d
been living in a van for the past year and needed a quick “catch
up” for all the nerdy technical things happening in the VDI
industry.

Fast-forward to today. Even though both Johan and I talk about


how VDI is just a small part of a complete EUC strategy, we both
also recognize that a lot has changed since his first book (thanks,
COVID) and one could argue that VDI is even more important
today than it was in 2018. Viewed through that lens, this book
you’re about to read, Part II of Johan’s VDI Design Guide, is even
more important and timely than Part I.

8 VDI Design Guide Part ||


To me, this book is the next-best thing to sitting with Johan in a bar
and getting a brain dump of what it takes to be successful with
VDI today. Johan claims this book is about “advanced design
topics”, which is certainly true, but don’t think that means this
book is only technical or only for geeks. In this book, Johan walks
through how VDI should be considered in today’s distributed
world, and what it takes to actually succeed with a VDI
implementation. Sure, the technical design must be solid, but most
VDI projects that fail do so due to non-technical reasons, which
Johan helps you navigate in these pages.

Johan also talks about recent hardware innovations, (we’re all


using GPUs in VDI now, right?) as well as walking through many
of the emerging, previously “no-no” use cases for VDI, like data
science, artificial intelligence, gaming, and virtual reality(!).

Throughout the book, he’s sprinkled in interviews with lots of


industry experts—it really feels like people are sidling up to you in
the bar, grabbing a beer, and sharing a cool story from the
trenches. I asked why he didn’t interview me. He said it’s because
I get to write the forward foreword. Fine, but I expect an interview
in Part 3! Hopefully by then I will be interning for him learning
and watching him build VDI for Mars or whatever he’s thinking
about next.

Until then, I’m excited for you to read this book and up your VDI
game like I have!

Brian Madden, June 2021


Distinguished Technologist, EUC Office of the CTO, VMware
Author of “The VDI Delusion” and many other books
Creator of BrianMadden.com and BriForum

9 VDI Design Guide Part ||


PREFACE
The road towards a finished End-User Computing (EUC) project
isn’t always paved. Sometimes, it might even not be paved at all.
In my 20+ years of working in EUC, the number of perfectly paved
projects can probably be counted on two fingers. A perfect EUC
project is a bit like a rainbow-shitting unicorn. The reason is
simple. EUC is probably one of the most fluctuating IT concepts.
End users or consumers of the platforms we design are constantly
changing. They’re changing at such a rapid pace that today’s
platform might be old and unusable tomorrow. Now, we know
that change is the only constant. While change itself has always been
around, the rate and speed of those changes has heavily increased.
And please don’t forget about the changing demand of your
business.

Changes can be difficult. Not everyone copes with changes in the


same way. To be honest, I love changes! Changes enable you to
broaden your horizon. Changes help you to step out of your
comfort zone. I strongly believe that life starts where the comfort
zone ends. This book is dedicated to help you in taking your own
steps out of your comfort zone and show you that change in EUC
is a good thing!

As mentioned, the road towards a finished EUC project isn’t


always paved. The thing is, does it need to be? Why not look at
projects with an open mind and regularly think out of the box.
Accept the fact that things will be different than expected up-front,
and fully commit yourself to a famous quote by Doc Brown:

“Roads, where we’re going, we don’t need roads”

Enjoy reading!

Cheers,
Johan

10 VDI Design Guide Part ||


CONTRIBUTORS
“Talent wins games, but teamwork and intelligence win
championships.” A famous quote from Michael Jordan, in my
opinion one of the best and most inspiring professional sports
players ever. The way he played his games, and also motivated his
teams, helped the Chicago Bulls and the USA Dream Team to be
the best in what they did. I think of writing a book as a
championship as well. Where the first book took me over a year to
finish, this book took me well over 9 months. During those 9
months, it regularly happened that I lost inspiration, got stuck in
another rabbit hole, or just needed help in solving things. Having a
great team of people around you who are always willing to help is
one of the best feelings when working on such a big project. Where
for the first book, I basically spent all the time I could take in my
office on writing, for this book that became a bit more difficult. I
love spending time with my family, which is the number one
priority in my private life. I like to stay healthy as well, which
means I plan my workouts in the local CrossFit box, 2 or 3 times a
week. In the remaining time I got left during weekdays and in the
weekend, I blocked a couple of 2-hour sessions to work on the
book. It helped me to maintain discipline and made sure I could
keep my deadlines. Now, one of the worst things that could
happen when I wanted to start a writing session, is that you are
fully in a writer’s block. A special thanks goes out to Frank
Denneman, Maarten Caus, and Hans Jaspers, who helped me in
finding inspiration, create certain angles, or take away blind spots,
when I got stuck.

Another special thank you goes out to Tobias Kreidl. Besides being
a great community friend, Tobias is my “grammar and spelling
conscience.” I’m certainly all besides an English native speaker,
and where Grammarly sometimes has a hard job in recommending
the right tone of voice or just the proper word, Tobias has helped
me a lot!

Finally, and I would like to hand him his own podium, Age
Roskam is a great colleague and someone who reminds me of
myself. Age has joined ITQ during the pandemic as an EUC

11 VDI Design Guide Part ||


consultant and wanted to take his career to the next level. He
shares the same drive and ambition to be the best in what you do.
And not only that, but he also loves to share his knowledge with
the community as well. Age is a true specialist in the field of
Intrinsic Security and has covered the sections about Carbon Black
and Workspace ONE Intelligence.

12 VDI Design Guide Part ||


AGE ROSKAM
My name is Age Roskam. I work as a consultant at ITQ with a
focus on End-User Computing and Security. Also, I’m part of the
vExpert EUC and Security sub-programs. Since 2007 I’m working
in IT, and like most of us, I started my career at a ServiceDesk. My
interest in virtualization technology started when I was working
on a project to set up a new ServiceDesk while the organization
also did a new workspace project based on Citrix, App-V, and
Immidio Flex+.

Intrigued by application virtualization, I decided it was time for


me to leave the ServiceDesk. So, I joined Login Consultants and
started as an application packager. Working my way up to become
a consultant. I came in contact with App Volumes, and that’s
where things started to go fast for me. Exited as I am, I made the
product my own and started to specialize in the Horizon suite. At
that same time, I started to write blogs and present at events to
share my knowledge and experience. But there comes a moment in
your career, and I think everybody can agree, that the urge for
something new rises. For me, that moment was on May 1st, 2020.
We were in the middle of the pandemic, and I was looking a bit
back on my career, the very next moment I saw Johan tweeting
about his ITQ rollercoaster adventure and ITQ had a seat open. A
couple of months later I’m riding that same rollercoaster and I
without a doubt:

Best. Decision. Ever.

Since joining ITQ, I've shifted my focus towards Workspace ONE


and Carbon Black, which you will read more about later on in this
book.

13 VDI Design Guide Part ||


To close I’d like to thank Johan for giving me the opportunity to
contribute to his book, and I hope you enjoy
reading it as much as I did writing!

You can find me on twitter: @AgeRoskam


and on my blog: https://ageroskam.nl

ACKNOWLEDGEMENTS
Chasing dreams and setting goals is what gets me energized. But,
dreaming and setting goals is one thing, keeping yourself
motivated to make those dreams happen is another. One essential
quality which I always thought you need to achieve those goals, is
discipline. I always thought that discipline is something you have,
or you don’t. And, for a long, long time, I thought I lacked a bit of
discipline. What I learned from my peers at ITQ (and a little bit
from Yoda), is that there’s no such thing as discipline. It all has to
do with planning and sticking to that planning. Prioritize the
things you do and ask yourself if they are more important than
locking yourself up in your office and work on your goal. Don’t
get me wrong, I also like to put meat on my grill, drink a nice glass
of Italian wine with friends, and meet up with family. But writing
this sequel sometimes had a higher priority. This lesson (among
other lessons) that I learned from Francisco Perez van der Oord
(one of my mentors at ITQ) is what I am forever grateful for.
Another special thank you, goes out to Bertwin Oudenampsen.
One of the best managers I’ve ever had the pleasure to work with
and someone who fully supported my journey to get this book
done. Just another reason to join the ITQ family if you like to take
your own career to a next level. ☺

Next to my ITQ peers, I like to thank some people from EUC


community, VMware, Intel, and NVIDIA as well. One thing I
really liked about the VDI Design Guide, was the interview section
at the end of a couple of chapters. I decided for this book to end
every chapter with an interview. I’d like to thank the following
people for these interviews:

14 VDI Design Guide Part ||


• Spencer Pitts • Christian Reilly
• Robert Hellings • Scott Forehand
• Frank Denneman • Jason Sova
• Anirban Chakraborty • Matt Coppinger
• Huib Dijkstra • William Lam
• Tony Foster • Justin Murray

Being able to read what’s in a book is quite handy, but reading


something which isn’t technically wrong, is pretty important as
well. The following list of people helped a lot in getting the right
content out:

• Tom Schoemaker • Kees Baggerman


• Jesper Alberts • Jeremy Wellner
• Matt Heldstab • Chris Kerker
• Ivan de Mes • Jits Langedijk

One thing which was really cool after the first book was launched,
was the fact that people could buy it at the VMworld bookstore
during both the US and EU events. My brother, Rick van
Amersfoort designed the cover for the first book, and because of
the awesome work he did, it was an absolute no-brainer to ask him
to design this cover as well. I think he took the original design to
the next level and hid some easter eggs in it which you might find
if you look closely. Don’t hesitate to drop me a message on twitter
if you find one. ☺

Finally, I would like to thank one person in particular. Someone


who is an inspiration, a great friend, and never hesitates to be a bit
“Dutch” to say I’m wrong. Brian Madden inspired me to write my
first book and continued to do so with this book. He has a same
passion for storytelling, nerding out, and EUC. There was
absolutely no question who the right person was to ask to write
the foreword in this book and I am truly honored he said yes.

Now, enough with all the introductory stuff, let’s dive into the
world of EUC

15 VDI Design Guide Part ||


16 VDI Design Guide Part ||
WHY THIS BOOK?
Let me start by saying that it’s quite hard to finish writing a book.
Not because of time, but because it’s difficult to make a cut and
decide which sections will make it to the final release and which
don’t. I had a couple of additional ideas that never made it to the
final release of this book but gave me a ton of inspiration for this
book.

The VDI Design Guide had a couple of goals. I have spent most of
my career installing, configuring, managing, designing,
developing, supporting, updating, upgrading, and quite often
breaking, solutions that were used (and sometimes still are used)
with a general purpose: empower an end-user to do their work in
the most efficient way. That’s also what my view is on end-user
computing (EUC).

The first goal of the book was to help people interested in a


particular EUC solution to design that solution in the best possible
way. As an architect, I quite often get to work on projects that have
failed before because of the lack of a proper design. I always like to
use the analogy of building a house. If you are building a new

17 VDI Design Guide Part ||


house, would you do so without properly designing it up front?
Do you know what kind of soil you are building on? How many
hours of sun will your house be in? From what angle? What about
the wind or possible storms or worse: hurricanes? If you don’t
know that information up front, how are you going to prepare for
those kinds of situations? Designing before building is essential in
the road to a successful EUC project.

The second goal of the first book was to help increase the number
of VMware Certified Design Experts in the Desktop & Mobility
certification track (VCDX-DTM). I was the fourteenth who
achieved the certification in 2016 and wanted to increase that
number by helping architects to understand the VCDX design
methodology. Although it did help a bit (there are 19 certified
VMware EUC architects as of summer 2021), we aren’t there yet.
The first book has sold thousands of copies, which means that a lot
of people are now familiar with the design methodology and that
might even be more valuable. Of course, I still hope the number of
certified architects will increase, but time will tell.

The third and final goal was to show the maturity of the Virtual
Desktop Infrastructure technology. I believe that goal has been
achieved, as well. I have seen so many different use cases that
were able to successfully run on a VDI without any issues or
negative User Experience (UX). This made me realize that although
it shouldn’t be a goal of a project, you could design a VDI for all of
your use cases.

For you as an EUC enthusiast, I have a nice surprise. By scanning


the following QR code, you will be able to download the first VDI
Design Guide, completely for free! A special thanks goes out to my
employer ITQ for making this possible!

18 VDI Design Guide Part ||


This brings me to the goals of this book.

In March 2020, The Netherlands went into lockdown because of


the COVID-19 pandemic. The pandemic was a horrific tragedy in
which many have lost their lives. People working in healthcare
experienced a nightmare. There was no cure or possible treatment
known, and many more patients came into an ICU than there were
beds available. Doctors and nurses felt hopeless since no one was
in control over what was happening. As a result, the entire country
(like many other countries) went into what the Dutch government
called “a smart lockdown.” Schools closed, restaurants and bars
closed, certain shops closed, many offices closed, as well. We were
allowed to get essential groceries, see a physician, and go out to
walk the dog. But that was it.
Working from home suddenly became a standard for anyone who
could because of their profession. And that’s when the EUC
principles as we knew them also changed.

In the first four weeks of the pandemic, IT teams did what they
could to answer to a sudden demand of remote workers. All
resources, devices, old hardware, etc. were used to enable
organizations to continue their critical business processes, but with
a new form of flexibility in mind. IT people, and EUC specialists in
particular, were the corporate heroes of a new era in IT.

The first goal of this book is to inform the business of new


requirements to take into account that can have a huge impact on
infrastructure design. Availability, recoverability, performance,
scalability, manageability, and usability are all examples of design
qualities that may get a different set of requirements because of
sudden events such as a pandemic, but also to support the new
remote worker (which is a great segue to the next goal).

I strongly believe the end user has changed. This “new end user”
and the expected UX have probably changed, as well. While they
might have been used to get a Lamborghini to do their work with,
they were forced to use a Fiat, instead, during the pandemic. And
it probably worked well enough for the essential business
processes to continue. I have conducted many interviews with
people in all sorts of organizations to understand how their

19 VDI Design Guide Part ||


acceptance level has changed and how creative they became to
continue their work in such a situation. Throughout the different
chapters, I will share many of those findings.

The second goal is to tell you about a variety of use cases. On the
previous page, I explained the increased number of use cases that
are capable of running successfully on a VDI. I will dive into those
use cases, as well, and share my experiences, design
considerations, best practices and new/creative ideas to get the
most out of your own VDI projects. Gamers, data scientists, and
video editors are some of those examples. The nature of their work
in many cases demanded a physical PC because of the resources,
operating systems, or applications. The maturity level has again
increased and enables those complex use cases to be virtualized.
As you may know, organizations like Microsoft and Google have
had a huge benefit from the pandemic. They were practically
giving away Teams and Hangouts to enable organizations to
collaborate while people were working from home. And this led to
another shit storm. Many companies invested in VDI but didn’t
see a benefit of including Graphical Processing Units (GPUs) in
their infrastructure. With environments that have never been
designed to run video conferencing tools and every employee
heavily depending on them, a sudden demand for either
offloading of those tools or GPUs was introduced. These “new”
requirements will for sure be covered. I say new, because we as
EUC specialists already knew this day was coming, just not that
soon.

Also, a lot has happened since 2018. VMware Workspace ONE has
been thriving and has been part of nearly every VMware Horizon
project I’ve been part of. VMware also acquired Carbon Black and
heavily invested in their security stack. From an NVIDIA side of
things, they (NVIDIA) completed their plans to acquire Mellanox
Technologies in April 2020 and are busy with their intended
acquisition of ARM. There’s a lot more happening, but I’m quite
sure we could cover the rest of the page with the stuff I’m
forgetting. The third goal is to inform you about these great
developments.

20 VDI Design Guide Part ||


Last but not least, the hardware space hasn’t been sleeping, either.
Both Intel and AMD introduced new CPUs that bring a lot more
resources to the table. But also, components like storage and
networking underwent a lot of development with the goal of
lowering the TCO of your platform. Intel is also developing their
datacenter GPU offerings, but as of this writing they aren’t
available yet. AMD, on the other hand, decided to build datacenter
GPUs for the public cloud, only, so we’ll be seeing less AMD in the
on-prem offerings.

As you can read, the current world of EUC is in many ways a lot
different from what it was before. We had a lot of fun declaring
every year to be the Year of VDI, and I don’t think any year should
be a Year of VDI, but the importance of a well-designed EUC
solution like a VDI has never been so big.

21 VDI Design Guide Part ||


THE PAST, PRESENT,
AND FUTURE
As you might be able to guess, the majority of content in this book
is not only focusing on the current state of VDI, but also a bit on
the (near) future. In this case, future means to design an
infrastructure which is capable of satisfying your business needs of
today, but also for the next four to five years. Because that’s when
the average lifecycle of an underlying infrastructure ends, and new
projects will be started again. I know that it’s impossible to predict
what will happen over the next five years, but smart choices made
now can save you a lot of money or complaining end users in the
future. VDI is also quite old. In 2021, the VDI technology turned 14
years old (as VMware created the first VDI solution in 2007). Let’s
focus a bit on the history first.

22 VDI Design Guide Part ||


THE UNTOLD HISTORY OF VMWARE

HORIZON
I still know from the early days of virtualization that we created
our own hardware designs and had to incorporate things like
growth. Hyperconverged Infrastructures with linear scale didn’t
exist so we had to think about the ideal sizing and ratio between
the different components. RAM and CPUs were quite easy to size,
but when looking at networking, storage and scale of building
blocks, it became a lot more complex. Now, I have had my fair
share of PEBKAC (Problem Exists Between Keyboard And Chair)
incidents and learned a lot from them. I think this is also where
most of the inspiration for this book came from. Although I believe
I’m starting to become a prehistoric mammal (I started my own
EUC career in October 1999, which I always refer to as the late
90s), there are always people in the EUC space who can be
considered a dinosaur. One of those people is my good friend
Spencer Pitts. Spencer has been part of VMware for many, many
years and spent pretty much all of that time in VMware’s EUC
business units. He started in VMware’s professional services
organization, and through different other roles currently is a Chief
Technologist for the Digital Workspace at VMware, covering
EMEA. In the past years, Spence and I had quite some good
conversations on VDI, and I always enjoyed his anecdotes about
the early days. I thought it was a good idea to share those
anecdotes with the rest of the world, so here goes.

INTERVIEW WITH SPENCER PITTS


Me: What does end-user computing mean to you?

Spencer: Well, two things, actually. So, if I separate out work


versus home, I think that's important. Because if I think end-user

23 VDI Design Guide Part ||


computing from a work-perspective to me, is very specific around
how we speak to our businesses and customers, right. It's an end-
user computing department who provides a service out to the end-
users. So. if I think end-user computing from business. It is very
much, IT and a particular department who gives out devices and
gives access and services out. Now, if I look at my personal side of
things, to me, I’ve got every device under the sun (and probably
too many if you asked my wife). It's basically anything that I get
access to what I need to do to and probably more time than I
should spend, but to getting emails, going on the web, and buying
stuff (which I seem to do a lot of at the moment). For me, end-user
computing at home is my Apple devices or my Android devices,
so it's different. Which, maybe you could argue, you shouldn't. But
actually, for me, they are two different things. It's all of my home
kit and I also think of myself as my own end-user computing
department to my family. I provide the service and the one that
fixes Wi-Fi when it goes dead. I’m always thinking about the latest
new gadget, probably because I'm a geek, but at the same time I
can then make sure their services get better. And even though I
don't get paid for it and I didn't get any budget, what I don't get is
any hassle from them when it doesn't work.

Me: I like the fact that you talk about the home side of things,
especially now that a lot of people are mainly working from home.
Like you mentioned, you don’t want any hassle when stuff breaks.
Automation became pretty important, also in end-user computing.
What have you seen changing in that space?

Spencer: Yeah, so it's interesting there because if I go back 15 years


ago, a lot of people liked to spend their time in front of a console
and twiddling knobs and twisting things. And that was the
concept of being able to see everything. IT was very focused on
seeing everything. They wanted to understand everything that's
going on. The result was that IT was primarily IT-centric in the
early days. I think if you limited the scope, you could still look at
stuff quickly and easily from single console, but I think because
end-user computing becomes so vast and there's so many different
devices so many different user profiles and use cases. You still
want to be in control and don’t want to look at every single
console under the sun, so you need to automate stuff.

24 VDI Design Guide Part ||


And let's be honest, as well. There are certain things that just
happen that really annoy you. When something happens regularly
enough, you think, “Well, why don't I just put some automation in
there to get rid of it quickly?” And I mean, that's the way that end-
user computing is going is you need to be a lot better to
automating stuff. When you have everyday mundane tasks, rather
than you go into a console and clicking on that, you should be
automating that. And that kind of concept of infrastructure as code
as well. But I think seeing the emergence of a lot of low-code/no-
code type consoles (like Freestyle Orchestrator in Workspace
ONE) and how we can do things as you know, will surely
accelerate automation in end-user computing. Which is kind of
weird for us because we're still very hands on from servers and
stuff. But actually, and we touched on this during VMworld 2020
in a session on the future of VDI, as time goes on, you're not going
to be so close to that. So, you need to have more of an
encompassing view. So therefore, you're going to be detached
from the underlying infrastructure and trust that that stuff
underneath is going to give you what you need, like CPUs,
storage, and memory. And you're not going to be so attached to it.
You don't need to care so much about the DIMM chip or around
wherever it's DDR three or four right in the same way that we did
before.

Me: It's a bit like the Pets versus Cattle analogy, right?

Spencer: Yeah, don't need to know about that so much. It isn’t


important anymore to name your servers. Rather just give them
numbers, like chickens, as Joe Baguley (CTO at VMware) used to
say, a long time ago.

Me: Now let's talk about VMware. You've been part of VMware
for like a gazillion years, but what was VMware to you prior to
moment they jumped into the end-user computing space?

Spencer: My first exposure to VMware was when I used to work


for another company, a little small startup called ON Technology,
and although and it wasn't really a start up, you could argue it was
with about 250 people. It did PC Lifecycle Management solutions.

25 VDI Design Guide Part ||


And I used to use VMware Workstation back in the day, so I could
demo building a VM. Before that I used to carry around a PC,
which was part of a demo kit. I had a 100 MB switch, a laptop
which was a 2000 server and the software on there. The demo
would be that I'd get this brand-new machine, like a Dell OptiPlex
or whatever it was, and I build it in front of the customer from
scratch. I’d do a PXE boot, and I'd flatten it, right. That was great.
But then one day I attended one and at the start of the demo, dust
must have got into PSU, and it just blew up in front of the
customer. Literally purple smoke. I did a remote wake up and it
went BANG! And I just was… Okay... Right... And the customer
asked, “What do we do now?” Well, not that not much because
that's my machine and don't carry two of them around…

And in the early days, people didn't trust it unless it was a PC.
You'd have a laptop, but it was still at that time when there was
more PCs than laptops. Laptops were still a bit of a status symbol
for a lot of end users at the time rather than the actual PC device.
So, having a PC and plugging into a monitor, or a projector as well
(which I also used to watch football on when I got home). So, my
initial first kind of exposure to VMware is that I didn't want to
keep carrying around the PC. So, I used VMs. I'd use VMware
Workstation in a very, very, very early film, and I could use that
for testing, testing my images and everything. However, when I
used to show customers, they were like, “Hold on, so you're
building something virtual, what's this virtual machine thing?”
And some people didn't get it, right. So, it worked for me, and it
was great, but it took a while until customers got virtual machines.

The company I worked for developed software used to build


servers and eventually it got bought by Symantec and Symantec
then bought Altiris. And you may know that Altiris had some
Server Management Capabilities as well. Right. So, with this ON
Technology stuff, which eventually got acquired, you could build
servers. And then somebody came up to me and asked me if I
could also build ESX. I didn’t know what he was talking about.
I knew about the VMware Workstation, but I didn't know that the
other stuff. I took a dive in it and thought, Yeah, why not? It's kind
of like a hardened Linux thing. So, I gave it a go. And then I
looked into it and that's when I got interested.

26 VDI Design Guide Part ||


A while later I left Symantec and one of the sales guys I worked
with before, reached out and said, “Hey, I'm working for VMware,
come work for us!” I got dragged in, because he was like, “Yeah,
you already like it and we used to work together.” And I'll be
honest with you, I moved over originally because I knew him, and
I trusted him. More so than the technology. When I got in, the first
thing I did when I joined VMware was, I went out and bought
some Dell servers. I’ve always been a home lab person. Learning a
technology for me means to understand it, I built it. I can't talk
about if I don't understand how it works. Also, how is anybody
going to understand when I speak to them about it if I don’t
understand it? So that's what I did. That was 2007.

Me: Nice. So, you moved over to VMware in 2007 you just
mentioned. Virtual Desktop Manager (VDM, which was released
prior to VMware View) was released in 2008. How was that year
prior to that initial VDI release?

Spencer: So that was interesting. So, you all know Matt


Coppinger, obviously. He technically joined the company at that
time, but I haven't met him then. When I first joined VMware, I
was a public sector SE, and I was just going out there and getting
interested in the server stuff and I didn't really know about VDI or
anything. There was this guy called Richard Stinton and he was
the original SME for what was about to become VDI. He came over
and had this use case for a customer who had some remote
developers sitting in Southeast Asia. And they didn't want to ship
PCs over there. They already used RDP and they said, we're going
to put Windows XP onto ESX, and they came to us. They said,
“Does it work, and can you support it?” Well, we didn’t really
know. We never really tried that. And they said, “Well, actually,
let's put it another way, we have run it on there and it works. We
just want to know whether you support it.” The interesting thing
at that point was that somebody had written a custom broker (a
guy called Spencer Critchlow). And I always used to joke at the
time that Spencer was the original founder of VDI at VMware, but
then I just didn't tell him that it was a different Spencer. And I just
let everybody figure I was mister VDI. Spencer Critchlow was a

27 VDI Design Guide Part ||


professional services consultant who wrote that original broker,
because there was no VDM.

Me: So, I heard rumors that the original broker was kind of based
on VB script and an Excel sheet as a database. Is that rumor true?
Spencer: Right, so the short answer is yes. But actually, what
happened with that use case is they started running it. And they
had an Excel spreadsheet that had some IP addresses in it. They
basically just assigned an IP address to a user. They connect to the
IP address with that username and password. This worked for a
while, they started getting problems when they got to about 150
users. I think it was at the time where people were forgetting IP
addresses, and nobody was updating the master spreadsheet.
People started connected into wrong machines and they got
annoyed. Remember that we didn’t have any non-persistency
included yet.

So, what then happened, is that Spencer got involved. He then


wrote some custom scripting to do exactly what the customer
wanted. Which is: a user enters their username and password. The
scripting then recognized they username and directed the session
to this particular IP address. And the spreadsheet was that, so
rather than the user working in a spreadsheet, it was now just the
IT department that had to maintain the spreadsheet, instead. The
user didn't have to get involved and it was a very crudest form of
a broker, but effectively. It was like, if you are this person, you go
to this machine, which is, you know, a very early concept of a
broker. So yeah, that's exactly where it came from.

So that was just before I joined. I wasn't there. At that point, then
we'd actually bought a company called Propero, which is where
Matt Coppinger came along, but that was in stealth mode. It was
one of the very earliest acquisitions and they had their own
product called Propero Workspaces. That solution was based on a
Linux back-end architecture, as well. After the acquisition, the first
thing we did is we rewrote it all to Windows because we didn't
think customers wanted Linux-based appliances. ☺

After rebuilding it, we came up with VDM 1.1 because we had


VDM 1.0, which was the custom broker thing which Spencer built.

28 VDI Design Guide Part ||


We've done some stuff around it and supported it, but then you
had the VDM 1.1. This effectively was the rewritten Propero code
that was actually a GA product. So, that’s the early history.

29 VDI Design Guide Part ||


Me: Right, so let’s talk about cloning. Back in the day, you didn’t
have smart cloning technologies like Linked Clones or Instant
Clones. Virtual desktops you generated were just plain old full
clones, right?

Spencer: Yeah, totally just full clones.

Me: How did customers cope with managing them, and especially
when talking about large numbers?

Spencer: The thing about the early days is that we didn't think that
people would do it at scale. We weren't proactively saying to them
that they could use this for 10,000 machines. After a while, they
were asking us for those numbers. And I remember going through
it at a customer in the UK, which was a government customer, and
we were still in the days of Windows Vista. And they said to us:
“We love this VDM! We tested it. We got 50 machines or
something, and now we want to do this for 10,000 machines.” And
if you think about it as well, this was still back in the day where
SAN was expensive and shared storage haven't come on that well,
either. And Vista wasn't very good, especially with storage. If
you're taking 50 Vista machines and running them, and then scale
them out to 10,000, the calculations of the storage size alone
required a shit ton of hardware. It wasn’t just about speed and
performance, but it was also because there was no duplication yet.
If your machine was 250 GB, you’d have to times that by however
many machines you needed. There was no optimization in that
way. I mean, there was memory overcommit and things like that.
And, you know, the early new concepts of virtualization, but from
the storage side, there was nothing. So, to your point, what did we
tell customers? “Well, get your checkbook out, because you need
to go and buy a big fucking SAN,” was pretty much what we said.

That's what happened in the early days. Storage was killing our
ability to scale. And we used to lose a lot to Citrix for that very
reason, because we didn't do any app publishing, we didn't use
terminal services or the MetaFrame stuff or any of the RDS
services as you know them today. We literally had VM/Desktop
brokering, using RDP for remote connectivity. And we could also

30 VDI Design Guide Part ||


use Leostream as a broker before we bought Propero as VDM
broker.
Me: So, if customers wanted to scale out to like ten thousand
desktops, or tens of thousands of desktops, how did they manage
them? How did they deploy them, configure them, deploy
applications on them, manage their profile, etc.?

Spencer: Well, that was interesting because this is where IT-led


projects were just coming in. IT liked it because they wanted to
take their physical process and put it on the virtual desktops. At
the moment you're like, well, okay, why do I want to do this? And
what they liked was the ability they could reset a machine back to
zero. That was the big thing. And it led into the ability to reset or
delete a VM. This was also where pooling came in, a very early
concept of a desktop pool. We understood what the peak
concurrency was at a customer and then had these desktops pre-
built. Because back at the time, there was no quick building, there
was no Instant Clones or anything like that. And even View
Composer wasn't around. So, the idea here is, you'd have to
preconfigure all these VMs. It was basically cloning, and we
started doing calculations about how many VMs could be cloned.
When looking at the design of things, we especially looked at the
rate of change. I remember going into a hospital and talking to the
customer about their shift pattern. And well, I know we still do
that to a degree today, to understand when the people log on and
log off. Back then we did it to calculate how long it was going to
take to destroy and then clone a new VM. For the hospital, we
needed to plan and design the process so that they could satisfy
the shift.

I remember going in and do a lot of those calculations back in the


day. And do a lot of user profiling with customers. So, I think the
point here is that they manage it very cautiously. It was very, very
regimented and if anything went wrong, a lot of the time in those
early days, you could have outages. Problems existed mostly
around cloning issues. If it stopped for whatever reason and you
lost time, that would be time that people were waiting for a
machine to be created when they logged on, on a Monday
morning.

31 VDI Design Guide Part ||


Me: You mentioned a lot of complexity and you also mentioned
that for instance, storage was insanely expensive; now let’s talk
about business cases. How did you see business cases become
positive when looking at all of the complexity and financial
impact?

Spencer: It wasn't for a lot of customers. But the other thing you
need to understand here is it wasn't like End-User Computing
teams were coming to us. The people that started VDI were mostly
the datacenter people that thought, “Hey let's get the desktops on
there, as well.” And they didn’t necessarily understand how a
desktop worked. So, in the early days I was having to go into
customers and explain to desktop guys about why VDI could be a
good thing. From a business case perspective, it was really tough
to make it stack up. So, what you ended up doing is you went for
some of the main use cases like remote workers. Because if you
think about what you needed to enable remote workers and
introduce flexibility, you could have conversation. We knew it was
expensive, but these were the benefits you got out of it as well. So
yeah, it didn't come cheap. We were waiting, waiting, and waiting
until View Composer came along and compression on storage
level and then suddenly the company’s storage costs reduced to
half. It didn't sort the problem out, but it cut the storage costs in
terms of physical space for sure. But yeah, the early days, really
tough. The business cases were really hard to get over the line.

Me: I guess that operational management (and thus costs) reduced


as well when composer came out. The provisioning capabilities
also introduced non-persistent VDI, which became a thing. How
did customers handle that migration to non-persistent VDI?

Spencer: It was the biggest thing that they wanted. They wanted
non-persistent VDI. So, if you take non persistency, what that
really meant was, I can reset your machine back to a golden image.
So, what they were liking about that was to secure their hygiene
around the virtual desktops. It’s like PCs, you turn them on, and
they've already got slower. It's kind of the way I look at it and then
you leave them on for any length of time to get even more slower.
So, it's nice every now and again to reset them back. Every time
somebody logs on there, it’s still quick. There are less support

32 VDI Design Guide Part ||


calls, and if there is a problem, I can easily reset them so users will
have shorter down time. And, actually, I do subscribe to that
notion. So, when I was in On Technology in those early days, we
had a big use case of a customer large telecom company from
Germany. They had a three-minute rule on the help desk. If a user
phoned up and they couldn't fix the problem (they used to remote
control to it with PC Anywhere) in front of the user in three
minutes, they’d reset it. They'd ask them to go and leave the
computer and walk to another machine. If there was one available,
they would reset it and rebuild the machine from scratch. That
took about 30 to 40 minutes. But the argument was, it was quicker,
cheaper, cleaner to reset machine than what it was to try and
individually fix it. It's kind of like easy come, easy go and the
reason why I'm saying that is, that was the big draw in the early
days. And that's when non-persistent came across. Which was, if I
log off at the end of my working day, my machine deletes and
resets so therefore I get a brand new, clean and nice machine. The
concept was that should override the complexity of support.
Problems from the users should go away unless you've got
problems with the master image (which will lead into much bigger
problems).

The problem, of course, where you're getting into with non-


persistent, is that it works if everybody's got exactly the same VM
and got all the same apps. I used to work for Bank of New York,
and everybody had exactly the same applications. People weren't
allowed to install stuff. The main reason being that mostly
everybody did the same thing in an office, which is why
everybody got the same stuff. I built it so that I could easily
rebuild a desktop including Office and tools like WinZip and other
standard utility stuff. Next, you'd have department apps that were
deployed separately. But in the early VDI days, people took their
existing PCLM methodology and then just put it straight on. It
worked for a while until suddenly someone came in and said,
“Yeah, well, we went off and bought a new app we need.” And
even back then you have people that would go off and buy, you
know, department apps for some project. The postal departments
got this one app that does postal code/zip code look ups that cost
them 55 pounds a year subscription or something. We’d had a
specific install for specific users and quite often even had to run it

33 VDI Design Guide Part ||


from a CD ROM burner sitting in the corner and you're like,
“Okay, how do we deal with this?” And this is what happened.
So, you kind of did non-persistency for 80% of your users who all
used to have a single image on a PC, and you could take that quite
nicely and turn it into VDI. But then, users started getting a bit
more demanding and then all of these small apps came out of the
woodwork that wouldn't fit into the master image concept. So
that's where non-persistent became a thing. Because if everyone
that had the same identical desktop (they’re all non-persistent),
you don't really need to worry about it. It's just the way it
happened. But then you suddenly started hearing about all the
people that needed something slightly differently (like install their
own apps). We needed to give them a persistent desktop basically,
as this was a catch all for all the things that wouldn’t fit in non-
persistent (and deploy it in the way you wanted to do it). All non-
persistent use cases would be part of a certain group (as they were
all the same)
and anybody who doesn't fall in that group will get a persistent
desktop. And if that didn’t work, they still gave them a PC. Oh,
and if they got a Mac, we just put our fingers in our ears and just
denied their existence.

So yeah, persistent and non-persistent was drawn out the fact that
if it worked, it was in non-persistent. People didn't moan and
you're able to tell them what they were going to have. Persistent
was also for people that had a voice. If they were loud, they were
important and thus they were the ones that normally got persistent
VDI. A bit like you had a PC or a laptop in the early days. If you
had a laptop, you were a little bit higher up the hierarchy. It's the
same with VDI. If you are the lowest of the low, you got a non-
persistent desktop because you couldn't do anything about it. And
you know what I mean by that.

Me: I still remember the early days of the first tablet PCs. There
was a Compaq tablet (The TC1000), it was one of the first
Windows-based tablets with a pen. It worked really well, and I
actually owned one. All of our senior management wanted one
because of a status-thing. Managing those devices was a real shit
storm. They came with a specific tablet edition of Windows XP,
that was hard to deploy and hard to manage. But still, we needed

34 VDI Design Guide Part ||


to manage them, as well. And I think it's the same thing with
persistent versus non-persistent at the end. It's all about delivering
a tailored (and persistent) user experience, instead of a tailored
desktop or virtual desktop. But honestly, that's becoming a bit
harder because every end user is expecting that persistent user
experience. Along the way, operating systems have changed, and
we needed to become MacGyver to create all sorts of tricks to
maintain that persistent user experience in a non-persistent VDI.
The user’s expectancy has changed, as well. They want you to offer
something that feels the same as their PC at home. And that is
what I think that makes everything a lot more complex. But it’s
also a great segue into moving away from VDI and starting to look
at the digital workspace as a whole.

Spencer: Yeah, but in those days it was different. When they first
saw VDI, a lot of people thought it would be the death of the PC,
which I didn't think would be the case because history tells us that
there's always something that avoids that. Mainframes are still
there, which says it all. So, you're never going to fully get rid of
them all. There's always going to be these use cases. But the
mindset of “It’s the Year of VDI” means we're going to get
everybody on to it. But until we do it won't be the Year of VDI,
which is why we joke about it, right? We didn’t think VDI would
be great for an X amount of use cases. We were just on this hell-
bent kind of thing to try and make sure there was no reason why a
user couldn't get into VDI. It's called fixation blindness. That's how
IT treated VDI back then. “You're going to be on VDI, you need to
have a valid reason why you're not going to be on that.” And nobody
would ever challenge IT. But then you had some foul projects
where it didn’t work, right? And, in the early days, as well. Don't
forget, you know, there were some use cases like VOIP, which
wasn't a big thing back then, but you did have it a little bit, which
was a bit problematic. And then you started getting into all of
these offloading things. And let's think about thin clients and zero
clients. Back in the day, you didn't have any things local because
then there's nothing to go wrong. It's a disposable item, there's no
data there. But, you know, now we're in this kind of hybrid
chubby client, not a thin client. I call them chubby clients, where
there's a lot of offloading and there's a lot of local CPU and

35 VDI Design Guide Part ||


resource capability you need because not everything can run
nicely and in VDI. It doesn't make sense to do that all the time.

I look at it like music. Persistent and non-persistent sparked a


whole industry for companies like the profile companies from the
Netherlands (like RES Software and Immidio). What I mean by the
music analogy, I used to be hot into hardcore rave. But at some
point, hardcore rave became old school and then it was techno and
then it was happy hardcore and then you had drum and bass, and
then you had intelligent drum and base. VDI started like this as
well and now we have all of these different variations or different
technologies. There's a bit like music when you start up with old
school and move to hardcore and it became all these little
variations and now it’s dubstep. And again, it's a bit like that with
VDI, there's all these little, small technologies. I'm using this, with
this, with that, with that, and I don't like that because I use this bit.
This is also why we have VMware persons and Citrix persons at
customers.

Me: Something else: When looking back at the last 15 years, what's
the most memorable use case you have either seen or deployed on
VDI and why?

Spencer: I have two answers there. The first one is around


healthcare. I was a public sector SE and VDI was getting used for
follow-me patient-care use cases (which I think was the biggest
one that really made VDI mainstream). I saw what a difference it
made in front of customers and how it helped patients get treated.
I like that. So, from that side of things, it's memorable. That's the
best use case, I would say, because if you could help doctors treat
people better in terms of the time taken and not waiting to log on
to machines and that stuff. There's a direct correlation between
what you get your customer and the impact and what it does. I
like stuff like that. I think the healthcare use case for VDI is always
going to be number one. My wife is a nurse and I saw what she
did. I really brought it home because she'd come home and tell me
stories about what's going on there. And then I know how I could
help so that always sticks in my mind.

36 VDI Design Guide Part ||


The coolest one from a pure tech perspective is hard to choose. I
was involved with a couple real cool ones. One of them was one
that was a battlefield kind of use case. We did some early VDI stuff
with NVIDIA to build a proper battlefield kind-of simulator using
Heaven (the 3D engine for first person-shooters). It was possible to
use tanks and all. And I remember being involved in one for
Formula 1. We got some use cases for pit lane type stuff and
telemetry where you don't want to be shipping the whole kit.
What you do is, rather than having the workstations in the pit
lane, they sit in this mobile data center, and the apps and telemetry
is getting streamed the workstations. That was kind of cool. Maybe
the best one was this company that I remember seeing at VMworld
and they just came up with a pop-up golf unit. The golf course was
simulated where you played golf and had to hit the ball. They had
a big TV screen on the side and the tee box, and they were using
VDI on the screen with a thin client using PCoIP. And the idea
was, you could go to sporting events, and you could just quickly
build these golf simulators without bringing an entire datacenter.
It was all being done via VDI, and it was a virtual desktop running
this software and you had all the sensors and everything, but the
latency and connectivity had to be top notch. They were able to get
it working really well and it really proved that the protocols could
work remotely. And that's, from a pure geek perspective, probably
my favorite.

Me: What’s really cool is that the golf use case will be covered later
on in the book, as well. Topgolf uses VDI for one of their latest
games. ☺

Let’s dive into the last topic. So, in your current role, you talk a lot
with C-level management and other important decision makers. If
IT admins or architects need to convince those people to innovate
and invest in their EUC platforms, what would be your main key
takeaway?

Spencer: I think the main point here is that this isn't just about IT
making their cost structure better. Because as I mentioned before,
in the early days, it was about saving money with this, even
though the business case of storage was a bit weird. This isn't just
about IT. This is about better user experience and flexibility, as

37 VDI Design Guide Part ||


well. Take a look at what's happened in 2020, as well. I think they
need to get in other stakeholders. They need the business to be on
board, they need HR to be on board, as well. We're starting to see
that all the time. And of course, they need to make sure that the
business case that they do isn't just about IT-centric cost saving, it
should include how it can help the business, as well.

Me: So finally, before thanking you for a really, really cool


interview, if people want to know more about you and the things
you do, where can they find you?
Spencer: Mainly on twitter: @Spoinster, but I do a lot on LinkedIn
as well. And one day, hopefully in Barcelona or Las Vegas at
VMworld. ☺

THE CURRENT STATE OF VDI


This is going to be a relatively short section.

I mentioned in a previous section that no year should be the Year


of VDI. But imagine this; if the world is facing a global catastrophe
because angry aliens are using infinity stones to destroy the entire
planet, the Avengers are on some galaxy far, far away, and you are
the only who can save the planet. And the only way you can, is to
declare 2020 to be the Year of VDI. Imagine the pressure you are
under. Would you not declare it?

Ok, so very hypothetically, 2020 could be the year. It’s also not
weird that 2020 could have been the year. Besides the fact that the
technology has really matured, it also became very versatile. Both
those factors introduced new use cases to run on VDI. It still
doesn’t mean you need to build a VDI-for-All strategy, but you
certainly can.

The effect of the pandemic has even acknowledged the maturity


and versatility of the VDI technology. Not ever has there been an
event in which the scale of remoted session has been so massive. It
was actually the first time I heard the words VMware and Horizon

38 VDI Design Guide Part ||


being mentioned outside of work and in family/friends’ circles
because so many people suddenly depended on it.

Next to the increase of end users and popularity, a lot has


happened in the technology space, as well.

Microsoft is probably the main reason for remoting technologies to


exist. They invented Windows, which led to an explosion of
companies building apps for Windows. Those apps needed to be
remoted, and since Windows was never really built for remoting
technologies, companies like VMware and Citrix could thrive. Of
course, Microsoft released all of their Terminal Services and
Remote Desktop server operating systems, but that’s not the same.

A Windows operating system which a consumer runs at home was


never ever really suitable for remoting (and I’m not talking about
Windows NT Terminal Server Edition or Windows Server with
Remote Desktop Services). That is, until Microsoft saw the light
and released Windows Virtual Desktop (WVD), currently known
as Azure Virtual Desktop (AVD). AVD basically is a multi-user
version of Windows 10 and supposedly built for VDI
environments. That’s some great news, right?! Well, yes. But only
if you are consuming desktops from Microsoft’s own Azure cloud
platform. And just that. At the moment of the release of this book
(summer 2021), there are some beta projects running to bring AVD
to on-premises, but only on Azure Arc. Unfortunately, no third
party VDI solutions will be able to use AVD services on-premises.
What I do hope is that a lot of cross-platform compatibility will be
created between consumer/enterprise-grade Windows 10 and
AVD. In this case, we (as in on-premises VDI architects) are
indirectly benefitting from AVD. Only time will tell, though.

Another seriously cool thing that happened is that Microsoft


acknowledged that the Roaming Profiles technology really was
created by the devil. They did that by acquiring FSLogix in 2018.
FSLogix was known for its solutions to decouple user settings and
data in what they call profile and application containers. This is
another example of Microsoft’s attempts to simplify the
management of virtual desktops and increasing the UX of a
remoted desktop or application.

39 VDI Design Guide Part ||


Of course, VMware hasn’t been sitting still, either. A lot of new
major versions of their solutions have been released and all of the
acquisitions have had a huge impact on the entire stack. You can
imagine that since Carbon Black has been acquired, it will be a lot
easier to secure non-persistent desktops. More about these
technologies later. ☺

Since the rest of the book will cover most of the current state of
VDI, let’s dive a bit into the future.

THE FUTURE OF VDI


I get tons of questions about the future of VDI. Many consider it to
be a legacy technology. Well, let me start by explaining how these
people are completely wrong.

The VDI technology itself is state of the art. If you don’t believe
me, you should take a look at modern companies like NVIDIA,
Microsoft, and Google, who have built gaming platforms that
stream games to an endpoint through a remoting protocol.
NVIDIA GeForce NOW, Microsoft xCloud, and Google Stadia use
the same set of solutions to bring a “traditional” application like a
game to an endpoint and offer the same advantages over installing
the application locally on your device:

• Device independency – Any device that has a client for the


platform is capable of running the remote game.
• Instant access – You don’t have to install anything. As
soon as you connect to your remote game, you can start
playing.
• Cheap endpoints – The NVIDIA Shield TV or a Google
Chromebook can perfectly run a game including Realtime
Ray Tracing with a relatively simple GPU in the endpoint.
• Peripheral support – You can play games with weird,
redirected hardware like the Xbox controller or a Logitech
controller.

40 VDI Design Guide Part ||


More about VDI and gaming in a later section of the book. ☺

So, if companies (like the ones mentioned) invest in the


technology, why wouldn’t you? The answer lies with the reason
why people call the technology legacy. We use a VDI to remotely
access traditional applications. Traditional apps are considered
legacy and thus VDI must be legacy too, right? Now, let’s agree
that VDI actually is a modern technology. The reason for using it
might be because of traditional applications, but that’s a whole
other ballgame. Independent Software Vendors (ISV), and
specifically the ones that have existed for many years have been
building their software for popular operating systems like
Windows and Linux and never ever figured out that if such
applications need to be used in the enterprise, it might be a good
idea to help the IT admin out and simplify everything that has to
do with deployment and maintenance. On average, they don’t care
if you have to deploy the app to 50 or 50,000 end users. It’s not
their problem anymore: it’s yours. That shows the importance of a
solution which is capable of doing that and whoever tells you that
the era of those apps will soon end, is also telling lies.

A very respectful person in the EUC industry is VMware’s EUC


CTO Shawn Bass. He is also one of those EUC dinosaurs I
mentioned earlier and is being considered a visionary in this
space. In 2014 at a BriForum in London, I believe, he made a quote
that will probably chase him for many years:

“After a nuclear war, only three things will still exist: cockroaches,
twinkies and Windows apps. Denying that is simply foolish.”

I strongly believe he’s right. I have met many customers and EUC
enthusiasts around the world who quite often are still using
applications in their organization that are over 20 years old. Many
of these apps are mission critical (I shit you not). And apparently,
no real alternative is available from a cloud (like SaaS
applications). Quite often they aren’t even suitable for a different
deployment technology, and there you have it: another reason that
forces you into technology choices which might be really cool
(because let’s be honest, VDI is cool) but unnecessary because of

41 VDI Design Guide Part ||


the rise of cloud-based services and Digital Workspace solutions
like VMware Workspace ONE that strongly focuses on the
“modern” end user.

Now, some of you might have seen me present at either one of the
VMworlds or at a VMUG and talk about the Formula 1 simulator
that I’ve build together with a couple of my ITQ colleagues and
the help of VMware/NVIDIA. The idea of the project is to show
potential customers who have misconceptions about VDI, that it is
even capable of running the worst apps, with a great UX. You
might ask why I think an F1 game is the worst? The reason is fairly
simple. The game we tested was F1 2018. The behavior of the game
can be compared to a VDI’s worst nightmare:

• It will basically use all of the CPU cores it can find


• It will try to get the most out of every core (and thus
requires a high clock speed)
• It requires a GPU to run smoothly
• It requires weird peripherals for the best UX (think
about a steering wheel and gas/brake pedals
• No latency is tolerated as this will directly impact the
UX (it’s like driving while being really, really drunk)

We needed to take all of these things into account and introduce a


virtualization layer, a remoting protocol, and the cheapest
endpoint we could find.

After weeks of building, tuning, tweaking, breaking, rebuilding,


swapping CPUs and GPUs, replacing endpoints, and again
rebuilding, we managed to get everything to work. It had a little
bit of latency but could be played really well. We even got my
buddy Brian Madden to play a couple of laps and he wasn’t even
able to determine he was driving inside a virtual desktop. We
traveled around the entire country showcasing the F1 Sim on VDI
and with success. Some customers who still had doubts about VDI
were suddenly convinced.

This use case (which I’ll cover in the use cases section) lead into
another one which was really cool and shows the future potential
of VDI as well. VMware launched a new project called Project VXR

42 VDI Design Guide Part ||


in 2019. They showcased it at VMworld, and it was one of the
coolest things at the conference that year. Project VXR focusses on
bringing Virtual Reality, Augmented Reality, and Mixed Reality to
the enterprise. Being able to manage VR devices (like an Oculus
Quest) with Workspace ONE and deliver all sorts of applications
to such a device is the main goal. An Oculus Quest is a relatively
cheap VR device, and even better: it’s a standalone headset. It
doesn’t rely on an expensive and powerful VR-capable PC, but
due to that fact it also doesn’t contain the same powerful
hardware. And this is where VDI comes into play. If you can run
VR apps that require a lot of horsepower in the datacenter (on a
GPU-powered virtual desktop) and just remote the app to the
standalone VR goggles, you have yet again another example of
how VDI can (and probably will) open up for new use cases.
I won’t be diving too deep in the use case right now, as it will be
covered in a later section.

THE SOFT SIDE OF EUC


I mentioned earlier that the VDI technology is quite mature. In this
case, maturity means that it is stable, scalable, has a broad market
share, has a rich eco system around it, can relatively easily be

43 VDI Design Guide Part ||


adopted, and has a certain demand. Most of those aspects are
indirectly or directly linked to technology. But end-user
computing also has a soft side -- a side that completely focusses on
the end user without talking about technology. It’s the people and
process side. I think this is still the most underestimated
component of an EUC project and is also the biggest cause for
projects to fail. VMware acknowledged this by embracing a new
approach that focusses on the outcome. The outcome is basically
the result of Simon Sinek’s Why.

OUTCOME-BASED APPROACH
In the VDI Design Guide, I covered quite a bit involving business
cases. In my opinion, every business case should Start with “Why”.
Why do you want to change? Change isn’t done easily and doesn’t
happen in a day or week. Change takes time and depending on the
type of change, it will cost effort – an effort that should not be
underestimated, either.

In VMware’s outcome-based approach, which we at ITQ have


fully embraced, the methodology to change is strongly tied to a
model called the Digital Workspace Journey Model. The model
describes the journey from an IT Defined Workspace to a User Centric
Workspace with the ultimate goal of becoming a Digital Enterprise.
The following figure is taken from VMware’s outcome-based
delivery kit and shows the different outcome phases.

44 VDI Design Guide Part ||


The way we used to manage the workspace in the last couple of
decades is defined as an IT Defined Workspace. IT is in control of
everything. They decide what the user is allowed to do and how
they do it. IT controls the user experience, as well. This has worked
really well for a long time, for multiple reasons:

• End users didn’t really know how a computer worked


that well.
• IT didn’t trust end users because of the previous reason.
• People worked from the office, during office hours, on a
physical PC, and connected over a LAN in a managed
switch. Everyone, except for the senior management of
course, who used a fancy laptop to do the exact same.
• Operating systems were updated just every couple of
years.
• Applications were updated maybe once a year.
• Mobile working was just email on a phone.

You see where this is going. I think the main reason for an IT
Defined Workspace to be successful is because IT was pretty
predictable. Nothing really changed that much, outside of the
operating system upgrades or things like major hardware
replacements. One of the companies I worked for decided to
combine desktop replacements with operating system migrations
and because most of the aspects of such a migration were also
quite predictable, we kind-of automated most of it. There were
scripts to power off an old desktop that completely backed up
important data from the old machine, stored it on the network and

45 VDI Design Guide Part ||


imported it on the new machine as soon as it was auto configured
at first boot. That was also the life of IT, right. You were either
busy solving day-to-day issues from employees or preparing/busy
with migration projects.

Although the model describes the increase in user experience


when moving towards the Digital Enterprise, I don’t see it like
that. User experience depends on the expectations of both the
organization and the end user. If you work in an IT team of an
organization in which you strongly have to protect the intellectual
property of your business, user experience will look different from
that of, for instance, a university. The same holds true for
governmental organizations with a strong focus on security, like a
foreign intelligence department or the department of justice. These
will need to protect data even more so from hackers. In these
cases, it’s protecting from abuse. In other cases, IT was not only
protecting from abuse, but also misuse. Everyone who worked in
IT during the zeroes (2001-2010) knows someone who used to
write down their new password on a Post-it Note and pasted the
note to a monitor or on a laptop. Heck, I even had a sales colleague
who wrote his password with a pencil on his laptop and used an
eraser every 3 months to erase it and replace it with a new
password. These people didn’t have any intention to do harm to
the organization, they were just stupid as f*ck and the biggest
cause for companies to “leak” data. This is exactly what I mean
with misuse.

The next phase in the Digital Workspace Journey Model, is to


move towards a User Centric Workspace. This is where most
organizations struggle. IT never trusted anyone. This led to a
massive inefficiency in labor and organizational processes. These
organizational processes were created around the IT Defined
Workspace and of course could be a lot simpler, but IT just
wouldn’t let them become simpler. If everyone in a company
would have a laptop to work on, instead of a traditional desktop,
people could become more flexible in where they sit in an office.
They could collaborate better and can even take a device home to
do some work. Laptops were a lot more expensive and since IT
quite often was responsible for budget and decisions around the

46 VDI Design Guide Part ||


budget, they couldn’t care less about collaboration or increased
efficiency.

What has changed is that senior leadership has acknowledged that


IT finally is a lot more than just the department of NO. IT can make a
difference. IT just has to acknowledge that, as well. This is where
the transition to a User Centric Workspace is essential. Employees
need to be able to get the most out of their efforts. IT needs to
facilitate that. Luckily for most organizations, this is where IT has
slowly been focusing on for the last couple of years. Not because
they wanted to, but mostly because they needed to. The increased
demand for mobility and flexibility caused many organizations to
invest in solutions that were capable of managing Apple and
Android devices. The increased demand of SaaS-based
applications and solutions has caused IT to focus on delivering
modern authentication methodologies and application portals.
And the continued delivery of Windows 10 and applications
updates, and upgrades has caused IT to focus on smarter delivery
solutions for these updates and upgrades. Still, you don’t want an
organization to focus on changes because they have to, you want
organizations to change because they want to. You want them to
see that they can win so much by embracing technology and fully
putting it to work. The pandemic has had a big impact in this, but
there is a major difference in three types of organizations who
have faced the pandemic:

• The ones that have invested many years ago in a modern


way of management. These organizations had no worries
about their IT systems and IT teams to be capable of
handling the sudden mobility scale. The business was
already adapted to the User Centric Workspace. Start-ups
are great examples, but also call center companies.
• The ones that responded to the pandemic by enabling
mobility for anyone that normally worked from the office.
With a bit of work, those people were now able to work
from home. The Dutch education system is a great
example. Sure, most schools prefer having students in a
classroom. But the pandemic taught us that remote
schooling can work if both teachers and students use the
right tools.

47 VDI Design Guide Part ||


• The ones that still thought that mobility was something for
start-ups and now still depend on people coming to the
office to work. As an alternative they offered VPN (which
no one really likes. No one. Like seriously. I’d rather quit
my job before I start working at a company that offers a
traditional VPN).

Of course, some organizations aren’t able provide everyone


remote possibilities, just due the nature of the organization or the
work they are doing. Healthcare is a great example of this. Many
patients get treated in a hospital, and doctors and nurses can’t help
them from home (not just because most patients might be in the
hospital, but don’t forget about personal information storage
issues involved, such as HIPAA and FERPA restrictions, just to
name two that may apply within the US, for instance).
Manufacturing, retail, and the food industry are just as good
examples. But, even in those industries, not everyone needs to be
in the office 8 hours a day, 5 days a week.

I already mentioned that organizations need to be in a position


where they want to change. If they are in that position, nothing is
holding them back to fully embrace the User Centric Workspace
(except for the financial aspect). The User Centric Workspace has
the following characteristics:

• The workspace focusses primarily on user experience.


Putting the user first. This means the user is able to
customize their own workspace.
• Security is focused on the identity and can be conditional,
based on the context of the user.
• Applications will be aggregated into a single delivery
model, in which conditional security policies can be
introduced, as well.
• An employee can use a device that is tailored to the job
they do. Mobile employees can use a tiny laptop, while
designers can use a powerful workstation.
• Application delivery is done by automated distribution,
but also through on-demand requests.
• Employees can do a lot of things through self-service.
• They basically can work from any location.

48 VDI Design Guide Part ||


• IT controls devices through compliancy to security
policies.
• File sharing is done through tools like SharePoint and
OneDrive (as well as some proprietary file share solutions
like Workspace ONE Content) instead of the well-known
fs01 fileserver and F: drive mapping.

The importance here is to determine the actual outcome of the


change. Why do you want to change and move towards a User
Centric Workspace? I think the main answer lies within the fact
that organizations are starting to realize that their most important
asset isn’t their product or service, it’s their employees. Without
skilled and highly motivated employees, their products and
services won’t be that successful. There are always exceptions like
big chip builders or pharmaceutical companies who developed
something unique. But even for those companies, their products or
services have been developed by people. Which again comes back
to where it all should focus on: the employee. To keep that employee
happy, energized, optimized to get the most out of their workday,
and loyal to the organization, a User Centric Workspace is
essential. Graduates who leave university and enter the world of
jobs have already been embedded in this way of working. You
won’t even keep them in your organization without a people-
centric focus. IT is important, but Human Resources (HR) is just as
important. HR have the same impact to employee satisfaction as IT
does. IT just does it with tools and devices, whereas HR uses their
capabilities to coach people and enable them to fully develop
themselves (although I do understand that this is where some
companies fall way short, they do not support nor understand the
needs and desires of their employees because they have in some
cases failed to adapt to changing times).
Development of employees is something the Digital Workspace
Journey Model doesn’t cover but is just as essential to achieve the
outcome you were hoping for. Remember that the ones that have
to change the most are the actual people working in IT. They are
the ones that have to cope with changes from the organization
(people and process changes), changes from the outside world
(increase of mobile devices, SaaS Apps, increased security
demands), and all changes dictated by the big IT companies like
Microsoft and VMware. This means that education has a major

49 VDI Design Guide Part ||


impact on how changes are being implemented. With the right
knowledge and motivation, IT can become a true business partner
to the organization, just like HR has become in a lot of cases. When
that happens, organizations are ready to work towards the final
phase in the Digital Workspace Journey Model: Becoming a Digital
Enterprise.

In a Digital Enterprise, IT is being used to determine the future of


the organization. They are able to beat the competition, create
innovative business models, drive disruption in the market, etc.
Especially the last one is one of my favorites. I had a really nice
conversation with a decision maker from a small municipality in
the Netherlands quite recently, let’s call him Luke and the city
Tatooine.

I met Luke at a birthday party and because of the same love for
sneakers (I was wearing a pair of limited-edition Jordan 1s and he
did, too), and we ended up having a nice conversation about the
organization he worked for. They were a small municipality in the
northern part of the Netherlands, and for many years were driving
their business through the same non-changing demand from their
customer: citizens of Tatooine. What you need to know is that an
organization like that is quite unpopular with graduates. They
fully embraced bureaucracy, have a process for basically
everything and are commonly known for their slow way of
embracing changes. It’s mostly older people who work at such
organizations. Luke saw that many years ago and wanted to
change that image of his organization. His first challenge was to
reshape the outside image of the organization. This was about 10
years ago. People in the Netherlands need to pay a visit to city hall
for public services like requesting a new driver’s license, passport,
announce the birth of a child, etc. Most of the tasks involved in
these processes can be automated and offered to citizens remotely.
That’s exactly what they did. They were one of the first in the
Netherlands to introduce a self-service portal in which their
customer could manage a lot of their own services and schedule
appointments. This gave Luke the ability to let his employees
focus on improving the business processes instead of mostly
repetitive and administrative work. The overall image of their
organization slowly started to change because the citizens of

50 VDI Design Guide Part ||


Tatooine saw how they were innovating and impacted the user
experience of the customer. The municipality of Tatooine wanted
to change not only because of a lack of new personnel, but also
because they wanted to offer the best service possible to their
customers.

I was seriously impressed by the conversation, and already


thought of him being a strong leader for their organization. But,
after a few drinks, he even mentioned a plan that he was working
on right now that is kind-of still not publicly known (which is the
main reason for not mentioning him and the municipality by their
actual names). The most important goal for his municipality is to
offer the best service for their citizens. Those citizens are spread
across multiple small towns and suburbs and are mostly working
in agriculture, farming, construction, food processing, etc. He
figured out that if he could fully empower his employees to work
from everywhere, he could also do that from mobile offices, from
home, and even from the businesses of the citizens. His vision is to
become a business partner to the citizen. He wants to empower the
citizen and avoid them having to go over to city hall for all sorts of
administrative processes. Instead, Luke is going to transition his
employees to become account managers. A successful pilot has
been conducted with a beautiful outcome: customer satisfaction. In
the next phase, every employee has a fixed set of accounts they
manage. In case a new passport needs to be requested, the citizen
can easily request this. The account manager will handle the
request at the location of the citizen and is able to ship the passport
as soon as it’s ready. I would move over to Tatooine if it wasn’t
that far from the sea (I now live 5 minutes away from the beach).

I think this is a perfect example of an organization that wanted to


change and used the possibilities of technology to do so. They
invested in mobility solutions a while ago and have a consolidated
IT team that focusses on everything that has to do with employee
experience. They are in the absolute last phase of the Digital
Workspace Journey Model.

THE JOURNEY

51 VDI Design Guide Part ||


The journey towards the Digital Enterprise can be both a hard one
as well as a long one. It all depends on the maturity level of the
organization and the will to change.

The following figure (which is taken from VMware’s outcome-


based delivery kit) shows the same Digital Workspace Journey
model, but with individual streams within the journey itself:

The idea behind the model is that based on the individual topics,
you basically determine what your maturity is within a topic. If
you are already running the majority of applications from a
Microsoft 365 portal, it would probably also mean you have fully
integrated into a public identity provider and thus will be very
mature in that part of the model. But, on the other hand, if you
don’t have a BYOD policy in place or have a hard time to enforce
and report security compliancy, you will need some more effort to
move from that maturity level to the Digital Enterprise.

The idea here is not to treat everything as one massive project, but
to make the journey in smaller steps.

The first thing to do is to fully assess an organization on the


following topics:

• Identity and Access Management

52 VDI Design Guide Part ||


• Application Delivery
• Desktop Delivery
• Mobility

Within each of these individual streams, there are multiple


maturity levels to achieve. At the end, the four streams converge
into two main streams.

Identity and Access Management and Application delivery


converge since everything will be aggregated into a single portal, a
portal that will both provide authentication as well as all of the
resources an employee might need.

Desktop Delivery and Mobility converge into a single stream since


we want to utilize Unified Endpoint Management to both manage
endpoints as well as the delivery of possible virtual desktops.

In the end, all the streams converge into a Digital Workspace.


Please note that a Digital Workspace isn’t a product or solution,
it’s a strategy. It’s the goal of the roadmap and will enable you to
create those small steps from your current maturity level to that
end goal.

It’s important to not only focus on investing in solutions that will


bring you further into the Digital Enterprise, but it’s also just as
important to focus on the people and process part. This means that
things like second day operations are essential to be able to
increase the maturity level. Without proper operational processes
in place, your IT environment will still be a mess. Fully integrating
with change management methodologies like ITIL will for sure
help to shape those second day operations. Change management
by itself is underestimated and quite often seen as a process that
slows changes down. While that might be a bit true at first, in the
long run, organizations will heavily benefit from a proper change
management process as it will keep you in control over what’s
happening in the organization. It will keep changes as predictable
as possible and will enable you to avoid ad hoc changes as well.

53 VDI Design Guide Part ||


A SMALL STEP FOR IT…
A big leap for Employee Experience…

To ensure that changes in the organization are not temporary and


feel organic, it’s important to apply them in small steps. The
smaller the step, the bigger the chance it will stick. I am quite
inpatient by nature and had to learn this the hard way. A while
back, before I started to work at ITQ, I got offered a new
international role as a team manager for an IT team that was
spread across Northern Europe. I accepted the role and felt really
energized and full of ideas on how to manage the different
services and solutions we were dealing with. Now, that wasn’t the
only thing changing. One of the things that also changed was that
the team managers in all individual countries were fired because
of a personnel reduction. In such a short time span, so much had
changed for the different IT teams that they just didn’t accept the
fact that the organization was moving on. They didn’t accept me as
their new leader and so, after 6 months, my energy levels were
drained. I decided to move on to another role at the same company
in which I could do something that still was internationally facing,
but more focused on technology and less on people. I was quite
close to getting a burnout, just because I love working in a team,
but most people in the team just didn’t want to work with me. I
tried multiple things from regular meetups, having drinks outside
of work, trying to inspire them with really nice presentations,
going to awesome IT events, etc., and nothing helped. They
eventually ended up with a senior manager who was all about
management by fear (something I really despise). When I look back,
I think the company made a big mistake by changing all of this in
just 2 weeks’ time. If they had applied multiple changes, spanning
multiple months, it would probably have been accepted.

Another great example of a big change was the introduction to the


new Start Menu in Windows 8, without the possibility to revert
back to the old menu. I’m certainly not saying it was a shit show,
but yes, it was a shit show. The start menu was completely
different from the one they introduced in Windows 95 and
basically kept the same until Windows 7. For almost 18 years, that

54 VDI Design Guide Part ||


Start Menu stayed the same. This was up until the release of
Windows 8. The whole idea was to make the Start Menu a bit more
interactive and respond to whatever request you have. You could
not only include shortcuts, but also a search window, different
widgets and buttons to shut down, etc. Next to the changed
layout, it also suddenly filled the entire screen when popping up. I
wasn’t the only one that genuinely hated it and reverted back to
Windows 7. Some folks in the Microsoft community built a custom
plugin that was able to hack the old Start Menu back in Windows
8, so it was usable again. I guess Microsoft accepted the fact that it
sucked that badly and because of that, they released Windows 8.1
a year later (in 2013). They reintroduced the old Start Menu again
and ever since, it has stayed in the Windows operating system.

Apple does this a bit differently. With every new version of their
Mac Operating System, they introduce subtle changes so people
can slowly adapt to them. When I now touch an old OSX version
(like Lion or Snow Leopard) I miss a lot of features that I use on a
daily basis. So, again applying changes in small steps will for sure
help!

For inpatient people, it can be hard if you have to wait for changes
to happen. My wife, who works in HR, taught me a great lesson
about this.

If you change or improve 1% of the things you do per day, for a whole
year, you’ll end up thirty-seven times better by the time you’re done.

This goes for everything. Changes or improvement in exercising,


one’s diet, work-related stuff, you name it. The same goes for the
steps you need to take to get to the end goal of your roadmap. If
you put a dot on the horizon, take the time and seriously plan
your intermediate goals. Make all of those goals SMART (specific,
measurable, achievable, relevant, time-bound). This will increase
the likelihood of these goals to actually be achieved.
Ok, changes have an impact, especially for employees who we
consider to either be the IT admin or the end user. The pandemic
had a major impact for the end user. But why?

55 VDI Design Guide Part ||


2020 can be seen as a very weird year. Although many consider it
to be a major shit show, it also had its advantages. First of all, I was
finally able to tell my friends and family what I do for a living
without them starting to look like they saw a ghost halfway
through my explanation. The world has finally seen where the true
potential of end-user computing lies.

Since the mid 90s, we (as in: IT people), have worked with
technologies to remote the end user into our company networks
and application servers. Those technologies always had a certain
stigma or prejudices regarding the User Experience, not from us,
but mainly from the businesses. Especially when talking about
such remoting technology for the masses. At my employer ITQ, we
have tried many things to convince customers and take away all
their prejudices. Running proofs of concepts with their own
business-critical applications, showing the F1 simulator on VDI,
running remoted sessions from another continent to show how
higher latencies can still work -- we have tried it all. But nothing
has worked as well as the sudden work-from-home demand. For a
long time, I was wondering why that was, but I think I found an
answer.

As mentioned earlier, people don’t like to change. And with “don’t


like” I actually mean “hate”. Every person is different and will
cope differently with changes. Change has brought about a lot of
really nice things. Without change, we would still be going to a
library to learn about history. Or first rewind the tape in your VCR
for 10 minutes because your little brother forgot to do that when
he was finished watching Die Hard. Or what about taking photos
and having to bring the film to a local photo shop to find out after
a week that all of the holiday pictures were overexposed.

You see, change is good. Even if it’s not good, it eventually will be
good. It’s easy to find disadvantages of change. If you break up
from your spouse, it’s a massive change that might not seem right
at first. After a few weeks or months or maybe a year, you will
realize that also in that case: change is good. And trust me, I have
been in that exact position many years ago and look now. I found
out that the very person who I would eventually marry was my

56 VDI Design Guide Part ||


soulmate and a better match than my ex-girlfriend. And better yet,
she helped me become the best version of myself.

To conclude my lecture on change, I would like to share one of the


quotes I have framed in my office. Heraclitus of Ephesus, an
ancient Greek philosopher, strongly believed “Change is the only
Constant” and I also live by those words. If you give in to the fact
that change itself is inevitable, you might even be able to use it to
your advantage.

THE “NEW” END USER


There are many reasons why things change within your
organization. Changes can for instance be dictated by financial
decisions, by customer demand, or by technology or innovation
(or when the sole person who is the expert in a certain area
suddenly leaves ).

In case of the 2020 work-from-home change, it was obviously


dictated by the pandemic. The thing that blew my mind is how
such an event caused not only organizations to change in a serious
short time, but more importantly, the end user
changed, as well. I’m a regular listener of the
Digital Workspace Podcast
(https://techzone.vmware.com/podcast) and on
one of the episodes, Pat Gelsinger (VMware’s
former Chief Executive Officer) was the main
guest. In the episode he said the following
sentence, which I now have on the whiteboard behind me:

“Sometimes it takes a decade to make a week of progress, sometimes it


takes a week to make a decade of progress.”

There are multiple reasons why the world has changed so rapidly.
For instance, the technology itself is mature and capable of coping
such an insane scale. IT departments have changed and adapted to
become more end-user focused. Senior management is changing to
outcome-based monitoring instead of micro-managing employees.

57 VDI Design Guide Part ||


The term “work” has changed from being in an office from 9:00 to
17:00 for five days a week to what it is now. Still, the most
important asset in the whole change, is in my opinion, the end
user.

The end user always had a major impact on end-user computing


projects. They determine a successful outcome, mainly based on
user experience. User experience is sometimes hard to define, as
not everything can easily be measured. Metrics like a frame rate
and latency can be measured, but do they always reflect the end
user’s experience? I mean, I have worked on a project in which I
was able to reduce the logon time from a 12-minute average to 1.5
minutes. The customer was insanely happy because their
employees could be a lot more productive. If I manage to get an
average logon time of 1.5 minutes at a hospital where nurses and
doctors need to quickly sign into medical stations to help patients,
I’m quite sure they won’t be as happy.

What the pandemic has taught us, is that user experience has also
been subjected to change. This is what has changed my own
mindset forever. I have been part of a lot of pre-2020 VDI projects
and the most complex part of those projects has always been
application related. The complexity is not only related to the best
delivery model or strategy for a certain application, but also to the
effect it has on the rest of the project. Imagine you have 500
applications that need to land on the new platform. You can
migrate 1 application per day, which leads to 1.5 years of work
until you have not only migrated all of the apps, but also the users.
Without all of their applications, end users can’t work, right?
That’s how we always ran our projects. We defined a list of
general apps, a list per department and a list per user. Based on
those lists, we migrated apps to the new platform and as soon as
all of the apps were migrated for the group of users, we migrated
the actual users, as well.

Early 2020, I was part of two major end-user computing projects


for multiple thousands of users. Both projects were basically in the
same phase. The architecture was designed, the infrastructure was
built and the first 25% of the applications were migrated. We, as a
team, were completely busy with the packaging process and our

58 VDI Design Guide Part ||


migration team was adopting end users to the new platform. We
migrated a single department per week or couple of weeks. This
was totally dependent on the list of applications they were using.
Although these processes were slow because they take a lot of time
and effort, all went well. Until the end of March.

Both customers decided to let everyone work from home, even


though the projects weren’t finished and aside from the top 50
applications, a lot of applications were still not migrated yet. We
had to cope with a sudden scale and quite frankly were expecting
a large number of complaints, simply because most end users had
to work with limited tools and applications. The opposite was
true. Instead of a large number of complaints, compliments were
literally coming in. People were incredibly happy that they could
remotely access their email, the entire Microsoft Office Suite and
the most important business-critical applications. It is something
we never expected. Another thing that surprised me is that for
both customers, we scaled them for a certain concurrency and a
little bit of extra capacity, just in case. For both, we completely
oversubscribed the resources, which led to a slight performance
degradation. And still no complaints, just compliments. My mind
was blown. This made me realize that user experience had also
been subjected to change.

A year later, I’m wondering if this changed user acceptance level


could have an impact on VDI projects, as well? This could change
the way we migrate or adopt users to a new platform. If we don’t
have to wait until all applications are migrated before we migrate
all of the users, it could save us from a lot of complexity. And what
about business requirements? Organizations will start to look at
things like performance and scalability differently because they
have now seen what happens in cases like the pandemic. A risk
like this one is apparently quite realistic, and mitigation could be
easily done in case the infrastructure is designed properly. Also,
what kind of measures will business leaders take to reduce the risk
of their business seriously losing money or even going bankrupt?
Certain types of organizations aren’t able to adapt so quickly to the
new normal, like healthcare organizations. But what will happen if
we have to deal with another pandemic in a few years? I think this
is going to be an interesting time.

59 VDI Design Guide Part ||


INTERVIEW WITH ROBERT HELLINGS
For some of the topics in this book, I had a hard time to find a
suitable person that would be a perfect fit for an interview. With
this interview that wasn’t the case. The soft, human side of end-
user computing is getting more and more important, maybe even
becoming the dominant side. About four years ago, at my
employer ITQ, we wanted to focus on a new strategy in which
Customer Intimacy became very important. ITQ is part of a handful
of VMware partners who strongly focus on delivering solutions.
We started to notice the importance of the People and Process aspect
because some of the bigger projects became challenging. At one of
those projects, I met a person who eventually became our new
CEO: Robert Hellings. I met Robert for the first time when he was
an IT Executive Director at that specific customer’s company. He
was responsible for everything related to IT and was one of the
strongest leaders I ever met. The main reason for his strong
leadership wasn’t because of a vision he had, or because of his
great successes. No, it was because of his way of inspiring and
motivating people. He truly sees that people are the actual most
important aspect of a company, not a service or product. Without
inspired and motivated people, a great service or product will
probably not sell that well.

Because of the change in strategy at ITQ, the leadership team


decided that it would be best to find someone who is passionate
about the people and process aspect and knows his place in the IT
world, as well. That person was found in Robert. Since he joined
ITQ as CEO, a lot has happened, both for me personally as well as
for the company. With the vision, strategy, and passion of the
leadership team and the rest of the ITQ family, ITQ has expanded
to other countries, doubled in growth, introduced Managed
Services and Support Services and moved to a new office.

For me personally, Robert has been a great mentor and helped me


grow, as well. He taught me multiple valuable lessons that have
helped me to gain focus and improve my own leadership skills.

60 VDI Design Guide Part ||


Because he was both a leader at a customer with a major focus on
the people and process side, and now an inspiring leader who
implemented a lot of changes at ITQ, there was no doubt he is the
right person for this interview.

Me: Why did you decide to jump into the IT world?

Robert: Well, I started like many others. In my teenage years the IT


world was changing rapidly. These were the eighties and the shift
from bringing computers into family homes was ongoing. My first
computer was an MSX and I loved to play, develop, and pioneer in
what was for me a new world full of challenges and creativity.

Me: For someone in IT, you have an extraordinary interest in


people. Why is that?

Robert: I think I’m built that way… It truly gives me energy and
joy if I’m able to help someone or to feel that as a team we can be
successful. Furthermore, I truly love to see differences in people.
For instance, during my military service, when I was 20, I was
explicitly experiencing the big differences of all the people in the
company I was part of. At that time, I learned a lot how to take
into account people’s interests and preferences. In these days it
was purely intuitive, later I learned more explicitly via
management trainings the importance of diversity and how to
increase success and joy by focusing on the talents of every profile
and combine these different profiles in teams.

Me: What’s your link with End-User Computing?

Robert: From a technical side, I’m not an expert in End-User


Computing. But I’m fascinated from a change management point
of view. This started already at my former company, where the
challenge of enrolling the EUC was not at all a technical challenge.
And the views I’ve adopted there, got enforced when I joined ITQ.
In my view the most important part of getting to success is based
on psychology and therefore the change management. To put it
short, when you enroll EUC, there is probably an IT manager
asking for the change (and paying as well), but it causes a change
in everyone’s workday of the whole company. And these users

61 VDI Design Guide Part ||


didn’t ask for that change, were not part of the decision making
and definitely are cautious about changes in their working
environment which they are used to. It is like saying, “As of today,
I will take you out of your house and put you in a new one
without asking.” And like already mentioned, if it is about people
and how to take into account people’s characteristics, then there is
a link in the way I enjoy my work life.

Me: Why did you move to ITQ?

Robert: My previous company was an inspiration for more than 25


years. There was the opportunity to enjoy all kinds of positions
and challenges. I started in the IT and had the opportunity to do
this both in The Netherlands as international during my time at a
big financial company. After 18 years I got the opportunity to
manage the change program of the pension business line,
including an operational responsibility. From that perspective I
learned even more about the value of IT for the business. And I
can tell, that is really different if you experience that out of the
business line itself compared to have serviced the business out of
the IT department for many years. Then I was asked to become the
director of Integrity, which was a big compliment and interesting
responsibility. And although it was really interesting and within a
company that takes the topic really serious, sadly I misjudged the
level of formalities and the tsunami of regulations that
accompanied the Compliance responsibility that was part of the
job. I had to conclude that I was losing my joy and energy in my
work life, a main personal principle. So, I resigned. Life is short
and we should predominantly enjoy what we do. It caused to
reflect on my personal preferences, the outcome was IT, an
environment where people matter first, and passion is the way to
success. And that company I already got acquainted with during
my former job, it is ITQ.

At that same time ITQ was growing and looking for someone to
help in the next phase. My respect and admiration is big for
Francisco Perez van der Oord, both (co)founder and CEO at that
time, being able to reflect on what the company was needing and
that he should embrace a different responsibility. Being able to

62 VDI Design Guide Part ||


judge that the next phase needed a change like that. In my view, a
leader pur sang.

Me: You have been in the seat of both customer and supplier. Are
applying changes different in those seats?
Robert: Yes, as a supplier you can learn, apply and learn again
constantly. Inside a customer organization, normally the same
change doesn’t occur that often. However, another view is
ownership. Inside the customer organization you can really own
the change. As a supplier, you are always positioned as an advisor,
maybe implementer, but never the one that will be the owner
across the whole lifecycle of the change. Furthermore, inside the
customer organization, you are part of the culture. That is both an
advantage as a disadvantage, easier to read and feel the culture,
however also being part of that same culture and less capable of
thinking and acting out of the box or including previous cultural
experiences in the same type of change…

Me: How did the past 4 years impact your view on the people &
process aspect of End-User Computing?

Robert: Like explained above, in my view it is a crucial part of


success. EUC has as many owners as there are people in the
customers’ organization. They are all used to their own workplace,
and someone is suddenly changing that. The consequence is an
issue in the ownership. The stakeholder that owns the change
during decision-making is the IT management but shifts to the
end-user and related management during implementation. Then,
if the acceptance is at risk, as an IT manager you will always signal
that too late. The natural movement of dissatisfaction is not via
direct feedback to the IT department, but normally acts first via
informal ways and reach out to the management of the business
departments first. This often implies escalation via top
management as first feedback moment for the involved IT
management. And then you have to fight your way back in a 3-0
down situation. Sometimes even having discussion about why the
change is needed anyway. So, I’ve learned management of the
change is crucial and needs an explicit approach that develops a
pull for the change in the organization. In such a way that the end-
users and business departments will feel themselves the owner of

63 VDI Design Guide Part ||


the change. A crucial part of that is to understand, capture and
communicate the WHY of the change explicitly, involvement of all
business departments in such way that they feel that they are
being listened to and the inclusion of implementation of so-called
satisfiers by solving dissatisfiers of the current platform.

Me: It’s hard to predict the future, but where do you see End-User
Computing going, from a people and process perspective?

Robert: The End-User Computing technology is advancing over


time and includes more and more business crucial enablements,
for instance, the automation of workflows that add value to
onboarding new employees. Imagine a company that has
constantly new employees to onboard and exiting employees to
off-board. With the modern EUC tools, it is far easier to automate
these processes and include both automatic changes in
authorizations as well as including functionalities that boost
employee satisfaction. For example, sending flowers as part of the
on-boarding workflow -- simple but powerful at the same time.
Or the integration of IoT with data analyses, which you can use to
automate things for a business and increase customer satisfaction.
Lately you had yourself a nice idea to build a test working place
that automatically adjust the height of your workplace, based on
your login. A third example, well known to you, is the usage of
End-User Computing power to enable the expansion of the
research capacity at one of our customers, The Netherlands Cancer
Institute, by simply utilizing the GPUs at night when hardly
anyone is at the office. It sounds like easy examples, but what
happens is that the End-User Computing is becoming increasingly
important for business or non-IT managers within the customer
organization, like the HR department. The consequence is that
there will be non-IT owners in the customer organization, what
can be used to increase the level of commitment during the
implementation of End-User Computing changes. However, it also
means we have to adapt our language to non-IT stakeholders and
learn the outcome-based language instead of discussing IT
changes.

Me: How do you handle changes in your own personal life?

64 VDI Design Guide Part ||


Robert: Well, that is of course not an easy question. In my work-
life we analyze our projects and changes to learn and adapt over
time. In my personal life, that is less explicit. However, it does
depend on the level of change. If it concerns material changes, like
changing your car or TV, I must admit I act sometimes impulsive.
And even if I think I make good analyses upfront, I know I’m also
a victim of my reptile brain that already decided far before my
cortex is analyzing the change… But if it is about impactful or
meaningful changes, then I do include the step of trying to find out
the ‘Why’ behind the change and what objective to achieve. For
instance, the choice to start my Masters in Finance a few years ago,
I started first analyzing what was the real ‘Why’. Later in that
process when it came to discipline and perseverance, it helped me
enormously knowing what was the Why and the importance to
stick to the plan. The same accounts for the Corona time we are
currently in. We can all react in really different ways, for me taking
time to understand the ‘Why,’ main values and outcomes were the
important rational behind the changes in my personal and
working life during this crisis.

Me: You taught me multiple valuable lessons, but if there is


something you would like the person reading this to take away,
what would it be?

Robert: I will keep it short and put it in three main principles, so it


is easier to remember:

1. Ownership of change is crucial but is not a given fact.


It needs constant attention and there is a difference
between who is the formal owner and who is or are the
informal owners.

If you think you can manage the change purely within the
IT arena, you are mistaken. The end-users in the business
are crucial stakeholders and you need an explicit approach
to manage the change that they are going to experience.

2. Knowing the ‘Why’ behind the change is crucial to enable


the right conversation and perseverance in difficult times.

65 VDI Design Guide Part ||


Especially if your change is conflicting the interests of
your stakeholder, it is easy to get positioned in the back
seat. Especially if the technical rational is all you have at
that moment in time. A constant knowledge of the real
‘Why’ for the company, and also understanding the
consequences on the real Why if the change is not
implemented, will enable you far better to persuade and
get commitment.

3. You can only implement an End-User Computing change


successfully, if you understand it is all about the End-
User!

In other words, have an explicit approach to reach them


and understand their concerns. Onboard End-User
representation in the project and create ambassadors of the
change inside the End-User departments. Have an eye for
including satisfiers as part of the change and ensure direct
communications.

Just to enforce the above words. A few years ago, we were not
successful at all in a big project. The technology was working, but
we managed the change from a technical perspective. The first
time we found out the end-user was considering this change as
unwanted, it was already far too late. We didn’t get that feedback
via a direct communication. The resistance started as part of the
informal talks at the coffee machine, then reaching to the business
management, who on their turn talked to their management and
finding its way to the board of the company. No communication
was yet made to the IT project, when the CIO of the company was
getting the feedback out of board level… You can imagine, this
was a disrupting event for the image of this EUC project. It never
got finished. Luckily, we learned our lessons and as of that time
we include an explicit approach, like explained via the three main
principles.

If you want to know more about Robert and his work as the CEO
of ITQ, you can follow him on twitter: @roberthellings or find him
on LinkedIn.

66 VDI Design Guide Part ||


THE VMWARE END-

USER COMPUTING

FAMILY
Where the VDI Design Guide primarily touches the VMware
Horizon Suite, this book covers some more components (from a
VMware EUC perspective) which might be necessary when
looking at advanced designs. VMware continues to innovate in the
VDI space, but when looking at the bigger picture, they are heavily
investing in the entire end-user computing portfolio. The bigger
picture here means the triangle between the identity, the device,
and the apps.

67 VDI Design Guide Part ||


Where VMware Horizon has a main focus on delivering any
application to any type of device, it certainly doesn’t do that with
the ultimate user experience in mind. Sure, because you can run a
remoted version of Adobe Photoshop on an iPhone, it doesn’t
mean you should. There are specific versions of Adobe Photoshop
available for an iPhone, so why wouldn’t you use that? If you ask
IT that question, you quite often get the answer that they are
unable to manage that. This is where the VMware Workspace
ONE suite comes into play. Being able to run any type of
application on any type of device, but with the best user
experience, is what’s important here. Where it excels even more is
when adding identity management and security to the suite. This
section is dedicated to give you a high-level overview of the
different components in the VMware Workspace ONE Suite (from
a VMware Horizon perspective). We will dive a bit deeper into
some of these different components later in the use case sections.
Just see this as a mild snorkel session on a late afternoon with tons
of really cool things to see, like turtles taking a breath or an
occasional dolphin swinging by (this is the dive instructor in me
who always thinks about diving when someone starts a deep dive
on a certain topic).

68 VDI Design Guide Part ||


WHAT IS VMWARE WORKSPACE

ONE?

The short answer is that it’s the Digital Workspace Suite from
VMware, which includes a full set of solutions to manage the
identity, devices, apps, user experience, and security of the end
users. And it does it in a modern way. It’s basically VMware’s
answer to Microsoft’s Modern Management strategy -- a “new”
approach to manage the workspace of an end user. The reason
why new is between quotes is because it’s new to many
organizations. Modern management itself isn’t new. Windows 8.1
got introduced with a new set of APIs which made it possible to
manage devices running that OS like you would manage a
smartphone or tablet. Modern management strongly focusses on
self-service, continues delivery of apps, simplified update
mechanisms, and security based on the identity and context of the
end user.

There are a couple of reasons why most organizations haven’t


adopted this way of management yet, the first being that they have
a shit ton of traditional apps, data, and infrastructure. Microsoft
doesn’t force customers yet to migrate to modern management.
Windows 10 can still be managed in a traditional way, with a
domain membership, device policies based on GPOs, and remote
working through VPNs (Virtual Private Networks). It doesn’t have
the best user experience, but when an organization is still IT-
defined, it doesn’t really matter, right?

VMware saw a new opportunity in the market when they acquired


AirWatch in 2014. They already had VMware Horizon and
ThinApp as their main EUC solution in their portfolio but
recognized that EUC wasn’t just about VDI and delivering
virtualized applications. The acquisition of AirWatch was one of
the most expensive acquisitions in the company’s history; they
spent 1.54 billion dollars to eventually become the global leader in

69 VDI Design Guide Part ||


Enterprise Mobility Management. Device-focused management has
had multiple names through the past years: Mobile Device
Management, Enterprise Mobility Management, Unified Endpoint
Management, you name it. In my opinion, they mean the same
thing (although some might disagree). What AirWatch did really
well was that they were able to manage literally any type of device
that you might want to use from a business perspective. They were
even able to manage my old Psion 5MX I had lying around. ☺
AirWatch eventually became VMware Workspace ONE Unified
Endpoint Management.

VMware also built a solution called Workspace Portal. It was one


of the first versions of their Identity and Access Management
solution: a portal that could be used for identity aggregation and
the possibility to offer you single sign-on capabilities to VMware
Horizon, SaaS-based apps, and web-based apps. The Workspace
Portal became VMware Identity Manager (vIDM) and is currently
known as VMware Workspace ONE Access.

The two services offer a great solution combined (as Workspace


ONE Standard), but that hasn’t always been the case.

In one of my first VMware Workspace Suite projects (which was


the name for the entire EUC Suite from VMware, back in 2015), I
was part of a team of people who sold the vision of a user-centric
workspace to a customer. The customer came from a seriously
traditional workspace that evolved over years. They were one of
the first to run a Microsoft Active Directory for thousands of users,
to implement an IT store where employees could request
applications, access to data, reset their password, etc. These were
true pioneers who believed that innovative technologies could be
used to stay ahead of the competition. The customer had a drive to
become one of the first with a Digital Workspace. We spent many
days building demos, business cases, proofs of concepts, talking to
many people in their business, with the result of inspiring and
convincing them that VMware’s solution was the right one.

In the first phases of the project that would follow, we were full of
confidence about the solution and the outcome that we were going
to realize. After finalizing the design, and building the production

70 VDI Design Guide Part ||


environment, that confidence started to shrink. In the pilot, we ran
just a small number of use cases and validated their success
criteria. In the production environment, that number of use cases
was obviously widely exceeded. Which meant that we hadn’t run
their environment at a large scale and with a wider variety of use
cases. The result was a serious disaster. Without going into too
much detail, I can tell you this: the two solutions that would
eventually form the first version of Workspace ONE were by far
not ready for the enterprise. The integration between the two
solutions (Identity Manager and AirWatch) was hard to build, and
seriously buggy. Syncing the AD took ages and quite often just
stopped, the integration between the two solutions sometimes just
broke, and don’t get me started on updating and upgrading them
(they were both on-prem).

The one valuable lesson I learned from that project is to be


somewhat more cautious before you introduce technology which
isn’t really proven yet.

Many years later, a lot has happened regarding the development


of Workspace ONE. Not only did it mature into an enterprise-
ready Digital Workspace solution, but it also got expanded with
Workspace ONE Intelligence, Carbon Black, and various different
mobile applications (like the Boxer email app and VMware Verify
for multi-factor authentication).

71 VDI Design Guide Part ||


VMWARE WORKSPACE ONE ACCESS
Like I mentioned in the previous section, Workspace ONE Access
has had different names prior to what it is now. Workspace ONE
Access is a solution which VMware has fully developed
themselves and has evolved from a web-based portal to access
different types of applications to an employee hub which offers a
wide variety of services. Workspace ONE Access consists of the
following main modules and features:

• Web-based portal for end users.


• Application aggregation from all sorts of sources (web-
based, SaaS apps, traditional apps, virtual desktops, and
even Citrix-based Apps and Desktops).
• Service integration to offer employee experience-related
services.
• Identity federation through standards like SAML and
OAUTH.
• Integration with Multi-Factor Authentication services.
• Conditional Access Policies on main access level, but also
on application level.
• Auditing and logging.

The idea is that every end user uses Workspace ONE Access as
their main entry point to the corporate resources, which can either
be internally hosted, cloud hosted, or a combination of both. The
main portal has a consistent user experience, independent of the
type of device the end user is running the portal from. A modern
browser which supports HTML5 should be sufficient. The
following figure shows the Workspace ONE Intelligent Hub from
a modern HTML5-based browser.

72 VDI Design Guide Part ||


The above screenshot shows the web-based Workspace ONE
Intelligent Hub application.

While most functionality will work, Workspace ONE Access can


also provide device-specific functionality such as biometric-based
authentication (Facial recognition or fingerprint recognition). To
get access to those features, Workspace ONE Access has a native
application called the Workspace ONE Intelligent Hub (see the
following figure).

73 VDI Design Guide Part ||


The above screenshot shows the native Workspace ONE Intelligent
Hub application on an Apple iPad.

The Workspace ONE Intelligent Hub is continuously being


updated and upgraded with new features. Where the Hub started

74 VDI Design Guide Part ||


as just an improved version of the user-side of Workspace ONE
Access, it evolved into a solution which brings all sorts of
integrations and services together. The idea is to enable the end
user with their most common IT and HR tasks. A simple password
reset, submitting a helpdesk ticket, but since 2020 it’s also possible
to integrate with HR systems so Workspace ONE can be used for
onboarding of new employees as well.

At my employer ITQ we actually started using some of these


features since the pandemic. We normally like to invite new
employees over to the office and show them around, let them meet
new colleagues, and not unimportant, hand over their bag of toys
☺ We offer a COPE model (Corporate Owned, Personally
Enabled) for hardware. It doesn’t really matter what type of
hardware you choose, and at the first day you would normally get
an ITQ-branded backpack with the new gear. They would start the
enrollment process in Workspace ONE to get access to corporate
resources. Since the pandemic, we needed to adapt to the situation.
After someone now signs an offer letter, the credentials for
Workspace ONE are being provided. After signing in for the first
time, they are presented with both the HR system and a portal
from one of the biggest computer retailers in the Benelux region in
which they can just order whatever brand of laptop they like, and
it will automatically be shipped to their home address. The Hub
portal also gives access to resources such as enrollment procedures
and other welcoming information. After they receive the new
device, enrollment can be started and access to corporate and
secure resources is provided. Sure, it works really well, but it
doesn’t fill the gap of meeting colleagues in real life. I think the
good thing here is that IT isn’t involved in any step here. HR is
fully in control over the onboarding in which the Workspace ONE
Intelligent Hub facilitates the whole onboarding process. If you
like to see a demo of this, check out the following link:

https://www.youtube.com/watch?v=MReIFlS8z00

75 VDI Design Guide Part ||


The Workspace ONE Intelligent Hub can provide integration with
major HR and helpdesk solutions like ServiceNow, DocuSign,
Workday, and many more.

Workspace ONE Access can currently be consumed as a Cloud-


based service as well as a on-premises appliance. Which one you
should choose is covered in the considerations section on the next
few pages.

Design Considerations

There are many considerations to take into account when


designing Workspace ONE Access, but they are mostly focused on
the different features of the service. As this book is mostly focused
on the Horizon side of things, I won’t cover the features
individually, but instead focus on the ones that could impact the
Horizon integration. Both from an architecture-perspective as well
as from a solution-perspective.

Deployment Considerations

The first consideration is regarding the deployment. Are you going


for cloud-based or on-prem? I have been part of projects with both
cloud-based and on-prem Workspace ONE Access, and I have a

76 VDI Design Guide Part ||


strong preference. If no specific requirement dictates why you
need to go for an on-prem deployment, stay away from it.

My main considerations are:

• Workspace ONE Access on-prem requires a lot more


design work on things like high-availability, sizing and
scaling, database locations, etc.
• Workspace ONE Cloud is updated frequently, and new
features are sometimes added on a weekly basis. On-prem
has a different update cadence and sometimes doesn’t get
new features or gets them much later.
• You need to run updates and upgrades on the on-prem
appliances yourself, which can be a time-consuming
exercise.
• Although no one really confirms, I cannot imagine that the
on-prem version of Workspace ONE will be available that
long. For VMware, it’s quite intensive to maintain the on-
prem branch, which might be another reason to move
away from it. And since they are moving towards
subscription-based licensing for basically everything,
would it still make sense to pay for a subscription while
not fully consuming it.

Are there any downsides to the cloud version? To be honest, the


only feature I miss in the cloud version, is the ability to use a
custom URL for your Workspace ONE Access instance. VMware
offers different types of instances (shared or dedicated) and even a
dedicated instance doesn’t support a custom URL. Is it a
dealbreaker? I don’t think so. If you use Workspace ONE Access in
conjunction with Workspace ONE Unified Endpoint Management
(Workspace ONE UEM), you probably won’t even use the URL
that much, because of the enrollment process that can use an auto-
discovery mechanism.

One thing that you need to keep in mind, is that the cloud version
is hosted. In case the service becomes unavailable, users won’t be
able to authenticate with the platform, and thus won’t be able to
access applications.

77 VDI Design Guide Part ||


Connector considerations

In case you need to connect Workspace ONE Access to a VDI


platform based on VMware Horizon 7, 8 or 2103+ or Citrix, you
will need to use a connector. The connector will enable you to
provision your desktop and application pools and will
synchronize your entitlements as well. There are a couple of
considerations regarding the connector.

• In case your requirements dictate an availability that


avoids the use of single points of failure, you need to
deploy the connector in an HA pair. Check the Workspace
ONE reference architecture, for more information:

https://techzone.vmware.com/resource/workspace-one-
access-architecture#workspace-one-access-connector

• I’m not really sure why, but in case you need VMware
Horizon integration with Workspace ONE Access, you
can’t use the modern connector software which is on the
same version as Workspace ONE Access. As of this
writing, VMware Horizon 2103 and Citrix Virtual Apps
and Desktops support the legacy connector only.
• In case you are using the legacy connector for VMware
Horizon, it automatically means that you will use the same
connector for the on-premises AD sync as well.
• Workspace ONE Access has the ability to provision
ThinApped applications directly to a Windows-based

78 VDI Design Guide Part ||


endpoint. To sync ThinApp applications and entitlements,
you need a specific connector (version 3.3).

Sizing

Another consideration is related to sizing. I mentioned the


dedicated and shared environments earlier. To become eligible for
a dedicated environment, you need a minimum number of users.
That number has been 3000 users for a while, but it might differ
because of scalability improvements. Always contact your
VMware sales specialist for your options. From a performance or
SLA perspective, I don’t think there are many differences. The only
main difference I can recall, is the ability to schedule your
maintenance windows for upgrades. For information about the
sizing of the connectors, please check the following page:

https://docs.vmware.com/en/VMware-Workspace-ONE-
Access/19.03/identitymanager-connector-win/GUID-A401F9EA-
0BD5-42E3-BF62-41F278724C85.html

Identity Federation

One of the first things to configure after the initial setup, is the
Identity Provider (IDP). Where will your users be primarily
authenticated? There are a great number of different IDPs which
are fully supported. Which one to use, depends on your situation.
It’s even possible to use multiple IDPs in the same Workspace
ONE Access, if needed. The following list show some of the IDPs
which are supported:

79 VDI Design Guide Part ||


• Azure AD
• ADFS
• Ping
• Okta
• Google Workspace

There are some considerations regarding the different IDPs:

• Not every IDP works in the same way. Some use SAML,
some use other protocols. Check the documentation on the
IDP to know how you can federate identities to Workspace
ONE Access.
• If you like to federate an external IDP to access internal
Horizon resources, be sure to include True SSO in your
Horizon deployment (otherwise you need to enter your
password again when accessing Horizon resources.

Conditional Access

One of the coolest features of Workspace ONE Access, is the ability


to build a conditional access policy, based on devices, device
posture, identity, user context, operating systems, location, etc.
There are a wide variety of different contexts which can be used to
create a policy which helps you to define an enterprise security
model, but with a consumer-simple experience.

For more information about the design and creation of those rules,
I’d highly recommend checking the following video:

https://youtu.be/h5XIu5K6Pes

80 VDI Design Guide Part ||


VMWARE WORKSPACE ONE UNIFIED

ENDPOINT MANAGEMENT
When designing VDI solutions, does it really matter what
endpoint is used and how you manage the endpoint? Absolutely!
In the VDI Design Guide, I covered that topic quite thoroughly,
but not from a Workspace ONE perspective. Like mentioned in the
introduction of this section, VMware acquired AirWatch, which
opened up the ability to extend VMware’s focus within the EUC
space. Back then, AirWatch was the undisputed leader in mobile
device management, mobile application management, mobile
content management, etc. Their solution covered all aspects of
mobility management, with a major focus on doing all of the
previously named types of management, for a brought set of
device and OS types. AirWatch always claimed to offer day 1
support for new versions of operating systems. A new version of
iOS got released? No worries, AirWatch offered same day support.
I don’t know if that form of support was unique to the world of
mobile device management, but I do know it helped AirWatch to
become the leader in this space.

After the acquisition, nothing really changed for a long time. For a
couple of years, AirWatch just remained AirWatch. In 2016, the
world saw the first integration possibilities between AirWatch and
VMware Identity Manager. The transition to VMware Workspace
ONE Unified Endpoint Management took two more years and was
finalized in 2018.

VMware Workspace ONE UEM its main focus is on the


management of your endpoint. Smartphones, tablets, Macs,
Windows 10 machines, Chromebooks, VR goggles, Apple TVs,
your name it. UEM enables you to enroll a device into the solution,
which enables remote management. After enrollment, it will be
possible to configure the device through so-called policies, and
check for compliancy of those policies. What makes the integration
between UEM and Access so cool, is that you use the device status

81 VDI Design Guide Part ||


or compliancy status as a conditional access method in Access. Is
an iPhone jailbroken? You can automatically block access to the
Digital Workspace. Does a Windows 10 laptop miss certain
security updates? You can enforce the laptop to update first before
you provide access to certain applications. All automatically,
including automatic reporting of such incidents happening.

UEM fully supports company-owned devices, as well as


personally owned ones. Company-owned devices can be fully
managed, whereas personally owned devices can be fully
managed, partly managed or just run a couple of managed
applications for things like email or secure browsing.

There are several ways to enroll a device in UEM, and what’s


important to know is that enrollment basically is built so end users
are able to run the process themselves. Where a domain join
requires admin credentials to process and is only available for
certain devices, enrollment doesn’t and can run on the majority of
popular endpoints. Even Linux enrollment is currently available.
Enrollment is built in such a way that a single admin is able to
manage many more devices compared to managing traditional
domain-managed devices. Though the process can be offered
through self-service, it doesn’t mean that it needs to be. Automatic
enrollment is possible, even at the factory where devices are built.
Dell and HPE offer factory provisioning options, so that devices
are already managed as soon as an employee would unbox the
device. At the other spectrum of enrollment, you can invite end
users through an email, or instruct them to run the enrollment
process from the self-service portal inside the Intelligent Hub.

After enrollment, devices can be configured. What can be


configured, depends on the type of device and what the
manufacturer has enabled for device management solutions to
manipulate. Windows 8.1 introduced those mobile device
management policies for the first time in a Windows OS and with
every new version of Windows 10, more features are being added.
At this moment, any setting you can manage through GPOs, can
also be managed through MDM policies. Apple has done the same
with macOS, iOS, iPadOS, tvOS, etc, as has Google with both
Android and ChromeOS. The policies can be focused on

82 VDI Design Guide Part ||


configuration or security, just like traditional GPOs. The best thing
about UEM is that you can manage all of the different types of
devices in the same exact way. A password policy or pin code
policy is set in the exact same way for macOS, Windows 10,
Android, iOS, etc. Because of that, you are able to introduce a
single endpoint management strategy for all of your endpoints.
Something which is worth mentioning, is that companies like Igel,
who offer thin clients with a management solution, are also
integrating their solution with Workspace ONE UEM (which they
announced during VMworld 2020). Although I’m unable to talk
about it, I know for a fact that more thin client, laptop and
barebone companies are looking to integrate with Workspace ONE
UEM, which shows that VMware is on the right track here.

Next to device management, UEM also offers an application


management solution. It’s perfectly possible to deploy any kind of
native application to an endpoint. It could be deployed from a
public Appstore, installed from your own repository, and can be
installed automatically after enrollment or on-demand through the
Workspace ONE Intelligent Hub. For Windows 10 and macOS
applications, it’s quite common to install the application with some
command line switches, so the installation runs fully unattended.
In some cases, even iOS and Android applications can be installed
using certain configuration settings which will be injected during
the installation. This ensures that the user is able to use the
application without any manual configuration, immediately after
the deployment. Another example of offering a great user
experience to the business.

Modern Management & VDI

Is there any link with VDI? Actually, there is! For a long time, we
have been creating as many non-persistent desktops as possible. It
can save time, complexity and possibly increase the uptime of the
VDI. Now, non-persistency (especially when you look back at the
interview with Spencer Pitts) was primarily built to decrease the
footprint of the virtual desktops on the infrastructure and also
decrease the Total Cost of Ownership (TCO). And, quite honestly,

83 VDI Design Guide Part ||


we didn’t really have an alternative which could offer the same
benefits from a business perspective (TCO). Windows 10 and the
modern management APIs were the first signals that we could
manage virtual desktops in a different way and still lower the
administrative overhead. The idea is simple, after provisioning a
new virtual desktop, you automatically enroll the VM in UEM.
After the first logon, applications and (security) settings will be
loaded and users are able to work. Just like you would enroll and
configure a physical machine. The big advantage here is that you
can, yet again, extend your UEM strategy to another use case, such
as VDI. And it doesn’t really matter where the desktop resides. It
could be an on-prem Horizon virtual desktop, or even running on
Azure through Horizon Cloud:

https://docs.vmware.com/en/VMware-Horizon-Cloud-
Service/services/hzncloudmsazure.admin15/GUID-E4675019-
DA01-4173-99EE-DC6E51CCCBC9.html

When looking at the overall solutions, there are some advantages:

• You can use the same application delivery methodology


for physical and virtual devices.
• You won’t be needing App Volumes, which will save you
from migrating your applications to non-persistent ones.
• As you won’t be using non-persistent desktops, you will
be saving your profile locally on the virtual desktop. This
will save you from creating DEM profiles.
• The Microsoft Outlook and OneDrive won’t be redirected
to a container, so you won’t be needing FSLogix or
Writable Volumes.

84 VDI Design Guide Part ||


There are also some specific design considerations to take into
account:

• Users will get a dedicated virtual desktop, which means


that your sizing will most likely look a bit different
(especially from a storage perspective).
• Although Instant Clones and Linked Clones (now
deprecated) have a tiny footprint on storage platforms, full
clones nowadays have much smaller footprint as 10 years
ago, because of how good compression and deduplication
works. If you go this route, look at the storage platform for
its deduplication and compression capabilities.
• One of the advantages of non-persistent clones, is the
ability to offer virtual desktop services in multiple
datacenters and load balance them (think Cloud Pod
Architecture). As just the DEM profile stuff needs to be
replicated across multiple sites, it’s relatively easy to make
the virtual desktops highly available. With persistent
clones, that’s a bit more difficult. It won’t be possible to
offer an active/active multi-datacenter VDI with persistent
clones. Active/Passive is possible however, and even with
relatively low RTO/RPO times. Check the following
TechZone article for more information:

https://techzone.vmware.com/resource/horizon-active-
passive-service-using-stretched-vsan-cluster

85 VDI Design Guide Part ||


The future of managing virtual desktops

With all the stuff that’s going on right now with UEM, my guess is
that UEM will also become the primary solution to manage virtual
desktops. At this moment, you aren’t able to enroll a non-
persistent desktop into UEM, but my estimate is that this will
come in the next coming years. And if you think about it, it will
totally make sense for a number of reasons:

• I’m guessing that Microsoft will eventually deprecate


traditional domains, which means we need an alternative
way to include a desktop into an enterprise management
environment.
• VMware has built QuickPrep and ClonePrep before as
technologies to add a linked clone and instant clone to the
Microsoft domain. Now, VMware was fully depending on
Microsoft to accept this way of adding a desktop to a
domain. In case a non-persistent desktop needs to be
enrolled, VMware owns everything to develop something
like ClonePrep for enrollment. Also, enrollment for shared
devices is already available, so it’s not that we are talking
about a whole new use case here.
• In case a desktop is enrolled, you would ideally use the
same way of delivering applications to non-persistent
desktop as you would do with a persistent or physical one.
Remember Project A2 (A squared)? During VMworld
2015, Noah Wasmer and Sanjay Poonen announced a new
project which would bring AirWatch and App Volumes
together. Through the management interface of AirWatch
you could automatically entitle users for App Volumes
Appstacks, but the best thing was that this worked for
physical devices. So, after enrollment, the Appstacks you
are entitled to are automatically downloaded and
mounted locally on your device. The way it worked, was
through App Volumes 3 VHD-based Appstacks. Long
story short, while this announcement got a standing
applause after the demo at the keynote, it never saw
daylight. Rumors go that the Project A2 demo was part
real, part sci-fi. The biggest challenge is that downloading

86 VDI Design Guide Part ||


Appstacks could take a while, which may lead to UX
issues and thus a nice idea, but not really viable. But the
integration between UEM and App Volumes for virtual
desktop management shouldn’t be too hard to build.
• Where’s the business value in all of this? Well, less
management interfaces which an admin needs to work
with, the lower the administrative overhead will be. Plus, I
designed a single endpoint management strategy a couple
of times, for different customers. Extending that strategy
towards VDI would be a big deal and from a VMware-
perspective, also a major competitive edge.

Before we go back to UEM, I’d like to share a video I made with


Spencer Pitts about managing persistent VDI with UEM. You can
find it here:

https://www.youtube.com/watch?v=SgxEa4Wc87o

87 VDI Design Guide Part ||


Is UEM the holy grail?

Is it all great? Unfortunately, Workspace ONE UEM does have


some downsides to it. First of all, and I’m not joking, there are
thousands of settings you can manipulate to configure the beast.
Thousands. And quite honestly, you probably won’t touch most of
them. Now, a lot has happened in the last year or so, because
VMware now offers a kind of quick start, which will guide you
through the important knobs and buttons. This will help you to
streamline the configuration and avoid navigating through the
entire configuration menu.

Design Considerations

There isn’t that much that can be designed which will heavily
impact your UEM deployment up front. There are a couple of
things to take into account though:

• Although it’s primarily built for cloud consumption, UEM


is still available for on-prem customers who have a specific
requirement to build and manage their own UEM instance
(such as specific federal/defense organizations). Like
Workspace ONE Access, I would highly recommend
staying away from the on-prem deployment. Where
Access has a couple of nodes and a database which needs
to be maintained and regularly upgraded, UEM has far
more components and services, which are probably built
redundantly because of HA. Trust me on one thing,
upgrading those components is for sure something you
want to stay away from. It’s time consuming, error prone,
and always stays behind the cloud version in terms of
feature set.

88 VDI Design Guide Part ||


• UEM will highly depend on an identity provider so the
enrollment will be tied to a certain user. Of course, the
integration with Workspace ONE Access as IDP will
enable you with most of the features (full functionality of
the Intelligent Hub, MobileSSO for iOS and Android, full
device compliancy support, etc), but of course it’s possible
to use others like Okta and Azure AD as well. You might
lose some features, but it will all depend on your
requirements.
• Like Workspace ONE Access, sizing will dictate a shared
or dedicated environment. Contact your VMware sales
specialist to know more about these options.
• Not all features are fully available in every version.
Extended Windows 10 management requires Workspace
ONE Advanced Edition for instance. Take a look at the
feature comparison to know what version might suit you:

https://www.vmware.com/content/dam/digitalmarketing/
vmware/en/pdf/products/workspace-one/workspace-one-
editions-comparison.pdf

• Integrating UEM with your local AD will require a


connector. It used to be a separate connector, but since the
convergence of Workspace ONE Access and UEM, you
can now use the same connector if needed.

89 VDI Design Guide Part ||


• UEM can integrate with your local infrastructure for tons
of things. Content management, certificate services, email
management, (per-app) VPN tunnels, etc. Some of the
integrations are available after installing the connector, but
for some of the integrations, the Unified Access Gateway
is required. For more on the integrations, check out the
following VMware article:

https://docs.vmware.com/en/VMware-Workspace-ONE-
UEM/2102/UEM_Recommended_Architecture/GUID-
CDC49AFA-98EF-4FBD-897D-
A561FACB9915.html#vmware-tunnel-and-unified-
access-gateway-tunnel-9

• UEM has some new tools and features which aren’t


generally available during the time of this writing. One of
them is the Freestyle Orchestrator. Freestyle Orchestrator
is a low-code orchestration platform which enables you to
create workflows for tasks like application deployment,
image building, and applying configuration settings. More
about Freestyle Orchestrator can be found here:

https://docs.vmware.com/en/VMware-Workspace-ONE-
UEM/services/GUID-AWT-FREESTYLE-
ORCHESTRATOR/GUID-freestyle-orchestrator.html

90 VDI Design Guide Part ||


Like all the other Workspace ONE solutions, UEM has a
comprehensive reference architecture, which is fully documented
on Tech Zone. Take a look at the following site for more
information:

https://techzone.vmware.com/resource/workspace-one-uem-
architecture

VMWARE WORKSPACE ONE

INTELLIGENCE
I originally wanted to end the End-User Computing Family section
with Workspace ONE Intelligence, simply because it’s the glue
that sticks everything together. As you may know or have read,
most of the EUC solutions from VMware were acquired. That’s no
problem, but it does come with some challenges. The most
important one is the integration. Integrating acquired solutions
might pose a challenge if the solution doesn’t include a strong API
(or any API for that matter). That challenge didn’t really exist
when VMware acquired Apteligent in 2017. The Workspace ONE
portfolio with AirWatch and Identity Manager was good for most
of the use cases, but it lacked analytics and automation. This is
why Apteligent was brought in as the first step in building
Workspace ONE Intelligence. Apteligent was already built with a
strong API because the nature of the solution. It needed to easily
integrate to gather data and be able to run analytics on it for a
variety of use cases such as mobile performance management,
business insights and application behavior analytics. The

91 VDI Design Guide Part ||


information gathered with the original engine from Apteligent, is
great, but where the magic happens, is when you use Workspace
ONE Intelligence for its automation engine. The most common
presented use case is the one where an organization uses
Workspace ONE Intelligence to check the battery capacity of
laptops and determines if a battery is still healthy. In case it drops
below 70% of the original capacity, a trigger is automatically sent
to ServiceNow to create a ticket to ship a new battery. In this case,
the battery can be replaced before an end user might even know it.
This form of proactive support is a great way to increase employee
experience and position IT as a business partner.

In 2018, VMware took Workspace ONE Intelligence to the next


level by strongly focusing on the security side-of-things. Where the
solution currently excels, is in this area. In 2018 VMware
announced the acquisition of E8 Security, which offered a behavior
analytics engine. E8 introduced the ability to use the context of the
end user and perform all sorts of analytics on it to determine if
behavior is normal or deviates. In case of a deviation, proper
security measures can be taken to allow access to resources, or
mitigation actions can be taken to, for instance, patch an endpoint.
Somewhere around the acquisition, VMware also launched their
so-called Trust Network, a partnership with specific security
vendors who offer tight integration with Workspace ONE
Intelligence. With the integration, security vendors like Lookout
and Carbon Black (who eventually got acquired, but more on that
later) offer their value-added services to the Trust Network. The
data that is gathered from users, endpoints, and applications, is
being analyzed in real-time by these vendors, where they use their
services for risk analytics. I honestly think this is one of the best
features in the entire Workspace ONE Suite. The following
security vendors are currently part of the Trust Network:

• Zimperium Mobile Threat Defense


• Pradeo Security Mobile Threat Defense
• Netskope Security Cloud
• Lookout Mobile Endpoint Security
• Wandera Unified Cloud Security
• Zscaler Private Access (ZPA)
• Carbon Black Predictive Security Cloud

92 VDI Design Guide Part ||


More partners will be added in the future. If you like to check the
latest list of partners, take a look at the VMware Marketplace:

https://marketplace.cloud.vmware.com/

You can filter for security solutions, who are part of the Trust
Network.

Digital Employee Experience Management

Important in any end-user computing solution, is the ability to


monitor your user experience. I covered that section quite
thoroughly in the VDI Design Guide, and what I mentioned there,
is that VMware has a couple of tools in place to monitor the end-
user part. One of them is the Horizon Performance Tracker (which
can be installed during the horizon agent installation). A great
tool, but not really scalable and mainly focused on the connection
protocol. In 2020, VMware introduced Digital Employee
Experience Management (DEEM). DEEM is powered by
Workspace ONE Intelligence and enables you to see what’s
happening on the end-user side. DEEM collects a bunch of metrics
and translates them into Experience Scores. The following
Experience Scores and telemetry data are currently available:

• Organization Experience Score


• User Experience Score
• Desktop Apps Experience Score
• Mobile Apps Experience Score (In case the app uses the
Intelligence SDK)
• Device Health
• Application Performance and Stability

93 VDI Design Guide Part ||


• OS Crashes
• Login, and Logout
• Boot and Shutdown events and duration
• Windows Services Status
• Windows Performance Monitor Data

DEEM was originally created for physical Windows 10 endpoints,


who are fully managed by Workspace ONE UEM. Endpoints you
want to monitor and manage through DEEM still have to be
Windows 10-based, and also have to be managed through UEM,
but they don’t have to be physical endpoints. I played around with
DEEM in my lab on a bunch of virtual desktops, and I was really
amazed by the simplicity and remediation options.

One of the cool things about DEEM, is that it doesn’t just focus on
monitoring, it also enables you to automatically remediate certain
issues like application crashes (based on the automation engine in
Workspace ONE Intelligence). I think this is a very welcome
feature in the EUC portfolio, although from a VDI-perspective it
does lack some telemetry data.

DEEM Considerations

The current version (mid 2021) of DEEM is really awesome, but


primarily built for physical endpoints. It can work for virtual
desktops, as well, but comes with some limitations.

• As DEEM primarily focusses on physical endpoints,


telemetry data such as connection protocol information are
(not yet) available. My guess is that VMware will
introduce these metrics as they also recommend using
UEM and Intelligence to manage virtual desktops running
in VMware Horizon Cloud on Azure.
• DEEM requires virtual desktops to be enrolled in UEM
and fully managed by IT. This automatically means that
non-persistent desktops are not supported.

94 VDI Design Guide Part ||


• At this moment Windows 10 is the only supported
endpoint OS for the entire scope of DEEM telemetry.
• Automated remediation of application crashes doesn’t
work out of the box. Automation workflows need to be
created to enable this feature.

If you like to know more about DEEM, check out this article:

https://docs.vmware.com/en/VMware-Workspace-
ONE/services/intelligence-documentation/GUID-
19_intel_deem.html

More about setting up automation workflows can be found here:

https://docs.vmware.com/en/VMware-Workspace-
ONE/services/intelligence-documentation/GUID-
21_intel_automations.html

Workspace ONE Intelligence Design


Considerations

• Workspace ONE Intelligence is relatively straight forward


when talking about designing it. The solution can only be

95 VDI Design Guide Part ||


consumed as a cloud service (SaaS), so nothing to design
there. Something which does need to be designed is the
connector for on-premises collectors. Every site will
require a collector, preferably a pair (from a HA
perspective). More about the connectors for on-premises
collection can be found here:

https://docs.vmware.com/en/VMware-Workspace-
ONE/services/intelligence-documentation/GUID-
06_intel_install_connector.html

• Workspace ONE Intelligence doesn’t require a separate


agent. All you need on the virtual (or physical) desktop is
the Intelligent Hub application. You can automatically
install it in your base image and ensure enrollment is also
automated. Check out the following link to see how you
can automate the enrollment process during the cloning
process:

https://techzone.vmware.com/onboarding-windows-10-
using-command-line-enrollment-workspace-one-
operational-tutorial#_273303

96 VDI Design Guide Part ||


• As for all other major Workspace ONE solutions, VMware
has got you covered with a reference architecture on Tech
Zone. Take a look at the following site for more
information:

https://techzone.vmware.com/resource/workspace-one-
intelligence-architecture

VMWARE CARBON BLACK


Security is a very hot topic right now. On a near daily basis, I see
news reports about cyberattacks on organizations. It’s almost like
an ordinary bank robbery no longer exists. Criminals, or as of now
cybercriminals, don’t come in, all guns blazing, but attack you
remotely through the internet, which we all embraced and has
become a necessity in life. From both a personal and business
perspective, we need to mitigate those possible security risks and
avoid being hacked, or more realistically, become a victim of
things like ransomware.

Back in 2019, VMware announced the acquisition of a security


solution called Carbon Black. Carbon Black and VMware were
already working very closely on deep integrations with
AppDefense, Carbon Black, and later with Workspace ONE.
VMware loved their vision in what they were redoing and
rethinking about endpoint security, so acquiring Carbon Black was
a logical move (although quite honestly, it was still a big surprise
when the acquisition was announced).

But what is Carbon Black?

97 VDI Design Guide Part ||


Carbon Black is a cloud-based security solution, powered by
algorithms and machine learning. Within Carbon Black, there are a
variety of modules available like Endpoint Standard, Endpoint
Detection and Response, Audit and Remediation, and Workload
Protection.

Endpoint Standard is what we call a Next-Gen Antivirus solution.


Next-Gen Antivirus or NGAV takes antivirus services to a new
level of endpoint protection. Instead of scanning devices on known
file-based malware signatures and heuristics, Carbon Black (like
other NGAV solutions) uses predictive analytics driven by
artificial intelligence and combines this with threat intelligence to:

• Detect and prevent malware and fileless non-malware


attacks.
• Identify malicious behavior and TTPs from unknown
sources.
• Collect and analyze comprehensive endpoint data to
determine root causes.
• Respond to new and emerging threats that previously go
undetected.

Additional to the next-gen antivirus solution, Carbon Black also


offers Endpoint Detection and Response (EDR). EDR enables you
to act during a cyberattack to, for example, place devices in
quarantine and set up a live connection to the device for
investigation, and I personally like this a lot. Because if I learned
one thing in the last year, it's that the security of our endpoints and
users is more important than ever. As the large workforce from
home is here to stay, more and more devices will be used to access
the corporate network through the internet. As we connect from
any device and anywhere, our “new normal” has become one of
the largest threats to the corporate environment.

CyberArk surveyed 3000 remote workers and IT professionals and


the results show us that 77% of the remote office workers are using
unmanaged, insecure, personal devices to access corporate
systems. A staggering 93% are saying they reuse the same
password across multiple applications and devices.

98 VDI Design Guide Part ||


From the people that are using corporate devices at home almost
30% admit that they let family members use their devices for
things like schoolwork, gaming, and online shopping.

As you can imagine, these devices are a window for


cybercriminals to infiltrate the home network of the remote worker
and that can then quickly turn into a window towards the
corporate environment. This is one of the reasons why
cybercriminals massively started to target remote workers and
their online collaboration tools like Zoom and Microsoft Teams
because they all know the easiest way to intercept credentials and
get into more organizational networks is through the end user.

Take a phishing email for example. Just before Christmas in 2019,


the University of Maastricht got hit by a ransomware attack, and
the results were catastrophic. Through this phishing email, a
group of cybercriminals got into their network and were there for
months before they executed the ransomware attack on 269
servers.

The cybercriminals were able to work their way through the


network because security was, and for a lot of company’s still is,
something you configure and then forget about. Before the hackers
executed the ransomware attack, the administrators had received
multiple alerts from the antivirus product, but those were ignored
together with other vulnerabilities like using the same password
over and over again and the lack of updates on several servers.
The university ended up paying almost 200.000 euros to get their
apps and data decrypted.

CyberArk asked the IT teams in their survey if they are confident


in their ability to secure the new remote workforce, 94% answered
YES. 40% of the teams answered that they did not increase their
security protocols despite the significant changes in this new way
of working.

From my personal experience, I’ve seen customers working hard


on improving their security in recent years. Network teams have
been (micro) segmenting the network and configuring firewalls.
Datacenter teams have hardened the Windows servers that are

99 VDI Design Guide Part ||


running the company’s workloads, but when it comes to endpoint
and identity protection, I see desktop services teams still relying
on their old and trusted virus scanners that they probably
configured years ago and let them monitor and scan the devices
for known threats created in the past 20 years. So, everybody is
working on their island with their own set of tools to try and keep
the company safe.

And this is exactly one of the things that needs to change. Because
hackers and cybercriminals don’t work in teams. They start by
compromising an identity or device and start moving laterally
through the network to eventually compromise workloads and
data.

https://www.cyberark.com/resources/blog/remote-work-survey-
how-cyber-habits-at-home-threaten-corporate-network-security

Intrinsic Security

When we talk about Intrinsic security, we speak about the


transformation of security. Security nowadays is a team sport, and
to keep the company’s apps and data safe, every team has to come
off their island and start to work proactively together, preferably
by using one tool.

Security needs to be transformed and there are three ways how to


achieve that. Security needs to be:

• Unified instead of Siloed


• Proactive, not reactive
• Built-in instead of Bolted-on

100 VDI Design Guide Part ||


Unified

Unified as in technology should work better together when one


product is used across the environment, but also as in teams. As
said earlier, I see a lot of teams working on their island, but
another thing I notice is that the security team/officer, IT, and
operations are struggling to work together. Often there is a big gap
between both parties because they don’t speak the same language
in terms of IT.

I once was at a hospital working on a project to upgrade the VDI


environment to the latest version of Horizon and to migrate to
Windows 10. With the introduction of Windows 10, the hospital
wanted to minimize the number of browsers in their environment
and because Microsoft Edge Chromium enterprise was released, it
was the perfect time to phase out Google Chrome and Mozilla
Firefox.

The security officer who was part of the project brought to the
teams’ attention that the hospital wanted to comply with the NEN
7510 regulatory, and he had already downloaded the ADMX files
whereby the hospital would immediately comply with the
standard, not looking at any setting within that policy and
understanding what it would mean when we would implement it.

When the IT team implemented to policy in the test environment,


the browser was as secure as possible. But besides being secure,
the browser was not functional at all, and this is where the gap
between both parties became visible. The security officer had no
understanding of how the desktop environment worked and what
it would mean if the security policy would be implemented
without any changes. And the IT team thought the security policy
was an eyesore because they wanted a functional browser for their
users at the cost of security in their environment.

When taking the intrinsic security approach, you bring tools like
Carbon Black into the desktop environment and both teams will
start working closely together because all alerts generated will be

101 VDI Design Guide Part ||


shown in the same console used by both teams. The security team
will investigate the alerts and requires knowledge from the
desktop team to finetune the policies in Carbon Black. This way
both teams will learn more about each other's field of work and get
a better understanding of the things that happen in the
environment.

Proactive

Because everybody will have a better understanding of what they


are protecting, the mindset will shift from being reactive to
proactive.

Where the average administrator would start to act after the shit
has hit the fan and then tries to reverse engineer the attacks of
yesterday, the administrator will now start acting proactively to
prevent that attack on the system.

Built-in

The greatest thing of all is that the product, Carbon Black, is built
into the technology the administrator is using, whether it is a
network admin who controls the network traffic with NSX, a
server admin in charge of the workloads, or a desktop admin
who’s responsible for the devices and identities managed with
Workspace ONE.

Rather than relying on a standalone product for each capacity,


Carbon Black Cloud is integrated deeply into all layers of VMware
technology to align all teams and tools into the intrinsic security
approach as shown in the image on the next page.

102 VDI Design Guide Part ||


One of the most inspiring presentations I’ve seen about Intrinsic
Security was at VMworld 2019. Tom Corn and Shawn Bass held a
great talk about security and how change is needed, the live
demos made it even more powerful. You can watch the session on
YouTube following the link below, and I’d highly recommend
doing it.

https://youtu.be/LSU9HQopuIY

103 VDI Design Guide Part ||


Design considerations

As Carbon Black is integrated into all layers of VMware


technology, there are many considerations to take into account
when you are working on a design. As mentioned before, this
book is mostly focused on the Horizon side of things, so I won’t
cover the NSX, Workloads, or Kubernetes but instead, focus on the
ones that could impact Horizon and Workspace ONE.

Deployment considerations

Let’s start with one of the most asked questions these days: Am I
going to the cloud, or do I stay on-premises? In my opinion, the
answer is simple. Always cloud, unless you have a very, very strict
no-cloud policy, and here is why:

• On-Premises is complex and costly. It requires a massive


amount of resources and funds to maintain, while the
Cloud is hosted and managed by the provider.
• Carbon Black Cloud uses a single management pane,
while On-Prem exists out of multiple products, all
installed on different servers and using different
management panes.
• VMware is strongly focusing on delivering everything
from the cloud. Still, for certain use cases it might be
required to run everything on-prem. If that’s the case, I
would highly recommend reaching out to your local
VMware team and fully outsource the design and
deployment. It’s a complex solution if you are planning an
on-prem deployment.

But what if you have virtual or physical machines in your


environment that are not allowed to have any kind of internet
access? Well, there is a possibility to configure a Local Mirror
server. The concept of the Local Mirror server is self-explanatory,
the Local Mirror server is placed in the on-prem datacenter and is

104 VDI Design Guide Part ||


connected to the Carbon Black Cloud to download and maintain
the latest signature updates. Machines not connected to the
internet are seen as on-prem and receive signature updates from
the Local Mirror server.

Although Carbon Black has written a very detailed KB article on


how to set the Local Mirror server up manually, my colleague Ivan
de Mes has created a great blog on how to automate this.

https://www.ivandemes.com/automating-the-vmware-carbon-
black-local-mirror-configuration-for-windows/

Sensor considerations

When it comes to the installation of the Carbon Black Cloud


sensor, there are many options. You can install the sensor on
Microsoft Windows, macOS, and Linux. Installation possibilities
are:

• Invitation from the management console.


• vCenter Management console, if you have the Carbon
Black Workload Protection appliance implemented.
• Manually by downloading the installation files from the
Carbon Black Cloud management console.
• Automated by using software deployment tooling like
SCCM, MDT, or Workspace ONE UEM.

If you ask me which one to choose, I’d always go for automated


for logical reasons such reducing errors, achieving consistency,
and decreasing management overhead. Use the solution you

105 VDI Design Guide Part ||


currently use to build your VDI image and add the automatic
installation of the sensor to the task sequence. Installing the sensor
on application-layering products like App Volumes is not
supported as it needs to be running as a service, prior to any user
signing in. Installing it in the base image is the way to go.

When installing the sensor on physical devices, manage them with


Workspace ONE UEM and deploy the sensor through the “air”,
whether it is a Windows or macOS device.

VDI Considerations

Regarding VDI there are a couple of things you have to take into
account when installing the sensor on the various types of clones.
As linked clones no longer exist in the latest versions of Horizon, I
will focus on the considerations for Instant- and Full clones or
Persistent and Non-Persistent, so to speak.

Before you start deploying the Carbon Black Cloud sensors in your
environment, you need to have a VDI policy where you need to
configure some essentials settings for the virtual desktops starting
with bypass rules (exclusions).

Each organization must understand the trade-offs between


performance and security, and this can be a challenging discussion
between the security and desktop services teams. Each application
has to be reviewed to set exclusions or not. Because when you
work with filter drivers, you need exclusions to let it all work
properly. For VMware Horizon, App Volumes, and Dynamic
Environment Manager it’s recommended to implement the
following rules.

**\Program Files\VMware\**,
**\SnapVolumesTemp**,
**\SVROOT**,
**\SoftwareDistribution\DataStore**,
**\System32\Spool\Printers**,
**\ProgramData\VMware\VDM\Logs**,

106 VDI Design Guide Part ||


**\AppData\VMware\**

Next to the above bypass rules, it’s best practice to disable on-
access file scan mode and signature updates. The local scan
feature adds network overhead and augments resource utilization.
The Carbon Black Cloud can pull reputation and enforce policy in
real-time from the Cloud because most VDI environments
maintain 99% uptime.

For more information on exclusions and other AV best-practices,


I’d like to share the following VMware Digital Workspace Tech
Zone link:

https://techzone.vmware.com/resource/antivirus-considerations-
vmware-horizon-environment

There are many more options to set but the last one I’d like to
point out is the Auto-deregister VDI sensors that have been
inactive for setting. I would recommend to only enable this setting
on a policy for Non-Persistent virtual desktops. It’s recommended
to enable this setting to remove any clones from the management
console that have been inactive for a specified duration. Check the
following link for more information:

https://docs.vmware.com/en/VMware-Carbon-Black-
Cloud/services/cbc-sensor-installation-guide/GUID-D2BC3455-
B8EB-414F-A5FE-31D40C193ABE.html

107 VDI Design Guide Part ||


108 VDI Design Guide Part ||
Workspace ONE UEM & Intelligence

Personally, the integration of Carbon Black with Workspace ONE


is one of the most exciting and powerful things when it comes to
endpoint and identity protection.

If you have a Workspace ONE subscription, I recommend looking


into adding Workspace ONE Intelligence to that. Workspace ONE
Intelligence is the glue that creates the perfect mix of endpoint
protection with automation capabilities that can act on threat
insights.

After integrating Carbon Black into Workspace ONE Intelligence,


you can create an automation rule based on a variety of triggers.
For every trigger, you then can add an action rule which can vary
from quarantaining a device in Carbon Black to creating a ticket in
ServiceNow.

I’ve set this up in my lab environment and VMware TestDrive


sandbox, and was amazed by the simplicity of it all. After setting
up the integration between Carbon Black and Workspace ONE
Intelligence with only an API and SIEM key, I was able to create an
automation workflow.

So, I’ve created an automation rule with a trigger, which basically


is an IF rule, to monitor new incoming events on a threat severity
that is equal to, or higher than a score of 8. Carbon Black threats
are scaled from 1 to 10, the higher the score, the bigger the threat.
The action, which is the THEN rule, would quarantine my device
in Carbon Black, remove the device profile in Workspace ONE
UEM and send an email to my administrative account.

After that, I executed a ransomware attack on a virtual machine


managed with Workspace ONE UEM, and the magic happened.
My mind was blown by the power of these combined products.

109 VDI Design Guide Part ||


If you want to get started or learn more about the integration of
Carbon Black and Workspace ONE, I’d recommend this VMware
Digital Workspace Tech Zone page:

https://techzone.vmware.com/integrating-workspace-one-
intelligence-and-vmware-carbon-black-workspace-one-
operational-tutorial

110 VDI Design Guide Part ||


HARDWARE

INNOVATIONS
As a nerd who spends a lot of time in his home lab, I have a big
interest in hardware and how hardware can best fit the
requirements you might have. Hardware can really make a
difference for things like user experience, total cost of ownership,
but also security and recoverability. I’m still amazed by the
number of requests I get to run an assessment or health check on
an existing VDI platform to find out that the platform hasn’t been
designed correctly or optimally. And in this case, I don’t mean it
should have been designed according to my own standards. In this
case, the platform has just been designed incorrectly according to
vendor best practices. A great example is an organization who
contacted us to talk about Microsoft Teams integration into their
new VDI platform. Like every other company, they wanted to use

111 VDI Design Guide Part ||


Teams during the pandemic so employees could collaborate a lot
easier while working from home. They heavily invested in a new
platform, and primarily wanted to talk about offloading Teams to
the endpoint with the Media Optimization for Microsoft Teams with
VMware Horizon plugin. Since they wanted to talk about that topic
and how viable it was for them, I immediately assumed that they
didn’t have any GPUs in their VDI platform. Still, an assumption
needs to be validated, so I asked them about it. Before giving the
answer, you need to know they have around 500 end users, and a
wide variety of endpoint devices. Some use Apple devices, some
have thin clients, some have zero clients, and some run Windows
10 on their device.

They indeed invested in GPUs. And surprisingly, they invested in


a GPU capability for all end users, including room for growth and
scale. My alarm bells went off as in a normal situation, an NVIDIA
vGPU profile with a minimum framebuffer size of 1 GB should be
enough to run Teams within the virtual desktop. Obviously, it
didn’t, because why else would they have contacted us? When
asking a bit more and diving a bit deeper in their environment, I
found out that they were using NVIDIA vGPU 1B profiles for all of
their users. Nothing weird still, but when I found out what GPUs
they used, it all became clear.

The customer got some advice from a certain company that since
they wanted to achieve the highest density possible on a single
host with mainly task users, they were advised to go for a card
with a lot of framebuffer. The company recommended to run two
NVIDIA Quadro RTX8000s in a single host, as a single card has 48
GB of framebuffer and combined could offer 96 1B profiles to end
users. This would mean that they could run 96 users on a single
host.

First of all, while an RTX8000 is capable of running 1B profiles, the


number is incorrect. Due to technical limitations, you can offer a
maximum number of 32 1B profiles.

Second, the RTX8000 was at the time of purchase the most


powerful card there was, when talking about high-performance
VDI use cases such as graphical designers, video editors, and

112 VDI Design Guide Part ||


emerging use cases like streaming virtual reality to a stand-alone
headset. The goal of these use cases is not consolidation, but
performance -- not for the masses, but for a limited number of
users on a single card. When looking to a GPU on a very high
level, it has three main important components: framebuffer (video
memory), cores, and encoders. The RTX8000 has a lot of
framebuffer, a lot of cores, but relatively less powerful encoders.
This is where the customer got the wrong advice. When sizing for
density and running a large number of frames per second (because
of the camera support in Teams) per user, you need to consider a
card or multiple cards that are capable of handling all of the
frames that need to be encoded.

At the end, we were able to change the encoder to a software


encoder and used the RTX8000 just for rendering instead. The user
experience was good, but the TCO of the platform was far from
good. It could have been a lot better if the company who advised
them had done their homework and knew about those best
practices.

This section will cover some of the current trends in hardware


innovations and will cover topics like CPU choices (Intel vs AMD)
and the impact of flash storage.

113 VDI Design Guide Part ||


THE IMPORTANCE OF HOST DESIGN

If there is one lesson regarding to the design of virtualization hosts


which we learned over the last 10 years, it is that we need to stay
away from custom builds. But is that lesson still valid? I know I
mentioned in the first book that using commoditized hardware
where possible is definitely something to consider. I still value that
recommendation, but when talking about advanced design topics,
you quite often need to deviate from it. Standardized VDI hosts
are purposely built for a healthy mix of common VDI use cases
and use cases that don’t fit in a standardized T-shirt size. When
designing a platform which will be used for 95% task workers or
knowledge workers and just 5% of the platform will be consumed
by heavy power users, such a standardized host could surely offer
a perfect user experience for all of those use cases. The power of
virtualization will offer you all of the advantages we have seen in
the past 15 years (where we built virtualization platforms to
consolidate workloads and get the most out of the hardware).

There is a different trend going on which brings the


commoditization of virtualization hosts to a different light. Major
companies like VMware, Microsoft, and AWS are offering
platforms which are capable of running basically any type of
workload. When they talk about ANY type of workload they can
run on their platforms, they will surely mean it, but that’s not the
message behind the message. If I read ANY, my brain starts to
think about all sorts of complex workloads which might be
challenging to virtualize (like some of the use cases described in
this book). That’s just natural behavior. One of the reasons why
they can now claim such a thing is because they actually were able
to run it on their platforms. And surely being able to run a Docker
container on a platform will help to run certain workloads, but is
that really where the innovation came from? When you read the
title of this section, I think it will give the answer away. Because
the other reason for those complex workloads now being able to
run on a virtualization platform is related to innovations in
hardware. Telcos, research facilities, gaming platforms, ISPs, and
many more businesses now have the ability to run their mission-

114 VDI Design Guide Part ||


critical workloads on virtualization platforms, either on-prem or in
the cloud. Take one of them, a Telco (Telephone Company). They
used to run the majority (possibly all) of their workloads on a bare-
metal platform because of the intolerance of latency. When they
are experiencing latency on their platform, you as a consumer of
their solutions will probably, too. Have you ever experienced
calling someone and experienced jitter? That could be due to a bad
signal but could also be due to infrastructure-related reasons.
Virtualization resulted in overhead and thus was avoided. A lot
has happened in this space, and I know for a fact that the major
Telcos in The Netherlands are using virtualization platforms to
increase the availability of their services. There isn’t a single secret
sauce which made virtualization viable for them, it’s a
combination of factors. The first reason is related to networks.
Networks have become much, much faster. With the introduction
of network interfaces which support RDMA (Remote Direct
Memory Access), latency generated by interaction between the
CPU, RAM and the network interface has been reduced to a bare
minimum. Instead of latency of milliseconds, we are now talking
about microseconds. If you combine that with the increase of
circuit speeds (hundreds of Gbps), the network doesn’t have to be
a bottleneck anymore. The same case can be made for RAM.
vSphere 7.0 U2 supports a blazing 24 TB of RAM in a single host.
Imagine running all of the workloads in memory instead of partly
on disk. And what about CPUs? The same vSphere version
supports an insane number of 896 virtual CPUs per host.
Processing data becomes a lot faster with such a number of
resources. In 2020 I even helped a Telco to build a virtualization
solution which included GPUs, so they could run some of their
workloads fully accelerated in the GPUs. The question now is, can
an organization like this run those workloads on a commoditized
platform? Although some of them might claim they could, I
strongly believe that it just isn’t possible (yet). Platforms like these
(but also the gaming platforms, research workloads, etc.) have
such a specific demand for resources that it brings the subtle art of
host design back to the table. But what does this have to do with
VDI?

This book focusses on advanced design topics: use cases which


require a different approach, a different view, and most likely a

115 VDI Design Guide Part ||


different hardware platform. Where virtualization companies offer
a platform to virtualize ANY type of workload, I think the
technology and underlying hardware platforms are nowadays just
as innovative so they can offer ANY type of use cases a place to
work. The secret sauce in this case isn’t just the right type of CPU
or GPU, it’s a combination of even more. What’s a VDI without the
right endpoint? And don’t forget about the ecosystem around VDI.
Operating systems that can support hardware innovations which
are virtualized are just as essential. Properly designing the
platform is vital and starts with the host.

This section dives into a variety of innovated components which


enable you to run those complex use cases on a VDI platform,
looking at components such as CPUs, GPUs, networking and
storage.

By the way, all of the use case sections which you can find later in
this book include background information on the different
hardware choices and considerations. Those considerations are
based on the experience I gained by being able to test and validate
those use cases on the different platforms. I need to thank
companies like HPE, Supermicro, NVIDIA, Intel, AMD, and
VMware (obviously) for their support in testing and validating
those components.

CPUS
Central Processing Units (CPUs) have always been a hot topic. Am
I going to use a lot of cores or fewer cores? Do I need a high clock
speed or a lower clock speed? Do I need multiple physical CPUs in
a single system? What about Intel versus AMD? And since
VMware is investing in ARM and NVIDIA acquiring ARM, is
ARM a thing? Questions, questions. The answer isn’t always that
simple, but the design methodology in the VDI Design Guide will
help you to narrow down the choices you have related to CPUs for
your specific use cases. Before we go a bit more into the
considerations, let’s dive into the trends and different choices first.

116 VDI Design Guide Part ||


117 VDI Design Guide Part ||
Intel

Besides being a global leader in chip building and CPUs in


general, Intel has been the undisputed market leader in datacenter
CPUs ever since VMware became a thing. Intel developed the
original x86 instruction set which debuted in the 8086 CPU. IBM
decided that it was a good idea to include the 8086 in their IBM PC
and the rest is history. Intel has always been an innovator in the
field of CPUs. Not just with the iconic original Pentium CPU, but
with other products like Intel Atom, the Pentium II and III and the
launch of the Intel Xeon line, they set the stage for other chip
builders.

When server virtualization became a thing, the Intel Xeon could be


found in almost all datacenters. Early virtualization solutions from
VMware like VMware GSX (later VMware Server) and VMware
ESX (later VMware ESXi) were optimized to virtualize x86 and x64
workloads. Although both AMD and Intel CPUs were supported
since the x86/x64 architecture parity, AMD was rarely seen in
datacenters. It’s not that one was better or faster than the other
when virtualizing workloads, but I guess the people responsible
for partnering with major OEMs like HPE, Dell Technologies, and
IBM did a really good job in making sure the Intel Xeon was the
number one CPU when talking virtualization. The interesting
thing here is that at first, some of the people at Intel were a bit
skeptical about virtualization. They thought it might drop CPU
sales because everyone would be consolidating their workloads to
fewer servers. Boy, were they wrong! The exact opposite
happened. CPU sales exploded because of virtualization and
brought fortune to Intel. VMware vMotion basically sold itself and
resulted in a disruption in the market. Everyone who saw their
first vMotion was sold and got a virtualization platform.

Intel released a number of architectures which have had a major


impact on virtualization. The Nehalem architecture was one of
them. The Nehalem-based CPUs were optimized for virtualization
with Intel’s second generation of the Virtualization Technology
(VT). The Nehalem architecture was also used in the first
generation of Intel’s Core CPUs (i5 and i7). It enabled higher

118 VDI Design Guide Part ||


consolidation ratios on physical hosts due to hyperthreading being
optimized as well as incorporating integrated memory controllers.

When looking at VDI solutions, the idea is to get the best density
while retaining a good user experience for the end user. Until a
couple of years ago, VDI was mostly used for desktop workloads
with a similar footprint. The footprint was mostly tied to your task
worker use cases as they could have a positive impact on your
density. Power users were always a challenge, especially when
looking at either single threaded applications or requirements for a
high clock speed. When Intel released the Skylake architecture
with the 18-core Intel Xeon Gold 6154 (3 GHz / 36 threads) in 2017,
this opened up the possibility for higher densities, even with
power users. For a couple of years, this CPU and its successor, the
Intel Xeon Gold 6254 (3.1 GHz / 36 threads), have been part of
almost all major VDI projects I have worked on.

In 2021, yet another iteration of the successful Xeon line was


announced to the world. In April, Intel announced that the 3 rd
generation Xeon CPUs with code name Ice Lake would include a
large number of new CPUs dedicated to get the best
performance/consolidation ratio in the virtualized datacenter. The
successor to the Intel Xeon Gold 6154/6254 promises a better
performance and higher consolidation ratio, but unfortunately, I
didn’t get the chance to test them. Next to the Intel Xeon Gold
6354, they also launched a 32-core beast with a 2.8 GHz base
frequency called Intel Xeon Platinum 8362. Imagine what two of
these will do for your high-performance workloads in terms of
consolidation?

Intel has always been a strong partner with VMware. I guess that
since Pat Gelsinger (VMware’s former CEO) announced he would
leave VMware in early 2021 to become Intel’s new CEO, that
partnership has only become stronger.

119 VDI Design Guide Part ||


AMD

AMD is Intel’s big competitor. AMD was founded in 1969 as a


company that built CPUs for other companies. In fact, AMD and
Intel were partners in the early 1980s. Intel wanted to build the
8088 CPU which came with the IBM-PC from IBM, but in order to
do so, IBM demanded that Intel used a second manufacturer of
those CPUs. Intel brought their partner AMD to the table. In
exchange, AMD got the plans for the x86 architecture.

Intel and AMD remained partners until the mid 1990s. The
partners ended up becoming each other’s competitor and in a
long-lasting dispute, the US court decided that AMD could still
use the Intel x86 architecture up to the 80486 processor family.

In 1996 they released their first x86 CPU that was fully developed
in-house. It was called the AMD K5 and was available with clock
speeds varying from 75 MHz to 100 MHz. True story, the first PC I
bought from the money I earned with delivering newspapers was
actually based on this CPU. Mine ran at 90 MHz and was fast
enough to play games like Doom, Mortal Kombat, and Command
and Conquer.

For a couple of years, AMD was really able to compete with Intel
in the CPU market. Many architectures followed the K5.
Personally, I owned a K6 and an Athlon as well, before I moved
over to Intel CPUs, even though AMD had some really nice
features in their later architectures (such as Turbo Core technology
to switch from 6 cores running at 100% speed to 3 cores running at
200% speed) -- technology we currently see in most CPUs as a so-
called Turbo Boost.

I stepped a bit out of the hardcore gaming scene for a while and
lost track on what happened with CPU trends. While working for
a system integrator from the mid 2000s to the early 2010s, I have
literally installed tens of thousands of PCs for all sorts of
customers. One of the things that came to me while planning this
section was that I never ever installed a PC with an AMD CPU
during that era. Intel made a couple of really good deals with

120 VDI Design Guide Part ||


major PC manufacturers like Dell and HP/Compaq back then,
hence the disappearance of AMD in the business PC market. Did
AMD disappear entirely? Absolutely not!

In 2014, I worked on a big Horizon 6 project and one of the first


Cloud Pod Architectures in Northern Europe. The customer
already had an existing Horizon View 5 environment in one
datacenter, based on AMD Opteron CPUs. We upgraded the
platform to Horizon 6 and added a new datacenter with a Horizon
Pod to the platform. This new Pod was based on Intel CPUs, as the
customer was bound to investment rules and RFP procedures. All
of the suppliers who answered to the RFP offered Intel CPUs
instead of AMDs. You could see this as a challenge, but the
customer used linked clones for 100% of their use cases, so
desktops weren’t moved from one Pod to the other. Back then,
these CPU differences between Intel and AMD caused for some
operability and mobility issues as you weren’t really easily able to
interchange VMs between the two CPU architectures.

For me as a consultant it was a true playground. Because we had


to work with the two architectures, it was the ideal place to run
regular performance measurements and benchmarks between the
two. I honestly don’t recall the exact CPU models from AMD and
Intel, but they were quite comparable performance-wise. After that
project, I haven’t really seen any AMD CPUs in the datacenter for
a long time until I got the opportunity to work on a potential new
VDI customer last year.

Before I tell you about that project, I want to emphasize that AMD
didn’t really leave the market. In fact, their CPU sales (and GPU
sales, since they acquired ATI in 2006) didn’t drop at all. The
acquisition of ATI brought a whole new market to the table.
Suddenly, AMD was able to deliver GPUs and because of their
long-lasting knowledge of the CPU market, even able to design a
hybrid CPU/GPU called Fusion APU (Accelerated Processing
Unit). This new market was formed by major game console
manufacturers like Nintendo and Microsoft. The Xbox 360 and
Nintendo Wii used an ATI-based GPU and basically set a new
standard for game console GPUs created by AMD. Where Sony’s
PlayStation 3 used an NVIDIA-based GPU, the PlayStation 4, Xbox

121 VDI Design Guide Part ||


One, and Nintendo Wii U all had AMD-based GPUs. In the latest
generation of Sony’s and Microsoft’s consoles (PlayStation 5 and
Xbox Series X and S), AMD was responsible for both the GPUs and
the CPUs, based on their Zen2 architecture. In my opinion, the
latest Zen architecture from AMD can be seen as a tipping point.

AMD launched the Zen architecture in 2017 and are slowly


gaining market share in (mainly) their CPU sales. Early in 2021, I
was investigating some new hardware for my lab that was capable
of running virtual desktops for all of the use cases that I’ve
covered in this book. Because of the price tag, the performance,
and support for large amounts of RAM, I ended up with an AMD
Ryzen 9 CPU (3900XT, based on Zen2) and I’m really happy with
this choice (but more about that in the home lab section ☺).

In March 2021, AMD announced their 3rd generation of their EPYC


datacenter CPUs. The successor to their Rome series, which is
called Milan, promises to be the world’s most powerful datacenter
CPU which should be offering the highest consolidation ratios
available, while retaining a great performance. Like the latest
generation of Intel CPUs, I haven’t had the opportunity to test
them, but I honestly am a bit skeptical (more about that in the next
section). The announcement event showed a LoginVSI benchmark
demo, running on a single host with two AMD EPYC 7763 CPUs.
A single CPU is equipped with 64 cores (128 threads) and runs on
a base frequency of 2.45 GHz. The benchmark showed an insane
number of 509 virtual desktops running on a single host, before
the performance dropped (whereas the Intel Xeon Gold 6258R
reached 240 virtual desktops before the performance dropped). I’m
assuming they used the LoginVSI OSOT template (which disables
basically every usable feature that could slow down performance).
In honest opinion, it’s a nice benchmark, but far from any form of
a representative outcome (as no one should ever use that template
for production environments, because it will disable a lot of
essential stuff). Still, I’m interested to see what the future will
bring with the Zen 3 architecture, as they promised to solve some
of the issues such as NUMA boundaries (more about that in the
interview with Frank Denneman).

122 VDI Design Guide Part ||


Intel vs AMD

Let’s go back to the project I worked on in 2020. The customer was


a massive VDI customer (15,000 concurrent users) and were slowly
approaching the end of their current hardware life cycle. The
current environment at that stage consisted of Intel-based
hardware. Their use cases are mainly focused on consuming
single-threaded applications, which require a high clock speed. To
the customer, density was really important to keep the TCO of the
platform as low as possible. For the new environment, they
wanted to see what else was on the market (besides Intel) and with
the help of their current supplier (HPE), they got a wide variety of
types of servers, CPUs, SSDs and NVMe drives, GPUs, form
factors, you name it. I got hired in the project to run an objective
benchmark over all of the different types of configurations and
work on an early hardware design which would fit their needs.

Of all the different hardware configurations, we ended up with


two different traditional rack models to run a benchmark
comparison on:

• HPE DL380G10 with:


o 2 x Intel Xeon Gold 6254s (72 logical cores at 3.1
Ghz)
o 512 GB RAM
o Intel Optane Flash Drives
o 6 x NVIDIA Tesla T4

• HPE DL385G10 with:


o 2 x AMD Epyc 6542 (128 logical cores at 2.9 Ghz)
o 512 GB RAM
o Intel Optane Flash Drives
o 6 x NVIDIA Tesla T4

We used VMware View Planner as benchmarking tool and created


a couple custom workloads. The benchmark was broken down
into three different profiles:

123 VDI Design Guide Part ||


• Task workers (with mainly Microsoft Office, Adobe
Reader, and some websites).
• Power users (same apps as task workers, but also WebGL
apps, and one of their single-threaded apps).
• Heavy power users (same as power users, but also
including a single-threaded app which requires a GPU).

As already mentioned, we built a custom workload in View


Planner which includes all of the applications and per application,
looks at a couple of things:

• Application start time.


• Document or file open.
• View or edit of a document.
• Document or file close.
• Application close time.

From a VMware-perspective, we used vSphere 7, Horizon 7.12,


and Blast Extreme as the connection protocol.

The idea was to create an initial benchmark of a single desktop and


see what the values of the metrics under normal circumstances are.
The following table outlines those recorded metrics:

Action Duration in Seconds


Excel Close 0.2279
Excel Data Entry 0.0917
Excel Maximize 0.4116
Excel Minimize 0.3260
Excel Open 1.5107
Excel Save as 0.5919
Excel Sort 0.1130
Outlook Close 0.3748
Outlook Open 1.1353
Outlook Read PST 0.3307
Outlook Restore PST 0.0729
PDF Browse 0.0494

124 VDI Design Guide Part ||


Action Duration in Seconds
PDF Close 0.6747
PDF Maximize 0.7424
PDF Minimize 0.5946
PDF Open 0.6084
Remote Login Time 23.010
Video Close 0.3464
Video Open and Play 0.7516
Word Close 0.2271
Word Maximize 0.8645
Word Minimize 0.6790
Word Modify Document 0.1165
Word Open 6.5089
Word Save 2.3599
WebGL App 1 load 0.6382
WebGL App 2 load 0.7263
Single-Threaded App 1 Open 0.1254
Single-Threaded App 1 Read 0.4938
Single-Threaded App 1 Write 0.4812
Single-Threaded App 1 Close 0.2441
Single-Threaded App 2 Open 0.9272
Single-Threaded App 2 Read Code 0.6482
Single-Threaded App 2 Compile Code 0.4185
Single-Threaded App 2 Close 0.9803

We ran the benchmark on both hardware configurations per use


case (profile) to see what the maximum density was per host with
a tolerable performance (where the CPU Ready Times stayed
below 10% and the application metrics didn’t exceed the baseline
with 100%).

125 VDI Design Guide Part ||


Task worker

For the task worker use case, we ran the benchmark 10 times per
host and ended up with the following metrics, averaged based on
the last run in which the density reached the maximum number
per host. For the task worker persona, every user had:

• Windows 10 1809
• 4 vCPUs
• 6 GB RAM

The following table shows the average duration time per action
and shows the differences for both different configurations.

Action DL380G10 DL385G10


(68 users) (98 users)
Excel Close 0.4326 0.3536
Excel Data Entry 0.1432 0.2234
Excel Maximize 0.7234 0.7432
Excel Minimize 0.7324 0.6234
Excel Open 1.9234 1.6344
Excel Save as 1.0232 0.9324
Excel Sort 0.2325 0.1356
Outlook Close 0.6532 0.7231
Outlook Open 1.5233 1.3235
Outlook Read PST 0.2153 0.3110
Outlook Restore PST 0.6123 0.3221
PDF Browse 0.1224 0.2351
PDF Close 0.7231 0.6246
PDF Maximize 1.2361 0.9234
PDF Minimize 0.6345 0.7234
PDF Open 0.9234 1.0014
Remote Login Time 21.036 22.246
Video Close 0.6232 0.3611
Video Open and Play 0.9243 1.0233

126 VDI Design Guide Part ||


Action DL380G10 DL385G10
(68 users) (98 users)
Word Close 0.3235 0.6523
Word Maximize 0.8234 0.9231
Word Minimize 0.8347 0.8662
Word Modify 0.4562 0.3523
Document
Word Open 6.7346 7.0243
Word Save 2.0423 2.1535

As you can clearly see, the metrics are relatively similar. The thing
that completely stands out is the fact that the AMD-based
configuration has a much higher density (98 vs 68) until it reaches
a threshold. Up to these densities for both configurations,
applications responded quite well, and the user experience was
acceptable. As soon as more virtual desktops were loaded, the
CPU ready times began to increase on the physical hosts and the
performance started to drop significantly.

Power user

Next, it was time to run the same benchmark for the power users.
Alongside the task worker applications, we ran the benchmark for
the additional applications, as well. Due to the required resources
for the power users, we used the following configuration:

• Windows 10 1809
• 6 vCPUs
• 12 GB RAM
• vGPU T4_1B profile

The result was quite interesting. Where we expected to see the


same difference in density between the AMD and the Intel system,
we saw something different. The table shows the application
results for the power users-specific apps (as the other apps
resulted in quite similar metrics).

127 VDI Design Guide Part ||


Action DL380G10 DL385G10
(62 users) (51 users)
WebGL App 1 load 0.9432 1.1354
Single-Threaded App 1 0.2236 0.3245
Open
Single-Threaded App 1 0.6345 0.8323
Read
Single-Threaded App 1 0.7023 0.8932
Write
Single-Threaded App 1 0.5034 0.7323
Close

The performance completely dropped on the AMD system when I


added two more virtual desktops to it. The CPU Ready times
reached 14% averages (where 10% can be considered a maximum
threshold).

The difference between the AMD and Intel system is significant.


The Intel host kept a relatively high density for power users, while
the AMD host was limited to 51 virtual desktops. In this case, the
VMs could offload rendering and encoding to the GPUs, so these
tasks weren’t even part of the workload.

Heavy Power Users

Lastly, we ran the benchmark for the heavy power users. These
users run two applications (next to the apps which the previous
use cases run) which are both graphically and computationally
intensive and require a lot of resources. The single-threaded
application is a research/data science application which requires a
high clock speed, a GPU with CUDA support, and 8 vCPUs to
achieve the best performance from the application. The virtual
desktops for this benchmark used the following specs:

• Windows 10 1809
• 8 vCPUs
• 16 GB RAM
• vGPU T4_2Q profile

128 VDI Design Guide Part ||


The result was just as surprising as the result from the power users
run. Where I expected that the larger number of cores of the AMD
CPUs would have a positive impact, the opposite was true.

Action DL380G10 DL385G10


(28 users) (20 users)
WebGL App 2 load 1.1364 1.0124
Single-Threaded App 2 1.0225 0.9524
Open
Single-Threaded App 2 0.7883 0.8423
Read Code
Single-Threaded App 2 0.6239 0.7253
Compile Code
Single-Threaded App 2 1.1023 1.3803
Close

I think it’s quite logical that the number of virtual desktops


decreases due to the footprint of the virtual machines. 8 vCPUs
will have an impact on the CPU scheduler and for sure, that
impact is a lot bigger compared to the impact of a 4 vCPU
machine. What I didn’t expect at all, was that the AMD CPUs
underperformed when looking at complex use cases which require
more than 4 vCPUs. After some research, I found out that this
behavior is expected, due to the architecture of the AMD CPUs. Of
course, I could try to explain in my own words why that is. But
why do that when one of my friends is an absolute expert in this
area? For this reason, I have asked Frank Denneman for an
interview on CPUs, NUMA boundaries, and the impact on
virtualization.

INTERVIEW WITH FRANK DENNEMAN

129 VDI Design Guide Part ||


If you work in virtualization and don’t know Frank, you probably
lived in a cave on Hoth. Frank has been working in virtualization
for a gazillion years. He has written a large number of books, has
created numerous patents, has one of the best technical detailed
blogs and foremost is just an easily accessible person. Oh, and let
me mention his “Chill the f*ck out” Spotify playlist again, because
that playlist keeps me focused for hours when I need to write
books like this ☺. You can find his playlist by scanning the
following QR code:

Me: You have been involved with VMware technologies in all


sorts of roles. You have been in the seat of a customer, worked as a
consultant, evangelized an innovative solution in the VMware
ecosystem, and are now responsible for R&D and strategy in
VMware’s Office of the CTO as a Chief Technologist. How have
you experienced VMware technologies in these different roles?

Frank: I was introduced to VMware technology by my great


colleague Peter Sentveld. At that time, I was managing a
comprehensive MS exchange platform distributed across 28
countries. He shouted through the wall: Hey Frank do you want to
see something really cool? I remember saying to him, what can be
cooler than the architecture that I manage? He proceeded to show
me vMotion, and I was sold. From that moment, that was my
mission, learn VMware, know ESXi inside out. Soon I discovered
that learning ESXi comes with understanding shared storage and
networking. Which I think is one of the great things about our
platform. You broaden your horizons. You learn more than only
the surface you touch and become versed in way more technology
stack than the traditional point-solution expert.

Due to our choice of shared storage, I was obligated to learn about


shared storage. That led to my first blog post, "Increasing the

130 VDI Design Guide Part ||


queue depth," which kicked off the second phase in my career, a
job at VMware. I started at the Professional Services Organization,
assisting many organizations in designing their virtual
infrastructure. In this role, I learned about an interesting pattern.
For many customers, their procedures and requirements are so
evident that they expect the software to be aligned with them
automatically. But something so obvious for them might not be
evident to the engineering team responsible for the product. And
it's this gap between engineers' intention and customer use that
becomes a fertile area for innovation. It allowed me to express
these to the R&D organization, and soon I was a part of the
technical marketing team. In this role, I created collateral
explaining the products in-depth while acting as a bridge between
engineering and customers. Translating customer wishes to
engineering and engineering lingo to easily digestible stories for
customers.

For the last three years, I've been a part of the CTO office of the
cloud platform business unit, which is responsible for the core
features and multi-cloud strategies. This is where I focus on
generating strategies for modern workloads and developing new
product concepts. Understanding customer direction is essential.
That means I have to recognize how the customer uses our
products today and how our products should align with their
technology strategies for tomorrow.

Going from the moment where I saw vMotion for the first time to
be allowed to develop future product concepts has been a journey
I'm very proud of.

Me: You have an extraordinary interest in CPU architectures and


have written a numerous number of great articles about them.
How did this topic spark your interest?

Frank: I think the interest in CPU technology started when I began


building my own machines. At that point, the CPU speed and
generation were the primary indicators of the raw performance of
the system. I'm talking about the end of the 486-era, beginning of
the Pentium era, where I genuinely contemplated if I should buy
that 133Mhz Pentium to replace the Pentium 100. :) While being

131 VDI Design Guide Part ||


the VI-admin for a multinational in the mid-2000s, I had the
privilege to work with the first AMD Opteron systems, getting
introduced to both NUMA concepts and quad-socket systems.
This combination proved to be a steep but rewarding learning
curve. During my PSO architect and technical marketing time, I
was surprised quite a few people ignored the fundamentals and
treated compute architecture as trivial. All while still wanting the
best performance without putting into any effort understanding
your fundamental layers. It's like aspiring to be a pro-athlete while
living on fast food and soda drinks. No success story will happen
if you do not get your fundamentals right. So, I took that as an
opportunity to cover the fundamentals and describe the intricacies
of modern CPU architecture. Seeing the attention that the NUMA
series got on my blog, the vSphere host resources deep dive book,
and the 60 minutes of the NUMA VMworld series, I believe a lot of
people don't treat CPU architecture design as trivial anymore.

Me: During the last couple of VMworlds, one of your sessions


always ends up in the top 10 of best attended and evaluated
session. I’m talking about your NUMA deep dives. For people
who don’t know what NUMA (Non-Uniform Memory Access) is,
could you explain what it is and why it matters?

Frank: Before the AMD Opteron release (2003), multi-socket x86


server systems were Symmetric MultiProcessing (SMP), where two
or more identical CPUs share common resources, such as memory
and I/O devices). All processors are connected through a central
memory controller to the shared memory pool, or through a
central PCI controller, to the I/O device. This architecture offers
uniform memory access (UMA). However, this architecture
couldn't keep up with the increase in the number of cores per
socket.

AMD changed the design paradigm of modern x86 architectures


by incorporating the memory controller into the CPU package. As
a result, a part of the memory pool directly connects to this
particular CPU package. The remaining memory within the
system connects to one of the other CPUs. Since all CPU cores can
access the complete memory address range of the system, some
memory addresses are accessed through the local memory

132 VDI Design Guide Part ||


controller. Some memory addresses are accessed through a
"remote" memory controller. This topology creates a Non-Uniform
Memory Architecture (NUMA).

The CPUs in different sockets are connected through an


interconnect to provide access to each other's memory pool. The
interconnect is limited from a bandwidth perspective (-60% less
bandwidth). On top of it, it is used for other traffic, such as data
transmits between cores and PCI devices. The difference in
distance between memory pools impacts memory access latency
(+70% latency increase). For consistent application performance, it
is imperative to optimize for local memory access. That means that
if the workload permits it, keep the vCPU count and memory
capacity count within the boundaries of a single NUMA node.

The NUMA scheduler within the ESXi hypervisor is considered


the best in the industry, but optimal performance starts with the
input for the VMkernel. As such, VM sizing remains incredibly
important. Maybe the impact might not be noticeable on a per-VM
basis, but the compounded performance regression can
dramatically reduce the efficiency and effectiveness of the virtual
infrastructure. Understanding the boundaries of the physical
layout, the behavior of the VMkernel, and the relationship
between all the layers from application to CPU package can help
you run a successful virtual infrastructure.

Me: In the section about an Intel vs AMD benchmark, I wrote


about the significant difference in performance and density when
virtual desktops are equipped with more than 4 vCPUs on an
AMD host. How does the AMD architecture impact this density
decrease?

Frank: The common thread in any multi-CPU architecture is the


distance to shared resources and the obstacles along the way. An
increase in distance equals an increase of time, and workloads will
face a reduced amount of bandwidth outside the NUMA node
boundary. To right-size workloads to fit inside the NUMA
architecture, one has to understand the boundaries of the local
domain. This used to be delineated by the socket. However, due to
scaling requirements and production limitations, these boundaries

133 VDI Design Guide Part ||


have moved inwards into the CPU package. Making it harder to
understand. A history overview is once again helping to
understand this move.

When AMD introduced NUMA architecture to the X86 world, its


primary purpose was to break through SMP scalability challenges
and drive more parallelism. Intel relatively quickly followed suit
(6 years) with the Xeon Nehalem line. Both Opteron and Nehalem
followed the same principle, multi-core within a single CPU
package. With each release, the core count and cache memory
increased. But there is a limit for a production process to
incorporate more transistors and logic within a CPU package. And
so, AMD was again the first to move towards a Multi-Chip
Module Architecture. They started this trend back in 2010 with the
release of Opteron Magny-Cours. To the untrained eye, the Magny
Cours was a 12 core CPU package. It offered two billion transistors
and almost 20MB of cache on-die. Compare this to the Intel
Westmere at that point which could only offer six cores.

What's not to like? Well..., It was one CPU package, that was for
sure, however under the hood lay two six-core Istanbul CPUs,
with a HyperTransport (AMD brand) cache-coherent connection
between the two Istanbuls and an interconnect between the two
sockets. That meant that there are now four NUMA nodes with a
system with two sockets. With this concept, the IT world departed
from the concept of a NUMA node that is equal to a CPU socket.
Twelve years later and we still need to get used to this concept. As
a result, ESXi showed four NUMA nodes and VM sizing needed to
align to the physical layout of the internal structure of the CPU
package to get the most consistent performance. Many
applications are not adequately NUMA optimized. As a result,
application performance boils down to VM sizing. Unfortunately,
this concept was confusing to customers who switched from Intel
and their Single Socket - Single NUMA node design.

But AMD believes this is the future, and even within the modern
EPYC architecture, they are sticking to their guns with their Multi-
Chip Module architecture. The first design was an authentic roller
coaster design. You can compare it to introducing someone who
was brought up with the most elegant system designed by

134 VDI Design Guide Part ||


humanity, the metric system, and now all of a sudden must use the
arbitrary rollercoaster ride called the imperial system. In a world
where we were used to having a single CPU package in a socket,
cores can access the same cache and memory controllers (Intel).
You are now entering this wild ride, where you have a single
package with multiple Zeppelin chips. Each Zeppelin contains two
compute complex elements, two memory controllers, and an
interconnect to other Zeppelins. These compute complex elements
contain four cores that share the cache. If you can still follow, I
salute you. The most important thing to realize is that you now
have to deal with three independent domains within a system. A
last-level cache (LLC) domain (per CCX), a NUMA domain, now
contains two LLC domains and a CPU package containing four
NUMA domains. And because you have typically two sockets in
your system, multiply this feast by two.

The world was not ready for this design. Operating systems are
designed with the notion that every CPU within the same socket
can access all the cache and memory with a uniform memory
access pattern. The VMkernel uses relational scheduling
optimizations and attempts to schedule worlds (vCPUs) on CPUs
that share the same LLC domain to take advantage of cache
latency instead of memory latency (10 ns vs 70ns). But now, with
the AMD EPYC, the LLC domain was microscopic compared to an
Intel system. Some customer systems were seeing massive
oversubscriptions on a tiny part of the CPU package. Other
operating systems had similar problems. VMware immediately
started to work on optimizations, and with each update, new
optimizations are introduced. AMD took some feedback to heart
and introduced a simpler architecture in their second generation
(Rome) EPYC. Instead of four NUMA domains per socket, there
was now only one. Yet still 16 LLC domains with every four cores
per CCX. And it's this layout that is detrimental to the
performance of VMs that go beyond the four cores.

One thing is for sure, it all depends on the ultimate consumption


logic produced by the application. If the application is a well-
behaved NUMA-aware application and the VM is sized correctly
with the CCX boundaries in mind, you can have consistent
performance well beyond the size of four vCPUs. But most

135 VDI Design Guide Part ||


applications aren't well behaved or NUMA-aware, so you are
stuck with applications that have spatial and temporal memory
access that goes cross-(LLC) border. Now the biggest problem, if
it's not in the local LLC, it's somewhere else in that vast system
with countless interconnects. Every cache access beyond its local
cache boundary has to travel to the central I/O chip on the CPU
before it can access memory from somewhere else in the system.
This pattern is your performance killer right there.

Now the good news, the third generation has increased the size of
the CCX. AMD bumped the core count to eight. Very interesting to
see, and hopefully, more people are getting used to this roller
coaster ride and size accordingly. I'm sure they take the
recommendations in this book to heart.

Me: You are one of the subject matter experts on core vSphere
features like vSphere HA and DRS. How have you seen the
different CPU architectures and innovations impact those core
features?

Frank: Other than shifting cache and NUMA domain boundaries,


CPUs haven't changed that much. Yeah, there were a few changes
at Intel moving from ring to mesh topology internally, but that
hasn't really changed how compute services are provided. And
because of this, no immediate changes were needed for HA and
DRS. What did impact the design of the DRS algorithm is the way
we consume the virtual topology and the functional requirements
organizations have placed on the virtual infrastructure. Many
customers moved away from the initial purpose of consolidating
active workloads. When we started to consolidate production
workload back in the mid-2000s, most workloads were only
consuming 5% of their server resources. This behavior helped to
move a lot of workloads into the virtual infrastructure. With
vSphere 4.0, many organizations treated their virtual
infrastructure as the critical and de facto platform. The majority of
workloads were deployed on ESXi, which resulted in a different
resource management approach. Little to no memory
oversubscription and reduced CPU oversubscription. DRS 1.0 was
designed with the consolidation of active workload in mind. And
thus, a mismatch was introduced between the goals of the

136 VDI Design Guide Part ||


organization and the goals of the DRS algorithm. With vSphere 7,
the algorithm is new and focuses entirely on the happiness of the
resources. I.e., what does the workload want, and which host can
provide these resources?

Me: What trends are you currently seeing in the field of x86 CPUs
and virtualization?

Frank: A couple of exciting developments will impact the way


data centers and clusters are designed in the future. AMD
provides high-density core designs (Rome, Milan), helping
organizations move away from multi-socket systems and
simplifying system designs. The SmartNIC concept can introduce
a layered compute paradigm. The ARM-based device handles all
the device I/O of the system and acts as a management control
plane layer for the InfraOps team, while the high-density multi-
core CPU handles the compute cycles of the workload.

Another exciting development is the shift in compute paradigms.


The single instance application scale-up model was dominant in
the era between 1980 and 2000s. The distributed systems x86 scale-
out model was dominant from 2000 to 2020. Don't get me wrong,
distributed scale-out systems won't go away, but it's the workload
requirements that are changing the compute stack composition.

It is expected that GPU accelerated workloads such as ML


workloads will be displacing x86-bound applications within the
enterprise data centers. This impacts the way we are going to look
at compute and memory resource management. The introduction
of ASIC-based or GPGPU based accelerators means we
automatically have to deal with copious amounts of data passing
through the system on their way to the accelerator. How will we
fetch that data? Do we really need to pass it through the CPU and
its memory capacity before loading it into the GPU memory? Can
we figure out direct access to data sources, taking the CPU and its
local memory capacity out of the critical path of the data flow?
How about quickly attaching and detaching these accelerator
resources. We need to think about multi-level schedulers. Which
NUMA node, which ESXi host, which Kubernetes worker node,
which device, multiple devices, local devices, remote devices? All

137 VDI Design Guide Part ||


these questions come into play when dealing with this layered
cake of CPU and memory resources.

Me: vSphere on ARM was announced at VMworld 2019. We have


seen some really cool use cases being presented based on the ARM
architecture. Where do you see the development in ARM going?

Frank: I think ARM will be a dominant player in the management


control plane layer. Many larger organizations will likely copy the
management philosophy of big cloud providers. They will
introduce a layer that allows them to control and manage the
systems while their consumers (developers and other IT staff using
self-service Kubernetes deployments) can consume the entire
server (if needed). This layer becomes a safe and secure substrate
for InfraOps to operate on.

Me: Let’s take it back to VDI use cases. In VDI, we mostly see
homogenous VMs running on a host (because of how we design
desktop pools). When taking this homogeneity in mind, what
recommendations do you have when talking about CPU choices
and architectures?

Frank: Understand the workload behavior, their patterns, and the


policy introduced by the InfraOps team. If you have well-defined
sizing standards, you can start to match that to the internal CPU
design. I.e., if your power users receive a VM with six vCPUs and
you are planning to run it on an AMD Rome, try to test it out. Not
only the InfraOps team but let users do an entire test run and a
shakedown. Also, do these test runs with the targeted
consolidation ratio fully active. Doing a shakedown with a single
active user is not a shakedown. That's kicking the tires of the
fighter jet and then believing it can take the forces of a hot landing
on an aircraft carrier. If you're going to plan to run 100 power-
users on that type of host, with that VM size, then run that test that
way. That's when you see if your VM size fits the LLC boundary
and NUMA boundary well or if you have performance regression
due to inconsistent remote capacity access. It would be best to find
out how the workloads respond when the interconnects are
saturated, especially when using GPUs. Maybe you learn that you
need an AMD Milan or an Intel Xeon.

138 VDI Design Guide Part ||


Me: If you have one key takeaway for people who use resources
pools as folders, what would it be? ☺

Frank: The Dutch answer: Don't ☺

The simple answer, if you are running a highly consolidated


environment and applying oversubscription, you might want to
reconsider this approach. It can happen your organization
structure acts as a denial-of-service generator. If you have copious
amounts of resources to spare and are nowhere near the limits of
your environment, there is much to worry about. To understand
which position you are in, add all vCPUs and memory
configurations running within the cluster and determine if it
exceeds the physical configuration. If this is a firm no, you can use
RPs as a folder. But only if it's a firm no (for your own sake, be
honest).

The more detailed answer, Resource pools are a part of the


entitlement calculation done by DRS. It's a parent-child
relationship. The cluster is the root level and acts as a parent for its
first connected layer. Elements on this layer can be resource pools
and VMs. If you place a VM in a resource pool (RP), the RP
becomes the parent of the VM. The VM is the child of the RP. Two
VMs in the same RP are children of the RP, and thus they are a
sibling of each other. A VM placed next to an RP (which happens a
lot) is a sibling of the RP. And here is where it gets messy. Because
they are siblings, they rival each other for the attention of the
parent construct. That means that DRS will look at their share
value (beyond any reservation) and determine their relative
priority. A resource pool was designed back in 2005, where the
largest VM size was a four vCPU with 16GB. That became the
internal sizing of an RP. Nowadays, a power user might receive a
VM with the same number of resources, which leads to a situation
where a modern VM can overshadow the needs of a collection of
VMs placed inside an RP. To battle this problem, vSphere 7 has a
new feature called scalable shares that allows DRS to
automatically recalculate the share value based on the VMs inside
the RP. The scalable shares feature is a great enhancement of the

139 VDI Design Guide Part ||


RP concept, and I'm not only saying this because I'm one of the
(proud) inventors of this feature.

Me: If people want to know more about NUMA and the impact of
AMD and Intel CPUs on virtualization, what sources would you
recommend?

Frank: If you genuinely want to understand the architecture, read


the reviews posted on Anandtech by Johan de Gelas. He really
knows his stuff. Other than that, I can point you to some of the
resources that I created. Let’s not forget the lightboard session we
created together available at YouTube:

https://youtu.be/VnfFk1W1MqE

Another good intro is the 60 minutes of NUMA VMworld


sessions. In 2020, I covered AMD and Intel architecture as well as
the influence of NUMA on the data path of PCI devices. If you like
to read instead of view, I wrote a multi-part series on CPU
architecture on my blog frankdenneman.nl and extended it to 300
pages of CPU and memory architecture in my book vSphere 6.5
Host resources deep-dive. For all NUMA-related content, I can
point you to https://numa.af.

If you want to know more about Frank and the stuff he’s doing,
follow him on twitter: @frankdenneman or on his blog:

140 VDI Design Guide Part ||


https://frankdenneman.nl/

141 VDI Design Guide Part ||


GPUS
Of all hardware components, GPUs are my absolute favorite. The
impact a GPU can have on the user experience is insane, and quite
often underestimated. It’s not just that a session runs smoother, a
GPU can ensure that a session can become a bit more tolerant to
latency and can even improve your density (under the right
circumstances). I dedicated a huge section of the VDI Design
Guide to the design of acceleration and remote display protocols to
hopefully take away those possible misconceptions. What has
happened since 2018 in the GPU space? A lot! While in 2018 we
were full of excitement waiting for NVIDIAs competitors AMD
and Intel to launch more GPUs for the datacenter, in 2021 we are
partly still waiting, and partly disappointed. Are AMD and Intel
still relevant in the GPU space? At this moment, I don’t think they
are. And you didn’t either:

What the future will look like, no one knows, but let’s dive into the
options first.

142 VDI Design Guide Part ||


NVIDIA

Let’s start with NVIDIA, first. I think NVIDIA did an excellent job
in continuously improving and innovating their datacenter GPUs.
They finally made a split in their compute-only accelerators and
the ones purposely built for graphical workloads. While
accelerators can handle both of those workloads perfectly fine, an
accelerator specific to graphics can benefit from a different
architecture compared to the compute ones (and vice versa). I
guess that NVIDIA’s focus area is more and more shifting from
gaming to AI. The release of their DXG platform (monstrous super
computers used for computational workloads), the NVIDIA GPU
Cloud (NGC, an online repository where you can find Docker
images with entire AI applications), and the acquisition of
Mellanox are some of the examples which show that renewed
focus. I’m wondering what their idea was when they created the
CUDA framework. Was it just a fun project to run computational
instructions on a GPU? Or did they know back then that they
would shape the future of computational acceleration like it is
now? To top it all, NVIDIA announced an extensive partnership
with VMware during VMworld 2020. In the partnership they are
developing a solution to commoditize AI platforms with
NVIDIA’s GPUs, network interfaces, NGC platform, and
VMware’s VMware Cloud Foundation infrastructure platform
(VCF) and Tanzu platform for container/application management.
I’m a strong believer in innovations like this, because you want to
move away from customized, whitebox solutions which host your
mission-critical applications. The key is running them on an
enterprise-grade platform which can be supported in the same
way as your current virtual infrastructure. What does this have to
do with VDI? Not a lot. Although I do think we might see
innovations from this front coming to the VDI stack, as well. One
of the things which NVIDA released in early 2021 is something
they call Multi Instance GPUs (MIG). MIG enables you to divide a
physical GPU into logical instances. vGPU does this too, but vGPU
does it only from a framebuffer perspective. The cores on a GPU
are scheduled through a GPU scheduler which resides on the
hypervisor layer. In case of MIG, cores are directly tied to a set of
framebuffer and won’t be shared. The advantage is that with MIG,

143 VDI Design Guide Part ||


it is possible to have heterogeneous GPU profiles on a physical
GPU. With vGPU, you are stuck to running a single profile type on
a GPU engine (and most GPUs have a single GPU engine). The
MIG technology is really cool and supported since vSphere 7.0 U2.
MIG isn’t suitable for VDI (yet). Since GPUs for graphical
acceleration also contain encoders, RTX cores, etc., which are
directly tied to the graphical side of things, NVIDIA might need to
reinvent the wheel to enable such a strong improvement in
graphical acceleration.

NVIDIA did release some cool new stuff for us VDI people,
though. In 2018, GPUs based on the Maxwell, Pascal, and Volta
architecture were widely available for different types of use cases.
Specifically, the Tesla M10, P4, P6, P40, P100, and V100 were
available for a wide variety of use cases. Not long after the release
of the first VDI Design Guide, NVIDIA announced their Turing
architecture which brought us the Tesla T4 (which is now called
the NVIDIA T4) and the Quadro RTX6000 and Quadro RTX8000
cards. Early 2021 announced another lineup of products, so the
RTX cards have been replaced already by cards in the Ampere
architecture, as has the T4, which still is the go-to-GPU for VDI
workloads until these are generally available.

Prior to the T4, customers mostly invested in the Tesla M10


because of the large amount of framebuffer, combined with 4
GPUs and a large number of encoders. From a density perspective,
the M10 could offer 64 to 96 accelerated users on a single host
(depending on the make and model of a host). As the M10
contained 4 separate GPU engines which also allocate 4 PCIe
addresses, adding more than 3 cards to a single host created PCIe
bus issues (due to the limited number of PCIe addresses available
in a single bus). The T4 is a half-height card with 75W power
consumption (versus 240W – 300W for the M10) and a 16 GB
framebuffer (versus 32 GB for the M10). What I’m currently seeing
a lot is customers running 4 to 6 T4s in a single host and achieving
a similar density, but with less power consumption and no PCIe
issues. While this works pretty well, luckily for us VDI enthusiasts,
NVIDIA finally surprised us during the GTC Spring 2021 event
with the announcement of the A16. The NVIDIA A16, takes all
what was good about the idea of the M10, but instead of 4

144 VDI Design Guide Part ||


Maxwell GPUs, it now contains 4 Ampere GPUs with 16 GB of
framebuffer each. While the card has been announced, it is not yet
available (even for testing). And, with all of the production issues,
it might take a while before it is. When looking at the specs, it does
look promising:

• 4 Ampere-based GPUs
• 64 GB of framebuffer
• H.264, H.265, VP9, and AV1 CODEC support
• 250 Watt power consumption
• 2 x the encoder throughput as a single M10
• PCIe Gen 4.0 dual slot

3 of these will fit in a single host, which could potentially get the
theoretical density to a blazing 192 users when using a 1B or 1Q
profile! This will hopefully decrease the cost of VDI and create a
GPU-for-all situation at an investment which is a no-brainer. The
biggest competing solution for NVIDIA GPUs isn’t the GPU of a
competitor, it’s just more CPUs. Quite often customers choose to
invest in additional hosts for the overhead caused by encoding and
rendering, which will offer enough resources for sure, but will
limit the user experience, as well.

For customers who do decide to follow the GPU route for


graphical acceleration, there are currently quite a few options. In
the list are current architectures and previous architectures and
NVIDIAs description of the use case:

GPU Model Architecture Use case


A40 Ampere Mid-range to high-end 3D design
and creative workflows with
NVIDIA Quadro Virtual Data
Center Workstation (Quadro
vDWS). Virtualized AI with
NVIDIA vComputeServer (vCS).
Upgrade path for Quadro RTX
8000, Quadro RTX 6000.
A10 Ampere Mainstream graphics, Mid-range
3D design and creative

145 VDI Design Guide Part ||


GPU Model Architecture Use case
workflows with NVIDIA Quadro
Virtual Data Center Workstation
(Quadro vDWS).
A16 Ampere Successor to the M10. Knowledge
workers using modern
productivity apps and Windows
10 requiring best density and
total cost of ownership (TCO).
Multi-monitor support.
RTX 8000 Turing High-end rendering, 3D design,
and creative workflows with
Quadro vDWS.
RTX 6000 Turing Mid-range to high-end rendering,
3D design, and creative
workflows with Quadro vDWS.
T4 Turing Entry-level 3D design and
engineering workflows with
Quadro vDWS. High-density,
low-power GPU acceleration for
knowledge workers with
NVIDIA GRID software. AI, deep
learning, and data science with
vComputeServer licenses.
M10 Maxwell Knowledge workers using
modern productivity apps and
Windows 10 requiring best
density and total cost of
ownership (TCO). Multi-monitor
support with NVIDIA GRID vPC
and vApps.
P6 Pascal For customers requiring GPUs in
a blade-server form factor. Ideal
upgrade path for M6.

Next to the hardware side of things, they also continued to


innovate in the software aspect of GPUs. Instead of letting you
check the release notes out for yourself, I created a list of
highlights, based on the new stuff which was added after the first

146 VDI Design Guide Part ||


VDI Design Guide (please note that I didn’t mention all the new
supported hypervisors and brokers):

147 VDI Design Guide Part ||


vGPU 7.0 - 7.5:

• vGPU vMotion was introduced. In the first book I


speculated a bit because I knew what was coming, I just
wasn’t allowed to talk about it yet ☺.
• Multiple vGPUs in a single VM.
• Extended metrics being available for monitoring.
• NGC support.
• Support for the NVIDIA T4.

vGPU 8.0 – 8.6:

• Support for the NVIDIA Quadro RTX6000 and RTX8000.


• 4K support in vGPU 1B profiles by using 1B4 profiles
(finally!).

vGPU 9.0 – 9.4:

• vComputeServer was introduced as a new license model


that supports compute-only VMs and enables you to
license everything based on physical GPUs instead of
VMs/users.
• ECC memory support.
• Configurable time slices for equal share schedulers and
fixed share schedulers (which can be seen as creating QoS
policies for your individual workloads).

vGPU 10.0 – 10.4:

• 4K support in 1B4 profiles is now deprecated (uhm, what


now?).
• 4K is now natively supported in 1B profiles (ahh, nice!).
• Increase in the maximum number of virtual display heads
supported by -1Q, -2B, and -1B vGPUs:
o All -1Q vGPUs now support 4 heads instead of 2
heads.
o All -2B vGPUs now support 4 heads instead of 2
heads.
o All -1B vGPUs now support 4 heads instead of 1
head.

148 VDI Design Guide Part ||


• Flexible virtual display resolutions. Instead of a fixed
maximum resolution per head, vGPUs now support a
maximum combined resolution based on their frame
buffer size. This behavior allows the same number of
lower resolution displays to be used as before, but
alternatively allows a smaller number of higher resolution
displays to be used.
• Virtual display resolutions greater than 4096×2160.
• 10-bit color.
• Changes to allow cross-branch driver support in future
main release branches.

vGPU 11.0 – 11.3:

• Licensing grace period for unlicensed virtual GPUs and


physical GPUs (an unlicensed virtual GPU or physical
GPU initially operates at full capability, but its
performance is degraded over time if a license is not
obtained). Although it’s not the primary use case, this
helps a lot in DR scenarios when the license server can’t be
reached. It will give you some extra time to restore a
license server or switch over to a new one.
• NVIDIA A100 support.
• Support for Multi-Instance GPU (MIG).
• Support for GPUDirect technology remote direct memory
access (RDMA).

vGPU 12.0 – 12.2:

• Support for Unified Memory with NVIDIA vGPU


• Support for GPUDirect technology remote direct memory
access (RDMA) on VMware vSphere.
• NVIDIA A10 support.
• NVIDIA RTX A5000 support.
• Support for NVIDIA GPU Operator (for K8s platforms).
• Support for NVIDIA NVSwitch on-chip memory fabric.
• NVIDIA A40 support.
• NVIDIA RTX A6000 support.

149 VDI Design Guide Part ||


As of this writing (June 2021), the NVIDIA A16 isn’t supported
from a vGPU software perspective, but I’m pretty sure that
support will follow as soon as we are approaching the release of
the GPU.

Although I don’t really agree with the whole license structure


which NVIDIA introduced directly after GRID 2 (vGPU
predecessor), there is an upside to this, as well. With the revenue
they get out of the licenses, you can see that NVIDIA does
continue to develop and innovate in the vGPU platform (which
benefits in particular its software and configuration options). In
case you are wondering how AMD and Intel are doing, that’s a bit
less comprehensive, I’m afraid. 

AMD

At VMworld 2018, AMD announced their new flagship graphics


accelerator for the datacenter: the Radeon Pro V340. I honestly was
quite excited because it was the first real attempt of a company
other than NVIDIA to release a new GPU for VDI use cases. The
V340 supported both H.264 and H.265, had a 32 GB framebuffer,
and might have been a serious opponent for NVIDIAs offerings.
Might have been indeed… The card was announced at VMworld,
and I even saw it with my own eyes, live in action at AMD’s booth.
Unfortunately, quite soon after the release they discontinued the
V340 for regular VDI use. Instead, it’s now only available in cloud
offerings. The V340 and all other previous offerings are currently
unsupported by VMware (the VMware HCL has support up to
vSphere 6.5 for AMD GPUs). What’s in store for the future is quite
unknown at this moment.

Intel

Where AMD possibly stopped investing in GPUs for on-prem


solutions, Intel has been working on a whole new series of GPUs
for multiple use cases. In 2019, the first rumors about their Xe

150 VDI Design Guide Part ||


GPUs came online and along the way, little by little, more news
has become available. Long story short, there isn’t a GPU for VDI
from Intel yet. The question here is if Intel will be developing one
for a niche use case such as VDI? Accelerators for AI are
increasingly becoming more popular and with the announcement
of their Ponte Vecchio GPU they are also jumping in the
computational accelerator space. If you google for Intel Arctic
Sound you will find some more articles on the GPU projects from
Intel, but no final products yet.

When looking at the graphics side of things, their first discrete


graphics card (Intel Xe DG1) for normal PCs has been released and
is available from certain OEMs, only. In 2020 they announced the
Intel SG1, which basically is a server GPU with 4 DG1s on a single
card. It might be the successor to Intel’s VCA series of datacenter
GPUs, which were capable of running 44 Full HD streams at 30
frames per second, and potentially could be used as an accelerator
for VDI. Let’s hope the SG1 will be suitable for VDI acceleration as
it might finally bring some competition to the table. As this is all
speculation at this moment, we just have to wait and see. Fingers
crossed…

151 VDI Design Guide Part ||


NETWORKING
I’m no expert at networking, nor do I have ambitions to become
one. I always found networking a topic which you need to
understand on a conceptual and logical level, and for some aspects
(like connection protocol tuning) you need to understand a little
bit more. Networking to me was something that was offered by a
different team, just like power and cooling. When designing a
platform, on average you design it on top of an existing network.
When a networking team offers you 10 GbE from their core switch,
I always considered this to be sufficient, independent of the use
case. My interest in networking has changed. I got to spend a lot of
time with somewhat more complex use cases that really depend on
a fast network with different latency requirements. This has
opened my eyes (sorry for it to happen this late, but I’m a slow
learner ☺).

To emphasize this, I would like to briefly mention two different


use cases which will be covered in detail later on. The first use case
is the data scientist. When looking at the way a data scientist runs
applications and consumes data, this is quite often done in a
distributed way. Their (virtual) desktop runs applications, scripts
or code and runs that on data. The type and the size of the data
varies, but in most of the customers I have worked with, the data
is certainly not stored within the (virtual) desktop). At one of those
customers, they wanted to move the data scientist to a virtual
desktop. It would make much more sense to put the applications
and scripts next to the data on a faster network. Prior to the
migration, the 25 data scientists were working on their own 1 GbE
connection. They had a relatively good UX and all over
performance within their apps. Now, I thought that if we moved
all of those 25 data scientists on a single host with 2 x 10 GbE
fabrics they would have a better throughput and thus performance
of their apps. When you take into account that the GPUs we used
in the VDI host had more computational power and the CPUs
were faster and had more cores, you can probably guess what
happened next. The 25 data scientists completely utilized the 10
GbE fabrics, even though Network I/O Control (NIOC) was used
on the host. Without going into complete detail about the rabbit

152 VDI Design Guide Part ||


hole we went down, the solution to this new bottleneck I
experienced was found with 2 x 40 GbE RDMA-based network
interfaces and a seriously fast switch. After that, the data scientists
were able to work like never before (I know, it sounds like an
Amazing Discoveries commercial from Mike Levey and Tony
Little).

The other example was related to one of the latest projects I’ve
worked on. The primary goal was to move a team of video editors
to a VDI. Prior to the pandemic, the team was mainly working
from a central location at the customer’s HQ. After the pandemic,
they were spread across the country and experienced a massive
challenge in collaborating on their video projects. The customer
chose to move them to a VDI platform to enable them to
collaborate better. The first challenge was to virtualize their
workstations to the central platform. With just a crapload of
resources, that was fairly easy. The next step was to move their
project data to a so-called Media Asset Management (MAM)
solution and enable them to run their project workflows from their
virtual desktops, but with data which resides on the MAM
appliance. If I tell you that they normally work on 4K raw video
data, you can imagine what challenge we faced. Without spoiling
everything, the solution was also found in an insanely fast
network: 100 GbE network interfaces with a switch that could offer
enough bandwidth for the entire team to work simultaneously.

The price difference between 10 GbE interfaces and faster ones


isn’t necessarily an issue anymore. But switches might be a
different story. Companies like NVIDIA (Mellanox), Intel, and
Cisco have a wide variety of solutions which can satisfy basically
every need related to networking.

So, the question is, how is this innovation? Well, I don’t think it is
innovation from a datacenter perspective. But, since emerging use
cases like data scientists or solutions like VDI by Day, Compute by
Night might have a network constraint with (traditional) 10 GbE,
utilizing those technologies in a modern VDI stack can be
considered as innovation.

153 VDI Design Guide Part ||


INTRODUCTION TO USE
CASES AND

INDUSTRIES
The maturity of a Virtual Desktop Infrastructure platform has
reached a level in which I strongly believe it will be hard to find a
use case that can’t run on it. Sure, it will be hard to connect to a
VDI from the International Space Station or from the Mariana
Trench (although I think that if you have 11 kilometers/6.8 miles
of fiber cable attached to the sub, it should be possible). And
although some use cases might be suitable for a VDI, the outcome
is the most important thing. Choosing the right approach is

154 VDI Design Guide Part ||


essential for a successful project. I don’t want to convince you to
run everything on VDI, but I do want to share my thoughts on
some new use cases that were developed in the last couple of
years. I’ve received some weird use case requests ever since I
released the first book.

Also, I have a never-ending urge to prove that something can be


done if I get challenged. A great example of this is revolves around
a project I did a while back. The customer was one with a set of
applications which required a lot of CPU, RAM and GPU
resources. The main application shows real-time images from a
wireless camera, and they use those images to control a robotic
arm, but in incredibly small steps. The controls are very sensitive
but need immediate action in case the operator detects something
in the real-time image. I know it’s a bit vague, but I’m not allowed
to go into any more details due to the nature of their business.
What you do need to know is that the operating platform was a
single point of failure. The apps ran on a physical Windows 10
box, which in case it failed, needed to be replaced by a new one.
This process could easily take 4 hours, which could potentially
cost a lot of money due to downtime.

I was part of a presales team, and our main goal was to first see if
their use case was viable for a more modern way of management,
increase the availability, and reduce the potential downtime in
case of a failure.

Although it was clear to me that we could probably solve this


through VDI, the non-IT teams from the customer weren’t that
convinced. We weren’t allowed to run a proof of concept with
their applications, either. So, we needed to figure out a way to
convince them. And so, we did. ☺

We wanted to create a similar use case that has the same


requirements:

• The solution needs to have a high availability (let’s say at


least 99.9%).
• The solution must support rich graphics with a high frame
rate (at least 45 FPS).

155 VDI Design Guide Part ||


• The solution must be capable to run with a maximum
latency of 20 ms round trip.
• The solution must be able to work with network-attached
robotic arms.

If you think of a use case that needs rich graphics with a high
frame rate and a low latency, the first thing that comes to mind is a
gamer. Without going into a lot of details (you can find these in the
Gaming on VDI section), we built the F1 simulator on VDI. From a
rich graphics perspective, we could perfectly show how smooth
the game played in the virtual desktop (with 60 frames per
second). From a latency perspective, we introduced a USB-
redirected steering wheel and gas/brake pedals. If any latency is
noticed, you will notice that in a split second. So, being able to
have great user experience in such a game would make sense. We
showed the customer the F1 simulator on VDI, and after driving a
couple of laps themselves, they were convinced this would be the
right solution for the job. At first, it might not make a lot of sense
to build something like this. But it does have its use cases.

Eventually, we ran an actual proof of concept with their own


applications, which was highly successful in most parts. The only
thing that kept us from going full into production was licensing.
Their applications simply weren’t allowed to run on a virtual
machine due to a mobility aspect and the decoupling between a
physical box and an Operating System with applications. Still, it
was one heck of a ride and gave us the opportunity to use the F1
setup in road shows and VDI events, as well.

The following sections are entirely dedicated to a couple of those


use cases and the stories around them. It also focusses on industry-
specific solutions which we also solved with VDI.

156 VDI Design Guide Part ||


MEDIA DESIGNERS AND
EDITORS
During the writing process of this book, I got in contact with a new
customer. They reached out because they needed help with a
specific use case for their existing VDI platform. They are a
company that produces all sorts of content. The content could be
anything, from white papers to web content. Prior to the
pandemic, most employees were already able to work from any
location, but this specific use case wasn’t able to do so yet,
although you would expect otherwise because of their work. The
title obviously gives it away; the customer reached out to bring
one of their media design and edit teams to the VDI platform,
specifically, video designers and editors. They travel all around
the country to record, design, and edit videos. They used to
collaborate a lot on video projects, but since the pandemic, they

157 VDI Design Guide Part ||


weren’t able to do that anymore because it’s quite hard to
collaborate on 4K video projects on physical machines spread
across the country.

Imagine one person shooting video and creating hundreds of


gigabytes of raw video. The first challenge was that they needed to
share the data with the rest of the team. Second challenge, they
wanted to increase the level of security and implement a form of
data-loss prevention (DLP). Third challenge was that they wanted
to create a so-called Media Asset Management (MAM) solution to
store all of their media centrally and be able to use the entire
database for future video projects, as well (instead of having to
invest in stock videos, as archive images might be suitable for
certain topics).

In one of the first meetings, they asked if it would be possible to


even consider bringing the media team on the VDI platform. I
strongly believe there isn’t a use case that can’t run on VDI, the
question is how much an organization is willing to invest to make
it happen. Even organizations like Netflix enable their video
editors to work remotely (through their own NetFX VDI platform),
but everything comes with a cost.

When I think media team, I automatically think about large


amounts of RAM, fast CPUs and obviously GPUs. In order to edit
video, user experience is extremely important. Frame drops and
too much compression can impact the quality of their work, which
was an essential requirement for a challenging project. I had
worked with a video editing team at another customer before, but
that was with 1080p images instead of 4K. Also, they did work
from a central location, so the connectivity between the endpoint
and the virtual desktop was really good (high bandwidth and
hardly any latency). In the case of the new customer, latencies
varied between 20ms and 80ms because they work from
potentially any location in the country.

158 VDI Design Guide Part ||


PILOT AND CHANGE APPROACH
Instead of just immediately explaining what the virtual desktop
and settings looked like, I would like to show what kind of
approach I use to start with an initial setup and work towards a
concept that satisfies the customer’s requirements and user
experience.

The approach uses a change management principle but is very


pragmatic. The important thing is to create a test procedure which
always has a similar outcome. The outcome itself is impacted by
possible changes you apply to the begin state of the configuration.
The outcome also determines what changes you apply to the begin
state of the configuration. It’s very important that you just change
a single thing at the time. The following steps show the approach
which can be used for any of the use cases described in this book.

1. Success Criteria

The first thing you need to do is to determine the success criteria.


When is the solution successful? Compare it a bit to requirements
you need to validate. Let’s assume we are working towards a pilot
for the media team. For this team, the following success criteria
were defined:

• Connecting to the virtual desktop and being presented by


a desktop should not take more than 45 seconds.
• Users need to be able to work without any noticeable
performance delay.
• The solution must be able to work with content which is
stored in a centrally hosted MAM solution.
• Adobe Premiere Pro and After Effects need to run without
any UX issues (such as audio/video sync problems).
• During rendering jobs in Adobe Premiere Pro, other
applications should still be able to run without
performance issues.

159 VDI Design Guide Part ||


Again, it’s important these success criteria can be validated. Make
them SMART (Specific, Measurable, Achievable, Relevant, Time
bound) to ensure you are able to meet them (or not, but then the
proof of concept basically failed).

2. Test procedure

Define a test procedure which you can follow every single


iteration, but also shows the impact of the changes you will apply.
It doesn’t make sense to run iterative performance tests if you
want to validate availability procedures or if you don’t change
things that can improve the performance. This focuses on the
Specific, Measurable, and Relevant aeras of the SMART success
criteria.

In our case, we defined a workflow which touches upon all of the


performance-related aspects of the success criteria. The workflow
went as follows:

1. Connect the virtual desktop from your home location.


2. Start Adobe Premiere Pro.
3. Start a new project.
4. Import 4K media which is stored locally in the virtual
desktop into the project.
5. Play the media in the timeline.
6. Open up the MAM solution through a plugin in Premiere
Pro.
7. Scrub through the media.
8. Import a 4K video from the MAM solution.
9. Create a transition between the local video and the remote
video.
10. Play the media in the timeline.
11. Render the video and minimize Adobe Premiere Pro.
12. Open up Adobe After Effects.
13. Open an existing project and play the animation.
14. Close everything and exit the virtual desktop.

160 VDI Design Guide Part ||


The defined procedure is the exact same in all individual tests. To
get a reliable outcome of the changes, every test is executed three
times. The average outcome is taken from the three tests.

3. Define the begin state

It’s important to define a begin state. Quite often, the begin state is
an assumed configuration which will most likely run the use case
as expected, or nearly as expected. Obviously, this depends on
your experience with certain use cases. If you have a lot of
experience in the field of healthcare, chances are you can easily set
up a new healthcare proof of concept. If you have a lot of
experience in the security side of things, piloting a use case for a
government won’t be that different than that from a commercial
organization who needs to protect their intellectual property
through a VDI.

In this case, the begin state was a bit different. The customer
wanted to see how these media use cases differed from the other
GPU-accelerated use case. So, we used the virtual desktop
configuration from the other use cases as a starting point, which
was the following:

Physical hosts:
• 2 x Intel Xeon Gold 6248R CPUs (24 cores/48 threads, 3
GHz clock speed).
• VMware vSAN All-Flash storage with Intel Optane
o (2 x DC P4800X for cache, 6 x DC P4510 for
capacity)
• 4 x NVIDIA T4s
• Mellanox Connectx-5 NICs (dual port 25 GbE)

Management components:
• VMware vSphere 7.0
• VMware Horizon 2012
• NVIDIA vGPU 12.1

161 VDI Design Guide Part ||


Virtual machines:
• Full clone
• Windows 10 1909
• 4 Virtual CPUs
• 8 GB RAM
• T4_2B vGPU profile
• VMXNET3 Virtual NIC (1 GbE)
• Adobe Premiere Pro and After Effects CC

Connection Protocol:
• Blast Extreme
• H.264 Encoder
• minQP: 10
• maxQP: 36
• 30 Frames per Second

4. Run the initial test to set a benchmark

Before you know where you are heading, you also need to know
where you are coming from. This is why you should run the
workflow a first time and validate if the test procedure is capable
of initially running on the begin state. Of course, running it the
first time means you go through the procedures using three
iterations and record the average of the three iterations. What I
always do is document the steps and the result per step, followed
by a general outcome of the entire run:

Test
Initial benchmark run with the begin state configuration
Step Positive Result Negative Result
Connect the virtual 35 seconds
desktop from your
home location
Start Adobe Premiere Starts fine
Pro
Start a new project No issues

162 VDI Design Guide Part ||


Import 4K media Importing a 150
which is stored locally GB video, takes
in the virtual desktop 5.5 seconds
into the project
Play the media in the Audio/Video out
timeline of sync
Open up the MAM Opening the
solution through a MAM solution
plugin in Premiere Pro takes 2 seconds
Scrub through the Impossible, a lot
media of frames drop
Import a 4K video Importing a 100
from the MAM GB video takes 6
solution seconds
Create a transition Transition created
between the local in 2 seconds,
video and the remote adjusting the
video transition time
happens in real-
time
Play the media in the Audio/Video out
timeline of sync
Render the video and Rendering works,
minimize Adobe but the system is
Premiere Pro fully occupied
during the render
process
Open up Adobe After Unable to due to
Effects a lack of system
resources
Open an existing N/A
project and play the
animation
Close everything and Done
exit the virtual
desktop
Outcome

163 VDI Design Guide Part ||


The desktop can be reached within the stated connection time,
but due to a lack of system resources, the user experience is
negatively impacted.

5. Suggest a change

It might be that the outcome of the last run doesn’t meet the
success criteria. In that case, you want to improve something that
could get you closer to that goal. There are different opinions for
change management and the application of changes. In my
approach I’m focusing on a single change at a time, the main
reason being that I want to be in control over the impact. If you
change multiple things like the number of vCPUs, a vGPU profile,
and Blast Extreme settings before starting another run, how will
you know which of the applied changes had which impact?

If you look at the outcome of the last run, my gut feeling says the
VM has a large vCPU resource constraint. We validated that by
checking the performance statistics in vCenter and came to the
same conclusion. The suggested change in this case was to add an
additional number of 4 vCPUs to the virtual desktop.

6. Apply the change

The next step is to document the change, before applying it. I


always like to keep track of the initial state and document what
changed.

Begin state New state


2 x Intel Xeon Gold 6248R CPUs Same
VMware vSAN All-Flash Same
4 x NVIDIA T4s Same
Mellanox Connectx-5 NICs Same
VMware vSphere 7.0 Same
VMware Horizon 2012 Same

164 VDI Design Guide Part ||


Begin state New state
NVIDIA vGPU 12.1 Same
Full clone Same
Windows 10 1909 Same
4 Virtual CPUs 8 Virtual CPUs
8 GB RAM Same
T4_2B vGPU profile Same
VMXNET3 Virtual NIC (1 GbE) Same
Adobe Premiere Pro and After Effects CC Same
Blast Extreme Same
H.264 Encoder Same
minQP: 10 Same
maxQP: 36 Same
30 Frames per Second Same

7. Run a new test with the implemented change

After applying the change, you start a run with the new settings
included and document the outcome.

Test
Second run: added 4 vCPUs to the virtual desktop
Step Positive Result Negative Result
Connect the virtual 34 seconds
desktop from your
home location
Start Adobe Premiere Starts fine
Pro
Start a new project No issues
Import 4K media Importing a 150
which is stored locally GB video, takes
in the virtual desktop 5.4 seconds
into the project

165 VDI Design Guide Part ||


Play the media in the Audio/Video is
timeline nearly in sync
Open up the MAM Opening the
solution through a MAM solution
plugin in Premiere Pro takes 2 seconds
Scrub through the Impossible, a lot
media of frames drop.
Import a 4K video Importing a 100
from the MAM GB video takes 5
solution seconds
Create a transition Transition created
between the local in 2 seconds,
video and the remote adjusting the
video transition time
happens in real-
time
Play the media in the Audio/Video is
timeline nearly in sync
Render the video and Rendering works,
minimize Adobe but the system is
Premiere Pro fully occupied
during the render
process
Open up Adobe After Unable to due to
Effects a lack of system
resources
Open an existing N/A
project and play the
animation
Close everything and Done
exit the virtual
desktop
Outcome
The desktop can be reached within the stated connection time,
playing projects has improved significantly but still isn’t fully in
sync. During render jobs, the system is still fully occupied.

166 VDI Design Guide Part ||


8. Repeat steps 5 to 7 until you reach a positive
outcome

The success criteria weren’t met yet, so we continued to improve


the platform by adding one change at a time. By repeating steps 5
to 7, you can work in a structured way to achieve your goal.

After the second run, we noticed that although the VM did have a
vGPU profile, the render process utilized a great number of CPU
resources. This quite often means that these processes haven’t been
offloaded to the GPU. Adobe Premiere Pro does support
offloading to a GPU, but this requires Quadro features. We also
noticed in Windows Task Manager that the framebuffer of the
GPU was fully utilized. Because of this, we suggested to move
towards a T4_4Q vGPU profile, instead.

Begin state New state


2 x Intel Xeon Gold 6248R CPUs Same
VMware vSAN All-Flash Same
4 x NVIDIA T4s Same
Mellanox Connectx-5 NICs Same
VMware vSphere 7.0 Same
VMware Horizon 2012 Same
NVIDIA vGPU 12.1 Same
Full clone Same
Windows 10 1909 Same
4 Virtual CPUs 8 Virtual CPUs
8 GB RAM Same
T4_2B vGPU profile T4_4Q vGPU
profile
VMXNET3 Virtual NIC (1 GbE) Same
Adobe Premiere Pro and After Effects CC Same
Blast Extreme Same
H.264 Encoder Same

167 VDI Design Guide Part ||


Begin state New state
minQP: 10 Same
maxQP: 36 Same
30 Frames per Second Same

After the change was applied, the third run showed serious
improvements.

Test
Third run: added a T4_4Q vGPU profile to the virtual desktop
Step Positive Result Negative Result
Connect the virtual 34 seconds
desktop from your
home location
Start Adobe Premiere Starts fine
Pro
Start a new project No issues
Import 4K media Importing a 150
which is stored locally GB video, takes
in the virtual desktop 5.4 seconds
into the project
Play the media in the Audio/Video is
timeline in sync, but
frames seem to
drop
Open up the MAM Opening the
solution through a MAM solution
plugin in Premiere Pro takes 2 seconds
Scrub through the Impossible, lot of
media frames drop
Import a 4K video Importing a 100
from the MAM GB video takes 5
solution seconds
Create a transition Transition created
between the local in 2 seconds,
video and the remote adjusting the
video transition time

168 VDI Design Guide Part ||


happens in real-
time
Play the media in the Audio/Video is
timeline in sync, but
frames seem to
drop
Render the video and Rendering works
minimize Adobe fine and even
Premiere Pro faster than on an
existing MacBook
Pro
Open up Adobe After Starts fine
Effects
Open an existing Animation starts
project and play the fine
animation
Close everything and Done
exit the virtual
desktop
Outcome
The desktop can be reached within the stated connection time,
playing projects is in sync, but frames drop. Rendering works
fine.

The video plays at 50 frames per second, but the framerate in Blast
Extreme is configured to a limited framerate of 30 frames per
second. As a result, the next suggested change was to increase the
maximum FPS to 60.

Begin state New state


2 x Intel Xeon Gold 6248R CPUs Same
VMware vSAN All-Flash Same
4 x NVIDIA T4s Same
Mellanox Connectx-5 NICs Same
VMware vSphere 7.0 Same
VMware Horizon 2012 Same
NVIDIA vGPU 12.1 Same

169 VDI Design Guide Part ||


Begin state New state
Full clone Same
Windows 10 1909 Same
4 Virtual CPUs 8 Virtual CPUs
8 GB RAM Same
T4_2B vGPU profile T4_4Q vGPU
profile
VMXNET3 Virtual NIC (1 GbE) Same
Adobe Premiere Pro and After Effects CC Same
Blast Extreme Same
H.264 Encoder Same
minQP: 10 Same
maxQP: 36 Same
30 Frames per Second 60 Frames per
Second

After the change was applied, we saw again a serious


improvement of the user experience.

Test
Fourth run: increased the FPS limit of Blast Extreme to 60
Step Positive Result Negative Result
Connect the virtual 34 seconds
desktop from your
home location
Start Adobe Premiere Starts fine
Pro
Start a new project No issues
Import 4K media Importing a 150
which is stored locally GB video, takes
in the virtual desktop 5.4 seconds
into the project
Play the media in the Audio/Video is
timeline fully in sync
without frame
drops

170 VDI Design Guide Part ||


Open up the MAM Opening the
solution through a MAM solution
plugin in Premiere Pro takes 2 seconds
Scrub through the Impossible, lot of
media frames drop
Import a 4K video Importing a 100
from the MAM GB video takes 5
solution seconds
Create a transition Transition created
between the local in 2 seconds,
video and the remote adjusting the
video transition time
happens in real-
time
Play the media in the Local media is
timeline played well in the
timeline, but
remote media has
frame drops.
Render the video and Rendering works
minimize Adobe fine and even
Premiere Pro faster than on an
existing MacBook
Pro
Open up Adobe After Starts fine
Effects
Open an existing Animation starts
project and play the fine
animation
Close everything and Done
exit the virtual
desktop
Outcome
The desktop can be reached within the stated connection time,
playing projects is in sync, but only local media. Remote media
from the MAM solution still has frame drops. Rendering works
fine.

171 VDI Design Guide Part ||


We took a look at the network interfaces and quickly noticed that
the 1 GbE network interface experienced congestion because of the
size of the remote media. As the MAM solution was connected
over 25 GbE, it would make sense to increase the bandwidth of the
virtual desktop.

Begin state New state


2 x Intel Xeon Gold 6248R CPUs Same
VMware vSAN All-Flash Same
4 x NVIDIA T4s Same
Mellanox Connectx-5 NICs Same
VMware vSphere 7.0 Same
VMware Horizon 2012 Same
NVIDIA vGPU 12.1 Same
Full clone Same
Windows 10 1909 Same
4 Virtual CPUs 8 Virtual CPUs
8 GB RAM Same
T4_2B vGPU profile T4_4Q vGPU
profile
VMXNET3 Virtual NIC (1 GbE) VMXNET3 Virtual
NIC (10 GbE)
Adobe Premiere Pro and After Effects CC Same
Blast Extreme Same
H.264 Encoder Same
minQP: 10 Same
maxQP: 36 Same
30 Frames per Second 60 Frames per
Second

The final PoC run, showed as the result we were hoping for.

Test
Fifth run: changed the NIC to 10 GbE
Step Positive Result Negative Result

172 VDI Design Guide Part ||


Connect the virtual 34 seconds
desktop from your
home location
Start Adobe Premiere Starts fine
Pro
Start a new project No issues
Import 4K media Importing a 150
which is stored locally GB video, takes
in the virtual desktop 5.4 seconds
into the project
Play the media in the Audio/Video is
timeline fully in sync
without frame
drops
Open up the MAM Opening the
solution through a MAM solution
plugin in Premiere Pro takes 2 seconds
Scrub through the Scrubbing
media through the
media shows a
normal response.
Import a 4K video Importing a 100
from the MAM GB video takes 5
solution seconds
Create a transition Transition created
between the local in 2 seconds,
video and the remote adjusting the
video transition time
happens in real-
time
Play the media in the Audio/Video is
timeline fully in sync
without frame
drops
Render the video and Rendering works
minimize Adobe fine and even
Premiere Pro faster than on an
existing MacBook
Pro

173 VDI Design Guide Part ||


Open up Adobe After Starts fine
Effects
Open an existing Animation starts
project and play the fine
animation
Close everything and Done
exit the virtual
desktop
Outcome
The desktop can be reached within the stated connection time,
playing projects is in sync, both local and remote media.
Rendering works fine.

RESULTS
The result was as we hoped for and perfectly showed that even a
complex use case as a video editor can work on a virtual desktop.
Of course, there are some additional things to keep in mind:

• We have tested with 8 users at a time. Increasing the


density will for sure show bottlenecks on different
resources.
• We used a single 4K monitor in the tests. Increasing the
number of monitors will require more framebuffer or
maybe even a different GPU.
• In all tests, the latency between the endpoint and the
virtual desktop was between 15 and 25 milliseconds
(round trip). When the latency increases, frame drops
might occur, and Blast Extreme will need additional
compression so that frame sizes will decrease (adjust
minQP/maxQP). Also, switching to UDP instead of TCP
can have a positive impact on the user experience. See
more about Blast Extreme tuning in the section about
Gamin on VDI.

174 VDI Design Guide Part ||


CONSIDERATIONS
• Editing media, especially 4K media, requires a lot of
resources. Fast CPUs are essential, preferably CPUs with a
>3GHz clock speed. The proposed density on the physical
machine will dictate the number of cores. The higher the
density, the higher the number of cores needs to be. In our
case, we used the Intel Xeon Gold 6248R as the customer
was already using this CPU for other use cases.
• As mentioned above, the density has an impact on
resources. The same goes for GPUs. The customer already
had T4s, and I was quite surprised by the performance for
this use case. When the number of monitors increase, I’d
probably choose the NVIDIA A40 instead, since the GPU
is faster and has a larger framebuffer.
• Don’t save on storage. In this case, we had insanely fast
NVMe drives for the both the cache and capacity tiers.
Because working with large media has a huge impact on
reads and writes, choose fast flash which has proven its
endurance.
• Working on remote media sounds interesting because it
can decrease the investment in storage for the virtual
desktops. While it will for sure work, the network between
the centrally stored media and the virtual desktops needs
to be FAST and preferably with low latencies. What the
network should look like depends on your situation. I
would probably use a tool like Liquidware Stratusphere to
record network-related metrics and determine what your
own network should look like.

INTERVIEW WITH ANIRBAN

CHAKRABORTY

175 VDI Design Guide Part ||


Early 2021 I got in touch with Anirban Chakraborty. Anirban is a
Senior Product Line Manager at VMware and mainly responsible
for User Experience. In one of the internal VMware EUC sessions,
he presented some information about the plans for UX in Horizon
and since this book is a lot focused on UX in different types of use
cases, it totally made sense to ask him for this interview.

176 VDI Design Guide Part ||


Me: You are responsible for UX at VMware. How did you end up
at VMware?

Anirban: My passion for delivering a great user experience


combined with several years of product managing enterprise
software. I have spent most of my career in enterprise software,
initially as an engineer writing code and then as a product
manager driving cloud products. It was when I was in the
enterprise content collaboration space that I realized my love for
delighting the end user and how being able to use the product I
manage, matters to me personally. VMware has always been a top
innovator in the enterprise space and has a wonderful company
culture so when I got the opportunity to be part of the EUC team,
it felt like the perfect career move.

Me: What does UX mean to you?

Anirban: To me user experience is synonymous with user delight.


It’s not just about usability but also making the user love the
product and having a great experience with it. To me it goes
beyond the functionality of enabling the user to just do their job; it
is about making the experience engaging and memorable – the
way it’s important for a movie or novel to not just tell a good story
but also tell it the right way. It is about abstracting all the essential
components underlying the product to make the whole greater
than the sum of the parts. I strongly believe that the user
experience of the product ties into the overall brand. The more we
make the user experience simple, seamless and aesthetic, the more
we are able to delight the end user and the more memorable we
become. In a nutshell UX is key to adding value in the day to day
of a working professional.

Me: How have you seen UX and end user satisfaction change in
the past years?

Anirban: The most significant change in user experience over the


past few years has probably been in the enterprise software space.
Gone are the days when user experience in enterprise software
was an afterthought. As an example, you will notice that most
enterprise apps today have consumer grade UI that are simple to

177 VDI Design Guide Part ||


use, intuitive and actually enhance the end user’s job. User
experience in the enterprise software space today is all about
making the user productive and delightful. It has a direct
connection with efficiency in the workplace.

Me: The popularity of GPUs in a VDI is increasing. Do you think


GPUs are essential to achieve a great UX?

Anirban: Yes, and more so in recent times. Most of us are aware of


the resource benefits GPUs provide by offloading processing,
thereby freeing up the CPU to take on more tasks. But even
otherwise, a great rendering experience that was considered
essential only for graphics designers, game developers and video
editors is becoming increasingly important for the regular task and
knowledge worker as well. With the adoption of Windows 10,
most Office and other productivity applications are performing as
desired with GPUs at the backend. And with multiple vendors
now offering lower cost GPUs targeted for regular office
workloads, GPUs are definitely gaining more adoption. So, yes, I
would agree that GPUs are now fast becoming an essential other
than a nice to have component.

Me: In this section, the customer wanted to run their video editors
on a VDI. Have customers surprised you with comparable use
cases in which the UX was beyond expected?

Anirban: Yes absolutely. One of the examples that comes to mind


is that of the trader workstation. With 6 monitors running
applications and videos in parallel, it is incredible how GPUs truly
enhance the VDI experience for traders. Another amazing example
I have is that of VR applications. It is truly mind blowing the way
the power of GPUs can be unleashed to deliver an immersive
experience with virtualized collaboration applications on some of
these VR headsets.

Me: Have we gotten the most out of the potential of the connection
protocols yet?

Anirban: I feel this is an ongoing thing. As technology evolves and


remote workers demand more power, connection protocols will

178 VDI Design Guide Part ||


continue to evolve to meet their needs. So there really is no end to
this potential. From the early days, connection protocols have
come a long way from being just about establishing the connection
and there is a lot more to what protocols can continue to do.

Me: Where do you see the connection protocols going in a couple


of years?

Anirban: I feel remoting protocols will have an important role to


play in the coming years as AR/VR technologies get more
adoption in the enterprise. That would require delivering frames
at an exponentially higher rate without compromising on resource
consumption. If protocols are able to nail the user experience
associated with VR, by also reducing resource consumption that
would be a huge win. This is one area of innovation I’m very
excited about.

If you like to know more about Anirban Chakraborty, you can find
him on twitter: @anirbanzonly

179 VDI Design Guide Part ||


THE HEALTHCARE
WORKPLACE
Mid 2001, I started a new job at a small IT company whose
primary focus was on pharmacies and outpatient pharmacies. In
that job I basically traveled all over the country to install, maintain,
and replace modems they used for communication with general
practitioners and other doctors who are allowed to prescribe
medication. I think I did that for two years straight and really got
to experience how healthcare worked and how important those
small things like a modem are in a huge chain of processes. I
worked for 12 years at that same company and had different roles.
The cool thing about that small company was that they offered
solutions which were essential to the medical processes and
involved not only pharmacies, but also hospitals, medical research,
insurance companies, medical wholesalers, and pharmaceutical

180 VDI Design Guide Part ||


companies like the ones who are now involved in developing the
vaccinations for COVID19. If you never worked for an
organization that is either part of that chain or services an
organization in that chain, it’s hard to imagine how complex and
comprehensive healthcare actually is. This complexity and the fact
that you can always improve something in a healthcare-related
process (which can have a huge impact to patient care) has made it
my favorite industry. After I moved to ITQ, I got the opportunity
to work with many (academic) hospitals, both in the Netherlands
and beyond. Where the impact of the services my previous
employer offered was sometimes quite significant, the impact of
the projects I now get to work on can sometimes be seen as a game
changer.

The healthcare system is different in most countries, but the


primary services are still the same. Patients need care and the
organization offering the care needs to be paid for it. Billing runs
either through an insurance company or the patient pays for it
directly. The type and the quality of care can differ a lot, and this is
where IT can make a difference.

WHY ARE IT PROJECTS FOR

HEALTHCARE DIFFERENT?
First and foremost, lives might heavily depend on IT systems in a
healthcare environment. Keep this in mind.

When starting a new EUC project for a hospital, expect it to be


quite similar to other industries. The “assessment, design,
validation, deployment, and migration” methodology is still the
same. The challenge is more related to the complexity and the scale
of such a project, where scale might be the least of your problems.
So, what is it that makes such a project complex? It isn’t a single
thing, it’s the sum of all complexities. First of all, when looking at
the Dutch healthcare customers I have worked with, you can

181 VDI Design Guide Part ||


clearly see that people have different interests and expectations.
Although they have the best interest of the patient in mind, their
own interest is also important. Like I explained earlier, people
don’t like changes. If you change the way they have to work,
expect resistance. So, people might be the biggest complexity.
Employees at hospitals aren’t always the most technical, which
impacts adoption. Something that directly ties to healthcare is
related to people’s drive and passion. People who work in
healthcare are there to help others in need. They are there to make
people better. They are there to cure diseases. My general
recommendation when talking to people in a hospital is to keep
this in mind. It will help you understand why people do what they
do and why they react how they react.

Next to people, a hospital has to cope with strict security


regulations. In the Netherlands, we have the NEN7510 regulation,
but in the US, HIPAA (Health Insurance Portability and
Accountability Act) is used for ensuring security compliancy. I
found a perfect description on digitalguardian.com of HIPAA:

The Health Insurance Portability and Accountability Act (HIPAA) sets


the standard for sensitive patient data protection. Companies that deal
with protected health information (PHI) must have physical, network,
and process security measures in place and follow them to ensure HIPAA
Compliance. Covered entities (anyone providing treatment, payment, and
operations in healthcare) and business associates (anyone who has access
to patient information and provides support in treatment, payment, or
operations) must meet HIPAA Compliance. Other entities, such as
subcontractors and any other related business associates must also be in
compliance.

Without going into detail on both regulations, it’s important to


cover obvious things like password policies, but also the more
complex things like privacy control.

182 VDI Design Guide Part ||


The following website shows a summary of the HIPAA security
rule:

https://www.hhs.gov/hipaa/for-professionals/security/laws-
regulations/index.html

If you compare security in a healthcare environment to a “normal”


customer, you could agree or disagree that the security measures
are stricter in a hospital. The complexity does, however, increase
because many of the individual systems and applications that a
hospital uses, talk to each other. And not only that, but they
sometimes also need to talk to external applications or services. If
you never worked with a hospital before, it might look like a
massive rabbit hole. I have the utmost respect for security officers
in a hospital, simply because of this. The connectivity to the
outside world is the next thing I’d like to talk about.

Most EMR (Electronic Medical Record) applications still run on-


premises. EPIC is probably the most commonly known one on a
global scale, where on a local level, we can find others, as well.
These applications are built for the way a hospital needs to focus
on patients as well as support extensibility to other systems for
things like medication requests, lab results, communication to
other healthcare providers, etc. Most of those interconnected
systems don’t have modern APIs and might even use proprietary
communication services to talk to the EMR applications. The data
they share with each other can also heavily vary. Medication
information sent to a pharmacy is quite simple and might even be
an XML file. Data coming from X-Ray or MRI machines is for
obvious reasons very different and can also vary in size.
Microscopic images can easily be multiple TBs in size. See where
this is going? Now, if you know this, do you think all of those
solutions were designed and built for VDI? If you are very lucky,
some of them might, some of them might offer reference

183 VDI Design Guide Part ||


architectures which were validated, but that’s just a handful. This
is again one of the challenges which makes a healthcare EUC
project different. The majority of end users will consume an EMR
application, so in terms of sizing you need to take this into
account. And since most of those apps are single threaded, this
means you need beefy hosts with powerful, fast CPUs to have a
proper density on a host. More about that later.

Ever been in line at a pharmacy or reception at a clinic? What you


will see there is that multiple employees are helping patients but
share computers to do so. It’s quite normal for a handful of
physical PCs to be shared amongst 10-15 people, the reason being
that not all of these employees are constantly at a desk. If a patient
is in line, the employee hears the patient out and could easily walk
away to prepare medication, or to arrange treatment, etc. While
the patient waits, the physical PC can be used by another
employee. Sharing a PC like this requires something we call Fast
User Switching. The session of the first employee still is active, but
in the background or in another user session which is
disconnected. They can quickly open up the first session by using
a smart card, fingerprint, or something else. The importance is that
switching between employees has to be very fast, otherwise it
impacts patient care. Although EMR applications were never built
for VDI or RDSH platforms, you can imagine that quickly
disconnecting and reconnecting sessions is something it can excel
at.

More information about the deployment of EMR applications on


VMware Horizon, can be found here:

https://techzone.vmware.com/resource/vmware-horizon-
deployment-guide-healthcare

184 VDI Design Guide Part ||


Hospitals and clinics who provide first aid or have emergency
rooms will offer 24/7 services. This also means 24/7 SLAs with
strict availability requirements. In Northern Europe 99.9% uptime
is quite normal. I have even seen cases where 99.99% uptime is
required. Although everything is possible, it does have an impact
on the architecture you choose and implications for your design.
Assessing the SLA and validating of all of the individual
components in the chain to make sure they are all able to comply is
necessary. I have seen multiple customers where 99.99% uptime
was required, but their datacenter network needed to be upgraded
to satisfy the required uptime. One story I’ll never forget was at
one of my first projects abroad. I got the opportunity to design the
Horizon part of a large multi-datacenter infrastructure. Horizon 6
had just launched, and the customer wanted to build a Cloud-Pod
Architecture (CPA) because they had a 200 Mb WAN connection
between the datacenters which wasn’t sufficient for things like
async replication. Because of the size of the project, the design and
implementation took us many months. As I didn’t want to stay
abroad for multiple months straight, I went home for the
weekends. At some point, we were busy setting up desktop pools
in both datacenters and initializing the PODs. My colleague was
setting up load balancing for the Connection Servers and Security
Servers (this was back in 2014), and Global Site Load Balancing
(GSLB) so we could present both PODs to the entire organization,
which were spread across the country. In theory, this was
supposed to work. We weren’t allowed to actually test a failover,
and the customer ensured us that everything would work as
expected. They had been running DR tests every year, and
Horizon should just run as expected as they were confident about
their network and VDI solutions. Spoiler alert: it obviously ended
in a shit storm.

I normally don’t really believe in Murphy’s Law. In our final week


at the customer site, my colleague and I were finalizing and testing
some GPU stuff. It was the last day at around 3:00 PM and our
flight would leave at 6:00 PM. We were just tweaking and tuning
the UX, when all of a sudden, people came running into our office
and claiming there was a major outage. We were still in the
platform but connected directly to a couple of desktops in a pool in

185 VDI Design Guide Part ||


our local test environment. After some IT people came storming in,
it was quite clear that shit had hit the fan. In one of the datacenters
that housed one of the PODs, the main fabric to the outside world
failed. Apparently, it’s also quite common in places other than The
Netherlands that people who dig deep holes in the ground don’t
always know how to avoid fiber cables. The whole datacenter had
an outage, but luckily, we designed the platform to support such
an outage. The customer also had an entire datacenter network
that was redundant, so no issues, right? Well, although they had
everything redundant on their side, and their routers, firewalls
and load balancers automatically tried switching over to the other
datacenter, the ISP didn’t switch over. Apparently, the service they
consumed from the ISP only offered a single circuit. When the
fiber connection that was tied to that circuit became dark, it
couldn’t switch over to the other datacenter and thus no one was
able to work. In the past they ran a couple DR tests, but they never
included the ISP in those tests. The company who managed the
fiber connectivity couldn’t be reached because their support team
was overloaded. It took three full days before the connection was
restored and the customer could finally work again. I seriously
don’t want to know how much this cost them but having
approximately 1,400 employees who aren’t able to work can cause
a company to go bankrupt, just because a couple of people weren’t
critical enough about their SLAs and DR plans. My colleague and I
nearly missed our flight out. I was honestly surprised they let us
leave because the only thing we knew at that time was that the
whole network failed. My lesson learned here was never trust a
customer who doesn’t let you completely validate a DR scenario
(or any business requirement for that matter).

HOW DOES HEALTHCARE FIT IN THE

FULL EUC PICTURE?


Healthcare is more than just VDI sessions being consumed from
fat or thin clients. A traditional healthcare provider might still be

186 VDI Design Guide Part ||


able to run 95% of their workloads on a VDI platform, but when
looking broader, most hospitals where I worked in the past couple
of years are also innovating. It’s not just because a vendor like
Microsoft ends support for Windows 7, it’s because they are
embracing new technology and are using this new technology to
improve their business processes and the care for patients. A
perfect example I recently heard is IoT related. One of my
customers is a massive academic hospital who treats tens of
thousands of patients per day. The hospital is quite large and has
wheelchairs for patients to use all over the place. There should be
plenty enough, but quite often they got complaints from both
patients as well as nurses that wheelchair parking points were
empty. On average, a wheelchair has a one-way trip from the
entrance to a certain area, or the other way around. This caused
much frustration and thus needed a solution. What they did is
equip every wheelchair with an IoT sensor and Wi-Fi beacon
connectivity. All around the hospital, you can find giant monitors
that display where the nearest available wheelchair is located. If
too many wheelchairs are situated in a certain location, someone
from facility management is paged to redistribute the wheelchairs.
I think this is a brilliant example of how innovation can be used to
increase patient care and also shows that healthcare customers
need to look beyond VDI and fully embrace the possibilities of a
Digital Workspace platform. You’d might not agree that IoT is part
of EUC, but we’re still talking about devices (in this case, things)
which need to be managed.

I’d like to dive into a couple of other examples of use cases that
required things like Workspace ONE Access or Workspace ONE
UEM to get similar result. The full EUC picture includes all
solutions in the entire Workspace ONE and Horizon Suite, but it
also heavily leans on the VMware SDDC suite. While this section is
dedicated to designing the ideal healthcare platform, I think it’s
reasonably impossible to cover everything in this section. I could
probably dedicate a whole book just on healthcare. I might even
do it when this book is finished. Let’s dive into a couple of
examples of healthcare-related EUC use cases first.

Unified Endpoints

187 VDI Design Guide Part ||


If you ever had to visit a hospital for medical reasons, you might
have noticed the large number of different devices which are in
use by nurses, doctors, receptionists, logistics, technical support,
medical imaging (such as X-Ray and MRI), pharmacists, etc. Some
devices are used by a single person, some devices are shared by
multiple people. Sometimes they are shared based on a shift and
sometimes shared with fast user switching. Some of these devices
came with specific equipment (like medical imaging equipment)
and quite often are delivered with a specific version of Windows
10. In my honest opinion, IT admins who are responsible for
managing all of these different devices with all those different
OSes and different applications deserve the utmost respect. They
have to deal with so many variables that it’s relatively hard to
manage such a device landscape. Because of the large diversity of
use cases and their devices, it always required different
management strategies, as well. Most devices could easily be
managed through the Windows AD and SCCM, while other
devices could be managed through a more modern approach.
What we are now luckily seeing is that customers are slowly
moving away from the traditional way of managing endpoints
(through SCCM or other PC Lifecycle Management solutions like
Altiris). Traditional management heavily leans on Active Directory
and permanent connectivity to manage and secure these devices.
When the pandemic started and people needed to work from
home, this quite often meant that VPN connectivity was required
to not only access company resources, but also to receive security
updates and let IT admins stay in control over those devices. And
to be very honest, VPNs aren’t really suitable for those purposes.
By the way, this is me being politically correct. Let me rephrase
this. VPNs suck. Period. VPNs just introduce latency from a
connectivity perspective, and complexity from a user experience
perspective. And with all the possibilities introduced in Digital
Workspace platforms, traditional VPNs are “Virtually Pointless
Now” (as my great friend Spencer Pitts would say).

Let’s go back to managing those devices. What you really would


like to do, is introduce a device management strategy which:

• is independent of the type of devices a user is consuming

188 VDI Design Guide Part ||


• is independent of the type and version of an OS which the
device is running
• is independent of the ownership type (BYOD versus
company owned)
• is independent of the location where a user resides
• doesn’t require a permanent connection to the AD to
receive updates and security policies
• can deploy applications to the device, without the user
being close to an app repository
• supports self-service where possible
• still can use a VPN, but in a transparent and modern way
(without crappy UX issues) and depending on the context
of the end user
• can support modern devices such as wearables and
VR/AR goggles
• is capable of being used within a zero-trust strategy
• is capable of integrating into your existing traditional
management strategy
• supports phones and tablets, as well

If you think a solution like this doesn’t exist from a single vendor, I
have some news. Please don’t see this as a sales pitch, but a
solution like Workspace ONE UEM in conjunction with
Workspace ONE Access and Workspace ONE Intelligence can
seriously offer you the ability to introduce a single device
management strategy for all the different endpoints you might use
in a healthcare environment.

One of the clients I have worked with recently already moved


towards Workspace ONE a couple of years ago, primarily for the
reasons I mentioned above. They wanted to move towards a single
management strategy for devices and also transition to the
possibility to introduce BYOD without a big impact. This all
succeeded very well, which led to nearly 25% of their devices now
being a BYOD-based device. Sure, this includes a lot of
smartphones, but some doctors and researchers wanted to use
their own macOS, Windows 10 or Linux-based devices, as well, to
run their corporate apps on.

189 VDI Design Guide Part ||


Early in 2021, they wanted to run an experiment with VR devices.
Some patients who are undergoing chemotherapy quite often have
to wait many hours during their therapy sessions. The research
team of the customer wanted to use VR goggles to offer some
distraction to patients, but also wanted to see if certain VR
scenarios could also be used for pain relief. At the time of this
writing, the studies are still running. The distraction use case has
been experienced as positive and thus the customer wanted to
scale up from a proof of concept to a pilot. They invested in 100 VR
devices, and you don’t want to manually manage all of these
devices individually. And as those devices are offered to patients,
you can also imagine that it might occur that a device needs to be
brought back to factory defaults when they fail. They reached out
to us to see if it would be possible to manage those devices
through Workspace ONE UEM, as well. I did remember VMware
announcing that during VMworld 2019 and so we dove into this.

For Project VXR (which you can find in a later section) I worked
with the VMware team to test some things out with an Oculus
Quest, but the first Quest didn’t have proper device management
support yet. Everything changed when the Oculus Quest 2 was
released, and specifically the Oculus Quest 2 for Business. With the
normal Quest 2, you would need a Facebook account in order to
use it, but with the Oculus for Business models, a device
management solution can be used to manage the device, as well.
Workspace ONE UEM fully supports these devices. Without going
into a lot of detail (I don’t want to spoil all the fun of the Project
VXR section), the customer was able to fully manage all of the VR
goggles. The goggles are configured into a kiosk mode, so patients
can just use the VR apps which are provisioned on them, and
that’s it. Like I mentioned earlier, the use case involving the
distraction of a patient during chemotherapy was successful and
has been fully embraced by the customer. Pain relief studies look
good, but no results have been posted by the customer yet.

A use case like this shows the potential of unified endpoint


management in a healthcare scenario. Being able to introduce new
types of devices, while retaining an existing device management
strategy, is essential in reducing complexity and enabling IT to
also become a business partner to the organization.

190 VDI Design Guide Part ||


Identity Management

What we saw during the pandemic is that (especially in


healthcare) organizations are working together so the brightest
minds could collaborate. Although this event can be seen as an
accelerator for collaboration, the idea of medical specialists and
researchers who are working for different medical institutes, but
would like to collaborate, isn’t new. I have worked with two
different customers who offer internships for researchers spread
across both their institutes. In some weeks, an intern would work
from a single institute, while during other weeks they could be
working from both. This collaboration program was introduced
because one of the institutes is an academic hospital, while the
other is a research hospital.

Both institutes have their own IT department, digital workspace


solution, AD, security policy, device management strategy, etc.
The academic hospital fully embraced BYOD, while the research
hospital was holding back a bit on this topic. The primary working
location of an intern dictated where their primary digital
workspace was located, as well, which meant their device was also
fully managed by the IT team from that location. Oh, and
something worth mentioning is that the AD domains from both
institutes weren’t trusted, the reason being the complexity of both
domains and their enormous history (both started with a Windows
2000 level) with tens of thousands of accounts. Another reason was
that some of their networks overlapped in terms of subnets.
Because the domains weren’t trusted, interns got a user account in
both institutes and two email addresses, as well.

Now, imagine if you were an intern at one of those institutes. You


walk in the office in the morning, fire up your device, connect to
the Wi-Fi network, log on to the domain, start your research
applications, and you can work. Fairly simple, right? Now you
need to get some information from the other institute, so you
establish a VPN tunnel to the other institute, start a remote
desktop connection, open up your applications, find the
information, close down the remote desktop connection, done.
Working from the other institute, it would look similar. Now, all of

191 VDI Design Guide Part ||


a sudden, those interns started to work from home quite often. If
they now wanted to switch over from one remote desktop session
to another, they had to disconnect the remote desktop session,
disconnect the VPN, establish a VPN connection to the other
institute, start the remote desktop session, and done. That is a
three minute longer process, as well. If they had to switch multiple
times a day, you can imagine how that must have felt. This perfect
example of a shitty user experience is how end users are
introducing shadow IT into an organization.
To keep quoting Mike Levey, there was indeed a better way to do
this.

The idea came from one of my EUC friends, Huib Dijkstra. Both
customers already invested in Workspace ONE Access but were in
the phase of migrating to it so it could become their main
authentication portal for people coming from the outside. The idea
of Workspace ONE Access is simple: you sign into the portal and
when you open up an application from the portal, your credentials
are used to authenticate into the application. The SAML standard
for identity federation, enables an end user to work with single
sign-on (SSO).

Now, Huib’s idea was simple. What if you could use single sign-
on between the two institutes and fully federate the identity from
one institute into the other? This would enable users to just sign in
to one platform and start applications at the other institute without
having to go through all of the authentication steps. It would then
also be possible to enable them to use a per-app VPN instead of a
device VPN and establish it when needed.

The following article from Peter Björk will help you build a similar
environment yourself. Although it’s already from 2017, it is still
very useful!

https://blogs.vmware.com/horizontech/2017/05/using-vmware-
identity-manager-transform-users-active-directory-domains.html

192 VDI Design Guide Part ||


The wonderful world of application sprawl

Many healthcare organizations have existed for decades. Some of


them are still relatively small regional hospitals, while others grew
from probably a couple of different departments and specialties in
a single building to massive campuses with tens of thousands of
patients being treated every day. Growth doesn’t happen in a
single day and is mostly organic. Growth can be dictated by the
population (The Netherlands went from 14 million people in 1979
to 17.5 million in mid 2021) or possible new diseases, or for
instance new treatment methods for existing diseases. Growth
from an IT perspective doesn’t just mean that the number of
employees increases, it also means that the number of applications
increases. It’s just how stuff works, especially in healthcare,
because of the complexity of the entire healthcare process. There
isn’t a small number of applications which covers it all (like in
finance or some forms of education), so the landscape at healthcare
customers is in many cases massive. The introduction of shadow
IT hasn’t helped a lot in a sense that IT isn’t always aware of all of
the mission-critical applications that are in use in a hospital. Sure,
Dropbox might not be considered a mission-critical application,
but I have seen many customers who have built their own
Microsoft Access applications, Excel macros, or other fancy apps
on which they now heavily depend. But how do you find out?

Throughout the years, I have conducted many desktop and


application assessments at healthcare customers. With the use of
Liquidware Stratusphere, it’s relatively simple to create a
comprehensive overview of the entire desktop and application
landscape. And yes, this will for sure expose shadow IT.

193 VDI Design Guide Part ||


The first time I ran such an assessment, I was surprised by the
outcome. It didn’t just show an insane number of applications in
use, it also showed which ones were used and how they consumed
resources. Now, in the VDI Design Guide I dedicated quite a large
section to both applications as well as shadow IT, so I won’t get
into a lot of detail on those the approach or methodology of such
an assessment. What I would like to cover is the outcome and why
it surprised me. Like I mentioned, growth can be quite significant
to healthcare organizations. And using the right tool (or in this
case, application) for the job is essential to provide the best care
and service to patients. With all of these facts in mind, we ran the
assessment for 6 to 8 weeks. During these weeks, we also took the
opportunity to have a conversation with IT and some of the
different departments so we could get a context that would match
the outcome of the assessment. Interestingly, IT was thinking they
were relatively in control over the applications landscape.
Conversations with some of the departments gave me the exact
opposite idea (that IT wasn’t always able to help them and so they
took matters into their own hands). The metric side of the
assessment confirmed what IT and the various departments were
telling me:

• they managed approximately 300 traditional applications


• they managed approximately 30 modern and mobile
applications
• some departments managed approximately 80 specific
applications
• they suspected they had approximately 50 shadow IT
applications

We didn’t really know what to expect from a customer with 4,500


employees of which 1,000 were researchers. It could have been a
great pub quiz question to pick the winner from the rest of the
competitors. The outcome of the assessment was as follows:

• the number of traditional apps in use and managed by IT:


290, so this was relatively close
• the number of modern and mobile apps in use and
managed by IT: 30, so this was spot on

194 VDI Design Guide Part ||


• the number of known apps that were managed by
departments, were indeed approximately 80
• shadow IT isn’t called shadow IT for a reason, over 800
applications were in use without IT knowing it

I don’t have to tell what this outcome did to IT. First, they
expressed a lot of disbelief, which luckily led into the quest for a
solution. Second, it gave us the opportunity to focus on helping the
customer solve this challenge.

There are multiple ways and an endless number of tools which can
help you to normalize application sprawl. What you want to
avoid, though, is to introduce point solutions to manage all of the
individual applications. This is where the full VMware EUC stack
steps in. All of the different types of applications can be
provisioned to an end user and in such a way that shadow IT
could become history. Here is an overview showing some of the
types of apps and the delivery method of the app:

Type of applications Solution


Web-based applications with Workspace ONE Access
SSO
Web-based applications without Workspace ONE Access
SSO
Native Appstore applications Workspace ONE UEM
(Windows 10, Android, iOS)
Native installable applications Workspace ONE UEM
(Windows 10, macOS, Android,
iOS)
Traditional applications on VDI App Volumes
Traditional application with ThinApp
isolation requirements
Traditional application with OS VMware Horizon with
dependencies Instant Clones (base image)
Remotely delivered applications VMware Horizon Apps
through RDSH

The list is obviously a lot bigger. Instead of sharing the list and the
ideal delivery methodology, I’d rather guide you to my blog. In

195 VDI Design Guide Part ||


2015, I developed the first version of the Application Delivery
Decision Diagram. The diagram has now evolved into a
comprehensive flowchart which covers all of the choices you
currently have within the entire Workspace ONE suite. Please find
the latest version of the diagram here:
https://vhojan.nl/application-delivery-decision-diagram/

196 VDI Design Guide Part ||


Hardware Accelerated desktops

Having to support a crapload of applications also means that you


need to work with all of those applications to ensure
supportability between them on your EUC stack. Many
organizations choose VDI delivery due to the availability
requirements and the LCM requirements for the majority of the
application landscape. Now, running those hundreds of
applications inside a virtual desktop brings another challenge to
the table. Those applications weren’t built for a virtual desktop.
They were built for a physical one. And physical desktops
nowadays all are equipped with a GPU. This means they were
tested, validated, and released with GPU resources being available
on a client system. Please don’t be afraid that I’m going to say that
you now need to invest in a GPU for your VDI because of this
reason. What I am going to say, though, is that some of those
Independent Software Vendors (ISVs) are little by little noticing
that they can actually use a GPU to their benefit, not just to make
their application look good, but also to offload calculations from
the applications to the GPU to improve performance. When
someone told me that the first time, I was quite skeptical. After
doing a little bit of research, my skepticism turned into disbelief.
Instead of explaining it all, I’d like to challenge you to run a little
bit of the same research that I did. It’s fairly simple. Go to a
Windows 10 machine that has an NVIDIA GPU. Log in to
Windows and start a command prompt (cmd). Now navigate to:

“C:\Program Files\NVIDIA Corporation\NVSMI”

Now run nvidia-smi.exe and check the result in the table. The
table shows an overview of all of the processes which utilize the
GPU. The processes could be marked with a C (compute), a G
(graphics) or C+G (compute + graphics) in the Type column. In
case a process is using the GPU for computational reasons, the
table will show either C or C+G.

With such an enormous list of applications being able to run on the


virtual desktop, you can imagine that the chance is quite high that
an application expects a GPU, which means it will do the same on

197 VDI Design Guide Part ||


a virtual desktop (you can fully assess this with Liquidware
Stratusphere for Windows, macOS, and Linux). Considering a
GPU for your VDI platform or at least the majority of your use
cases will ensure proper supportability in the future. My
prediction is that more ISVs will invest in GPU offloading simply
because their applications are becoming more complex and users
consuming those applications expect a good user experience, as
well. With GPUs being generally available, it’s just a matter of time
before more ISVs will be utilizing them. Of course, NVIDIA now
rules the GPU world and offers an optimized computational
CUDA API, but others like Intel and AMD also offer
computational support with OpenCL (just like NVIDIA). My guess
is that a lot will happen in the coming years, which you need to
take into account if you are now investing in a new EUC stack.

One popular innovation is built around GPU and CPU resource


repurposing. Through a concept we call “VDI by Day, Compute by
Night”, we were able to dynamically schedule those resources for
both VDI as well as computational workloads. The section called
“VDI by Day, Compute by Night” will cover this concept in detail.

Video consultations and video calls

Due to some medical reasons, my wife and I are a regular guest at


one of the biggest hospitals in the Amsterdam area. Once every
three months we pay a visit to a specialized doctor, but when the
first lockdown was announced, our type of appointments weren’t
physically allowed anymore. Instead, a video consultation was
organized, so our doctor was still able to see us. In theory, this is a
great solution. They used some kind of web-based solution, which
made it really simple for us to open up a link in an email and
establish a video connection with the doctor. Because of the
simplicity and straight-forward approach in facilitating remote
consultation, the UX for patients is great. There was only one
challenge, the hospital offered a virtual desktop to all of their
employees, which caused for a low camera resolution at the
doctor-side of the solution. She could see us perfectly, but she
looked like Princess Zelda from the 8-bit version of the game.

198 VDI Design Guide Part ||


During the first consultation, we just accepted the crappy quality,
but the one following three months later, was established from her
phone. This led to a better UX for both sides, but yet again a form
of Shadow IT (as it was never meant to be used from a phone).

At first, it sounded like a challenge which could easily be solved


by enabling the right form of offloading or just by simply
accelerating the desktop. A little bit of research led to a simple
explanation for the doctor choosing her phone over the redirected
camera. The web-based solution simply didn’t offer any offloading
features, and the VDI simply didn’t have any GPUs. The result is
that A/V support in a remoting protocol is available, but it heavily
depends on CPU resources and enforced quality settings by IT.

The only possibility (in my opinion) for the hospital to offer a


better UX, is to add GPU support to the VDI platform and
configure the connection protocol to fully use the vGPU profile
assigned. Such an application is in most cases developed for
devices with a GPU. The same with this application. It simply
performed best with a modern browser that can offload certain
tasks to a GPU.

Now, for more mainstream solutions like Zoom and Teams, there
are other options, too. Sure, using a GPU inside the virtual desktop
is probably the easiest to implement and also the one that can be
used from a wide variety of endpoints. The only thing is that you
surely need to optimize the connection protocol for such a
workload. Optimizing unfortunately isn’t just done by selecting a
certain checkbox. What I would recommend here is to download
the Blast Extreme Optimization Guide. It will cover most of the
options which you can tune to get the most out of the connection
protocol for video conferencing solutions, accelerated through
GPUs. You can find the guide here:

https://techzone.vmware.com/resource/vmware-blast-extreme-
optimization-guide

199 VDI Design Guide Part ||


Another option, which is becoming increasingly more popular is
the option to use a specific plugin for offloading to the endpoint.
VMware created a set of plug-ins which allow you to create a
connection between the unified communication client (such as
Teams) running inside the virtual desktop and the one that runs
on the endpoint. When someone calls you or you call someone, the
video stream is directed to the local endpoint, so no additional
video resources are required inside the virtual desktop. Although
this might sound like a great advantage, there are still some
ground rules:

• These optimization packs aren’t available for every


endpoint and for every unified communication solution.
Windows 10 has the biggest chance of supporting all of
them, but Thin Clients could be a different ball game. Be
sure to reach out to your manufacturer to find out what
they support.
• The optimization packs will only work with the regular
VMware Horizon Client. So, the HTML client won’t work
with offloading (as of this writing).

You can find the optimization plug-in for Microsoft Teams and
related information here:

https://techzone.vmware.com/resource/microsoft-teams-
optimization-vmware-horizon

200 VDI Design Guide Part ||


Zoom has also provided an optimization plug-in, which can be
downloaded here:

https://support.zoom.us/hc/en-us/articles/360031096531-Getting-
started-with-VDI

Monitoring

Like in any other industry, monitoring the EUC infrastructure is


essential to ensure a great user experience or to act quickly in case
the user experience is impacted. When looking at tools and
solutions, the VMware EUC stack does have some event
monitoring, auditing, and user experience tools, but none of those
tools give you an overview of the entire stack. Now, using tools
like vRealize Operations, ControlUp (which VMware has a
partnership with and offers as a separate SKU with Horizon) can
give you a good start, but what makes monitoring in healthcare
different is the ability to include EMR applications in your
monitoring stack, as well. To most hospitals, their EMR application
is the most critical application of them all. It’s quite often also the
most complex of them all and of course (due to the fact they are
mostly single-threaded) the one that requires a shit ton of
resources. If you designed your EUC stack around such an
application, you definitely want to make sure both the application
and the infrastructure are running perfectly fine.

When looking at monitoring of the infrastructure, many tools are


already capable of doing this. The same is the case for user
experience monitoring. Including the EMR application into the
monitoring stack is something different. Some solutions will allow
you to build custom sensors or metrics to include the applications
into your stack. While this might work well, I always try to stay

201 VDI Design Guide Part ||


away from custom developed and built code for such solutions,
the main reason being that if the person who built the code leaves
the organization, no one is able to manage the solution anymore
(because the developer left without proper documenting what he
or she did). Using solutions where you can include the metrics out
of the box is in my opinion the best way to approach this.

vRealize Operations has the possibility to include those metrics by


integrating the Management Pack for Care Systems Analytics, but
this focusses on Epic only. More about the management pack can
be found here:

https://docs.vmware.com/en/Management-Packs-for-vRealize-
Operations-Manager/10/care-systems-analytics.pdf

As there is more than just Epic in the world, you also might be
looking at a different solution for monitoring your EMR
application. Another solution which provides out-of-the-box
support EMR solutions -- but in this case for many different ones --
is the Performance & Application Monitor from Goliath. Goliath
offers support for major systems like Epic, Cerner, and
MEDITECH.

There are also application monitoring solutions, but they don’t


integrate with the VMware stack, which makes it a bit more
complex when trying to stay in control over the entire stack.

202 VDI Design Guide Part ||


DESIGN CONSIDERATIONS
A lot has been covered already, but as the topic is quite complex,
there is a lot to consider. The following sections share some more
of the insights I gained in the last couple of years at several
healthcare customers.

General EUC considerations

• Don’t force everyone to work on VDI. For some use cases


it’s just not the best solution. Especially when people are
working with applications or peripherals which don’t
really support remoting, try to manage them in a different
way. Enrolling those devices into Workspace ONE so you
are in control sounds like a better plan.
• When sizing for VDI, take concurrency into account. It’s
always the best plan to run a full application and desktop
assessment to find out what the concurrency looks like. If
you aren’t able to run an assessment, be sure to look at
concurrency, especially when working with a 24/7
customer.
• If the customer has a 24/7 availability requirement, it’s
very important to find out on what. I worked with many
bigger and smaller hospitals, and quite often they claim
that they need this for everyone, but just don’t see the
impact. When looking at regulations from a government,
for instance, it’s mostly dictated by the EMR application,
which could give your some more flexibility in terms of
finding an appropriate DR solution.

Application considerations

• Expect traditional apps. A lot of them.


• Licensing of those apps will become your worst
nightmare. You can expect that they might require

203 VDI Design Guide Part ||


hardware-based licenses, fixed MAC addresses, or
unchangeable computer names. This is just because some
developers suck. Like really suck.
• It would be best if a customer has access to the entire suite
of application delivery solutions. It will give you the
biggest chance of bringing any application to any type of
device while not having to spent weeks and weeks of
maintaining a single application.
• I covered it already quite thoroughly (and will continue to
do so in other sections) but be aware of those nasty single-
threaded applications. Sometimes customers aren’t aware
of their existence and thus will invest in a platform which
isn’t really capable of running them with a good UX.
Remember to go for fast CPUs with a lot of cores to get the
best performance with the highest density.
• With such a large number of applications, it’s quite certain
one will change in such a way that it will either impact
another app or the OS you are running on. Implementing
a proper DTA (development/test/acceptance) process
with right change management is essential to stay in
control.
• Is a gazillion more than a crapload? Then expect a
gazillion apps.
• Some applications will land directly on an endpoint, some
will be deployed in a virtual desktop or RDSH machine.
Make sure when virtualizing apps, that you are actually
allowed to do so. Some applications aren’t allowed to be
remoted such as some X-Ray image viewers, because of
how remoting protocols could potentially impact the
quality of the images. This might also be regulated by the
government or medical institutes, so be sure to check it.

Security considerations

• Does your customer need to comply with certain security


regulations? If the answer is yes, it’s going to be straight-
forward. Those regulations are completely dictated and
can be translated to design constraints.

204 VDI Design Guide Part ||


• Choosing the right security measures to apply to a security
policy is really important. If a customer requires a second
factor for authentication, choose one which fits the
customer. You can imagine that RSA tokens or Google
Authenticator don’t work if you quickly need to help a
patient at a desk through fast user switching.
• Applying security policies through Workspace ONE UEM
is one thing, but please don’t forget to define a compliancy
policy, as well, so you can actually control and validate if
security policies are applied.
• Don’t overdo security. Completely implementing a Zero
Trust strategy sounds awesome, but if end users find it
hard to use, they will find ways to work around your own
version of Fort Knox.
• One thing that’s not really tied to security, but certainly
important, is that you need to create awareness of the
possibilities of a platform like Workspace ONE Access. I
still visit customers who were using Workspace ONE
Access for years, but still see employees investing in
software and services that don’t support SAML, for
instance. Make sure procurement is aware of new
requirements around SSO and conditional access
possibilities.
• Due to the fact you have a gazillion apps, it might be a
challenge to run micro segmentation with NSX, especially
if you want to apply rules for both north/south traffic as
well as east/west traffic. Focusing on just disabling
east/west traffic and creating rules for north/south is
relatively easy to set up and a lot better to maintain. Please
note that vRealize Network Insight (or vRNI) is your best
friend in this case as it will help create insights in network
requirements on a per-application basis.

Hardware considerations

I already covered a little bit about hardware choices, specifically


when from an application perspective. Still, there are some
additional considerations to take into account.

205 VDI Design Guide Part ||


• When your environment needs to be equipped with GPUs,
you basically had to go for normal rack servers because of
the number of GPUs you could run in a single host. That
has now changed. Companies like HPE now offer blade
enclosures which can accommodate conventional GPUs.
On one of the projects, the customer went for a fully
composable HPE platform, just based on blades where all
blades were equipped with four NVIDIA Tesla T4s. It’s a
bit more expensive when looking at the TCO of just the
hardware, but it can save you a lot on networking because
of the advantages of such a platform.
• Based on the size of the application platform, be sure to
include enough storage in case you choose to run those
applications with App Volumes. Every application will be
located in a virtual disk (VMDK) on your shared storage,
hence the requirement for scale.
• If you choose to go for VMware vSAN, I’d strongly
recommend running multiple disk groups to spread the
load. Some applications can be intensive in terms of I/O,
and it will also save you from single points of failure.

Endpoint considerations

• When you include Workspace ONE in your digital


workspace, you basically embrace the “any device”
philosophy. There are, however, some use cases that
would benefit from devices they can share. An endpoint
on a counter or desk, used and share by multiple people at
the same time, might be different in terms of the form
factor than a device that someone would take home.
Choose the right endpoint for the job.
• I’d strongly recommend using endpoints which can be
enrolled in Workspace ONE in order to reduce the
complexity of your device management process. The
example of the VR goggles is such a device.

206 VDI Design Guide Part ||


• If you need to work with Thin Clients, make sure the
management is easy and can preferably integrate with
Workspace ONE.
• If you are going to implement fast user switching with
smart cards (or NFC cards), be sure that the endpoint
supports the card reader plugin, as well. Imprivata is such
a smartcard solution for healthcare, for which a plugin is
available for both normal desktops as well as a number of
thin clients.

Scalability and DR considerations

What we’ve seen during the pandemic is that the risk of suddenly
having to cope with an enormous growth is real. Being able to
handle such an increase in the number of end users on a platform
requires an EUC solution to be designed with scalability in mind.
How to scale out depends on your use cases. You can imagine that
cloud services like Workspace ONE Access and UEM are relatively
easy to scale out. The only thing you basically need is a platinum
credit card. ☺ Your services are designed and built for scalability,
so that shouldn’t be an issue. I do recommend reaching out to
VMware to check if the type of service/tenant you consume is
capable of handling the theoretical maximum. When looking at
VDI use cases, it will be a bit different, though.

Scaling out is the easiest when your entire platform (hosts, storage,
and networking) is capable of scaling out. I covered that topic
quite thoroughly in the VDI Design Guide, so I won’t go that much
into detail here. However, there are some things we learned in the
past year which I would like to share.

• Scaling a VDI out sounds easy, but make sure you have
the equipment ready. The pandemic taught us that at such
an event, it could be hard to find hardware because big
companies (such as those hyperscalers) might have
contracts with suppliers which give them priority over
your own organization. Which brings me to the next point.

207 VDI Design Guide Part ||


• Hyperscalers don’t offer an indefinite amount of
resources. As suddenly a lot of organizations tried to
consume cloud desktops to scale out, hyperscalers
couldn’t handle the scale and had to either oversubscribe
the resources they had (with performance issues as a
result) or just say no to requesters of cloud desktops.
• If you choose to scale your desktops in the cloud, make
sure all of the supporting infrastructure (such as
AD/DNS/DHCP/NTP services) and application access is
available, as well. And make sure that you are also
allowed to run the EMR apps from a cloud desktop.

INTERVIEW WITH HUIB DIJKSTRA


Huib Dijkstra has been a VMware EUC Solution Engineer for
many years. Next to a shared passion for sneakers, we also have a
common interest in healthcare customers, plus you may have seen
us both presenting at VMworld 2019 on this topic. In session
ADV2485BU – VMware EUC as a Force for Good, we take you
through the process we followed to successfully help a healthcare
customer to fully embrace the Workspace ONE and Horizon
solutions. By using some creative and innovative approaches, we
managed to solve different challenges along the way. I’d really
recommend checking out the session. You can find it on my
YouTube channel and please forgive me for my sore throat, it was
a very busy week with five sessions, lots of networking, parties,
and a major jetlag.

Check it out here:

https://youtu.be/nTYOPu6BShU

208 VDI Design Guide Part ||


On the 1st of April 2021, Huib joined the VMware Carbon Black
team as a Solution Engineer. Although I was sad to see him leave
the EUC business unit, I think he made a great choice as VMware
is focusing heavily on Intrinsic Security which is one of the topics
many customers are facing challenges with.

Me: You’ve been around in the EUC space for many years, how
did you end up in EUC?

Huib: I’ve always considered my first job after school temporary.


Working at the helpdesk of a telecom company my main focus was
to land a job where you could have a positive impact on people’s
days. After talking to several companies, I started working for a
systems integrator designing and maintaining Citrix
environments. Looking back, I learned equally as much on both
jobs, at the system integrator what it meant to technically build
large infrastructures to facilitate workplaces for end-users and at
the helpdesk I learned the value of communicating clearly and
efficiently.

Me: What sparked your interest in healthcare?

Huib: It’s seeing specialists, physicians, caretakers and generally


every person who chose to make a career out of improving
people’s lives at work. Healthcare is a people industry, from a high
level serving the public and easily relatable because we’ve all had
to receive care at one point or another in our lives. Just the thought
of supporting or improving the capabilities of the men and women
in this field still excites me to this day.

Me: I already covered some of the innovative stuff that VMware


has been doing in this space (VDI by Day/Compute by Night, VR,
Identity Federation). Do you think other industries can learn from
the way healthcare organizations use technology to improve
business processes and patient care? Could you name an example?

Huib: I think we’ve only scratched the surface of what’s


technically possible, and I’m not talking about a future way far
out. I think there is tech laying around we just have to find the
tinkerers and architect willing to think outside the box.

209 VDI Design Guide Part ||


There is this Gartner figure from a while ago, it describes the
difference between innovating and disrupting. I don’t think that
“making old things obsolete” is the thing to strive for in
healthcare, the point I’m simply trying to make is that it shouldn’t
stop us from trying new things.

Maybe because healthcare is such a righteous cause or maybe


because it sometime requires vendors to customize their hard and
software into dedicated healthcare formats (for example COW’s -
computer on wheels), however I came across many engineers and
architects willing to search further and try new things just to better
support their colleagues in providing better care. And this might
be the thing with other industries can replicate.

And to provide some examples, I’m thinking almost everybody


who doesn’t do data-processing or doesn’t sit behind a desk. For
example, pilots or flight attendant in the airline industry, police
officers and special investigators in law enforcement or people
working in the offshore business. The key thing to helping
organizations innovate should be that you’re going to be looking
for technology that is readily available and should only be
integrated, customized or adapted to the use cased you are trying
to innovate.

Me: When looking back at the EUC side of your career, what was
your favorite EUC project and why?

210 VDI Design Guide Part ||


Huib: Back in the day I worked as a contractor for VMware, and I
was working on an extremely large VDI deployment for a global
company. Why this one was my favorite, is partly because
Windows 10 was just released and was hitting us with endless
technical difficulties. It was really facing us to venture into the
unknown we felt like explorers at the time. For the other part of it,
it were the people. We had a really diverse group of people from
all over the world. We got stuck with each other and went through
some tough times, so looking back I have some fond memories
(although maybe at the time I felt differently).

Me: How did the pandemic impact your view on EUC?

Huib: Not that much, it accelerated or forced the adoption of


different work styles but that’s been something we have been
promoting for many years. From an architect’s perspective the
landscape is now very dispersed and there are some final hurdles
to concur for the companies that excel in EUC. The main hurdles
being SSO (single sign-on) to everything while the endpoints are
migrated of the domain and network, and before I forget: totally
rethink security (didn’t see that one coming, did you?)

Me: Hahahaha, let’s talk about your recent move towards the
security side of the business, why this interesting move?

Huib: Remember that picture form 2017 where a guy was mowing
the lawn as a tornado approaches his house? I loved the quote
from his wife (Cecilia Wessels) who reportedly said “he was
keeping an eye on it”

If you look at traditional security, it is either probing, monitoring


and protecting the internal network and VPN’s or is protecting
and monitoring the file system based on known threats and
malicious files. While cybercrime is a trillion-dollar business and
attacks have grown into chains of highly sophisticated steps where
different tactics, techniques, and procedures are being used, most
of them will probably not be detected by traditional security
solutions.

211 VDI Design Guide Part ||


Talking to EUC experts the last few years, I found a lot of people
mowing the lawn and “keeping an eye” on the approaching
danger. By this I mean that a lot of people are still busy trying to
roll out a modern workspace experience and are focused on
collaboration, application portals and modern management and
are “keeping an eye” on the fast-approaching security threats but
did not yet have come round to adjusting their security solutions.
This to me seemed like an awesome challenge so that’s why I’m
jumping onto this new career.

Me: At most of the customers I work with, security is always a


hard topic. Are you seeing the same thing? If so, why is it such a
hard topic?

Huib: Imaging yourself buying or renting a new house, what do


you think of first; color on the walls, where the TV is going to sit,
or do you think about the make, model and features of your locks
and fire warning systems?

Because of the digitalization trend that’s been going on since the


1940’s, we as a society are relying more and more on IT systems.
And this trend has boomed in recent years with the introduction of
fast connections everywhere, cheaper storage and the introduction
of public clouds. These accelerators have changed our behavior
and increased the number of digital identities, apps and devices
(although nowadays I tend to talk and think about these as attack
surfaces).

So, a lot has changed, and the topic of prevention isn’t as sexy as
delivering a new capacity to your colleagues. That combined with
how technically advanced attacks have become makes it a hard
topic.

Me: What kind of innovations will the world be seeing in the


(near) future from a security perspective?

Huib: From my humble opinion the future is already here.

The big debate in security is “to go to the cloud, yes or no?”. Big
data algorithms, AI and machine learning by nature work better

212 VDI Design Guide Part ||


out of the cloud and are disrupting the security industry. By using
these new techniques, we can look beyond what’s known as bad,
and can look at suspicious behavior. This way you can be
protected for living off the land and file-less attacks and unknown
malicious files. It has also made it a whole lot easier to scale and
process more events.
And the future? I think VMware is in an amazing position to build
out the open security API’s and integrate with many third-party
SOAR and SIEM solutions. However, it is also in a position to
create deeper integrations with the rest of the portfolio. We’ve
already seen this with the Tanzu integration for container behavior
monitoring, with the vSphere integration so admins can monitor
and manage VM workloads and with Workspace ONE where the
Intelligence module can be used as a SOAR (security orchestration,
automation and response) platform. In the future I expect to see
more VMware integrations like the examples above.

Me: If you could share one key takeaway regarding security and
EUC, what would it be?

Huib: If you’re designing or re-designing an EUC environment;


it’s not done until you’ve thought about these three things:

• Next Generation Antivirus


• Threat Hunting capabilities
• Live assessment and remediation options

If you like to know more about Huib, check him on twitter:


@HuibDijkstra

213 VDI Design Guide Part ||


VDI BY DAY,
COMPUTE BY NIGHT
It might not be a surprise that this is one of my favorite use cases
for VDI. I have presented about this topic since 2018 and I’m a
strong believer that for the right customer, VDI by Day/Compute
by Night can make a huge difference. This use case all started with
a conversation I had with a customer.

Like in every project, I followed the design methodology described


in the VDI Design Guide. The main phases are:

• Assess
• Design
• Validate
• Deploy
• Migrate

214 VDI Design Guide Part ||


At the beginning of the project, I was mainly engaged as a Digital
Workspace Architect to design a VMware Workspace ONE
environment for their entire organization. The Workspace ONE
environment also contained VMware Horizon and so the
assessment phase had a major focus on use cases assessments and
interviews with the different teams. In a mainstream Digital
Workspace project, the outcome of those assessments is quite
predictable. Based on the scope of the project and outcome of the
assessments, you can get a rough estimate of the use cases that will
land on VDI or will work from physical endpoints managed
through Workspace ONE (or both). In this case, one of the
conversations turned out to be a key conversation that would
shape this project like no other.

THE IMPORTANCE OF ACTUALLY

TALKING TO END USERS


What I always recommend is to use a questionnaire to get as much
information as possible ahead of time, but in a structured way. In
this case, you will have structured data to use in the use case
definition models. One section in the questionnaire focusses on the
application and data consumption and this is where I got the idea
to include the VDI by Day/Compute by Night concept in the
design.

I had a conversation with one of the data science departments of


the customer. During the application-related conversation I got a
good understanding of their day-to-day challenges.

First of all, they have an infinite demand for GPU power. Their
work mainly consists of analyzing data, creating deep learning
models to support the analysis and using these models to improve
certain processes within the organization. As the datasets are
exponentially growing, the resources required to train their
models are increasing, as well. The more horsepower you have in

215 VDI Design Guide Part ||


the infrastructure, the shorter the training time becomes and the
quicker you have access to trained models to support your
applications. Shortening training time is not a matter of seconds,
but more in the sense of days or even weeks (depending on the
size of the dataset and the complexity of the model).

Before a model can be used for training purposes, it needs to be


developed. This is where the second challenge was. Most of the
data scientists invested in physical desktops with consumer-grade
NVIDIA GPUs to develop their models on. The first challenge here
is that if they would request a GPU desktop from IT, it could take
months for it to be delivered and ready to use. To give you an idea,
the process looked a bit like this. The data scientist got a grant to
run research on a specific topic and needed a GPU-enabled
desktop. They order a desktop that isn’t part of the service catalog
IT could deliver, so IT needs to find out if their main OEM can
deliver something with the requested GPU. Quite often those are
different models than the regular desktops since they are equipped
with more memory, workstation CPUs, and different power
supplies. After a few weeks of configuring and validating they can
deliver such a configuration, it will be shipped. This also takes
week, but finally it’s delivered. After a week of deploying, they get
the desktop, but it’s provisioned with Windows 10 and standard
drivers. The data science frameworks they use were purposely
built for Linux, so now it needs to be installed with a specific
operating system that IT doesn’t offer, either. They need to
manage it and thus will install it. After the installation is finished,
the help of the data scientist is required to configure it and after
again some weeks of work, the machine is finally ready. But wait,
since it is a machine that only gets access through SSH, they
demand for it to be installed in their datacenter. In this case they
can ensure it will be secured in the separate subnet because since
they can’t load GPOs on it, they don’t trust it. Ow, and don’t get
me started on keeping the Linux OS up-to-date. Now, as you can
imagine, this leads to a lot of shadow IT. Requesting a GPU from
the cloud is done in an instant, but also requires the data to leave
the company network. Ordering a really powerful desktop
yourself and using it underneath your desk is a great alternative,
right? Now, this led to the second main challenge. We needed to
shorten the time-to-deliver a GPU-enabled machine from 3 months

216 VDI Design Guide Part ||


to a hyperscaler-like experience (within 15 minutes and preferably
with as much self-service as possible).
The third challenge was remote access. As most of the GPU
machines were behind a firewall and no remote access was
possible, most of the data scientists quite often paid a visit to the
office during the weekends to check on the workloads. In some
cases, training can easily take weeks. Imagine if for some reason
you miss out on a whole weekend of training because of an issue
in either the infrastructure, the framework, or the application. Yep,
challenge indeed.

After a trained model is ready for application usage, it needs to be


integrated into an application. On the customer side, they use
those trained models in an inference process to run object
detection in large images. The model is pretrained to detect a
certain number of different objects in an image, and that’s what the
integration in the application will facilitate. Now these workloads
are considered to be a server workload (as they run in a server
VM), but since we wanted to provide the applications with the
power, availability and portability of a virtualization platform, but
licensed through VMware Horizon, we needed to figure out a
solution for that, as well.

217 VDI Design Guide Part ||


NUMBERS TELL THE TALE
Interviews with end users of the proposed platform are essential
but will always be a perception and thus just one reality. Sure, you
could interview all of the people in a department to get all of their
perceptions, but I think it’s very important to support their
insights with numbers. And this is where a desktop and
application assessment becomes your best friends.

Quite often, desktop and application assessments are only used to


get insights into the obvious metrics and resources that will shape
your VDI platform:

• The type of desktop used by what persona


• Desktop metrics (such as CPU, RAM, Network, GPU,
Disk)
• The number of applications
• The resources used per application (with the main
purpose to detect single-threaded applications)
• Logon processes and issues
• Peripheral usage

In our case, we also used the information gathered to expose


something interesting.

The following figure is taken from a Concurrency Report from


Liquidware Stratusphere. The overview basically shows the
average resource consumption on a 24-hour basis. CPU and RAM
consumption are being shown with a user and machine count per
hour.

218 VDI Design Guide Part ||


In this graph, two things really stand out.

First of all, resource usage doesn’t have a real peak. There is a


slight CPU peak at 8:00, possibly due to a boot or login storm.
RAM usage slowly increases a bit, with a minor peak at around
12:00 and 15:00 and another one at 22:00 before it drops again.

When looking at the user/desktop count, you can see that between
8:00 and 17:00 there is a high concurrency, but it quickly drops at
18:00. Between 18:00 and 7:00 the environment had a 10%
concurrency.

Another thing we found out during the desktop and application


assessment is that most of the applications required a GPU to work
properly.

There still is a misconception about the purpose of a GPU. Many


think it’s required to offload graphical aspects of an application or
operating system. While that is probably the biggest aspect, it isn’t
the only one. GPUs are more and more utilized from a
computational perspective, as well, and not only by developers
who use modern frameworks. Nope, one of the biggest consumers
of GPUs from a computational perspective is our friend from
Redmond. Microsoft leverages the power of machine learning to

219 VDI Design Guide Part ||


speed up certain processes in Windows 10 and probably, as well,
in Microsoft Office. The following figure is a screenshot taken from
a newly provisioned Windows 10 machine, equipped with an
NVIDIA vGPU profile.

If you run the nvidia-smi command on a machine with an NVIDIA


GPU installed (could be a virtual or physical machine), you can see
all sorts of information about the GPU usage. If you look at the
table with the processes, you can see the process name and in the
column to the left, the type. For all of the processes, that is C+G.
This means that the GPU is used for Compute + Graphics.
Interestingly enough, you can see that for example SearchUI.exe
(used for the search feature in Windows and Cortana) also uses the
GPU. The same for explorer.exe and the Photos.exe app. I don’t
have any insights on how they utilize the GPU, but it’s most likely
to run an indexing process on the GPU to quickly show search
results.

The output of the assessment showed that over 80% of the end
users could benefit from a GPU, either at the application or
operating System level.

220 VDI Design Guide Part ||


DESIGNING THE ARCHITECTURE
This was the most difficult part. We knew that the proposed VDI
environment would be better off if we equipped it with GPUs for
every end user. We wanted to lower the complexity of the
environment, so we also wanted to lower the number of different
use cases as much as possible. This led into NVIDIA vGPU profiles
for every end user. Another luxury we had in the project was that
the customer had an availability requirement of 99.9% for all users.
So, we ended up with a multi-datacenter VDI environment for
3,000 vGPU-enabled end users. The architecture basically looked
like this:

221 VDI Design Guide Part ||


Both sides can accommodate 100% of the users, which means that
one site can completely fail without a huge impact (other than a
failed session for half of the users which is solved by reconnecting
again).

Another thing we wanted to do is use multiple NVIDIA vGPU


profiles at the same time, on the same host. This is possible, but it
will require multiple GPUs (as a single, modern NVIDIA GPU
doesn’t allow for heterogeneous vGPU profiles on the same card).
We chose NVIDIA Tesla T4 cards and used four of them in a single
host. In this case, we could potentially run 64 users on a single host
with a 1GB vGPU profile.

222 VDI Design Guide Part ||


A logical result is that if your computational workload of a data
scientist requires more framebuffer, a 1GB profile won’t be
sufficient. Also, in order to run deep learning frameworks that
utilize the CUDA API (for computational instructions) you need a
Q-type profile (see the explanation of the different type of profiles
in the VDI Design Guide, in the section about GPUs and Remoting
Protocols).

So, mixing and matching profiles is essential. Setting priorities for


those workloads is just as essential. VDI resources always have a
higher priority to ensure a good user experience, which can be
impacted if noisy neighbor syndrome occurs.

223 VDI Design Guide Part ||


VDI by Day

The following figure shows the high-level architecture by day. The


environment has resources available for both virtual desktops and
deep learning workloads by day, but the priority lies with the
virtual desktops.

By night, the platform has to scale down the number of virtual


desktops according to the demand and a reservation for a sudden
demand or DR scenario. This will free up resources for the deep
learning VMs to scale up and run their workloads on the same
GPUs.

224 VDI Design Guide Part ||


Compute by Night

In the following figure you can see that VDI resources are still
available and also have the highest priority (for the same user
experience reason), but the majority of resources will be allocated
to the deep learning machines.

The concept itself sounds really cool, easily doable at first, but
along the way we found out that it’s not so easy to implement. The
main reasons are:

• A heterogeneous vGPU profile needs to be run in the same


cluster.
• vSphere features like DRS and HA don’t really take a
vGPU profile into account when trying to either move or
restart a VM on another host.
o The challenge here that you need DRS initial
placement to find out if a certain vGPU profile is
available in the cluster. DRS doesn’t do that (yet).
It will work for NVIDIA GPUs which are passed
through to a VM through assignable hardware,
but not yet with NVIDIA vGPU.

225 VDI Design Guide Part ||


• The compute application you want to scale needs to be
capable of dynamic scaling. If you run a certain training
workload that needs to start over if you shut down a node,
it’s just not going to work.

In order to get the scaling to work, you need something that


knows what is happening in your vSphere cluster, in VMware
Horizon, and with your compute application. In the first version of
the concept, we delivered to the customer, this was 90% done from
within the ML application. We just needed to dynamically scale
the Horizon Desktop pools depending on the required workload.
We quickly found out that the ease of scaling was specific to this
type of application, and we needed to get back to the drawing
board.

One of my buddies, Tony Foster (a.k.a. Wondernerd), had been


working on a similar project. I reached out to him to share some of
my experiences. He had similar challenges, but to solve them he
had built a PowerShell script. The script does a couple of things:

• It first checks how many vGPU profiles of a certain type


are available in your main cluster.
• It checks how many vGPU profiles you need for your VDI
workloads, for overhead and failsafe purposes.
• It locates a compute VM that is currently powered off.
• In case a vGPU profile is available in the cluster, it will try
to spin up the compute VM.
• If this is successful, it will do the same for the next
compute VM, and the next, etc., until you have either
reached the limit of the number of compute VMs or if no
vGPU profiles for compute purposes are available
anymore.
• The script will loop until a set end time.

226 VDI Design Guide Part ||


I have forked the script on my own GitHub repository because of a
couple of adjustments I made. If you would like to check it out,
take a look at: https://github.com/vhojan

DESIGN CONSIDERATIONS
Because it had never been done before on a large scale, we needed
to reinvent the wheel. This is why I love being a VCDX. The VCDX
methodology helped me to figure out what the proposed
architecture needed to look like, based on requirements,
constraints, assumptions, and risks.

When considering a concept like this, you need to figure out if


your end users are really going to benefit from it. Not every
compute workload is suitable to be scaled. Talk with your data
science team to find out.

GPUs and CPUs

Repurposing compute resources has everything to do with


hardware resources. Most people immediately think about GPUs,
but this isn’t the only hardware resource that needs to be
repurposed. An ML application might require fast CPUs, a lot of
RAM and some seriously fast storage and networking, as well.
Please consider the following when looking at hardware:

• When thinking about the type of compute workload,


choose the right NVIDIA GPU for the job. Yes, NVIDIA
GPUs. At this moment, both AMD and Intel don’t have an
accelerator that is capable of running both graphical
workloads and supporting CUDA as the mainstream
compute API.

227 VDI Design Guide Part ||


• Going for NVIDIA Tesla T4s might sound like the right
choice, but if your data scientists only run double
precision calculations, it might be the wrong choice.
• Consider GPUs with a sufficient amount of framebuffer
(video memory). The T4 is currently limited to only 16 GB,
so bigger training jobs might not work or negatively
impact the performance.
• Make sure to include at least two GPUs in a single host so
you are able to run desktops and compute workloads
simultaneously. More, separate GPU engines will increase
your scalability options.
• Talk with your OEM about the right type of host
hardware. Combining multiple powerful GPUs and CPUs
in a single host could require specific power supplies and
PCIe layouts on the motherboard. And surely don’t forget
about proper cooling!
• A fast GPU is great, but without a proper CPU it won’t be
able to run at full speed. I have done tests with multiple
CPUs from both AMD and Intel CPUs and found out that
for compute use, the AMDs aren’t really suitable (due to
their NUMA architecture). For most compute workloads,
we needed at least 8 virtual cores per compute VM. This is
where the Intel CPUs excel. The Intel Xeon Gold 6254 is
my current preferred CPU for the job as it has a high clock
speed (3.1 GHz) as well as a great number of cores (18
cores + Hyper Threading). I have tried CPUs with a lower
clock speed as well, but they impacted the GPU utilization
negatively.
• If you have multiple CPUs and GPUs, make sure to
configure them according to the OEM’s best practices. In
one of the first proofs of concepts I ran, I worked with
three NVIDIA Tesla P40s in a single host. With two
compute VMs, everything scaled up pretty much linearly.
As soon as I spun up a third compute VM, the
performance dropped on all three VMs. Apparently, we
oversaturated the PCIe bus because all three GPUs were
capable of running at full speed. Relocating the GPUs in
the host on different PCIe slots solved the issue.

228 VDI Design Guide Part ||


Networking and storage

The world of data science is one that changes rapidly. Due to the
fact it has never been so easy to collect large amounts of data, the
data aspect will impact your network and storage, as well. Not
every data science team works with the same kind of data.
Financial organizations like banks and insurance companies might
work a lot with tabular data, while medical institutes might work
with medical images. The size and the amount will certainly differ
per data science team, so make sure to know what you are sizing
for. One of the teams I have worked with uses medical images
generated from so-called widefield microscopes. Every image that
is created varies between 500 GB and 2 TB in size and is stored on
an extremely fast and ridiculously large NAS. Running ML
training jobs on such data requires different networking hardware
in a host than tabular data that can easily be moved close to a
compute VM. Please consider the following when choosing the
right networking and storage hardware for the job:

• If you are working with tabular data, it might make sense


to figure out a way to bring the data close to or even inside
the compute VM if that will stay within storage
configuration maximum.
• I worked with a team that required insanely fast storage to
run their ML training jobs on. The faster, the better. We
tried different types of flash drives and ended up with
Intel Optane persistent memory. Due to the limited size of
the training data, we used the persistent memory modules
as dedicated datastores. As an alternative, you could also
use them as a caching tier for vSAN or other Hyper-
Converged Infrastructures, but make sure to size the
caching tier correctly, based on your read/write ratios.
• If the dataset is extremely big, you might want to consider
going for low-latency networks with a high throughput.
RDMA-based network cards with a 25 GbE/40 GbE/ 50
GbE/100 GbE are widely available from companies like
NVIDIA (Mellanox) and Intel and might enable you to run
the training jobs over the network. Your compute nodes
will be able to connect to a shared storage and run their

229 VDI Design Guide Part ||


workloads on the shared data. Please remember to talk to
your data science team to find out if their ML workloads
are capable of doing so. They could consider frameworks
like Horovod or Apache Spark to introduce distributed
training or inference over the network. The reason for a
low-latency network is because you want to avoid time-
outs in the ML workloads because the network or storage
isn’t suitable to process the required amount of data.
• The backend storage (like a NAS) needs to support the
same networking speed and latency as your compute
nodes.

VMware vSphere

I think the hardware part is the easiest in terms of considerations.


If you have the right hardware for the job, the sky should be the
limit. Unfortunately, vSphere will currently be your limit. 

The main reason for this is that vSphere wasn’t built from the
ground up to support accelerators, in any form. We can assign an
accelerator to a VM, but what happens if an accelerator becomes
fully utilized? Or what happens if an accelerator fails? In 2020,
VMware announced a tight partnership with NVIDIA to work on
the accelerator support and let’s hope that this will change, indeed.
It would be great if vSphere DRS will be capable of using
accelerator metrics to declare a VM to be a noisy neighbor or to
predict a failure and run a vMotion to move a compute VM from
one host to another. vSphere vMotion currently already supports a
vGPU vMotion, so why not extend DRS support for DRS Initial
Placement and Predictive DRS, as well? We will see what happens,
only time will tell. When looking at the vSphere Layer, please
consider the following:

• Look at the right type of licensing. If you license your


vSphere hosts through VMware Horizon on a per-desktop
basis, it means you can only run desktop VMs and
Horizon-supported VMs (like App Volumes Managers) on
those hosts. A compute VM basically isn’t a Horizon-

230 VDI Design Guide Part ||


supported VM, neither is it a desktop VM. Make sure to
check with your VMware sales team what kind of licenses
are required in your situation to run mixed workloads.
• If you only have Horizon-based licenses, there is kind-of a
workaround. What you can do, is run those compute
workloads inside a desktop VM with Ubuntu and a GPU.
Also, entitle users to it through Horizon. Hypothetically
it’s a desktop now and thus is licensed. It might not be the
most ideal situation because of the lack of support for
certain features, but it will give you a great head start in
still being able to build the concept.
• As mentioned above, vSphere’s native features like DRS
and HA will need some additional help. Take a look at the
script I shared on GitHub to get started on initial
placement and resource scheduling.
• Take a look at the GPU policies that exist in vSphere for
the balancing of your VMs on the physical GPUs. vSphere
can allocate a vGPU profile in two main ways:

Option 1: Spread across all of your hosts, over all of your


GPUs. This will look a bit like this:

231 VDI Design Guide Part ||


232 VDI Design Guide Part ||
Option 2: Consolidate the profiles as much on hosts as
possible and fill every physical GPU until it reaches the
maximum available profiles. This will look a bit like this:

See the following VMware KB article for more


information:

https://kb.vmware.com/s/article/57297

233 VDI Design Guide Part ||


• Size your clusters properly. You need to take spare
capacity into account for DR scenarios or create a magical
red panic button you can push to instantly shut down all
compute workloads and spin up additional desktops if
needed.

VMware Horizon

VMware Horizon doesn’t have a large number of settings to


consider, but still has some things to take into account:

• Make sure to use instant clones where possible. The trick


is to scale virtual desktops up and down as fast as
possible, which is easily done with instant clones. Also,
since they are non-persistent, you just aim at a lower
concurrency outside of peak hours. But still, in theory,
everyone will be able to request a desktop (also outside of
office hours).
• If you are heading for the Horizon-licensing-workaround-
route, you should avoid the rage and anger management
issues I had when building my first Linux base image. I
love to use Ubuntu for compute VMs as many
containerized workloads can run on it with Docker. But,
when using a Linux OS for virtual desktops and wanting
to manually build it, be prepared for a lot of pain and
suffering. It will for sure guide you to the dark side. One
piece of advice that will help, is to download the Ubuntu
virtual desktop fling from the VMware Flings website:

https://flings.vmware.com/horizon-ova-for-ubuntu

234 VDI Design Guide Part ||


Application considerations

Another major mistake I made was to build my application stack


from scratch. This form of self-harm is something completely
unnecessary and can easily be avoided. Before we dive into that,
let me share a little story.

I was working with a team to build an ML platform that needed to


be capable of easily scaling out, based on demand, with self-
service. The idea was to use vRealize Automation and vRealize
Orchestrator to easily let a data scientist request a new vGPU-
enabled VM on which they could do a git clone and run their
Python scripts on. It was my job to build a base image that could
be easily cloned and configured for such an end user.

The application needed the following specs:

• Python 3
• TensorFlow 1.15
• 16 GB of framebuffer
• 8 vCPUs
• 64 GB of RAM

I started with the build process, and these are the steps I took:

1. I created a VM, with 8 vCPUs, 64 GB of RAM, a lot of disk


space, and a vGPU Q profile with 16 GB of framebuffer.
2. As mentioned, the standard OS for running ML platforms
on, is Linux. Selecting a Linux distribution of choice is the
first challenge, but as I had experience with Ubuntu, hence
I chose that one.
3. Next up is choosing the version of Ubuntu. I chose the
latest long-time service version, which was 18.04 (LTS).
4. After selecting the version, we needed to either choose the
server version or desktop version. I chose what seemed
right, the server version (as it contains less fluff than a
desktop UI and apps like a browser).

235 VDI Design Guide Part ||


5. First thing after deploying: Update all packages on the VM
with apt-get update
6. Before you can use a GPU, you need to install the NVIDIA
drivers (which come with a specific CUDA version). This
process failed because you need additional packages to do
this.
7. So, I installed those packages first, which are MAKE and
GCC. After those were installed, the driver and CUDA
libraries could successfully be installed.
8. After the driver was installed, I configured the license
server (as you need licenses for NVIDIA vGPU to work).
9. Next up, I installed Python 3.7 as this is the script
language they wanted to use.
10. As the app requires TensorFlow, I installed TensorFlow
with apt-get install tensorflow
11. CuDNN is the CUDA Deep Neural Network library,
which is required to run deep learning libraries, so that
needed to be installed, as well.
12. After the initial setup, I was able to run a test script to
validate the frameworks. I started Python and wanted to
see if I could import TensorFlow and check how many
GPUs were available. Result of the check was that I didn’t
have TensorFlow-GPU installed. Sigh...
13. I installed TensorFlow GPU and ran the test script again,
this time with success. ☺
14. Now, I wanted to test the application of the data science
team. I started the script and……….. Failed!
15. Apparently, when installing TensorFlow without
specifying a version, TensorFlow 2.x is automatically
installed. In my case, TensorFlow 1.x was required.
16. So, I installed TensorFlow 1.x and TensorFlow-GPU 1.x, as
well.
17. After the installation, I started the app again. Failed!
TensorFlow version 1.x requires a different CUDA
version.
18. So, I Installed the different CUDA version and tested the
app. Another fail. (This started to test my patience…)
19. The CuDNN version I installed was incompatible with the
right CUDA version. So, I Installed the right CuDNN
version and started the app again. Another fail.

236 VDI Design Guide Part ||


20. Apparently, you need to modify the path in which an
application searches for the matching CUDA and CuDNN
libraries. I updated the path for the libraries and tested the
app. Failed! Apparently, the app was not tested on
Ubuntu 18.04 server with all of the matching frameworks.

This three-week process of building, rebuilding, configuring,


reconfiguring, taking snapshots, reverting to snapshots, getting
coffee, getting more coffee, consulting with NVIDIA, checking
stack overflow, googling a bit, googling some more, and trying to
get some sleep in between almost ended up with me finally giving
up on trying to build this. It is proper voodoo. But, again, like
Mike in the 80s/90s tell sell commercials, I thought that there had
to be a better way, right? Ladies and gentlemen, there is.

In order to master the voodoo, it is important to use as many


commoditized platforms as possible. This is where NVIDIA
invested a lot of time and money into a
marketplace that offers complete AI frameworks
for you to download including all matching
frameworks. If you pay a visit to
https://ngc.nvidia.com, you can download large
number of frameworks in all sorts of versions,
completely for free!

So, I ended up with the following process to build a base image


that was capable of what the customer wanted:

• I created the VM with the right specs.


• Next, I installed Ubuntu 18.04 LTS (Yes, still that same
version).
• After the OS, I installed MAKE, GCC and the NVIDIA
driver, and configured the licensing.
• Next, I installed Docker and NVIDIA-Docker.
• To test the app, I pulled the appropriate TensorFlow
container.
• Done. Like, really done. In 4 hours.

Am I going to get those three weeks back? Nope. Did I learn


something in return? Absolutely! And now you have, too.

237 VDI Design Guide Part ||


INTERVIEW WITH TONY FOSTER
My buddy Tony is as passionate about this concept as I am and so
I couldn’t have thought of a better person to ask for an interview
than him. You may know him as Wondernerd, sometimes even
wearing his cape at events like VMworld or a local VMUG. I met
Tony during my first VMworld in Las Vegas as a VMware EUC
Champion. I think we met two or three months later again during
the NVIDIA vGPU Community Advisors program meetup in
Santa Clara. Ever since, we have talked a lot about the topic and
shared some ideas on how to make it better or even productize is.
Tony works in Tech Marketing at Dell Technologies and regularly
shares his knowledge at (virtual) events. I talked with Tony about
the concept and its evolution.

Me: When did you first hear about the idea of repurposing
compute resources?

Tony: The idea of for VDI by day compute by night really took
form for me in 2016, a few weeks after the GPU Tech Conference.
That was my first GTC and it really helped me to synthesize the
concept for repurposing of resources.

The idea was really simple, and at the core of what I’ve been
working on it remains simple. And that is, several years ago,
VMware had been championing any workload can be virtualized.
That means workloads such as high-performance compute (HPC),
machine learning (ML), deep learning (DL), and so forth that have
continued to grow increasingly dependent on GPUs can be
virtualized. And if they can be virtualized then they must adhere
to the same rules as other VMs. This also means that they could
use the same underlying infrastructure as other workloads.

If they (AI, ML, DL, or HPC) could use the same infrastructure, as
say VDI, then all I needed to do was turn the VMs on and off at the

238 VDI Design Guide Part ||


appropriate time. Though, in reality, we suspend and resume
them instead of turning them on and off. With this train of
thought came about the idea for what I like to call spare resource
harvesting.

Now for the history buffs, you’re probably thinking a lot of the
functionality wasn’t available for GPUs until the T4 GPU and
associated virtualization software in late 2017. And you’re right!
That is when it became possible to harvest spare resources,
because that release included the ability to suspend VMs with
vGPUs and release those resources back to the underlying host
along with some important placement and scheduling capabilities.
This allowed me to move it from a hypothesis floating around in
my head to a proof of concept (POC).

Me: Did you immediately think it was a viable concept?

Tony: I thought it had merit because everyone I talked to about


was not able to give me a good reason not to do it. I of course got
all sorts of push back about how no one would ever want to run
their high-end workloads, which have “always” run on dedicated
hardware, on a virtualized platform. I also had many tell me that
the academic teams that ran these large compute cluster and the
VDI teams would never get along or agree to share resources. All
of this just confirmed to me that I was tapping on the metaphorical
wall of “this is how we’ve always done it.” Which meant, I needed
a bigger hammer to make this happen.

I think the biggest confirmation that this was a viable solution


came at VMworld 2018. I had just completed a vBrownBag Tech
Talk on some of the concepts related to spare resource harvesting,
when I had a sales engineer call me from out of the blue asking for
anything I could provide on the topic. That cemented the idea as
something beyond viable and it just needed more people in the
industry to get on board with it.

Me: Since you work at one of the biggest OEMs in the world, is my
assumption correct that you are seeing a bigger adoption of the
concept?

239 VDI Design Guide Part ||


Tony: Not as big of an adoption rate as I would like. I think many
are waiting on early adopters and for a more consumable version
of resource harvesting. That said, the interest in this is pretty
large. Organizations see the potential and crave the results it
promises. And I think we are getting near the point where this
will take off.
With Covid-19 both IT and research teams were upended with
everyone having to work remote. Now we’ve gone a year with
this, and the panic buying of new systems to support remote users
has worn off, and as a new year of budgets are planned, I expect
that this will help accelerate adoption of resource harvesting.

So, at this time it has not been adopted as fast as I want, but
looking at all the indicators, hang on for a wild ride in the coming
years.

Me: Do you see it being adopted in specific industries?

Tony: Currently I see it being consumed by the ones that need it


the most. Those organizations actively involved in tackling Covid-
19. Many have restricted limited IT, research, and lab systems;
resource harvesting has allowed them to dynamically consume
those resources where and when they need as opposed to having
silos of resources that may be sitting idle when they could be
helping another part of the organization.

I expect to see adoption in many of the areas, where users are now
remote and VDI/EUC is being used. They have all of these
resources that are being consumed for 8 to 12 hours a day and then
just doing nothing plus they have all this new (big) data for what
the “new normal” looks like, and for them to have and continue to
hold a competitive advantage they will need additional
computing, and I think resource harvesting is the best source for
those computing needs. I think the ones I’m describing here will
be larger retailers and mid-sized enterprises.

I don’t expect the larger enterprises to happen till later, because


there are more significant silos of infrastructure that will prevent
early adoption. There may be a few industries like large IT
organizations that will be part of the first wave, but I definitely

240 VDI Design Guide Part ||


don’t expect to see the Fortune 500 as early adopters of this
technology.

I expect it will be like how virtualization was adopted. Someone


goes to IT and says we need to do “X” and IT says, “we can’t do
that, we don’t have the funds.” Then someone mentions the
resource harvesting they’ve been doing for these small third tier
non-critical IT applications. So, they do “X” with resource
harvesting instead of buying a whole new set of systems.

Me: Do you think other GPU vendors will jump into this space?

Tony: As of this writing only NVIDIA has engaged in this space.


The two other big players in the GPU space, Intel and AMD,
haven’t shown much interest in resource harvesting of GPUs. I
think that’s not as significant as it may sound. I suspect that the
greater significance for those two vendors and many others as well
will be resource harvesting of other processing units like FPGAs.

I suspect that will open up some additional opportunities for


processing unit types and allow vendors to expand their market
reach. Though, I also wonder, if this is not done soon, if it will be
too little too late and relegate other processing unit types to
application specific workloads.

Me: VMware fully adopting Kubernetes (K8s) into vSphere will


open up for better scalability and manageability of compute
workloads. Where are you seeing that going?

Tony: I’ve worked with some others in the industry around GPUs
and K8s specifically around scalability. While K8s expose a lot of
potential for many workloads, there are many more that can’t
easily make the transition to containers, specifically VDI. Because
of that I expect we will continue to see VDI as a workload that
shares many similarities with K8s but cannot necessarily be
converted to a true container-based workload.

To me, that means we will continue to have stranded resources in


VDI environments that can be captured for other purposes

241 VDI Design Guide Part ||


provided there is some intelligence to release those when the
primary (VDI) workload needs them.

Ultimately, I suspect it will be several years before we see VDI and


K8s come together for a containerized workspace. However, I also
fully expect to see end user applications become containerized
much more broadly than they are now, which may bring about the
end of virtual desktops as we know them today. Instead, each
application a user consumes is just a secure container, as users
consume the applications more application containers are spun up,
much like how Office 365 works for home users. This would really
remove the need for a true desktop platform which could easily be
replaced by a basic OS on the endpoint.

Me: In terms of VDI by Day, Compute by Night, where are you


seeing that development going?

Tony: Right now, it’s still a science experiment that’s being done in
a few folks’ garages and home labs, with some organizations
seeing potential in it and jumping in. It’s really in its infancy right
now and it’s a fun time to be part of it.

Where I’d like to go with it, and I’m getting there slowly, is to
build a basic packaged vApp for users to download with a basic
API and UI. This will hopefully allow it to move beyond a science
project to something more finished and useable by organizations
large and small.

Once the initial vApp is created I then expect there to be many


rapid enhancements including the ability to have redundant
systems and interface tools for things like SLURM and other
workload managers.

Who knows, maybe one day it will be baked into VMware


Horizon, providing any organization to harvest those spare
resources. We’ll just have to wait and see on that though.

Me: Do you think it eventually can become mainstream?

242 VDI Design Guide Part ||


Tony: Yes, I do think resource harvesting will become mainstream.
If we look back to the start of the current virtualization epoch, the
major problem it solved was freeing up stranded resources. Back
around the turn of the century (wow I feel old saying that) any
time someone needed a new workload you bought a new server,
data centers were growing at an unsustainable rate, budgets and
real-estate couldn’t sustain it.

I think we are in a similar position again today, we can recover all


these resources today, except we are now using bigger silos.
Instead of having a server per application, we have a platform
(physical or virtual) per workload. VDI is its own workload, HPC
its own, AI its own, business critical apps, and so on.

All of these silos will be broken down again and I think resource
harvesting is one tool in the IT admin’s arsenal to do this. To do
more with fewer resources. Just remember to hold on, as this will
be a wild ride.

Me: What plans do you have with the original script?

Tony: I hope to keep maintaining and adding functionality to the


original PowerShell / PowerCLI scripts I created for VDI by day
and compute by night and keep them free for anyone to consume
as they see fit.

Regardless of if this ever becomes common place or not, I’m doing


this because I’m learning so much and I can’t keep that knowledge
to myself. The best way for me to share that knowledge is to make
it available to everyone and allow everyone to ask questions, only
then can this work reach its full potential.

Johan, thanks for the opportunity to share a little about this


project!

If you like to know more about Tony and his adventures as the
Wondernerd, check him on twitter: @wonder_nerd

243 VDI Design Guide Part ||


DATA SCIENCE

ON VDI
A use case closely related to the VDI by Day, Compute by Night
section is the data scientist. Although a lot is already written about
it, I still want to explain a bit more about this use case and how
you would be able to support them on a VDI platform.

Due to the nature of their work, a data scientist is a use case that
could really push the limits of a VDI platform. They work with
data, sometimes a serious amount data (multiple TBs). That data
quite often needs to be analyzed, which they do with applications
they either build themselves or which are slightly adjusted to their
work or industry. They utilize Artificial Intelligence frameworks
within those applications to accelerate the applications on GPU
hardware. And typically, those are consumer-grade GPUs or other
accelerators. In the same project as mentioned in the previous
section, I got to work with multiple different data science teams
and got a pretty good understanding of their goals, their
challenges, what drives them to get to work in the morning, and
how a VDI could for sure help them in getting the most out of their
workday.

244 VDI Design Guide Part ||


AI vs ML vs DL

Before we dive into the details, I think it’s a good idea to explain a
bit about Artificial Intelligence, Machine Learning, and Deep
Learning. These are terms which are quite commonly used within
data science departments and can mean the same to certain people
and completely different things to others.

Let’s start with Artificial Intelligence (AI). AI is intelligence


demonstrated by a machine of some kind. It doesn’t have emotions
or a consciousness, like natural intelligence. There are two main
forms of AI:

General AI – A form of AI that doesn’t have a specific purpose or


goal. It’s capable of doing basically anything you would ask it to
do. A perfect example is J.A.R.V.I.S. from Tony Stark’s Iron Man.
J.A.R.V.I.S. is Tony’s personal assistant, both pro-active (he warns
Tony when something is wrong or needs his attention) and
reactive (he will do something when Tony requests him to).
Although a system like this seems viable, it really isn’t yet. We
simply don’t have the computational power to pull it off, yet. We
do see tiny snippets of the capabilities of J.A.R.V.I.S. in things we
use on a daily basis. Personally, I’m a big fan of Google Assistant
and use a Sonos speaker with the Google Assistant in my kitchen.
For use case like the shopping list or controlling lights and music, I
think it’s really good. Which is a good segue to the other form of
AI.

Narrow AI – This is a form of AI that has a specific goal or


purpose. In the case of Google Assistant, it can convert speech to
text and use the text for the examples mentioned above. Another
great example is a smart doorbell. Some of those doorbells are
capable of running object recognition in the video stream and will
notify you if your in-laws are at the door or your food delivery. If
you’d like to know more about narrow AI, I highly recommend
watching the Netflix documentary about Google’s Deep Dream
platform, which was trained to play the world’s most complex
boardgame called Go. A team built a model and application called
AlphaGo and had the goal to beat the world’s best Go player: Lee

245 VDI Design Guide Part ||


Sedol. The documentary guides you through the process they went
through to build, validate and run the model in production.

AI has a subset, which is called Machine Learning (ML). ML is the


study of computer algorithms that improve automatically through
experience. It basically means it is capable of predicting or even
deciding something without being explicitly programmed to do
so. ML uses an algorithm which has been trained with structured
data. A perfect example is a spam filter. A spam filter is capable of
predicting if an email is spam or not. We as a consumer of a spam
filter have been training ML algorithms for decades and are still
doing that. If you have an email which you consider to be spam,
you hit the spam button in your mail client and done, you have a
trained algorithm. As soon as multiple people mark the same
email as spam, the spam filter now knows about the characteristics
of the email and can use it to predict if new incoming emails are
the same spam or use the same way of spamming you. Emails are
structured data since every email contains the same objects as a
sender, recipient, subject, and body. Based on that data, it will be
capable of detecting spam. An algorithm or model contains layers.
For the type of data, in case of a spam filter, the number of layers
(which represent a number of conditions or possibilities), is
relatively limited. There are also types of ML that require a large
number of layers, with object recognitions in video for instance.
This requires a different approach called Deep Learning (DL).

DL is a subset of ML and is capable of running a large number of


layers and more importantly, unstructured data. Images or video
are a great example of unstructured data. The data can contain
anything: cat pictures and movies, handwriting, medical images,
etc. Due to the nature of the almost endless possibilities and
conditions, a model or algorithm in DL is also known as a Deep
Neural Network. It’s a network of neurons, like your brain, with
connections between them which all have a value assigned to them
called a weight. The weight will be valued based on how the model
is trained. If you train a model to detect cats, dogs and ewoks, you
basically feed a lot of those pictures into the model which you
have labeled up front. The model now knows how to predict
which of the three it could be in case you feed a new picture
through the model to validate it. If you feed a picture of a bear

246 VDI Design Guide Part ||


through the model, it will most likely predict it’s an ewok,
although you know it’s not. This is due to the fact it simply takes
the weighted characteristics it knows are runs a prediction on a
new picture. It might predict it’s 40% ewok, instead of 99% when
running pictures of Ewoks through the model.

Especially DL is highly suitable to run on accelerators such as a


GPU because of the parallelism of the computational capabilities of
such a card. GPUs have thousands of cores and have direct access
to the video memory of the card. Loading the model and dataset
into the GPU will decrease the training time, which for data
scientists is essential.

DATA SCIENTISTS != DEVELOPERS


The biggest misconception about data scientists is that they are the
same as developers. They write code, so they must be developers, right?
Nope! The data scientists I have worked with are able to build
code in Python, but that’s basically it. Things like containers,
CI/CD pipelines and code repositories aren’t that common in the
data science world. This is where I think we as IT people need to
help them. I strongly believe it’s not that hard to educate them in
those things, if we do it in the right way. We need to show them
what the big advantages are and instead of throwing them to the
lions, just show them a little bit at a time. First, show them what
Docker is and how easy it is to pull a deep
learning container from NVIDIA’s GPU Cloud
(https://ngc.nvidia.com) and run their own code
in it. Integrating it with a code repository like
GitHub can also be relatively easy so they can
use features like versioning and collaboration. As
a VDI nerd, I honestly never touched upon these
tools before. I got dragged into it because I wanted to know all
there is to help the customer in the best way possible. The learning
curve wasn’t that steep and if I can learn about those tools, others
should definitely be able to learn it, as well.

247 VDI Design Guide Part ||


In a situation where data scientists need to work on a VDI
platform, they ideally create their code inside the virtual desktop,
commit the changes to GitHub and a process automatically creates
a Docker container with the deep learning frameworks and their
code, and it gets automatically deployed to a container
orchestration platform like Kubernetes (K8s). There are a couple of
reasons why I think this might pose an issue or need extra
attention.

The code that gets created is mostly executed in real-time to work


on a dataset. Quite often, a console session is used to run a Python
script and immediately check the output. Console access to a
container can be challenging if it runs on an orchestration
platform. This can be solved by using a tool like Jupyter Notebook
or Jupyter Lab. These are HTML5-based interfaces that offer access
to the code and can provide the real-time output in the same
browser window.

In general, the code requires a GPU to execute. Although Docker


does support GPUs, K8s needs some love to get this to work. If
you run the container inside a VM or K8s host that has a GPU or
vGPU profile attached, it will surely work. But, if you would like
to run the container on vSphere with K8s natively, this isn’t
supported yet. There are some alternatives like VMware Bitfusion,
which lets you offload GPU acceleration for compute workloads
over the network to a host equipped with GPUs. This does work
pretty well, but as these solutions are constantly under
development, it’s always a good idea to contact your VMware
sales rep to check what your options are.

The idea behind a Docker image is to keep it as small as possible


and let it just run the bare minimum, so it is easy and fast to
dynamically compose when needed. The bare minimum for a
TensorFlow Docker image from the NGC platform is already 6 GB.
Composing does take a bit of time, and because of the large size,
will have an impact on the storage capacity.

The container needs access to the data. Like mentioned earlier, this
can be relatively small, but don’t be surprised if they need access
to multiple terabytes of data which will hopefully reside on a

248 VDI Design Guide Part ||


network-attached storage device. In the case the data is on an
external flash drive (I have experienced this multiple times) you
need to figure out a way to give them access. From the same
experience, I can share that redirecting the flash drive through
USB redirection is a bad idea. I have seen file transfers of 50 GB of
small images between an endpoint and a virtual desktop easily
take half a day (even though the end user was working locally
inside the company network). Taking into account that you need
to have a central location to store the datasets will make it a lot
easier to integrate with an orchestration platform. Like mentioned
in the VDI by Day/Compute by Night section, fast network
connectivity with low latencies is essential to get the most out of
the platform.

Of course, it could be that the organization is already working on


dedicated GPU-enabled hosts or things like an NVIDIA DGX
appliance. Or maybe they are running their acceleration in a public
cloud. In those cases, it might be relatively simple to migrate them
to a VDI. The only thing you probably need to take care of, is again
a fast network between the GPU platform and the location they
build their code from or load the datasets from.

What if you have absolutely no GPU platforms


yet, would like to migrate everyone to VDI, and
want to use NVIDIA vGPU profiles in the virtual
desktop for ML instead?

This is the section with the longest title and also, the one that I
enjoyed a lot in terms of research. Some customers just like a lot of
simplicity or simply don’t have the funding to build massive GPU
clusters to run their data science workloads on. The phrase “but we
already have GPUs for the virtual desktops, so why would I invest in a
separate GPU cluster?” is also quite commonly heard. And what
about “Yeah, but they are now working with cheap consumer-grade
GPUs, so it will only be better if we migrate them to a VDI with GPUs,

249 VDI Design Guide Part ||


right?” These are all great arguments, but in the end it’s very
important to understand the workload you want to migrate to the
VDI platform. ML workloads can vary a lot in terms of resources
usage, hardware constraints, deployment methodology, data,
operating system requirements, etc.
If you are starting with a greenfield environment for your data
science use cases, it’s important to assess the workloads.

• What kind of workload are they running?


• Does it require a GPU or other accelerator?
• What kind of framework (like TensorFlow or PyTorch) are
they running? What version of the framework?
• Does is support CUDA? Which version?
• Are there any specific libraries that need to be used?
• Is Docker supported for the training workloads?
• How big is the dataset?
• How big are the individual items in the dataset?
• Where is the dataset loaded from?
• What kind of network speed and maximum latency are
required?
• How often does it need to be trained?
• How do they train it?
• How long would that normally take?
• Do they need GPUs to run the trained model in an
application (inference)?
• How much framebuffer (video memory) does it require?
• Does the training app require multiple GPUs?
• How many CPUs does it need?
• How much RAM is required?
• What kind of operating system is required?
• What script language do they use?
• Is console access required to the desktop?

Based on the answers, you need to figure out the following things:

• The specs of the desktop VM:


o CPU count
o RAM size
o Disk sizes

250 VDI Design Guide Part ||


o Network speed and latency
• Can they use NVIDIA vGPU or do they need access to a
non-shared full GPU or multiple GPUs?
• In case an NVIDIA vGPU is suitable, what type of profile
can they use?
• Can the VM be non-persistent or not?
• Will they use the data science desktop you are designing
also for their email and main office use?

The gathered information would normally be used for the design


of the Horizon Desktop Pools. In that case, I would do it a bit
differently. My experience with different types of data science
departments from different customers is that they don’t have a
hard requirement for a certain version of a certain Linux
distribution. Nine out of ten times, they just prefer a certain
distribution because they have some experience with it. It could
become a real hassle to run all of the different flavors of Linux, so I
would just focus on recommending them to move to a single one
for all of the separate users you will accommodate. This will avoid
an OS sprawl and will reduce the management overhead as much
as possible. Remember, Linux VDI is serious voodoo and will
introduce complexity. Keeping the complexity to a bare minimum
will save you from those anger management sessions with a coach.

DESIGN CONSIDERATIONS
Supportability is key. Find out which combination of Linux,
VMware hypervisor, VMware Horizon, NVIDIA drivers, and
frameworks like CUDA, TensorFlow and PyTorch are compatible.

VMware Horizon 8 (version 2006) introduced Linux Published


Desktops and Applications for Multi-Session use. This feature will
enable you to just publish a certain application from a Linux
desktop to an end user. If they just rely on a single application, it
might be an option to publish it.

251 VDI Design Guide Part ||


Ideally, you want to run data science frameworks from a Docker
image (as this will reduce even more complexity). Docker uses its
own virtual IP address that might cause network issues for the
Horizon agent. Make sure to specify the IP range used for the
Horizon connection in the Horizon agent configuration file. See
my blog for more details:

https://vhojan.nl/mastering-voodoo-building-linux-vdi-for-deep-
learning-workloads-part-2/

One of the things that VMware vSphere does really well is


resource management, especially for CPUs, RAM, Disks, and
Networking. vSphere DRS, vSphere Storage DRS, Network I/O
Control and Storage I/O Control are all great features which will
help you to avoid Noisy Neighbor Syndrome. The challenge here is
with accelerators. At the time of this writing, VMware vSphere is
not yet capable of assessing the resources of an accelerator and
using it for resource management. I have worked with VMware
vRealize Operations and the NVIDIA vGPU Management Pack
before to fill in this gap, but it still might need some manual
intervention (depending on your situation). One thing you want to
consider is to run your resource management quite conservatively.
Although VMware introduced vSphere vMotion with vGPU
support, it has its challenges. Depending on the framebuffer size,
the vMotion process will cause for short stun (pause) in the VM.
The stun is caused by the absence of a precopy feature in the
vMotion process. In a normal vMotion operation, the precopy will
make sure that the end user won’t take any notice of the VM being
moved to another host. Because the vGPU vMotion isn’t capable of
running a pre-copy, the synchronization between the two hosts
will take a bit longer. The bigger the framebuffer, the longer the
stun times. For virtual desktops with a 1GB vGPU profile, I have
seen stun times of a couple of seconds. With a 16 GB vGPU profile,
this could take up to maybe 20 seconds. Of course, this depends on
the type of hardware you are using. The challenge is that some of
the ML applications that use a GPU can be intolerant to the stun

252 VDI Design Guide Part ||


and will crash. Always validate with your data science teams how
their own applications cope with vSphere vMotions. As a final
remark, I know that VMware engineering is actively investigating
pre-copy algorithms for GPUs. Of course, it’s hard to say if and
when it will be available for customers.

HARDWARE CONSIDERATIONS
The hardware considerations for data science use cases can be
quite similar to the ones of VDI by Day, Compute by Night. Most
of the workloads I have seen so far are really resource intensive on
most of the hardware resources you might want to virtualize.

GPUs

• Make sure you get the right GPU for the job. The right
GPU should be determined based on the framebuffer size,
the number of cores, and its computational capabilities
(such as double precision calculations).
• Determine the best way of assigning a GPU to a virtual
desktop. NVIDIA vGPU will offer the best flexibility but
could cause a bit of overhead. Passing through a whole
GPU will reduce flexibility but might increase the
performance of the accelerated process. I have seen both
but would surely recommend testing the various options
with a proof of concept.
• VMware vSphere Bitfusion might be an option to
virtualize a GPU. In this case, the GPU will be virtualized
over the network. The biggest advantage is that it’s really
easy to share the GPU amongst different end users. The
downside is that Bitfusion only supports CUDA
offloading. This means that if you use a GPU for both
graphics and computations, Bitfusion won’t be an option
(unless you want to use Bitfusion just for computations
and vGPU for graphics). If you want to know more about

253 VDI Design Guide Part ||


VMware vSphere Bitfusion, check out the following blog
post (including a demo video):
https://vhojan.nl/announcing-vsphere-bitfusion-as-gpu-
acceleration-platform/

CPUs

• Like the GPU, it’s important to choose the right CPU for
the job. Your end users will most likely run a virtual
desktop with a larger number of vCPUs than “normal”
users. 6 to 10 virtual CPUs is quite common.
• I have done some extensive testing with both AMD CPUs
(EPYC Naples and Rome) and Intel Xeon CPUs and due to
(again) the NUMA boundaries, found out that when
exceeding 4 vCPUs on an AMD-based system, the
performance will significantly drop compared to the Intel
ones as soon as you scale the number of VMs up on a
single host. More about this can be found in the blog by
Frank Denneman:

https://frankdenneman.nl/2019/10/14/amd-epyc-naples-
vs-rome-and-vsphere-cpu-scheduler-updates/

254 VDI Design Guide Part ||


Another great resource is a blogpost from my good friend
Tobias Kreidl:

https://www.mycugc.org/blogs/tobias-
kreidl/2019/04/30/a-tale-of-two-servers-part-3

• Choosing a CPU with a lot of cores and high clock speed is


essential. I have seen a serious positive impact on the
throughput of the GPU in case of a high clock speed. Intel
Xeon Gold 6254 (3.1 GHz / 18 Cores) is a great example.
Sure, 4 and even 5 GHz CPUs are now available, which
make 3 GHz CPUs look slow by comparison. They are of
course quite expensive. I’d recommend to go for more
CPUs with faster clock speeds and fewer cores if you want
the best performance. Other direct efforts on storage and
network I/O, context switching, and cache performance
also are affected.
• I have seen some teams who build and work on ARM-
based systems, especially when looking at embedded
platforms. Most of the frameworks and applications that
run on ARM-based systems are most likely compatible
with x64 systems, as well. Do make sure to validate this.

RAM

• It’s quite common to work with large RAM sizes. 64 GB


isn’t an exception. With the price of RAM being relatively
low, I don’t think this will lead to a lot of issues.

255 VDI Design Guide Part ||


Storage

• Considering the type of storage can be a challenge,


especially when your end users work with larger data sets.
I would always try to convince them to run their
applications on their desktops but work with data on an
external (fast) network storage.
• In case you do need to run the data locally on the virtual
desktop, size the disks accordingly.
• VMware vSAN can be a perfect solution to run the virtual
desktops on but take configuration maximum into account
and know how vSAN will manage larger disks. I have
seen vSphere admins just shut down a vSphere host
because vSAN was still migrating multiple TBs of data
from one host to another because of a maintenance mode
operation. After the host came back online, the data was
corrupted and the VM had to be restored from the backup.
• If you are using VMware vSAN (or any other HCI for that
matter), consider choosing proper flash devices for your
virtual desktops. Not all flash devices offer the same
performance and endurance for these types of workloads.
Devices can simply fail because of the large number of
reads and writes on the flash device.
• For one customer, the data science team worked with
tabular data. The ML model they wanted to train had
massive dependency on storage. Although the CPU and
GPU were sized correctly, when running a couple of test
runs with their training script, we couldn’t maximize
throughput on the GPU. The dataset was around 60 GB in
size and placed locally on the virtual desktop. Although
everything was running on seriously enterprise-grade fast
flash drives, we found out these were the actual
bottleneck. From an ML expert at VMware, we heard
about Intel Optane Persistent Memory modules which
basically are RAM modules which you can present to ESXi
as a datastore. After adding a set of modules to a host and
moving the VM to the datastore we ran the script again
and were blown away by the positive impact. Training

256 VDI Design Guide Part ||


time of a single epoch decreased by almost 20%. In this
case, the importance of host design and knowing what
kind of application will run on the platform are essential
for a successful outcome.

Networking

• For some time, I was under the impression that 10 GbE


was going to be enough, even for workloads like these.
Boy, was I wrong! Before my first ML project I always
thought that low-latency, high-speed networks were
something for Telco’s. At that project I worked with
microscope images from so-called wide-field microscopes.
A single image created from such a microscope could
easily be multiple TBs in size. Imagine what happens if
you like to run your training or inference job over the
network on such large amounts of data. Challenge indeed.
• Multiple vendors offer network cards which can deliver
such networking capabilities. I have worked with
Mellanox (which was acquired by NVIDIA) and Intel in
this space, which both offer great products. It’s no surprise
that both of them are part of a new VMware project called
Project Monterey, along with NVIDIA and a couple of other
companies. The goal of the project is to introduce smart
NICs which are capable of running part of the ESXi
hypervisor on the actual NIC. This will offload a lot of
networking operations to the NIC, and because of
dedicated hardware for those tasks, it’s very likely that
this will even increase bandwidth, decrease latency and
will open up for more use cases like network-based
security, directly on the NIC.

INTERVIEW WITH JUSTIN MURRAY


During the time I was working on one of the projects where we
needed to virtualize data scientists, I regularly jumped into a new

257 VDI Design Guide Part ||


rabbit hole. Data science itself isn’t really new, VMware isn’t new,
Linux isn’t new, GPUs aren’t new, ML frameworks aren’t new, but
the integrating them on-premises inside a virtual desktop surely
was. The thing with the whole stack is, and you have read that in
the section about VDI by Day, Compute by Night as well, that it
can become a real shitstorm if you don’t really know what you are
doing. The first time I ran into issues, I didn’t really know who to
reach out to, until someone introduced me to Justin Murray. Justin
works in Technical Marketing at VMware and covers the data
science side-of-things. If you ever have the ability to attend a
VMworld conference, be sure to attend one of Justin’s sessions. He
talks with passion about VMware’s AI and ML solutions and will
never disappoint the audience with boring demos. Justin inspired
me a lot with his awesome demos as they are clear, show the true
value of a solutions, and use state-of-the-art technologies. Another
reason for attending Justin’s presentations has to do with his voice.
If you ever watched one of the BBC’s documentaries on nature,
such as Planet Earth, you know that the narrator (David
Attenborough) truly brings value to the documentary. His British
voice takes you on the journey to Africa, or South America, or
wherever the documentary takes you. Justin is VMware’s own
David Attenborough, and offers the best narration of technical
demos, ever!

Me: You have been with VMware since 2007. How did you end up
with VMware?

Justin: Yes, I joined VMware in 2007 and it still feels like it


happened yesterday – I am still learning a lot about the
technology! Before VMware, I was working for several years at HP
in Cupertino and in the Bristol labs in the UK on HP-UX, web
services and J2EE technology with a focus in the last year or so on
financial services customers. In the early 2000’s, I was involved in
helping customers and partners to get the very best performance
out of J2EE and Java for their applications on HP servers. That
same theme became the reason I was hired back then by VMware,
that is, to explain how Java and J2EE could be done on virtual
machines with great performance – which was all pretty new stuff
at that time. I had worked on VMware platforms before leaving
HP, so I knew a little about it – but the learning curve really took

258 VDI Design Guide Part ||


off when I joined the company. I think VMware was about 2500
people at that time, so we knew everyone in the field and in the
business units. I did meet the founders of the company in the
hallways and the memorable encounter there was running
alongside Mendel Rosenblum in one of the company fun-runs
early on. He was faster than I was, I remember. The company
culture was then and is still now very supportive of its people –
allowing me to learn a lot technically as I did my job, which is a
big motivator to me.

Me: How did you get into the ML field and develop your interest
and passion for it?

Justin: I had been working in technical marketing at VMware on


the big data/Hadoop/Spark areas on vSphere and helping
customers and partners to deploy those in a best practice fashion –
around about 2013-2016 or so. I was beginning to hear about ML
and the fact that it would need large quantities of data to feed the
training process – and need that data in a suitably cleansed and
organized form, which is what Spark really excels at, from a
performance and API point of view. So, I started working with
Spark on logistic regression and random forest tests with a
colleague in performance engineering, using just CPUs to drive it
and I became immediately fascinated with the algorithms that
were being used for doing this. Neural networks with GPUs and
other accelerators underneath them took it to a whole new level of
excitement – and even those are morphing now into
“transformers” for the NLP field especially. It was a pretty natural
transition for me from big data to get into machine learning and
spend some time getting to know the terminology used like
“gradient descent”, “loss function”, “back propagation” etc., and
that kind of subject caught my interest right away – just to
understand it a bit more and to be able to explain it to my field
colleagues and early customers, which is part of my role at
VMware. I am still learning a huge amount and I consider myself
to be a beginner in this field and just getting started on
understanding data science, really. Going to NVIDIA’s events has
really taken my education upwards by big leaps and bounds too.

259 VDI Design Guide Part ||


Me: VMware has been focusing on AI and ML for quite a while
now. What have you seen changing in the last couple of years in
this space?

Justin: I think firstly that the interest among enterprise customers


has really grown tremendously in the last couple of years. People
see now that this ML architecture can be useful in all kinds of
applications, from medical imaging to financial fraud detection,
from inventory analysis in manufacturing to recommendation
engines in online stores. We have heard of retailers using ML for
processing the streams of data from cameras in their shops for
example. So, this is coming into the mainstream now – and people
want to deploy it in their existing data centers, without making a
specialized silo out of it and managing it in the same way. I also
think that the hyperscaler cloud companies like Google are
demonstrating to us what can be done at the high end of ML, such
as distributing a workload across many servers and many GPUs –
and some form of that will come down to the datacenter in the
next year or two. So, I think the general move is “mainstreaming
ML” in the industry.

Me: VMware announced a strong partnership with NVIDA to


bring AI to the datacenter. Why is such a partnership essential in
commoditizing AI?

Justin: Yes, you are right, Johan. VMware has been in a much
closer partnership with NVIDIA for over a year now though we
have worked with NVIDIA on vGPUs and other areas for many
years. We recognized that we have to be partners with the biggest
player in the accelerated AI/ML business and that today is
definitely NVIDIA. They sell not only the acceleration hardware,
but they also have vast experience with customers in both HPC
and ML and have built software frameworks, APIs, libraries and
containers that really jumpstart the data scientist and developer in
their work in ML. You can the real treasure trove of tools and
platforms for this at the NVIDIA GPU Cloud (NGC) – a repository
for containers, Helm charts, applications, pre-trained neural
networks and lots of vertical-specific solutions like Clara for
Healthcare or Jarvis for language recognition or Isaac for robotics
applications. What NVIDIA saw in VMware was a leader in

260 VDI Design Guide Part ||


enterprise deployment of other apps. By combining the two
strengths, we wanted to make ML apps a manageable and
understandable workload to the mainstream enterprise DevOps
folks. So that they could deploy ML infrastructure for their data
science teams without having to make it a silo in hardware and
software. But to manage and deploy it like they do with thousands
of other apps. This has the advantage to the data science/ML
community that they can stop spending time on managing and
configuring their own machines and become part of the enterprise
setup that their VMware Admin folks have been dealing with for a
long time. So, it is indeed a win-win for both data scientists and IT
administrators.

Me: Why do you think the technology is ready now, and not, let’s
say, 5 years ago?

Justin: In the past 5 years, I think companies like NVIDIA and


several others have focused their efforts in on ML in a big way
(along with graphics and gaming that was a stronghold area for
GPUs and accelerators in general). This focus has really sharpened
over the last few years. At the same time, massive quantities of
data are now available for training – and the labelling techniques
have really improved for providing value training data that has
accurate labels. You can startup businesses now whose entire focus
is ML operations or MLops, for example – like Algorithmia. So,
subsets of the ML landscape are becoming businesses of their own.
The amount of innovation that has gone into models like deep
neural networks and transformers (like GPT2 and GPT3 from
OpenAI) has shown that ML can really be very accurate and
impressive in predicting, analyzing and even synthesizing new
data. So, this all comes together to make a really energetic and
powerful move in the industry, I believe.

Me: What is the limiting factor in AI at this moment? Is it


computational power? Or something else?

Justin: At the very high end, teams that do research work on


neural network models like GPT-2 or GPT-3 have thrown a huge
amount of processing power at the problem, such as thousands of
GPUs. But that is not the common case that we see in enterprise

261 VDI Design Guide Part ||


customers. The limiting factor at the very high end of the industry
is how much acceleration power can I afford to dedicate to these
networks. Enterprises on the other hand may have a much smaller
set of accelerators to use for their applications, such as dozens or
so. So, I think a key obstacle for enterprise folks is how do I design
an adequate ML infrastructure from what I have – and how do I
monitor it to make sure I know how it is responding to business
traffic. I think we need to master a systems design approach to
this, where we understand sizing, where bottlenecks may appear
and how we make good use of the accelerators we have in-house
before we purchase more of them. This will mean that we have to
learn from our first deployments of apps that use ML in
production and get some best practices for doing this organized. I
think that is the limiting factor today for the people who are
adopting it now – along with how to do change and release control
over multiple models, datasets and versions all at once – the field
of MLops, in other words.

Me: Do you think an AI-driven supercomputer like Iron Man’s


J.A.R.V.I.S. or Skynet from The Terminator movies, who can
basically do anything will ever happen?

Justin: I think the intelligence is fairly basic today in AI and is way


behind human comprehension and intelligence. But work is being
done in the research labs to allow machines to assist us as humans
rather surpass us in capability and I think that work has great
promise. I don’t think the machines are going to be super-human
versions of us for quite some time to come, if at all.

Me: If someone is new in this space and would like to get a jump-
start. What resources would you recommend?
Justin: Well, if you are interested in the programming side of
machine learning, there is a wealth of tools and libraries out there
to download and learn about ML. You can get hold of TensorFlow
and PyTorch with all their associated libraries fairly easily and run
them on your laptop or desktop. One flavor of TensorFlow runs on
mobile phones! There are also lots of example applications around
to look at and try them out. If you are at the very beginning, I
would look at the TensorFlow Playground application online just
to get a feel for what the various parameters of neural network are,

262 VDI Design Guide Part ||


like the learning rate, or the activation function that fires at the tail-
end of each neuron. What I found intimidating at the beginning
was the actual terminology used in the field. So, I would search
online to find out what the term “Loss function” or “stochastic
gradient descent” mean – and it is amazing how much you can
learn just by doing that and following the links to other terms. Pick
up Python if you are interested in programming and you can find
small samples online to play with.

Me Let’s jump over to your demos. In the introduction, I


mentioned they never disappoint. I even heard a rumor that you
were responsible for the demo that Jensen Huang and Pat
Gelsinger showed during the announcement of the partnership
during VMworld. Was it different for you to build a demo which
was going to be showed to tens of thousands of people?

Justin: Well, a whole team of people was involved in building


various demos for the VMworld event, so I was just one member
of that team. We already had powerful ML platforms like the Clara
healthcare framework from the NVIDIA GPU Cloud (NGC) to
take and re-use, as well as the infrastructure to support them like
virtualized GPUs or vGPUs. So that was a useful experience in
using other folks’ pre-trained models to do inference on medical
images, for example. The NVIDIA team was key to building the
keynote demos and they did a great job on that, as well as creating
and supporting the ideas for them. So, this was really a joint effort
on the part of both companies. There are really impressive and
visual demos from these teams including the Intelligent Video
Analytics (IVA) one from NVIDIA that shows recognition of
models and makes of cars in urban settings from cameras, as a
potential helper for smarter cities, for example. We had a lot of
good examples to work with!

Me: For anyone who would like to build their own effective
demos, what three takeaways would you have?

Justin: Just some guidelines here – there are outfits that will teach
you how to build effective demos

263 VDI Design Guide Part ||


1. Plan your demo out first, with a written storyboard - so
that the most effective *main point* comes early in the first
30-60 seconds. It may be a business-related point or a
technical one but get it illustrated right when you start out.
You don’t need to go through the login process and follow
a linear pattern through your software to get to the end.
You can jump straight to the exciting part and skip the
intervening steps, e.g., “here is how the GPU is helping the
workload go very fast in a moving graph form”

2. Keep the whole demo very short – stay within 2-3 minutes
if you can

3. Make it as visually appealing as possible – not just busy


user interfaces that have a lot of text on them – but
pictures of other objects

If you like to know more about Justin, follow him on twitter:


@johjustinmurray

264 VDI Design Guide Part ||


GAMING ON VDI
(REALLY?!?)
F*ck Yeah!

265 VDI Design Guide Part ||


You must be thinking, “Gaming on VDI, seriously?!?” Yes, I’m as
serious as one can be. And I’m not the only one. As mentioned
earlier in the book, Microsoft, Google, and NVIDIA have invested
heavily in a gaming platform that runs the actual game inside a
virtual desktop. If those companies can do that, there must be a
use case, right?

Funny enough, they all claim the primary use case is user
experience (the game will run closer to gaming servers) and even
gamers with a lower bandwidth (+/- 10 MB) should be able to run
games in full HD with a good framerate. The reality is that the
choice to offer those platforms is really financially driven. Look at
Microsoft. Lots of people own an Xbox. They can play games on
the device locally. Microsoft already earns money on selling games
(as they claim that the consoles are sold without profit) to those
Xbox owners. The whole idea is that they can offer the same games
to other people, as well, as you don’t need an Xbox console to
stream their games from the hosted gaming platform. At the time
of this writing, an android tablet or phone with an Xbox controller
is sufficient. In 2020, almost 75% of all people in the world who
own a smartphone, have one powered by Google Android. Sure,
not all of those phones are powerful enough to run those games
with a comparable performance to an Xbox console, but you get
the point. Billions of people are potentially new customers for
Microsoft and can stream games without buying them, but rather
just pay on-demand. And it’s not just games from Microsoft. You
can play all sorts of games on the platform, just like with Google
and NVIDIA (who don’t even develop games). This brings another
interesting shift to the table, a financial one. Gaming studios
normally invest in designing games for multiple platforms: Xbox,
PlayStation, Windows 10, macOS, Android, iOS, etc. Especially for
Android and iOS, Google and Apple earn money for every app (or
game in this case) that gets sold in their app stores. With the
remoted gaming platforms, that financial transaction is moved to
the platform owner instead. This is also the reason why (as of this
writing) Apple still doesn’t allow for a game streaming app in
their app store yet.

Another advantage for gaming studios is that they don’t have to


develop different distributions for the games, as they just have to

266 VDI Design Guide Part ||


develop it for the streaming platform. Platform independency is of
course also one of the benefits of a remoted application through
VDI. The streaming client (in our case the VDI client) is the only
component which needs to be installed on the client device, and
that’s it.

I can imagine that you might be a bit mistrusting about this, but
please don’t forget that gamers are one of the pickiest and most
critical users you can imagine. They enjoy gaming and heavily
invest in gaming PCs with powerful GPUs and fast CPUs. The
smoother the game runs, the better it is for the gameplay. And the
faster it runs, the lower the latency will be. And this is key.
Latency in a first-person shooter, for instance, will probably get
you killed easier. It’s that simple.

Now, if we can offer a game through VDI to a gamer, why


shouldn’t we be able to do the same to less-picky end users? ;)

I asked this exact question to myself a couple of years ago. I was


working with a certain customer as an architect in an early stage of
a big VDI project. The customer already invested in licenses prior
to my engagement as an architect, so I needed to find a way to
satisfy the requirements of all of the individual use cases of the
customer. As the project was mainly driven from a technology
perspective and not a business perspective, there was a lot of
resistance. They really wanted to manage a non-persistent VDI for
all, while the business had the misconception that VDI
automatically means you have to take a step back in terms of user
experience, user privileges, and performance. Most of those
misconceptions are created as a result of the IT-centric workspace.
But a VDI can surely fit into the user-centric workspace or even
into the digital enterprise; you just need to design it with that
strategy in mind. It was my task to take those misconceptions
away, and so I did.

I like to take the art of the possible into projects. Not just to show
the potential of a solution, but also to see if certain aspects of such
a futuristic showcase might actually be viable for a production
version of the solution. Building awesome showcases is one of the
things I like most about my job at ITQ, especially showcases which

267 VDI Design Guide Part ||


give the audience a smile and really show the potential of a
solution. Car manufacturers do the exact same thing with concept
cars. With their concept cars, they try to inspire potential buyers of
their production cars, but also show what their design strategy is
for the next couple of years.

In the case of this customer, the misconception mainly existed


within their core applications: single-threaded, graphically intense
applications with a large footprint and an update cycle of multiple
updates per month. We wanted to run a proof of concept with
their applications, but due to a time constraint we weren’t allowed
to do so. Instead, we figured out if we could find an application
which had the same characteristics and quickly thought of a game.
I’m not the biggest fan of first-person shooters (just because I suck
big time at them), so we went for the F1 game I mentioned before
in earlier sections, instead. It is just as latency sensitive, and
heavily depending on graphical performance, and with the extra
complexity of a weird peripheral. To run a racing game properly, a
steering wheel and throttles are really required. It really enhances
the user experience when playing, but it also introduces challenges
as those peripherals were never created to be remoted through a
connection protocol.

I never read manuals, so we started the build by just doing what


we thought was best. And we didn’t run an assessment of some
kind to benchmark the game, we just used the defaults that came
with the game. Based on those specs and a bit of hardware we had
lying around, we built a VDI box that would probably be able to
run it. Ones again, the assumptions proved us wrong. Initially, we
built a system including the following specs:

• Dual Intel Xeon E5-2660 V3


• 128 GB RAM
• Intel Optane NVMe-based storage
• 2 x NVIDIA Tesla P4

Just for the sake of the experiment, the idea was to run the game
on a fast local datastore, so we went for an NVMe drive. The Intel
Xeon E5-2660 V3 has a base frequency of 2.6 GHz, which to us
seemed enough.

268 VDI Design Guide Part ||


For the first test, we built a VM with the following specs:
• 4 vCPUs
• 16 GB RAM
• 100 GB virtual disk
• P4_8Q vGPU profile
• Windows 10 1809
• F1 2018

Directly after the build, we installed Steam and downloaded the


game. Without any proper tuning of the connection protocol, or
tuning at all for that matter, we launched the game. It started
relatively quickly and had a decent UX….. until we got into the car
at the start of a Grand Prix. As soon as the number of frames
increased, the performance and UX dropped immediately.
Another thing we noticed was that the steering wheel had a ton of
latency. The latency was so high that you had the idea of being
drunk behind the wheel. This sucked.

Right, so first thing we did was to run some performance monitors


to collect metrics. The first thing we noticed was that the CPUs
were constantly spiking. So, we added 4 more vCPUs to the VM
and ran the game again. The performance was a lot better, but the
UX still wasn’t what we were expecting. We had another go at the
metrics. The next thing that we noticed was that the GPU was
being utilized at 70% maximum. This was weird. We were
expecting the game to fully utilize the GPU, but it didn’t. The VM
was set up with a vGPU profile, so maybe this could be due to the
scheduler of the GPU. So, we changed the scheduler to a fixed
share to make sure the VM could get all the resources it needed.
This time it had just a little impact to the performance, but it was
marginal. Could it be that vGPU introduces a bit of overhead? To
find out, we directly attached the GPU to the VM with PCI
Passthrough. In this case the entire GPU is directly passed through
to the VM without any overhead of virtualization. Again, a
marginal improvement of the performance and UX, but still the
GPU wasn’t fully utilized and kept running at 73%. What could it
be?

269 VDI Design Guide Part ||


After some research and more troubleshooting, I found out that
the base frequency of the CPU we used might be the issue. As a
result, I replaced the CPUs with Intel Xeon E5-2667 V3s. The main
reason being that these CPUs have a base frequency of 3.2 GHz.
We noticed the difference directly after the first launch. The UX
was a lot better, the responsiveness was nearly perfect, and after
checking the metrics again, we found out that the GPU was now
being utilized at about 98%. Apparently, the game had some
single-threaded processes, and when combining that with GPU
acceleration, it can negatively impact your user experience just like
we experienced. Although it was a lot better, we weren’t done yet.
We still needed to tweak the connection protocol to get the most
out of the whole UX.

In case you are unaware of the impact of the connection protocol,


it can make or break a VDI. Connection protocols on average are
optimized for most task worker or office worker use cases: full HD
screens, average display quality, maybe a little bit of video to
display every once a while, and enough compression to support
most types of connections. For Blast Extreme, there’s no difference.
Of course, we also had the opportunity to use PCoIP, as well, but if
you have the ability to use an NVIDIA GPU that in addition is
capable of offloading encoding to the card, it could have a positive
impact on the UX. PCoIP can offload encoding to hardware, but
you need specific Teradici Apex cards for that (which are
discontinued). Without such a card, vCPUs will be used for
encoding.

When looking at the optimization of the connection protocol, there


are a number of key principles you need to take into account:

• You need to offload as much of the graphical process to


dedicated hardware as possible. The more you utilize
dedicated accelerators, the lower the latency will be.
• You need to minimize compression where possible.
Compression reduces image quality and has a negative
impact on the CPUs, as well. Less compression means
fewer CPU resources are required.
• You need to make sure enough bandwidth and low
latencies are available. The lower the latency, the better it

270 VDI Design Guide Part ||


is for the UX. <40 ms should be a maximum if you like to
play on 4K, <80 ms is a maximum in case of FullHD.
• Games easily run at 120 frames per second on a gaming
console or locally on a gaming PC. Make sure to remove a
framerate limiter.
• Make sure the endpoint is powerful enough to decode the
video stream. Sure, it sounds awesome to use a Raspberry
Pi 4 to run a VDI session over 4K and play games, but it
simply doesn’t work. Every frame that’s being decoded,
has a certain size. It could easily be 20-30 MB in size; try to
process that on a Raspberry Pi 4, and also being able to
return steering wheel information to the other side
without latency. It simply won’t work.

So, how do you translate the key principles into Blast Extreme
optimizations? Blast Extreme has a large number of optimization
settings, but just some are necessary when looking at high-quality
use cases which require high FPS. The following list is a good
starting point. You can easily add the settings to the registry of the
virtual desktop or set them in a GPO with the Blast Extreme
ADMX template. The full registry path is:

HKEY_CURRENT_USER\SOFTWARE\VMware, Inc.\VMware
Blast\Config

These are the settings to start with:

• Enable H.264 as the display protocol codec. H.264 offers


the possibility to use a GPU as an accelerator to encode on
the server side, and all modern endpoints with a GPU can
offload the decode to hardware, as well. The following
settings are enabled by default, but just in case they aren’t,
these are the values:
o Registry setting: EncoderH264Enabled = 1
o Registry setting: EncoderNvidiaH264Enabled = 1

• Adjust the quality to the highest possible, with the least


compression. There are two main values that adjust the
Quantization Parameter (QP) for Blast Extreme. The QP

271 VDI Design Guide Part ||


values determine the quality of the frames, based on a mix
of bitrate and frame size. There are two main value to
adjust, the minQP and maxQP. The minQP determines the
highest quality/least compression and the maxQP
determines the lowest quality and max compression. If
you want to play a bit with these settings, start with a
minQP of 10 (which is the highest value) and a maxQP of
51 (lowest value). Slowly adjust the maxQP to a value of
10 with steps of 5 (51 – 46 – 41 – 36 – 31 – 26 – 21 – 16) and
end with 10 instead of 11. You will for sure notice a
difference in image quality, bandwidth consumption, and
CPU utilizations. In our case, we went for a minQP of 10
and a maxQP of 21, which offered a perfect balance of
quality, compression and CPU utilization. We used the
following values:
o Registry setting: H264minQP = 10
o Registry setting: H264maxQP = 21

• Blast Extreme can encode 60 frames per second. Increasing


the FPS from 30 (default) to 60 will probably double the
bandwidth and has an impact on the endpoint (as it has to
decode more frames), but it will increase your UX. You
can change the FPS with the following value:
o Registry setting: EncoderMaxFPS = 60

• There is another setting which impacts compression and


bandwidth. The Maximum Screen Bandwidth setting
configures the maximum bandwidth which can be used to
transfer screen contents. This setting is 6,200 Kbps by
default. Increasing it to 100,000 will ensure enough
bandwidth can be consumed to run the remoted sessions
from a video perspective. You can adjust the following
setting to enable this:
o Registry setting:
MaxBandwidthKbpsPerMegaPixelSlope = 100000

• Blast Extreme can either connect over TCP or UDP.


Because of the nature of TCP (it’s basically two-way traffic
with an acknowledgement that the message has been
received), Blast Extreme uses TCP in case it detects there is

272 VDI Design Guide Part ||


enough bandwidth and little latency. TCP will result in the
least number of dropped frames and will ensure the best
display quality. UDP works a bit like one-way traffic. Most
big video streaming companies use UDP as it just sends a
lot of frames, and it doesn’t really matter if a frame is
received or not. If they send over 120 frames per second
and you receive 118 frames, you hardly notice it. Also, it
has an impact on the circuits and fabrics they use, as well.
UDP causes for less resource consumption on the network
level, if directly compared to TCP. We have done some
tests and using UDP instead of TCP by default can have a
positive impact on both bandwidth consumption and
resources (CPU) consumption. I would recommend testing
it for yourself. The following setting can be used to force
UDP as network protocol (it’s by default enabled):
o Registry setting: UdpEnabled = 1

The above settings are a good starting point and optimized for
gaming and high-quality multimedia use cases, but there is
another option if you want to push it even further. You have the
option to go for a different codec called High Efficiency Video
Coding (HEVC). This codec increases the display quality and uses
H.265 instead of H.264. Switching over to H.265 can also have a
positive impact on bandwidth consumption, but it requires an
endpoint that is capable of decoding it, as well. If you have such an
endpoint, please note that decoding H.265 will probably require
more resources on the endpoint.

If you want to enable HEVC, use the following registry value:


o Registry setting: EncoderHEVCEnabled = 1
o Registry setting: EncoderNvidiaHEVCEnabled = 1

All of the settings mentioned can be altered in real-time. Which


means, if you adjust them in the registry while being in the
session, you should see the impact in a couple of seconds. There
are a couple of tools that could possibly help with the adjustments.
Take a look at GPUprofiler if you like monitor in real-time what’s
happening:

273 VDI Design Guide Part ||


https://github.com/JeremyMain/GPUProfiler/

274 VDI Design Guide Part ||


In case you like to adjust the connection protocol metrics through a
UI, take a look at Remote Display Analyzer:

https://rdanalyzer.com

If you want to find out more about Blast Extreme and how to
optimize it for your use cases, check out the following guide:

https://techzone.vmware.com/resource/vmware-blast-extreme-
optimization-guide

I mentioned the endpoint a couple of times. In some of the demos


and sessions I presented about this topic, you can see me using a
really cheap thin client with a Celeron J1900 CPU from Atrust. We
capped the game to FullHD, so the thin client was capable of
decoding the video stream without any noticeable performance
penalties, but when playing in 4K at 60 FPS, you need something
different. Because we still wanted to show massive performance
with a tiny endpoint device, we used Intel NUCs, instead-- an Intel
I5-powered version with an integrated Intel HD graphics card. In
addition to the hardware, the operating system of the endpoint
also seemed to have impact on the UX. We tried both Ubuntu and
Windows 10 and ended up with Windows 10 because of a
noticeable positive difference in the performance, but more
importantly with the responsiveness of the steering wheel. The
steering wheel responded a lot better and had almost no noticeable
latency when playing. With Ubuntu, we had a noticeable delay
while playing and apparently this is due to the fact that Linux
drivers aren’t really optimized for this. I’m unsure if this was the

275 VDI Design Guide Part ||


only cause or that VMware Horizon Client could have impacted
the performance, but it killed the game play.

The steering wheel mentioned is a Logitech G920, which includes


both a steering wheel and pedals, and is connected through USB to
the endpoint.

To give you an idea, to reach the stage we were in at this point, it


took us multiple weeks of building, tweaking, testing, breaking,
rebuilding, more tweaking, more testing, more breaking, more
rebuilding, and finally having a working game that performs
seriously well.

It’s quite hard to show the result in a book, but one of the coolest
things happened a few months after we finalized the build. I’m
part of a team who organizes a yearly EUC conference called
VMware EUC TechCon (vEUC TechCon). The team thought it
would be awesome to have a Playseat at the conference so people
could sit down and experience the performance themselves. One
of those people was my friend Brian Madden, who had just moved
to VMware and was about to present the keynote at our event. He
played a couple of laps on the simulator and was amazed by the
UX. There’s a short clip of him playing on YouTube where you can
see this for yourself:

https://youtu.be/FzAwvqOM_gc

Here is another short clip from myself and a colleague competing:

https://youtu.be/kd77JdfZZtY

276 VDI Design Guide Part ||


DESIGN AND BUILD

CONSIDERATIONS
Building a new VDI platform is mostly an IT-driven project. I
always like to have a little fun while working on such a platform.
One of my colleagues once built an Unreal Tournament ThinApp
package, just to test Horizon, GPUs, and Instant Clone pools in a
massive multiplayer session with the entire IT team. If that’s not
an example of having fun in such a project, I don’t know what is!

If you want to build something like this for yourself, please


consider a few things:

• Most games behave differently within the virtual desktop.


It could easily mean that you need to adjust virtual
resources which have been allocated to the VM. Use tools
like perfmon to monitor the virtual machine in real time.
• Not all graphical APIs and versions of APIs are capable of
running on datacenter-certified GPUs. I have tried games
which could run on different APIs such as OpenGL,
DirectX, and Vulkan. The game can run differently in
either of these APIs, so take that into account. One game
might run smoother in Vulkan while another runs
smoother in OpenGL. There’s a great article on the
comparison of modern graphics APIs, which can be found
here:

https://alain.xyz/blog/comparison-of-modern-graphics-

277 VDI Design Guide Part ||


apis

• vGPU-capable cards aren’t really designed for gaming, but


they can do a really good job. Keep this in mind. A
Quadro RTX8000 will for sure have a nice UX when
playing a game with real-time ray tracing, but a GeForce
RTX 2080 (which is from a similar architecture) will
outperform the Quadro card.
• Don’t forget to get the appropriate vGPU licenses from
NVIDIA. You will need Quadro support to be able to use
those graphics APIs, which is included in the NVIDIA
RTX Virtual Workstation (vDWS) license. If you aren’t
sure your game is capable of being virtualized, you can
actually request 90-day trial licenses from NVIDIA by
using the following URL:

https://www.nvidia.com/en-us/data-
center/resources/vgpu-evaluation/

• After installing the vGPU driver in VM, you can optimize


the driver to a specific application or type of workload. I
would try to select a type of workload first (such as game
development). I have seen that the profile doesn’t always
have the expected result, so try a few different profiles to
see what the impact is.

THE FUTURE OF GAMING ON VDI

278 VDI Design Guide Part ||


It’s hard to predict the future of gaming on VDI. As long as game
studios build traditional applications, there is a use case for
remoting such a game. I do think that the necessity of remoting
those games to tackle latency issues (like Microsoft, Google, and
NVIDIA are doing) will be history when 5G will be generally
available. In early 2021, 5G is already available in most urban areas
in The Netherlands, which offers insanely fast wireless
connections with lower latencies. In most cases, it’s even faster
than regular DSL or Cable-based Internet connections. One of the
other values of remoted gaming platforms might become even
more important: instant access to games. My good friend Frank
Denneman (Chief Technologist at VMware) posted the following
tweet the other day:

People don’t want to wait for services anymore. That’s why self-
service has such a big impact on society. While elderly might have
a hard time using self-service portals from organizations like
utility companies, the government, or financial companies,
millennials basically grew up with this and can’t imagine a world
without. The same holds true for a consumption-based model. I
still own hundreds of music albums because I love to listen to

279 VDI Design Guide Part ||


music the old-fashioned way (either on vinyl or CDs). I still do this
to support artists who spent a lot of time creating music that
inspires me. As music stores have disappeared from most
shopping centers (at least in The Netherlands), the easiest way to
listen to music is through services like Spotify, SiriusXM/Pandora,
and YouTube. Again, this is based on instant access and payable
based on consumption.

I guess that this will be extended to gaming even more. At the time
of this writing (March 2021) there is a major shortage of GPUs due
to various reasons. The price of Bitcoin and other cryptocurrencies
exploded (which made it profitable for miners to start mining
Bitcoin and others again), the pandemic has caused production
delays in China, and the introduction of raytracing in cards from
both NVIDIA and AMD has caused an explosion in demand for
GPUs. I have seen prices double on the black market and at sites
like eBay. Why would you still invest thousands of dollars if you
could just run a game as well on a cheap endpoint with fewer
resources? You then don’t have to replace your GPU every two
years because your favorite game requires it. And best of all, you
just pay for the remoted game services as soon as you start
playing.

I’m guessing this development will take another jump when


talking about Virtual Reality (VR). I have been part of a pilot
group to test the early version of VMware’s Project VXR, a
platform to remote apps to a VR headset. More about that project
in the next section. ☺

Like every section, I wanted to include an interview with someone


who is tied to such a topic. In this case, it’s not one, but two
interviews, the main reason being that I wanted to talk to someone
who has been one of the first to build a VDI platform to run games
(and inspired me to do the same). This person is Christian Reilly,
former CTO of Citrix.

The other interview focusses on the actual production use case of


gaming on VDI. Early 2020, I got an email from Scott Forehand,
who works as a Senior Infrastructure Engineer at Topgolf. He
wanted to let me know that with a bit of help I gave him earlier

280 VDI Design Guide Part ||


(tweaking and tuning of VMware Horizon) they managed to
release a new game at Topgolf, fully accelerated by VMware
Horizon.

INTERVIEW WITH CHRISTIAN REILLY


Christian Reilly isn’t just the Vice President of Technology Strategy
(and former CTO) of one of the biggest EUC vendors in the world
(Citrix), he is also a visionary, one of the most accessible thought
leaders I know, and just a genuine, nice guy. I didn’t have the
pleasure to shake his hand in real life yet (due to the travel ban),
but I did get the opportunity to talk to him during an EUC Digest
podcast recording, which was a fun conversation.

I already mentioned that Christian was involved with building an


F1 simulator when he was working at Citrix (for the first time), but
this wasn’t the only reason for the interview. Christian and I have
an unresolved feud on Twitter over who is faster on a remoted F1
simulator!

281 VDI Design Guide Part ||


282 VDI Design Guide Part ||
We haven’t been able to resolve it yet, but nevertheless he is the
perfect guy to talk to about this topic.

Me: When was the first time you got introduced into remoting
technologies such as those from Citrix, VMware, and Microsoft?

Christian: I'm going to think back here and say that the first time I
encountered Citrix technologies was in the early days of
WinFrame - so probably 1996. I had joined Bechtel in 1995 and was
part of the Infrastructure Engineering team in the UK. Even back
then, Bechtel had a lot of locations due to the engineering &
construction project based nature of the company and so we used
WinFrame to centralize certain applications because even on an
office LAN, some of them didn't work very well (as they had been
designed to run literally on a local machine). Of course, like most
other enterprise companies around that time, we were big on
Microsoft technologies, so I got introduced to Windows NT and
the TSE (Terminal Server Edition) around a similar time period,
maybe a year or so later. I guess my real love of Citrix inside of
Bechtel came in around 2007/2008 when we rebuilt our entire
global infrastructure (for a 60,000 person company) and delivered
pretty much all our apps on a top-to-bottom Citrix stack. That
became a really cool story for both companies. Hard to believe it
was 14 years ago.

Me: When do you think those technologies came to a level in


which it was possible to remote games? Was that purely because
of virtualized GPUs?

Christian: I think it was a combination of things. First, yes for sure,


there was quite literally an innovation explosion in graphics
capability. Virtualized GPUs were one thing, but at the same time,
we had GPU Pass-Through which was also very important.
Second, the remoting technologies themselves continued to evolve
at a fast pace - things like HDX 3D Pro and being able to squeeze
faster and faster frame rates made the scrolling and overall fidelity
much more acceptable from the end user perspective. Third, and
finally, there's the core technology in the games themselves -
companies like Unity continued to push the art of the possible in

283 VDI Design Guide Part ||


their 3D gaming engines and that was also a massive part of how
the opportunity evolved.
Me: Let’s talk about the F1 simulator you built at Citrix. Why did
you build it in the first place?

Christian: Why did we build it? Because we could! All joking


aside, there were a few things happening at the time. Citrix had,
back then, just signed their first Innovation Partnership with Red
Bull Racing and in the Red Bull factory in the UK there was a
simulator rig in the mezzanine area of one of the buildings that
was Red Bull Racing branded. Obviously, it wasn't a professional
sim and wasn't for the drivers to use, but it looked really cool. I
tried it out and quickly established that it was running the
Codemasters F1 game, powered by a Sony PlayStation. The rig
itself was just a branded Playseat and, of course, I just wondered if
we could build a solution out of Citrix products and remote the
Windows version of the same F1 game.

Me: Was this just a fun showcase for Citrix, or did it have a use
case, as well?

Christian: It was actually a little bit of both. One part of the


solution that was, quite literally, brand new - and, I don't know if
many people knew it was used - was the Workspace Hub. As
many people will know, the Workspace Hub was based on a
Raspberry Pi device and the performance of that little thing, along
with the back-end components, was really amazing. We were
showcasing it all for sure, but the use case was really for anything
that had a 3D workload. There wasn't much demand for gaming
back then, but there definitely was for many different kinds of 3D
workloads - including in engineering spaces like the one I had
spent 20 years in, so I suppose it was a combination of showing off
the cool gaming / F1 capabilities, but with a very serious set of
other use cases in mind, too.

Me: Did it just work out of the box, or did you have to do some
tweaking and tuning to get the best user experience?

Christian: The Dell pizza box hardware, Citrix Hypervisor, Citrix


Virtual Apps & Desktops and NVIDIA Tesla P40 were pretty

284 VDI Design Guide Part ||


much all out of the box. The USB redirection for the connection to
the steering wheel, etc., just worked fine, too. There were some
configurations to the graphics policies, of course - as you might
expect - tweaking the display memory settings and such, but there
was really very little we had to do in terms of advanced or special
configurations to get acceptable performance.

Me: During your time at Citrix, have you seen customers build
VDI platforms to run similar applications (games, simulators) on
their platform?

Christian: Purely for games, I would have to say no - but there


were definitely many examples of service providers building
HaaS-based offerings for the kinds of 3D workloads that had
similar technical requirements. Thinking back, it was around the
same time (2016) that we also worked on a prototype codenamed
Project Raptor. The technology in use was similar to that or the
racing rig, but the concept of Raptor was to virtualize Virtual
Reality. We wanted to figure out a way to untether the VR headset
from the PC - i.e., remove the need for a long HDMI connection
between the VR headset and the PC as it was a limiting factor -
and provide remoted access to Steam-powered VR apps. We built
a really cool solution that used an Intel NUC on a belt that
strapped around the waist - and remoted the VR apps successfully
to that, using the HTC Vive headset. It was tricky to get a high
frame rate at first, but we tweaked HDX a little and got there in the
end.

Me: Companies like Microsoft, Google, and NVIDIA offer remoted


gaming platforms to offer instant gaming services. Although the
remoting protocols they use are really mature and offer a good
user experience, they do need improvement for remoted true high-
end gaming on 4K resolutions over the Internet. Do you think they
will eventually be able to pull this off?

Christian: Do I think they will? Absolutely. Right now, 1080p is


pretty common and if you have the right equipment, bandwidth
and data usage package then, for example, you can game in
HDR/4K on Google Stadia. One of the limiting factors will always
be bandwidth, but with the advent of more and more fiber-to-the-

285 VDI Design Guide Part ||


home and, of course, 5G technologies, I think that - plus continued
innovation in the delivery CODECs - will remove those barriers
over time. I tend to think of it a little like, say, Netflix. The “app”
(for gaming in this case) will be more and more ubiquitous and
available for more and more operating systems and browsers.

Me: Do you think this way of offering games could eventually


become mainstream?

Christian: Yes, 100%. If you think about it, there’s probably more
money (margin) for the big companies in gaming subscriptions
than in selling hardware. There’s also a massive advantage in
being able to bring new games to market and update existing ones
without having to manufacture them onto DVD or, even, have
customers wait to download them in full to the storage on their
existing consoles. As we’ve seen with things like Minecraft,
Roblox, Call of Duty and tons of other titles, the world is moving
to multi-player, collaborative, immersive experiences. That’s also a
lot easier to deal with server-side.

Me: Let’s get back to F1. What’s your favorite racetrack and why?

Christian: That’s the easiest question so far. It would have to be


Singapore. I’ve been lucky enough to attend three races there over
the last few years. I think it’s because it’s a night race, a street
circuit and the fact that the weather is always warm. It really feels
like a special event every time.

Me: If you have one key takeaway to share on this topic, what
would it be?

Christian: One takeaway…I would say from what we’ve seen over
the past few years in these trends, the challenges we’ve seen from
global remote working during COVID, and the ways in which
we’ve been able to push the technology to deliver rich experiences
from “cloud” should tell us that the future of fully immersive
collaboration platforms that connect people for work and play is
pretty much here. The next two or three years will bring us even
more cool stuff, such as Microsoft Mesh, which will allow us to

286 VDI Design Guide Part ||


virtually teleport inside a world of mixed reality. Now, that…. will
be very cool.

If you like to know more about Christian, check him on twitter:


@reillyusa

INTERVIEW WITH SCOTT FOREHAND

AND JASON SOVA


Like Christian Reilly mentioned in the previous section, gaming on
VDI can have its production use cases. This idea took another level
when Scott reached out to me in 2020. Scott Forehand and Jason
Sova work at Topgolf and, together with their team, are
responsible for the technology innovation side of things. Topgolf
was working on a new secret project which evolved to include
VMware Horizon, GPUs, and a guest-facing gaming experience
which needed to be virtualized without any noticeable latency or
frame drops and run at scale in Topgolf venues. Oh, and he also
mentioned some peripherals which weren’t really a natural
inhabitant on VDI. I won’t steal his thunder, but the use case was
probably one of the coolest I have ever heard of. Not just because
of what they are doing, but also being that they rolled it out in
production during the pandemic. Next to that, Topgolf is special to
me, since NVIDIA, Liquidware, and Igel gave me the opportunity
to organize a release party for the VDI Design Guide in 2018 at
Topgolf in Las Vegas during VMworld.

Me: How long have you been working with VDI solutions like
VMware Horizon?

Scott: Before joining Topgolf in 2018, I had the opportunity to


work with Citrix XenApp and XenDesktop. I’d only administered
and supported the Citrix environment, and there was a six-year
gap between working with Citrix and working with VMware

287 VDI Design Guide Part ||


Horizon at Topgolf. I first got the opportunity to architect a VDI
environment when I joined Topgolf.

Jason: Before joining Topgolf in 2019 I had been working with


Horizon View for about 5 years in a traditional corporate
environment leveraging VDI for remote workers and overseas
contract engagements. While this introduced some interesting
challenges with dealing with latency and such, after the interview
process I saw an incredible opportunity to do something in the
enterprise space that not many have attempted before, which was
an opportunity I couldn’t pass up.

Me: Without any further ado, what awesome use case have you
built?

Scott and Jason: Right to the fun part! Topgolf built a 3D


augmented reality gaming platform into the guest experience at
our venues. We have a proprietary, state-of-the-art ball tracing
technology called Toptracer that we leverage to create unique
gaming experiences for our guests. We have several games
available including Virtual Courses which lets guests play a round
of golf on their favorite courses right from the bay at Topgolf;
Jewel Jam which turns the outfield into a field of Jewels, and
guests get points for matching three in a row; Angry Birds, yes
THAT Angry Birds, where guests use the Angry Birds characters
to smash down structures, defeat pigs, and earn as many points
and stars as possible – and much more.
The experience leverages a combination of Dell EMC
hyperconverged infrastructure, VMware virtualization and
NVIDIA GPUs to deliver an AMAZING experience to our guests
at a global enterprise scale.

Me: Have you ever come across such a use case before?

Scott and Jason: Over the past few years, we’ve been able to
confirm that our use case is entirely unique. As of this writing,
there are no other deployments on the planet with our use case,
software stack or hardware. That’s both a blessing and a curse,
because while it makes troubleshooting issues rather difficult, it
also makes creativity in design a necessity. It’s been a challenge,

288 VDI Design Guide Part ||


but we’ve been able to partner with Dell EMC, VMware, and
NVIDIA to improve every component of our deployment and
innovate.

Me: How did you come up with the idea to remote the application
through VMware Horizon?

Scott and Jason: Topgolf acquired a tracing technology company


called Protracer in 2018, which was rebranded as Toptracer. If you
watch major golf tournaments on television, you’ve likely seen the
technology being used to trace golf balls. Immediately, leadership
at Topgolf identified opportunities to deploy Toptracer to Topgolf
venues, allowing guests to see their ball trace all the way to impact
in the outfield. Initially during testing, individual desktop
computers were leveraged at each bay, and several physical
servers were used to process tracing data.

The Topgolf Infrastructure team identified opportunities to


virtualize the entire platform end-to-end, and began to work
closely with Dell EMC, VMware, and NVIDIA to design a solution
that would perform and scale. After a year of testing, iterating and
dialing in the hardware and software we finally had a platform
that performed consistently and had most of the bugs that derived
from our unique use case worked out with our partners. Guest
experience soared.

Me: What challenges did you face while building the platform?

Scott and Jason: The platform was built with a foundation of


virtualization, on the backs of strong partnerships. With a unique
design like ours, there are a lot of different challenges. The first
hurdle was the foundational hardware platform. Traditional
infrastructure can be notoriously difficult to lifecycle manage, so
we chose to move into a hyperconverged platform. We had
compatibility issues with drivers, firmware, and the software
stack, so we took the time to test versions in our labs and
documented our results thoroughly. Typically, Horizon isn’t used
to deliver 3D gaming experiences, so we have been working with
our partners nearly constantly to improve every facet of the entire
platform to deliver the best experience possible. That effort spans

289 VDI Design Guide Part ||


everything from optimizing the desktops and applications, to
helping NVIDIA develop their enterprise licensing and support
experience. Even once we put all of that to bed, there’s still
challenges with hardware lifecycles. Topgolf has three different
NVIDIA GPUs in service today, spanning Maxwell through
Turing GPU architectures. The architecture of the GPU behaves
differently when developing games on different game engines,
making development a challenge. We’re charged with delivering a
uniform and performant experience across our entire deployment.
Some of the issues that we faced took well over a year and
multiple engineering teams at Dell EMC, VMware, and NVIDIA to
fix because the use case is so drastically different than anything
else deployed today. If you’re facing similar challenges, the best
advice we can give is DO NOT give up. In the end, you’ll fix the
issue and probably help to improve the foundational platform for
other people in the future.

Me: Your end user isn’t a colleague, but a customer. How is this
different?

Scott and Jason: The solution being revenue-driving and guest


facing is unique, and daunting. The platform is highly visible to
Topgolf leadership, venues depend on performance driving guest
satisfaction surveys, and recently it’s been a challenge to monitor
the data and develop a more proactive approach to supporting our
guests. The stakes are certainly higher, and we have redundancies
built into the architecture, but our unique use case sometimes
means that the way a partner has designed the product to work
doesn’t always line up with how Topgolf needs the product to
work. We’re also charged with making sure that the platform is
available in the event of an internet outage. That alone means that
we have to build everything local to the venue to ensure a
consistent guest experience.

Me: Does VDI have its advantages when supporting your


customer?

Scott and Jason: Deploying the solution using VDI has incredible
advantages including supportability, scalability, and availability to
name a few. The average Topgolf venue has 102 hitting bays and

290 VDI Design Guide Part ||


as of this writing, the Toptracer platform is deployed to over 3,000
hitting bays across the globe, and that number is constantly
growing. We currently support the entire implementation with a
team of 5 engineers. We’re able to minimize the support efforts
due to the self-healing capabilities and a few secret-sauce tweaks
that we’ve been able to design into the ever-evolving platform.

Me: What about disadvantages?

Scott and Jason: For Topgolf, the advantages have far outweighed
the disadvantages. The maturity of NVIDIA integrations with
VMware have been a major disadvantage, but we are starting to
see those integrations become more enterprise ready in recent
releases. NVIDIA’s architecture created challenges with licensing,
instant clones, DRS limitations and more. Partnering with NVIDIA
and VMware to solve those issues has proven invaluable, and
we’re starting to see a major shift towards GPU in the enterprise
because of those partnerships. Barrier to entry is also a
disadvantage. Licensing and hardware required to deploy a
solution this complex can be very expensive – and even cost
prohibitive for some customers. Finally, getting people who can
understand the complexities of our platform on support calls has
been a major disadvantage. While it’s getting better, there are still
communication challenges between partners.

Me: Where do you see the future of gaming on VDI going?

Scott and Jason: We’re already seeing GPUs being delivered with
cloud computing today. In 2019 NVIDIA built partnerships with
Microsoft to deliver AI and ML solutions that leverage NVIDIA
GPUs, and Amazon followed in 2020. Gaming is a multibillion-
dollar industry, with revenues projected to surpass 140 billion
dollars in 2021. It’s very likely that the use case for VDI, and even
containerized GPU gaming will become more common. We’re
seeing the first steps in a complete overhaul in game delivery right
now, in fact. NVIDIA launched GeForce Now, Google launched
Stadia, and Microsoft is launching Project xCloud soon.

There aren’t many companies that have a need to leverage VDI for
gaming the way that Topgolf does, delivering experiences to

291 VDI Design Guide Part ||


guests, but there is a very compelling reason to leverage VDI for
game development and testing for 3D gaming. The most prevalent
issue with gaming on VDI becoming more accessible is the cost of
deploying the solution in the first place. As GPU in the enterprise
becomes more common and more supportable, they will become
more accessible. Currently, NVIDIA has the only real enterprise
offering that spans hardware platforms, and NVIDIA has a more
mature offering. AMD will be entering the enterprise GPU market
soon, and we’ve seen how ambitious their offerings have been in
the consumer space recently. It’s an exciting time for GPUs, and
competition is always good for customers.

We think that 5G will also play a huge role at the edge and
provide new opportunities to interconnect locations around the
globe that would normally have latency issues. We’re seeing an
influx of VR experiences like Zero Latency, and as those
experiences scale and expand to new markets around the world
there would be operational advantages to virtualization, not to
mention the advantages virtualizing those 3D gaming experiences
offers instead of housing the compute on the players back.

Me: Do you have any plans to extend the platform with other use
cases or applications?

Scott and Jason: Absolutely! Whether we can talk about them here
is another story. We’ve got to maintain that competitive
advantage! Thankfully, Topgolf leadership fosters and develops
innovations and creativity. One of our Core Values is Edgy Spirit,
and it’s all about having curious intelligence and pushing the
limits of what we think is possible. We’re already using parts of
the technology to run AI and ML workloads, we’re integrating
other venue systems into the hardware stack, and we’re testing
several opportunities with other guest-facing systems in the venue
and new proprietary technologies that will continue to evolve the
guest experience at Topgolf. Stay tuned!

Me: If people want to see a demo or play the game by themselves,


where can they find more information?

292 VDI Design Guide Part ||


Scott and Jason: As of this writing, Topgolf has deployed
Toptracer to around half of the venues globally. That number
grows exponentially every single year. A few of key locations that
some readers may be familiar with are Las Vegas, Dallas, Denver,
Sacramento, and Dubai. You can find a full list on the Topgolf
website by selecting Play > Games from the menu on the website.

https://topgolf.com/us/play/games/

If you like to know more about Scott and Jason, check them on
twitter: @stforehand and @jasonsova

293 VDI Design Guide Part ||


VIRTUAL REALITY
Every now and then I like to look around and see what the future
of end-user computing will bring. I strongly believe that most of
that future is determined by the soft side of end-user computing.
The end user is changing and the platforms they are consuming
need to change with them. What I’m also seeing is that the way
such a platform is being consumed is also changing. Those
changes happen in a lot of different aspects. Take Apple, for
instance. They decided to move over to the ARM architecture for
their Mac line of products (Mac Mini and MacBook). Because of

294 VDI Design Guide Part ||


that decision, digital workspace platforms need to adapt so they
are capable of handling such devices from day one.

Now, in my opinion there are multiple ways to handle such a


demand for new devices or services in case your end users need to
use them. The first way is to simply say, “we tolerate those devices in
our digital workspace.” Users can simply run the digital workspace
services from the device, but that’s it. It’s basically a one-way
street. As an IT department, you accept the fact that those devices
will be used, but you don’t want any hassle from them or invest in
ways to actually manage them. An example could be BYOD. Some
organizations implement a “BYOD policy,” but the only way they
basically do this is to offer remote access to their VDI platform
from any device with some MFA measures and all
drive/peripheral/clipboard redirection policies fully locked
down. It will work, but it doesn’t offer the best user experience.
This way of handling those end user demands is still part of the
user-centric workspace, but at its bare minimum.

The second way to handle those new demands, is to embrace


them. If a device has day zero or day one support from the digital
workspace service, why not fully accept them, manage them, and
ensure them being compliant to your security policies? In this way,
you increase the user experience of these devices, increase
employee satisfaction, and perfectly focus on the user-centric
workspace like you should. For the ARM-based Apple devices,
this could mean you offer a full BYOD support including
deployment of applications, and maybe even offer services like
Antivirus solutions, and (remote) support. This is what positioning
IT as a business partner is all about.

Now, the third way of handling those demands is to not only


embrace them but use them to your benefit. This is what the
journey towards the digital enterprise is focused on. As IT, you
ensure day zero/day one support for new demands and even offer
services to show the advantages of such a device. One of my
colleagues has an Apple MacBook with an ARM M1 CPU and
quite often I hear him claim that he is easily able to work a couple
of days straight without charging the device. A little bit of research
shows that the 2021 MacBook Pro M1 is capable of 17 hours of web

295 VDI Design Guide Part ||


browsing or 20 hours of video playback on a single charge. Now,
this opens up for new use cases that you as IT could use in your
benefit. Do you have people who travel a lot? Proactively offer
them those devices. You’ll be the hero of your company, I’ll
promise.

Now, what does this have to do with this section?

In the section about the Healthcare Workplace, you may have read
that at one of my customers we are working with Virtual Reality
(VR). Their use case of pain relief and distraction during
chemotherapy is a perfect example of how the demand for devices
is changing. In this case, several researchers had demand for VR
devices to run certain apps on. Based on the three ways to handle
those demands, IT has multiple options:

• IT accepts them. They will probably offer access to the


corporate Wi-Fi network and let the researchers be the
owner in case such a device breaks (and they will of
course).
• IT embraces them. They will manage the VR devices,
update them in case that’s needed and support them in
case they break.
• IT delivers services for VR. They manage the devices,
make them fool-proof, update them, and even invest in
specific solutions to increase the user experience.

This is what Project VXR from VMware is all about. VMware has a
department called xLabs. It is led by the office of the CTO and
focusses on technologies even before they become emerging. ESXi
on a Raspberry Pi4, Bitfusion and native K8s support, and SD-
WAN for LTE are examples of such projects. Another one is
Project VXR. At VMworld 2019 in San Francisco, the team behind
this project demonstrated this project for the first time. Project
VXR focusses on bringing technologies like virtual reality and
augmented reality (AR) to the enterprise. Where my customer use
case is just a simple example, VR and AR are being used my many
Fortune 500 companies. Several major airplane builders use AR for
instance for quality and assurance purposes. Through AR, an
engineer can see inside a plane what screw or tiny cable should be

296 VDI Design Guide Part ||


where and in case it needs replacement is quickly able to do that.
The screw or cable just pops up in the augmented display while
the engineer walks around in the plane.

This video of the Boeing AR Kit (BARK) explains it perfectly:

https://www.youtube.com/watch?v=BLnojMZ_gOo

Those AR and VR devices have a couple of challenges. The first


one is that most of the high-quality headsets, require a PC with a
powerful GPU to offer that quality. Sure, companies like Pico and
Oculus built their standalone head-mounted displays (HMD), but
those devices include a relatively tiny GPU that offers a nice UX,
but nothing quite comparable to a beast of a gaming PC with an
attached HMD. As an engineer, you don’t want to carry a massive
gaming laptop around in your backpack and constantly walk
around with a cable attached from the HMD to the laptop. So, the
first challenge is how to bring that high-performance application
to a standalone and wireless HMD.

The second challenge is that you would like to manage these


devices, as well, preferably just like you manage other devices like
tablets, smartphones, and Windows 10 PCs. Standalone HMDs
have an OS running, need to be configured, and preferably
automatically provisioned with applications.

The third challenge lies beyond management. Navigating through


the main interface of an HMD can be challenging. It’s like learning
how to use a new type of operating system and through a new
way of controlling it. From an IT perspective, you just want to
offer an environment that’s just as intuitive as the rest of your
digital workspace and preferably even has the same user interface.
And of course, you want it to be as fool-proof as possible, offer the
same security measures, and because typing is a challenge in VR,
include SSO where possible.

297 VDI Design Guide Part ||


All of these challenges are being solved by VMware through
Project VXR. The project has mainly three different components:

• A digital workspace landing zone.


• Full device management capabilities.
• A way to deliver high-quality applications which are
processed and rendered in your datacenter, to a less
powerful HMD.

Now, let’s dive a bit deeper into these components.

THE DIGITAL WORKSPACE VR

LANDING ZONE
Like on any other device, a standalone HMD has an interface
which lets you navigate through different apps, configure the
device, and browse through an app store. I’ve conducted a lot of
research with the Oculus Quest and in regard to the menu
structure, the settings, and the overall UX, I’m quite satisfied. It’s
very easy to navigate through the virtual world, but the interface is
proprietary to the Oculus Quest. The main challenge with that is
related to support for other platforms. Every type of device can
come with its own user interface. What you would ideally like to
offer to those different devices is a landing zone that is consistent
and similar for all of them. Within the landing zone, you would
like to provide (conditional) access to corporate applications.
Sounds familiar? The first use case for Project VXR is basically a
VR version of the Workspace ONE Intelligent Hub. Kind-of. After
starting the VXR application, you will be transferred to a room
with a view over a valley. The room offers access to the actual
Workspace ONE Intelligent Hub and will show specific VR-
capable applications. Those applications are the ones you can
publish through VMware Horizon and can be accessed from the
VXR applications in case you are entitled. More about those
remote apps later. You might ask yourself why the native browser

298 VDI Design Guide Part ||


inside the VR device wasn’t used instead of a specific app. The
main reason has to do with the resolution of that browser. The
resolution is so low that you would have a bad user experience.
Plus, in the future it might also be possible to add more specific VR
features to the VXR app such as specific controller support or
support for hand gestures.

One of the best features of the VXR app is the ability to SSO
through the different apps after you have initially signed into
Workspace ONE. In my lab, I have managed to federate
Workspace ONE Access with Google Workspaces and my local
Active Directory. As a result, I am now able to use my Google
identity to sign into the Workspace ONE Intelligent Hub and SSO
into my VXR remoted applications without reentering my
password.

UNIFIED ENDPOINT MANAGEMENT


I already wrote a bit about this use case but let me dive some more
into the technology. HMDs are currently enterprise-ready, which
means that they include management capabilities as well. At the
time of this writing, there is an official supported HMD which can
be fully enrolled in Workspace ONE UEM, which is the Oculus
Quest 2 for Business. This HMD is basically the same as the normal
Oculus Quest 2, but with one major difference. Where the normal
Quest 2 needs a Facebook account to sign into the device, the
Quest 2 for Business doesn’t. This means you are actually able to
manage the HMD with a dedicated management account. And
because you manage it in the same way you are also managing
your “normal” endpoints, you are again avoiding special
snowflakes inside your organization.

After enrollment, you now have the ability to fully control the
device. You are able to remotely configure it, push security
policies, install applications, and monitor the device.

299 VDI Design Guide Part ||


Besides being able to control the HMDs, you also have another
feature to avoid misuse and that’s enabling the HMD in a sort-of
kiosk-mode. Powering on the device, and automatically opening
the VXR application saves you from a lot of questions, tampering,
and hours of support in case something was broken.

VR APPLICATION REMOTING
Managing an HMD, publishing apps to it, and SSO to these apps
through Workspace ONE Access is really cool. Don’t get me
wrong here. But, when I saw the first remote rendering demo from
the VXR team, I was blown away. Not because it just looked cool,
but mostly because one of the issues which always held me back
from investing in an HMD, was the fact that powerful ones require
a cable attached to a really beefy gaming PC to give the best UX.
That, quite honestly, is a bit of overkill for most of the applications.
Besides the lack of mobility, the alternative HDMs are also
relatively expensive with prices varying between $500 and $2000. I
have played around with the first Oculus Quest for over a year
now and most of the applications run really well. The native ones
that have been built for the Quest are fun and show a great
performance on an HMD which basically is equipped with a
similar CPU and GPU as a smartphone. Now, it’s the high-
performance VR apps that still require a gaming PC which can’t
natively run on a relatively cheap HMD.

Sure, the Oculus Quest comes with a USB-C cable which can set
the HMD in link mode and treat the HMD like an Oculus Rift, but
you are still losing mobility and don’t want to walk around with a
laptop in your backpack (attached with the cable to the HMD).
Was the link cable the only way to provide access to the apps on
the gaming PC? Well, this book isn’t called the VDI Design Guide
Part II for nothing! ☺

When I first met with Matt Coppinger (director of the VXR team),
they had some different options to bring those applications to the
HMD. The obvious option was to use Blast Extreme or PCoIP, but

300 VDI Design Guide Part ||


both connection protocols are limited to only 60 frames per second
and 60 FPS will for sure give you motion sickness. The Oculus
Quest natively runs at 72 FPS, and that’s definitely a framerate you
want to keep close to. There were also a couple alternative
streaming options which could be viable. The first was ALVR, an
opensource streaming protocol. It was a bit buggy and hadn’t been
worked on for over a year. It did work, though. Duncan Epping
wrote a couple of blogs on getting ALVR to work:

http://www.yellow-bricks.com/2020/01/02/seeing-green-only-on-
your-hmd-when-using-alvr-to-stream-an-app/

Next to ALVR, there were some other solutions like VR Remote


Desktop, but all of these solutions lacked a really good UX.

CloudXR

Everything around remote rendering came to a new daylight when


NVIDIA released their CloudXR solution in early 2020. NVIDIA
CloudXR basically is an optimized connection protocol for VR
application remoting. It is standardized on applications built or
supported by the OpenVR SDK which is built by Valve. Yes, the
same Valve who created SteamVR, and Steam.

CloudXR basically contains two different components: a client


which needs to be installed inside the HMD and a server SDK
which needs to be installed on the (virtual) machine which
contains the VR application. The CloudXR client automatically
detects if the server is available and if any OpenVR applications
are running. What’s really awesome here, is that the presentation
layer of the CloudXR solution is fully compatible with SteamVR.
This means that all of the integration features and settings are
already available from within the CloudXR client. Configuring

301 VDI Design Guide Part ||


controllers, application settings, graphics settings, etc., all of it is
accessible from within the HMD and adjusted to the UI of the
HMD, as well. This means you can just use the controllers to
navigate through the SteamVR menus through the CloudXR
solution.

Sure, you would probably be able to pull the same thing off with a
traditional remoting protocol (from a displaying perspective), but
those aren’t optimized for VR remotes, which make them pretty
worthless. But, remotely displaying a SteamVR menu which is
quite static isn’t where all the magic happens. Traditional remoting
protocols were initially built for your office workers, and as you
have read in this book, can also be tweaked and used for a wide
variety of powerful and graphical intense use cases. Unfortunately,
VR isn’t one of them. Because where they lack usability for VR is
that they are optimized to handle latency. As soon as the latency
increases, a combination of dropping frames and reducing the
image quality occurs. And that’s perfect for most use cases.
Reducing 60 FPS to 50 FPS is fine, even for most gamers who
experience a bit of latency. And don’t forget that even with
latencies up to 100 MS round trip, 50 FPS is really good.

This is where CloudXR steps up. What CloudXR does differently,


is that the framerate always keeps to a steady 70 to 72 FPS. Even if
latency increases and the display qualities decrease, the FPS
remains steady. So, the experience inside the HMD of remoted
applications never gets you nauseous. What happens, instead, is
that a 3D model of, for instance, a plumber, suddenly starts to look
like Super Mario from the first Nintendo game launched in the
mid 80s. Textures become less detailed and might even lose some
colors. The thing is, that CloudXR still tries to stream as much of
the remoted application to the HMD as possible. And it does that
really well.

In my own lab setup, I have tested the HMD through CloudXR in


a couple different network scenarios. The first one was my local 5
GHz Wi-Fi network. I have a couple of Ubiquiti Wi-Fi access
points in my house and set up a dedicated network for the HMD.
The latency between the HMD and my CloudXR server never
exceeds 20 ms and always has sufficient bandwidth available.

302 VDI Design Guide Part ||


CloudXR needs around 70 to 80 Mbps for the best UX. One of the
“applications” I tested, was Star Wars Squadrons. You might have
noticed that I am a massive Star Wars fanboy, which made this the
perfect application to test in my lab. While running in my local
LAN, playing the game through CloudXR gave an excellent UX.
Rendering textures went perfectly smooth and displaying them
hardly showed any UX issues. And remember, this is all running
inside a virtual desktop. So, the overhead of virtualization doesn’t
impact the UX at all.

In the second test, I introduced a bit of latency in my local


network. As I didn’t have a hardware-based WAN simulator, I
used a virtual appliance instead. NE-ONE Professional from
ITRINEGY can be downloaded and is free to use for 14 days as a
trial and can be downloaded here:

https://itrinegy.com/virtual-appliance-network-emulators/

In steps of 10 ms additional latency, I wanted to see what the


difference in UX was while streaming the applications through
CloudXR. Up to 50 ms, I must say that I could hardly notice any
major difference. Sometimes a texture was a bit less detailed, but
the overall UX was still very good. Things started to get less
detailed, blurry and less usable when the latency started to reach
70 ms. But, again, even with the detail in the UI being crap, the
framerate was still stable and didn’t cause any throwing up.
Something worth mentioning about streaming VR with so much
latency, is that it’s not that common for a remoting protocol to be
optimized for troublesome connections. I tried the same with some
YouTube VR videos, and I will admit that vomiting came quite
close. Another reason why NVIDIA did such a great job with the
CloudXR SDK.

For the second WAN test, I wanted to actually see what a true
WAN did. My neighbors are connected through the same ISP and

303 VDI Design Guide Part ||


connecting the HMD to their Wi-Fi network was a simple action to
test the impact of the introducing latency from routers and
network hops. Weirdly enough, the latency didn’t exceed 40 ms,
and bandwidth wasn’t the issue either. Unfortunately, the ISP
apparently does some weird stuff to network packets and as a
result, the UX wasn’t what I expected. Textures were less detailed,
but overall, the application was quite usable. I was hoping for a
similar UX as my tests with artificial latency around 40 ms. Still, it
shows that CloudXR is really usable as solution for streaming VR
apps in a situation like this.

In the final test, I wanted to see what CloudXR does when both
bandwidth and latency are constrained. I configured my phone for
tethering my internet connection and so remoted the application
over a 4G internet connection. In this case, my 4G connection had
around 30-40 ms latency, but lacked a serious amount of
bandwidth. The bandwidth varied between 6 Mbps to 15 Mpbs,
which is far below the minimum requirements for a low-quality
UX. As a result, the entire application was unusable. Now, I
couldn’t really see moving parts, or control anything. What did
work however, is looking around. The virtual environment I was
in seemed to be preloaded and because of that I could look around
and saw walls, the floor and the ceiling with a reasonable amount
of detail. Again, this was something that surprised me.

CloudXR does offer something new to the market and it is actually


a solution which is completely different from other connection
protocols. At the time of this writing, CloudXR isn’t publicly
available as a download yet, but it is possible to request early
access to the beta. You can do so by checking this URL:

https://developer.nvidia.com/nvidia-cloudxr-sdk-early-access-
program

304 VDI Design Guide Part ||


Horizon integration

Now, you might ask yourself how all of this ties into VMware
Horizon. The simple answer is that as of this writing, it doesn’t.
But Project VXR is a VMware-led project and focused on
delivering VR and AR applications to HMDs. So, purely out of
speculation, I think something might be coming. But what could it
be?

First of all, one of the major goals of VMware Horizon, is to offer


remote access to desktops and applications. What I can imagine, is
that VR and maybe even AR support might be in the pipeline. If
Horizon could support the increased frame rate or maybe support
the integration of CloudXR into the Horizon stack, it would be a
great addition to an already rich feature set. The configuration of
the CloudXR SDK is relatively simple and basically supports a
point-to-point connection, just like the current connection
protocols do as well. If it would be possible to point the client side
to the virtual desktop running the CloudXR server through either
the FQDN or IP address, you are already halfway there.

There is one challenge at this moment which I didn’t solve yet. In


order for the CloudXR server to be accessible from outside your
network, you would ideally tunnel the connection from the HMD
to the CloudXR server. Tunneling in this case could become a
challenge. The way a tunnel is established could potentially
increase latency, which might kill the user experience. To test the
remote connection in my own environment, I configured port
forwarding in my firewall and forwarded the CloudXR ports to
my virtual desktop. In this case, it works very well, but in case you
need to do this for hundreds of HMDs and CloudXR servers, it
will be hard to oversee, although I do think that using software-
defined networking and automatically applying such firewall rules
could be a quick workaround for now. What I do hope from a
Horizon-integration perspective, is that it would be possible to
tunnel the connection through Unified Access Gateway (UAG). If
that would be possible, without a large latency penalty, it would
solve a lot of remote access issues.

305 VDI Design Guide Part ||


DESIGN CONSIDERATIONS
Designing a VDI for VR is a bit different from the other high-
performance use cases. The main reason has to do with the
increased demand for GPU resources. I will cover these, but also
the ones related to other resources in this section.

General system resources

• Like any other high-performance graphical use case, VR


apps require a crapload of CPU resources. In my test
setup, I allocated 8 virtual CPUs to the virtual desktop. All
of them are running at 3.8 GHz. I tried running with fewer
cores, which will work, but not for all applications. One of
the applications I tested is called The Lab. It ran perfectly
fine with 6 vCPUs and I think it might even work with 4.
But, when looking at Star Wars Squadrons, for instance,
that immediately showed a negative impact on the overall
performance when using 6 vCPUs instead of 8. The same
was true for a lower clock speed. In earlier tests, 3 GHz
worked, as well, but it will heavily depend on the type of
application. With lower clock speeds, you will for sure
have a negative UX. Please don’t forget that when you
move your head around, even slightly, the frames have to
be rendered and encoded again before being presented to
the HMD. This is where the main challenge lies. As soon
as you are expecting a slight delay in presentation, it will
screw up your orientation and will cause nausea.
Choosing a certain CPU model will depend on your
proposed density. More sessions will demand more cores.
And don’t forget to take NUMA into account, either. I
have tested with the Intel Xeon Gold 6246R and Intel Xeon
Platinum 8354H, both with good results. In my own lab,
I’m running an AMD Ryzen 3900XT with the 3.8 GHz base
frequency, which performs like a beast (but I just have a
single HMD here, so density testing isn’t really possible).

306 VDI Design Guide Part ||


• RAM sizing depends on the workload. Although sizing a
host for VR won’t cause any RAM bottlenecks, make sure
you’ve got enough. My test virtual desktops (I use one for
gaming and one for apps such as The Lab) all had 16 GB,
which is well enough to run most apps. Please note that
RAM needs to be reserved on VM level because you are
using GPUs, as well.
• Networking is really important. Like the media designer
use case, the bandwidth usage for VR use cases heavily
differs from the rest. 70 Mbps per user is recommended for
a good UX. The total required bandwidth is of course
dependent on the density but be sure you will have
enough. And because of the latency intolerance, you might
consider dedicated low-latency NICs for the VR traffic. Of
course, the Wi-Fi or 4G/5G connection will be the
bottleneck but taking away any other bottleneck will
definitely help.
• From a storage perspective, you just need fast storage.
What fast means in this case is obviously dictated by the
application. I have tested on both local NVMe drives and
All-Flash vSAN, which both perform really well.
• Like any other high performance use case, make sure your
host is also running in maximum performance mode. You
can probably set it up in the BIOS. What I’d recommend, is
to configure it so that your OS can decide. In that case,
ESXi will determine the best performance mode for the
workloads you run.

GPU resources

• Choose the right GPU for the job. An NVIDIA T4 might


sound nice because of the large number of GPUs you
could run in a single host, but it has some limitations. In
the tests I have run, the T4 offers a relatively good
performance for use cases such as The Lab. When looking
at Star Wars Squadrons, it just didn’t perform that well.
So, it will again depend on your use cases. I ran the other
tests on an RTX6000, which offers a perfect performance

307 VDI Design Guide Part ||


for both of those use cases. Please note that the type of
GPU will also depend on your sizing and density. It may
be possible to add multiple high-end datacenter GPUs like
the A6000 or A4 in a single host, but it will be dependent
on the host system you choose. NVIDIA has announced a
lot of new cards during GTC in April 2021, which might
offer a good performance and high density as well, like the
NVIDIA A10 or A16. Because I wasn’t able to test these (as
the cards are not GA yet at the time of this writing), I
simply don’t know.
• You could run multiple sessions on a single card, but the
GPU will most likely not be the limiting factor. Instead, the
encoding engine will probably be. Because you are
sending a larger number of frames to the HMD, the
encoder will be hammered. What I saw during tests, is that
CloudXR will actually double the number of encoded
frames, simply because the HMD I used during my tests
has two displays (one for each eye). In case of The Lab, the
encoder was around 30% utilized for a single session. Star
Wars Squadrons hammered the encoder a bit more and
ran on 37% average.
• When using an NVIDIA vGPU for the system, you will
need to set up the GPU scheduler to use the equal share
scheduler. This will ensure the best performance for
individual VMs.
• With use cases which are a bit more tolerant to latency, it’s
a bit easier to pick a configuration and start from there. It’s
fairly simple to scale a bit based on additional resources
you can add later. Sizing hosts for VR is a bit different.
Sure, you can start with fewer GPUs and add more when
required, but you still need to find a host which will be
able to accommodate enough of them.

Network considerations

I already mentioned that networking is important, but for VR use


cases, the networking part has an additional challenge called Wi-
Fi.

308 VDI Design Guide Part ||


• A standalone HMD will primarily connect to a Wi-Fi
network to use locally installed applications, or access
remote applications. This means the Wi-Fi network needs
to be capable of handling the required bandwidth and
latency restrictions.
• The closer an HMD is to the wireless access point, the
better the connection will be. Be sure to position access
points in such a way that HMDs will always have a proper
connection.
• To boost connection speeds, it’s recommended to connect
the HMDs over 5 GHz channels. It might even be a good
idea to create a dedicated wireless network that just
accepts 5 GHz HMDs to avoid saturation of the wireless
network. Of course, make sure you implement some form
of Quality of Service on the wireless networks so
downloading windows updates or streaming YouTube
videos won’t cause people working in VR to throw up.
• Wi-Fi 6 is slowly becoming the new standard in wireless
networking, which introduces higher connection speeds,
fewer potential congestion issues, and extended range
possibilities. It was introduced in 2019 as 802.11ax and
most big networking companies now offer it as an option
in their products.
• While connecting over a Wi-Fi connection supports a lot of
mobility, true mobility will be achieved when 5G cellular
networks will be widely available. 5G promises to offer
hundreds of Mbps connection speeds with much lower
latencies than 4G. At the time of this writing, 5G is slowly
being rolled out in the Netherlands, but isn’t that common
in the rest of the world.
• Connecting an existing HMD over 5G would simply be
done through a 5G router or tethering through a phone.

309 VDI Design Guide Part ||


• What’s really cool, is that a former executive of HTC has
announced the first HMD which will include a built-in 5G
interface. Check it out here:

https://www.xrspace.io/us/headset

If you like to read more about the tests which have been conducted
for Project VXR, please take a look at the following site:

https://pathfinder.vmware.com/activity/projectvxr

On the site, you can see some videos and download a whitepaper
that contains most of the tests which have been conducted.

During an internship with the Project VXR team, Duncan Epping


has written several articles on his findings, which you can find
here:

http://www.yellow-bricks.com/tag/vxr/

310 VDI Design Guide Part ||


Finally, Matt Coppinger presented an interesting session during
VMworld 2020, which you can find here:

https://www.vmworld.com/en/video-
library/search.html#text=%22spatial%22&year=2020

Which brings me to the interview with the Project VXR guru


himself. ☺

INTERVIEW WITH MATT COPPINGER


On my whiteboard behind me, I have made list of topics and
potential people who I would like to interview for such a topic.
Matt Coppinger was probably the first who was written down.
Not just because of Project VXR, but also because he has a long
history with VMware EUC, as well. As you may remember (or
have read in the first book in the VMware EUC history section),
Matt came into VMware after Propero was acquired. The fact he
still works at VMware, drives innovation in the EUC space, and is
a really nice guy, has earned him both my respect as well as this
spot in the book.

In 2014, we met for the first time. I just started working at ITQ as
the first EUC consultant and got invited for a VMware
Professional Services training week in the Staines, UK at the
former European HQ of VMware. That is probably the best
training course I’ve ever been in. Matt organized it, together with
Spencer Pitts. The room was filled with approximately 20 VMware
employees, and me. I still have no clue how I ended up there, but I
am forever thankful that I was able to attend it. That week sparked

311 VDI Design Guide Part ||


my first interest in the VCDX certification because people like Ray
Heffer and Safouh Kharrat were there (existing VCDXs) and gave
some really nice insights in what it means to become a VCDX. The
primary goal of the week was to be trained on several EUC topics
and use the knowledge gained during the week to design a
solution for a fictitious company. At the last day of the week, we
had to present the solution to the whole group. Matt and Spencer
evaluated the solution and together with my partner (Reinhart
Nel, a VMware instructor with a focus on EUC) we won. We built
the most comprehensive solution which included most of the tech
we got to know that week, in such little time. Matt and I kept in
touch all that time, which led to where we are now. Matt is a
director for product management in EUC at VMware, where he
leads the development for the Spatial Computing platform called
Project VXR. So, here’s the interview.

Me: Like I mentioned in the introduction, you were in VMware


EUC in the early days. How were the early days for you?

Matt: Exciting, fast moving, challenging… VMware had just


established the Desktop BU, led by Jerry Chen and Propero was
their first acquisition. It was great to be part of something new at
VMware and pushing to release a new product. I was fortunate
enough to be part of the engineering effort, and then, as a
professional services consultant, conduct the first implementations
of VMware View. The growth of VMware and the growth of VDI
meant there was so much going on, never a dull moment and of
course competition with Citrix. We lost a fair few of those early
battles with Citrix, but kudos to the team, we stuck with it and I
believe have built something special. My fondest memories are
from the early days of building a field team with the likes of
Spencer Pitts, Tommy Walker, Peter Bjork, John Dodge and many
others. Lots of those folks are still at VMware leading the charge in
EUC.

Me: You have seen the VDI market grow to what it is now. Did
you ever expect it to become so successful?

Matt: It's the year of the desktop, right? :) Having worked at and
with various enterprises deploying Windows NT, Windows 98,

312 VDI Design Guide Part ||


Windows 2000, XP etc. I knew the challenges they faced and the
pain. I’d worked with Citrix and on Propero’s own
application/desktop virtualization solution, so I was also aware of
how much easier it was to manage and secure a centralized
system. VMware was starting to boom, and everyone started to
really understand the benefits of virtualization. To me it was a no-
brainer, solving enterprise Windows desktops and apps with
virtualization was a great solution for lots of use cases. I was a
believer. If I wasn’t, I would have left VMware a long time ago. Of
course, we had some challenges to get over, like display protocols
and hardware redirection! However, I felt that the limiting factors
early on were primarily storage and bandwidth and that was
improving every day. I lost count of the number of VDI storage
start-ups that sprung up. As our product matured, I saw the
potential in just the sheer number of opportunities coming our
way. When enterprises talk about and implement 50,000+ users on
VDI, you know that the potential for VDI is huge. I think many
underestimate the “long tail of Windows apps” and the challenges
enterprises face in delivering them. Cloud has accelerated that
opportunity, reducing some of the technical obstacles to building
your own VDI. Was VDI going to dominate the EUC landscape?
No. With the influx of mobile devices and bandwidth so great you
can deploy GBs of apps to a laptop in minutes, the EUC landscape
has changed. VDI is just one of many solutions to meeting the IT
needs of our colleagues.

Me: You have had different EUC roles within VMware. What was
your favorite role and why?

Matt: That’s like asking which one of your children is your favorite
:) I’m fortunate to say I’ve loved every role at VMware. Each one
presented a new challenge and pushed me in many ways. VMware
has certainly given me the opportunity to develop my career in so
many ways and with interesting roles. I enjoyed working with
customers as a PSO consultant. It gives you a great understanding
of customers, their problems and the pros/cons of your products.
I think every software engineer, product manager and product
marketer where possible should take the opportunity to ride-a-
long with a PSO consultant - invaluable experience. My favorite
role though has to be my current one, but probably not for the

313 VDI Design Guide Part ||


reasons you think. Innovation sounds like working on a science
project, shiny things and fun/interesting technology. It is but at
the same time it will be one of the hardest professional things you
can do. Innovating and building a new product, in a new
technology space, with a new and distributed team, with limited
resources, during a global pandemic is one of the toughest yet
rewarding experiences. It's certainly not for the faint hearted or
those that want a 40-hour work week. It has both been my favorite
role at VMware and the hardest.

Me: In your current role, you lead the development of state of the
art EUC solutions. This is probably as innovating as it gets. Before
we dive into Spatial Computing, what is it like to be working on
the forefront of innovation?

Matt: As I said, tough! It's a privilege to get the opportunity to


innovate at VMware. You are being trusted to invest VMware
resources in a new idea/technology and deliver value. Sounds
easy right? Got a great idea? Just build a product, right? The thing
is, there are lots of great ideas, lots of smart people with plenty of
great ideas. Innovation though is about creating something that
delivers real value. It's fine to have an idea, but can you build it?
Can you build it with the resources available? In the timeframe
you have? Even if you build it, can you get enough customers
onboard to prove out the value? Can you convince colleagues this
is the right bet? If you have something of value, there is likely a
competitor out there already. Is the product you’ve built in a short
time frame ready to compete? In fact, it's even tougher than that.
As you innovate and explore your problem domain, it's likely you
will need to pivot. The more you dive into it, the more challenges
and opportunities you find. Your original idea will no doubt need
tweaking, changing, pivoting to meet the reality you uncover.
Innovation is also about listening, learning and being agile. If the
data, the customers, the feedback is all pointing to the fact you
need to change, don’t waste time. I often hear the phrase “fail
fast”. I’m a huge fan of how SpaceX builds their rockets. Build
something quick, fly it, record the outcome, learn from the data, fix
the problem, fly it again. Repeat until you have a solid product
(that doesn’t explode!) In my mind that isn’t “failing fast”, the
explosions look like failure, but that is just an opportunity to

314 VDI Design Guide Part ||


optimize your product. Without building fast and flying often you
don’t learn; you don’t get to optimize. Rapid iteration of your
products is critical, you need to accept a level of technical debt,
focus on the priority features and ship early and often. That is
easier said than done! Some engineers are not comfortable with
that and to them I say, you will always have technical debt! But to
get back on point, leading innovation projects is hard, challenging
work. From the outside it looks like a fun science project, but the
reality is a huge amount of effort has gone into building something
that delivers real value. My final point around innovation is about
having a passion for the fundamental problem you are solving.
You need to believe in and be passionate about the area of
innovation you are working on. Again, if I think about the
companies innovating in the 21st century, like SpaceX and Tesla,
they have a fundamental goal or guiding light. SpaceX’s is to
“make humanity a multi-planetary species” while Tesla’s is to
“accelerate the world's transition to sustainable energy”. These are
fundamental problems they are solving that drive them. Their
goals are not to make a specific product or amount of revenue.
Find your passion and the fundamental problem you want to solve
before diving into the innovation journey.

Me: What other projects is VMware EUC working on which you


are allowed to talk about?

Matt: I could tell you, but I would have to…

Me: LOL. Could you explain what Spatial Computing is and why
it matters?

Matt: Spatial Computing is the technology that enables man and


machine to work together in “3D” space. This could be delivering
high-fidelity immersive training in a virtual reality. It could be
enabling AI to teach someone how to build a complex component
or conduct a complex task - in real-time. It could also be about
giving someone remote presence via robotics, for example in tele-
medicine. Self-driving vehicles is another use case where this
technology applies. Spatial Computing is a broad area, but
ultimately this is about giving our devices spatial awareness in
order to augment us. Why does it matter? For enterprises, spatial

315 VDI Design Guide Part ||


computing offers the opportunity to massively increase employee
productivity in a scalable, measurable and repeatable fashion. This
is already proving out, as organizations have discovered that
immersive training is 83% more effective than traditional training.
It's safer, more cost effective, has greater efficacy and has other
benefits such as being able to accurately measure physical
responses to certain situations. As the training is digital and
delivered via a small mobile device, it is extremely scalable and of
course repeatable. Immersive training is critical in areas such as
healthcare, oil & gas, manufacturing, financial services, retail,
automotive and more. Outside of immersive training, augmented
workflows are the next use case in the enterprise for spatial
computing. Boeing uses augmented workflows to reduce human
error by 90% and reduce task duration by 30%. They do this by
overlaying digital models onto real aircraft during the assembly
process. This helps factory workers quickly complete tasks and
more accurately. Other uses of spatial computing are design
visualization, live events, media production, remote
collaboration/assistance and surprisingly mental wellness. This
last one is interesting in that Virtual Reality can provide calming,
meditative visual and audio stimulation, great for helping people
relax. In fact, studies show that VR can be used for pain therapy
too.

Me: Do you think VMware’s part in the spatial computing market


will open up for new use cases?

Matt: One of the challenges around innovation is helping


colleagues understand the problems your product solves and why
VMware should invest. Innovation is often developing adjacent
products to your current portfolio in order to open up new
streams of revenue. Adjacent areas may not be something that
colleagues understand well, so you need to spend a good chunk of
time educating and bringing people up to speed around the
problem domain. So yes, our investments in spatial computing
will open up new opportunities for VMware. More importantly
though, VMware’s move into spatial computing actually enables
the technology in the enterprise. Why isn’t spatial computing more
mainstream? Well one of the critical challenges has been around IT
managing these devices. Spatial computing devices such as

316 VDI Design Guide Part ||


AR/VR headsets have come from the consumer space and are
behind the curve with regards to enterprise readiness. VMware
now gives businesses the ability to deploy, manage and secure
these devices. VMware has one of the leading unified endpoint
management products and so naturally this is right in our
wheelhouse. In addition, these mobile devices lack powerful CPUs
and GPUs, something VMware can provide via its datacenter and
cloud offerings, by streaming AR/VR apps to mobile devices -
again right in VMware’s wheelhouse. We won’t stop there though,
as spatial computing matures, VMware will be in a good place to
offer additional services and products in the space.

Me: What challenges do we have to overcome before spatial


computing will become mainstream?

Matt: As I’ve said before two key challenges at the moment are
spatial computing device management and device capabilities.
We’re helping address those challenges through Workspace ONE
and Project VXR. The other two challenges I see are enterprise
security/access on these devices and enterprise user experience.
VMware is also working towards solving these issues. VR devices
need to become a little smaller in form factor and the user
experience in VR needs to mature. On the AR front, the devices are
not as mature as VR or as widely adopted. Microsoft HoloLens 2 is
a great development, but the price point is still high. There are
some considerable near-eye optics challenges that need to be
solved for AR to become more mainstream. However, as there is
more adoption, we’ll see more content, applications and use cases.
This time reminds me very much of the early days of VDI. Plenty
of skepticism, but clear-cut use cases and demand in the
enterprise, somewhat hampered by technology maturity. Going
back to my previous answers, I believe in the fundamental benefits
Spatial Computing brings and it's only a matter of time until the
challenges highlighted here are solved.

Me: What will the future bring for Spatial Computing?

Matt: Like VDI, Spatial Computing isn’t going to dominate the


EUC landscape, however, it has an opportunity to play a role.
Huge investments by companies such as Google, Microsoft,

317 VDI Design Guide Part ||


Facebook, Apple, Samsung, Lenovo and others are pushing the
technology forward, solving key problems. Facebook and Google
are already in early development of next generation AR devices.
Rumors suggest Apple is also developing mainstream AR/VR
devices. So, the devices will only get better, cheaper, more
accessible and find more use-cases in the enterprise. Do I see
everyone wearing head mounted displays? No, of course not. Like
VDI, headsets will be a tool used for specific use cases - training,
complex tasks, visualization, remote assistance/collaboration and
ultimately productivity. We’ll be able to ditch our 2 or 3 monitor /
laptop setups at home, on the move or in the office and just use a
lightweight headset with unlimited screen space. I’m more excited
though to see spatial computing be used in education and training
to give people more opportunities.

Me: Where do you see the future of VMware EUC going?

Matt: COVID-19 was a turning point for many in the EUC space.
Now more than ever, a digital workspace is needed by every
single organization on the planet. Any device, any app, anywhere
rings true now more than ever. VMware EUC is in a strong
position to help our customers succeed in delivering quickly on
the digital workspace, which gives us a solid foundation for the
future. Our digital workspace is also a foundation for our
customers. A foundation that we can develop on top of to deliver
not only a platform but better employee experience with the
digital workspace. Digital Employee Experience Management is
the next battleground, lots of innovators and players in the space,
helping customers deliver a consumer simple, enterprise secure
experience. DEEM enables organizations to measure, manage and
optimize the services they deliver and the user experience.
Allowing employees to seamlessly conduct business operations
across different applications and platforms helps increase
employee productivity. Now, we need to do all of this “outside the
firewall”. If organizations were not embracing Zero Trust before,
they need to now. Prior to COVID, many were aware of the risks
of just trusting devices on the corporate network. With many
working remotely, Zero Trust is even more important. Zero Trust
delivers a more secure architecture but comes with challenges
around user experience. VMware Workspace ONE and other

318 VDI Design Guide Part ||


products are in a good position to deliver Zero Trust - with a great
user experience. In addition to Zero Trust, we need to solve the
challenges of delivering security services to the remote workforce.
Giving organizations the ability to deliver security services at the
edge is a key part of Secure Access Service Edge (SASE). This helps
customers scale their security infrastructure and deliver it closer to
their users. VMware NSX and some components in the EUC
portfolio are well placed to deliver SASE. Finally, with VMware
managing devices, apps and users, organizations have access to
huge amounts of data that could be used to optimize the digital
workspace. So much data in fact you need help processing it. I
think machine learning and intelligence will play a key role in
automating that optimization and providing actionable insights.
ML can also provide a better user experience around
understanding context and user behavior. As I’ve alluded to at the
start of this interview, having felt the pain of delivering end user
services in an enterprise, I believe that striving to solve those
continuing challenges give VMware EUC a bright future.

Me: Any key takeaway which you would like to share regarding
Spatial Computing or VMware EUC in general?

Matt: Hopefully I’ve provided some useful takeaways in all my


answers so far :)

If you like to know more about Matt or follow his adventures, you
can find him on twitter: @Mcopping.

319 VDI Design Guide Part ||


THE IDEAL VMWARE

320 VDI Design Guide Part ||


EUC HOME LAB
In the original plan, this section wouldn’t have been part of the
book. Planning a book means that you need to work towards a
release date. A release date is the biggest enemy because it will
eventually dictate what section will make the final release and
what section won’t. Along the writing process, it might be possible
that you find something that wasn’t on the original plan, but fits
the goal of a book so much, it would be shame not to write about
it. That’s what I thought when I was talking about home labs with
a colleague. At first, I didn’t think designing a home lab would be
an interesting section for this book. But, as it goes, almost all
people I know who work in EUC also have some form of a lab. So,
why not?

WHY A HOME LAB?


In 1997 I left high school to start my first education in IT. It was a
4-year education in which the first 2 years were focused on all-
round sysadmin skills and the last 2 years on development skills. I
learned a lot of things about Novell NetWare, Windows NT 3.51,
and Unix. I learned mainly how to deploy and manage systems
based on these operating systems but was always tied to the
servers we had at school to play with the technologies. I didn’t
want to rebuild my gaming PC I had at home, so I therefore
decided to save some money from my job at a local butcher shop
and selling music CDs at school to invest in my own server to run
those operating systems on. Because I didn’t want to spend too
much on it, I bought a secondhand Cyrix 6x86-based system. It
had 166 MHz and since it wasn’t optimized for gaming (a Pentium
with the same speed played games like Quake a lot better), it was
the perfect device to play around with and use it primarily for
studying purposes. Since then, I always had some form of a lab
which I could build, break, build over, update, break again, fix,
and play around with the next technology. Every person uses
different methods to learn new things and getting hands-on is how

321 VDI Design Guide Part ||


I learn. Back then I used to buy those thick IT books and worked
my way through a technology until I knew everything about it.

The beast

In the past 24 years I had different labs in different form factors


and with different specs. Some of them were relatively small, some
of them were massive. I remember having an HP NetServer
LH6000 with 4 Pentium III Xeon CPUs and 8 GB of RAM. It
weighed over 60 kilograms and had to be carried by two people.
At one of my former employers, we sold these to customers as
massive database servers. The type of applications we ran,
required a lot of processing power, and fast storage (which at the
time was based on “fast” SCSI disks in RAID 10 configurations).
After four years of duty, such a machine was still very powerful,
but needed to be replaced by a new model. At one of these
replacement projects, the customer offered me his old server if I
would immediately take it with me. It was full of dust and didn’t
fit in his server cabinet, so he basically hated the thing for all the
time it stood in some kind of server/office room. It made a
crapload of noise and got replaced by a more powerful, more
silent and a lot less big model.

I still remember my mom’s face when I carried the new


“computer” to my room. My room was 3 meters by 2.5 meters. It
had a bed and a TV cabinet in one side, and a desk and a closet at
the other side. The thing you need to know about my mom, is that
she is really tidy. Everything in my parent’s house is clean and
very organized. Now, imagine her 20-year-old son taking a small
fridge-sized server home. Let’s say that she wasn’t happy at first,
and surely less happy 12 months later...

I used “The beast” (that’s what I called it) for little over a year. I
tweaked most of the fans, so it was less noisy. This was probably
around 2002, so silent fans like the ones from Noctua didn’t exist
yet. The beast was insanely fast, but also had a massive problem.
As I was still living in my parent’s house, I didn’t have to pay for
the utility bill. Until the moment they got the new yearly bill, that

322 VDI Design Guide Part ||


is. The beast had around 800 watts power consumption on
average, which lead to a massive overconsumption and an invoice
that costed me two months of salary. Needless to say, the beast
had to go. After I sold the beast, I tried some different types of
servers and for a relatively long time used an HP ML110G1.

My job back then was focused a lot on building server-based


computing environments for customers. I mentioned already that I
used my lab a lot for studying purposes, but in this case the lab
came with another advantage. The customer environments mostly
had Windows 2000 Server running with Remote Desktop Services
and Citrix MetaFrame, alongside Citrix NFuse for web-based
portal access. Where DTA environments nowadays are quite
common, they weren’t that common 20 years ago. So, during these
customer projects I also used my lab to basically test changes
before implementing them at customers. And frankly, I still do
that today. Especially when having to work with new
technologies, or technologies I’m not that familiar with, my lab is
essential to help me prepare for new projects.

DESIGNING YOUR EUC LAB


One of the charms of a home lab is that it can be literally
everything. One of the most common systems used, is probably
the Intel NUC: a small, yet powerful and efficient system that is
fast enough to run multiple VMs. More beefy systems with
datacenter CPUs like Intel’s Xeon CPUs are common, as well. First
and foremost, the type of systems will depend on what you are
going to build it for, how much budget you have available, and
how you would like to scale it. Does this sound familiar? Let’s
switch roles for a change and be in the customer’s seat. As your
own customer, it’s important to have an idea what you would like
to run on the lab. From an EUC perspective, that can be a lot. As
you may know, I received the VMware Certified Design Expert
certification in 2016. I will cover some VCDX-related insights
about my lab later, but when talking with other VCDXs or aspiring
VCDXs I always mock them about the VCDX-DTM certification for

323 VDI Design Guide Part ||


EUC being the hardest one. It’s not just the software-defined
datacenter (SDDC) side of things you need to worry about, it’s the
desktop, the operating system, the active directory, the connection
protocols, GPUs, endpoints, identity and access management,
endpoint management, firewalls, antivirus solutions, etc. This
perfectly reflects all of the different areas that an EUC admin
knows about. This also means that an EUC lab could potentially
run all of those things. The IT world is slowly transitioning to a
cloud-service world, which means that some of the EUC aspects I
just mentioned might not run on a physical server, or not ideally
on a physical server. Especially when thinking of solutions in the
identity and access management space, it’s quite common for
those solutions to be consumed as a service. This means that in my
opinion, an EUC lab is a mixture of components you might run on
a physical host, and components you need to consume as a cloud
service. Don’t immediately worry about a lot of monthly costs as I
will cover some tips and tricks that will save you a lot of money.

324 VDI Design Guide Part ||


How to start?

Like any other project, it’s essential to start with your own
assessment and determine your requirements, constraints,
potential assumptions and risks. This may sound a bit like
overdoing it, but you will see that it will help you a lot when
designing your lab. I created a list of questions with some guiding
answers which will help you define your own conceptual design:

• What would you like to run on the lab?


o Base infrastructure components like
AD/DNS/DHCP/NTP
o EUC base components like brokers, SSL gateways,
cloud integration services
o Virtual desktops
o Applications
o Profile management solutions
o Graphically accelerated applications
o Identity & Access management solutions
o Endpoint management solutions
o EUC monitoring solutions
o EUC automation solutions
o Antivirus solutions

• Do you have the ability to run these services in a (free)


cloud service?

• Do you need all of these individual workloads to be


powered on all the time or just some of them?

• Is the lab going to be running 24/7 or just when you need


it?

• Do you want to remotely power it on or off?

• Does the lab need to have a low power consumption?

• Do your workloads require specific hardware?


o Fast CPUs (single threaded apps for instance)

325 VDI Design Guide Part ||


o As many cores as possible
o GPUs
o 10 GbE vs 1 GbE (don’t forget the switches!)

• What’s your budget?

• Would you like to scale in the future?


o Scale up (add more resources)
o Scale out (add more nodes)

• What kind of storage would you like to use?


o Local (NVMe) flash
o Shared NAS storage
o A virtual SAN

• How are you going to handle DR?


o Backup to a NAS
o Backup to a cloud service
o No backup

• Where are you going to run the lab?


o In a home office next to your desk
o In a dedicated room

• Do you have enough power as well as cooling available in


the room?

• Does your current network support the lab?

• How production-like do you want your lab to be?

• Does it need to be officially supported by VMware?

• Does it need to have a WAF?

• What sorts of licenses might be required, and how much


will they cost initially and long-term?

Although it’s not a diversity-friendly phrase, the Wife Acceptance


Factor (WAF) is something that might even be one of the most

326 VDI Design Guide Part ||


important requirements of all. Although your spouse might not
use the lab, they might share the same financial situation you
might be in, so WAF is really important. Remember the beast?
That certainly didn’t have a MAF (Mom Acceptance Factor).

INVESTING IN HARDWARE
Based on the answers to the questions in the previous section,
there are a couple of main decisions to be made related to
hardware:

• Are you going to invest in something new or second-


hand?

• Is it going to be custom build or something off the shelf?

• Does it need to be production-like (such as a real virtual


datacenter with a cluster)?

I think that for basically every budget, there is an option for a


home lab. It just depends on what you need. William Lam
organized a community initiative to share some insights in what
people are using and how much they invested:

http://vmwa.re/homelab

As mentioned, there are multiple options, and let’s dive into those
options first.

327 VDI Design Guide Part ||


Second-hand hardware

I think the first option and one that can be very budget-friendly, is
going for second-hand hardware. When looking at William’s list of
community labs, Investments vary between $500 and $150,000. In
my opinion, they don’t always show “the art of the possible”. I
know for a fact that you can get more for less if you aren’t tied to a
time constraint. Take your time to look around and keep track of
what’s happening on sites like eBay. My first VMware lab was an
HP ProLiant ML350 G6 with 64 GB of RAM which I bought in 2012
for just $450 on eBay. It was a bargain, and because I spent a
couple of months looking for it, I think I got it for such a low
investment. It was relatively quiet, had out-of-band management,
and could perfectly run VMware View with all of the supporting
components without any issues. I didn’t run it 24/7, but it worked
fine. Finding the trusted sellers on eBay might help avoid scams. It
might even be possible to spent less on a lab, but in that case, you
might need to make some concessions. In some cases, it’s also an
option to search for complete clusters of hardware. One of my
colleagues bought four Dell 1U nodes including CPUs, RAM,
disks, and networking for a little bit less than $2000. This means
you are basically up and running for that amount of money. The
only thing you might need is a network switch. Hardware that’s 4-
5 years old, might be old for an organization, but as long as the
software you would like to run is supported, you are good. One
important lesson, take a look at the VMware HCL so you are
certain that you can run the lab components on the hardware.

To sum up, these are some of the pros and cons of investing in
second-hand hardware.

Pros

• Bang for the buck!


• Production-like hardware
• It’s probably on the HCL
• It’s kind-of off the shelf. This means that all of the
components are compatible with each other, which will
probably lead to fewer installation errors.

328 VDI Design Guide Part ||


• If you go for big OEMs, features like out-of-band
management are included
• Second-hand hardware will keep its value for a relatively
longer time than new hardware. It will be a bit easier to
sell again without losing most of the financial value.

Cons:

• If your strategy is to start with a single host and slowly


scale out, this might not be your best option unless you are
patient. It might happen that someone is selling the exact
same model with the same specs as you already have. The
alternative is to build heterogeneous clusters.
• The warranty of the hardware will probably be expired,
and firmware updates may no longer be available
• The lifecycle of second-hand hardware might depend on
supportability from the hypervisor. Drivers, support for
CPUs, enough resources, etc.

New HCL-based hardware

This is probably the most expensive option but does offer the best
support and durability. It’s also the one that potentially has the
lowest WAF (due to the relatively high investment). You do have
some relatively cheap options, but they do come with some
constraints.

My previous lab was based on an all-flash vSAN cluster. I wanted


to invest in a single cluster immediately, but it needed to have a
low power consumption and out-of-band management as I
wanted to be able to remotely power it on and off (which turned
out to suck pretty hard, but more about that later). I noticed in the
VMware community that some people were looking at Intel Xeon-
D-based hosts. The Intel Xeon-D basically was the successor of the
Intel Atom CPU, but with more cores and support for larger RAM
configurations (128 GB RAM). Supermicro was one of the OEMs
who offered different types of systems based on the Xeon-D CPUs,
with integrated out-of-band management, 10GbE NICs, and a

329 VDI Design Guide Part ||


power consumption of less than 25 Watts per host. The price point
of $600 for a system that just needs RAM and NVMe is pretty
cheap for a system that’s on the HCL. I do need to mention that a
Xeon-D isn’t the fastest CPU. It’s perfect for labs, but it will
depend on what you would like to run. Graphically intense
applications accelerated by a GPU could work, but not with the
same user experience as on a Xeon Gold-based machine. If you
take the Supermicro E300 series, these are seriously popular by
homelabbers because of the combination of price point,
performance, and rich feature set.

If you would like to spend a bit more, multiple OEMs offer Xeon-
based workstations that sometimes are supported by the
hypervisor, as well. Some of my colleagues have built their labs on
HPE or Dell workstations, with full support of the hardware and
due to the powerful CPUs and support for GPUs are able to run an
entire VMware Horizon and NVIDIA vGPU lab.

Because of the fact the hardware is new, it’s also quite simple to
expand. In my case, I started with two Supermicro E300 nodes in
direct-connect for VMware vSAN. A year later I bought a third one
and a 10 GbE switch, and I had more resources in my cluster. To
me, that was a big plus.

HPE has been offering the MicroServer for a while now as a


perfect alternative for the Supermicro barebones.

New HCL-based hardware also has its pros and cons.

Pros

• Support for the latest and greatest


• Full warranty
• Relatively easy to scale out
• No issues with drivers, supportability, etc.
• Depending on your budget, you have a wide variety of
options
• If you go for big OEMs, features like out-of-band
management are included

330 VDI Design Guide Part ||


• Suitable for any type of use case, but heavily depending
on, again, your budget

Cons

• The investment will probably be relatively high,


comparing to the other options
• Rack systems have a large form factor
• Blade systems have an even larger form factor
• Due to the fact that these systems are built for datacenters,
they aren’t really quiet

Non-HCL-based barebones/systems

Like mentioned earlier, I think this is the single most popular lab
system. Take an Intel NUC or comparable system. It’s small, has a
low power consumption, a relatively high performance, and has a
relatively low price point. Some of them even are seriously quiet
and thus would perfectly fit on your desk. With the addition of
external high speed NICs, you could even run an all-flash VMware
vSAN platform on them. The nice things about NUCs is that you
can just start with a single one, and relatively easily scale out. Just
add more NUCs. ☺

There is a great second-hand market for NUCs as well. I bought


one of my NUCs for $100 including RAM and an NVMe drive on
tweakers.net (a Dutch IT news site which has a marketplace as
well). Are there any downsides to NUCs? Well, they aren’t
officially supported. There is a big community that supports them,
though. William Lam was probably one of the people who initially
found out that the NUC7 was able to support 64 GB of RAM while
the Intel documentation clearly stated the supported maximum of
32 GB RAM. Another downside is the expandability of a single
NUC. Due to its size, it won’t fit a GPU or SmartNIC. Newer
models do have Thunderbolt 3 and can support external PCIe
cases which can include a GPU.

331 VDI Design Guide Part ||


If you like to know more about that, you should follow Max
Abelardo on twitter. He actually built a racing simulator on
VMware vSphere, powered by an external GPU on an Intel NUC.

332 VDI Design Guide Part ||


You can find him here:

https://twitter.com/mabelard

There are some other options, as well. Frank Denneman invested


in a new lab a while ago, which inspired me for my current lab, as
well. Frank built a new lab, based on the Intel NUC9 Pro model,
which basically is a really beefy Intel NUC with an internal PCIe
x16 and x8 expansion slot. The Intel NUC9 Pro model also has
three NVMe slots available, which I now use for an ESXi boot
drive, a cache drive and a capacity drive. Three of them are
currently running in an all-flash vSAN cluster. Although I really
love the Intel NUC9, it’s a bit more noisy than the “normal” NUC
models. The reason I went for the Pro model, was the ability to
add dedicated high-speed NICs to them. All of my NUCs are
equipped with Mellanox Connect-x NICs and seriously offer a
great performance.

Now, there are some less commonly known options, as well.


Shuttle is one of those OEMs who sells barebone systems that
aren’t on the HCL but do offer a great performance with a
relatively low price point. The same for ASUS, Gigabyte, and
ASRock.

Pros

• Full warranty
• Relatively easy to scale out
• Small form factor
• Quiet!
• Low power consumption
• Relatively cheap for a complete system
• For the Intel NUC, there is a large community which can
offer support

333 VDI Design Guide Part ||


• WAF!
Cons

• Potential issues with drivers, supportability, etc.


• Pro features like out-of-band management aren’t available

Custom builds

Just after I signed my contract at ITQ, I wanted to invest in a new


lab with a low power consumption. Although I had a major wish
list, I also had a major budget constraint. A year earlier we bought
a house, for which most of our budget was used. Immediately after
the house was finished, we started saving money because we had
our wedding planned. You can imagine that prioritizing for a new
home lab was a challenge. So, I decided to sell some old hardware
and started with just a small lab, a lab with enough CPU power
and RAM to run multiple VMs simultaneously. With a simple
quad core Pentium CPU, 32 GB of RAM, a 128 GB SATA SSD, an
ASRock mainboard, and an existing case with power supply, I was
fully up and running for less than $600. That lab served me for
nearly 2 years. It was relatively silent and consumed less than 50
Watts. When I bought that lab, I had a couple of requirements, but
nothing really related to specific types of workloads or VDI
functionality which I wanted to test and run. That changed when I
wanted to know more about GPUs and acceleration of applications
and desktops.

In the early days of graphical acceleration, I got my hands on an


NVIDIA Grid K1 card. Being able to run such a card in a home lab
and demonstrate the power of graphical acceleration to customers,
was kind-of a dream come true. As you may have read in the VDI
Design Guide, I always had a thing for GPUs. It all started in the
Doom and Quake era with Monster 3D cards and RIVA TNT
cards. Nerd as I am, I wanted to demonstrate those workloads
running inside an accelerated desktop. In order to do so, you need
some hardware which is able to work with a datacenter GPU
without any issues. The card obviously fit on my motherboard, but
as the case wasn’t purposely built for datacenter capable GPUs, I

334 VDI Design Guide Part ||


had some major cooling issues. That was kind-of solved by just
taping a fan to the end of the GPU (which looked pretty ugly), but
I still had the challenge of my CPU not being fast enough for the
GPU. The CPU was 1.9 GHz and although I expected CAD apps to
run OK, the limited clock speed caused for a lousy user
experience. So, it was time to design a new lab which was also
custom built.

Because I had a requirement for GPUs and better airflow/cooling,


I invested in a bit more datacenter capable hardware. I also
wanted to remotely shut down and power on the lab if needed. I
ended up with a Supermicro-based system with an Intel Xeon E5
CPU with 2.8 GHz. The high-quality power supply ensured that it
could potentially run 24/7. My old RAM wasn’t supported for the
new host (I needed registered RAM with ECC), so I needed to
invest in that, as well. The system I ended up with, was perfect for
my new interest in GPUs and accelerated desktops. I was capable
of running most of the complex workloads I also had to deal with
at customers and had a perfect demo environment, as well. This
was the actual lab on which I also I tested the first gaming stuff.

I think this actually shows the best pros of a custom build. You can
build something fully tailored to your needs. In the past five years,
I had four different lab configs and just added a newest custom-
built host. It’s an AMD Ryzen 9-based host with 128 GB of RAM,
NVMe storage and is capable of running VR and extremely
resource intensive workloads because the CPU has 24 logical cores
with a 4.2 GHz base clock speed and in the host runs an NVIDIA
RTX6000. I ended up with this system because of the relatively low
investment for the insane performance it delivers. It did come with
some virtualization issues, though, which I needed to tweak
manually in the BIOS (which obviously is one of the cons). But it
now works like a charm!

Pros

• The choice is yours, build whatever you like!


• Depending on your requirements, a custom build can be
as cheap as possible. I have seen builds of just $400.

335 VDI Design Guide Part ||


• Custom builds can be highly scalable. My fist custom
build started with 32 GB of RAM, which I extended over
time to 128 GB, 32 GB at a time.
• Websites like
https://www.servethehome.com
have a lot of information about
hardware and building a lab

Cons

• Support can be an issue if you use regular desktop


hardware
• I had to tweak a lot of stuff before my latest monster host
was running stable. This is quite common when using
hardware which wasn’t designed for virtualization.

Other hardware considerations

If you are investigating what to build, there are a couple of


resources which are really helpful. One of them is Reddit. There
are a couple of sub Reddits which I regularly check:

• https://www.reddit.com/r/homelab/

336 VDI Design Guide Part ||


• https://www.reddit.com/r/homelabsales/

• https://www.reddit.com/r/HomeNetworking/

If you are a VMware vExpert and would like to know more about
labs, follow the Homelab channel in the vExpert Slack. You can
find it here:

• https://vexpert.slack.com

Something really worth mentioning is a regular group buying that


William Lam organizes. He has some great contacts at multiple
suppliers he has worked with regarding Intel NUCs and
Supermicro barebone systems. Follow him on twitter or his blog to
be the first to know when he is organizing a new one:

337 VDI Design Guide Part ||


• https://twitter.com/lamw

• https://williamlam.com/

Network

• Be sure to design your network! Don’t just build a single


subnet and have your lab, home devices, and IoT in it. Be
sure to segment your network. I have dedicated
storage/vMotion network, a dedicated management
network, a desktop network, an IoT network, and a guest
network. All are firewalled and, in some cases, routed. I
would recommend to you to monitor what your smart
devices are nowadays tracking inside your home. You
would probably come to the same conclusion.
• Designing the network is one thing, but buying hardware
is another thing. As someone who isn’t a network
specialist, it was challenging (at first) to find the right
network equipment for the job. I didn’t have ambitions to
become a CLI specialist, and thus was looking for network
equipment with a nice graphical UI. I found that in
Ubiquiti. My entire network estate is based on the Unify
system because it’s easy to configure, it’s very stable, and
it provides everything I need. The takeaway here is to look
for network stuff that doesn’t require you to do a course
first (unless that’s part of your plan, of course).

338 VDI Design Guide Part ||


Storage

In terms of storage, you have a wide variety of options. The main


categories are:

• Local flash
• NAS/SAN
• Virtual SAN

Where to run you VMs will mainly depend on your budget. A


single flash device will offer the highest performance for the
lowest investment, but also has the lowest availability (as it might
be a single point of failure). Some considerations:

• Be sure to use NVMe drives. The price point of NVMe-


based drives is relatively low, while they offer the best
performance. The alternative are SATA-based drives, of
which the performance is sometimes 10x slower.
• In order to use NVMe, you have a couple of options.
Modern mainboards offer an M2 slot, primarily used for
compatible flash drives. In case your motherboard doesn’t
have one, you can also look for a PCIe card that has M2
slots on it and basically serves as a M2 bridge. Some
NVMe drives are already based on a PCIe-compatible
interface (such as various Intel Optane and Western
Digital drives).
• Please note that M2-based NVMe devices could become
very hot when using them, maybe even overheat. If you
have the ability, use a heatsink to protect them from
overheating.
• There is a massive number of devices available. Choose a
device that’s built for endurance. I had several ones that
served me very well (and still do). It could be tempting to
buy the fastest drive there is, but it might not be suitable
enough to build, run, and delete VMs, and take snapshots,
for instance. Samsung’s Pro drives are quite commonly
used by many homelabbers without any issues. I am using
a combination of those and also have a couple of Intel

339 VDI Design Guide Part ||


Optane H10 drives. Both perform very well and haven’t
failed me yet.

The second option is to run your VMs on a NAS or SAN. Now,


SANs have been purposely built for a workload like virtualization.
I won’t go into detail on SAN setups because it is very likely to be
proprietary to the SAN make and model you might use (and due
to budget constraints/power consumption, not very common).
NAS devices, however, are very popular amongst the home lab
community. Some considerations:

• Running your VMs directly on a NAS, means you need to


design or configure the NAS for that use case. First of all,
use flash drives that are supported by the NAS vendor.
Synology and QNAP both offer a list of supported flash
devices.
• While most NAS appliances are equipped with 1 GbE
interfaces, 10 GbE is available, as well, on certain models.
What interface to go for, largely depends on the number of
VMs you would like to run from the NAS and the
performance you are expecting from it. Sure, you could
build an all-flash NAS with 1 GbE, but the performance
will probably be limited due to the connection speed being
a limiting factor.
• NAS devices have RAM, as well. Be sure to invest in a
device that supports enough RAM for your use case.
Honestly, the more RAM, the better. Most devices will not
only use the RAM to run their storage system on, it may
function also as a first cache for writes. The more RAM,
the better.

The third option is a virtual SAN. It may also be the most


expensive option since you need two nodes with a witness
appliance as a bare minimum. I’ve been using vSAN for quite a
while now and really love it. It’s fast, resilient, and offers
production-like use cases (such as VMware Lifecycle Manager for
updates). Some considerations:

• vSAN will work with 1 GbE but using it in an all-flash


setup will require 10 GbE. General tip: if you are going to

340 VDI Design Guide Part ||


start with a two-node setup, you could connect the two
hosts directly over 10 GbE direct connect instead of using
a switch. This will save you from investing in a 10 GbE
switch.
• In my first vSAN setup, I used NVMe drives for the cache
tier and SATA drives for the capacity tier. That was three
years ago and mainly because of a big price difference
between NVMe and SATA. That difference has decreased
and is sometimes even gone. Because of this, using NVMe
for both cache and capacity is perfectly fine.
• Sizing vSAN works a bit different than sizing local storage
or a shared NAS/SAN. vSAN is object-based and has its
own methodologies to configure things like performance
and resilience for virtual machines. Take a look at the HCI
section in the VDI Design Guide for more guidance on
sizing.
• The bare minimum for vSAN is two nodes and the witness
appliance. A pretty cool nerd fact is that the vSAN witness
appliance can run on a Raspberry Pi 4. Take a look at
another cool post from William Lam to know more about
this:

https://williamlam.com/2020/10/vsan-witness-using-
raspberry-pi-4-esxi-arm-fling.html

• In case you would like to start with two nodes, but like to
scale the cluster out, you can simply do so by adding more
nodes. My general recommendation is to add identical
hosts to avoid potential compatibility issues within the
cluster.

341 VDI Design Guide Part ||


LICENSES
I don’t want to focus too much on licensing because I think
everyone will know where to find licenses. There are a couple of
considerations for using licenses in labs.

VMware licenses:

• You can request 90-day trials for most of the solutions


through:

https://my.vmware.com

• In case you would like to use permanent licenses, you can


do so in a couple of ways. The cheapest way is to become a
vExpert. Becoming one isn’t a hard thing, it just needs a bit
of work. You can find more about the program here:

https://vexpert.vmware.com

If you like to have a chat about the vExpert program, you


could also reach out to a vExpert Pro who can help you in
becoming a vExpert. My colleague Kim Bottu is one of
them:

342 VDI Design Guide Part ||


https://twitter.com/kim_bottu

• The VMware Usergroup (VMUG) offers a subscription


called VMUG Advantage. For just $200 per year, you will
get access licenses (next to a lot of other advantages).
Check the program here:

https://www.vmug.com/membership/vmug-advantage-
membership

Microsoft licenses:

• Most of the solutions can be downloaded for free and have


a trial license included which will vary per product but on
average is valid for 90 days.
• Getting permanent licenses for free from Microsoft is a bit
of a challenge. One of the ways is to become a Microsoft
MVP, which will offer a free visual studio (former MSDN)
subscription.
• Paying for a visual studio subscription is another
possibility. The prices will vary heavily but will start at
around $45 per month.

343 VDI Design Guide Part ||


LAB SERVICES
A lab doesn’t always have to imply that it’s a home lab. A lab can
mean multiple things and with the increasing number of cloud-
based services that are part of the Digital Workspace, running
these services in a lab is essential if you want to test them. Various
SaaS companies offer sandboxes or development accounts for you
to use in a test or lab environment. The following list is based on
my latest findings and also contains great tips from the EUC
community.

VMware offers a free to use services called TestDrive. You can find
prebuilt services including demo scripts, but even more valuable,
you can spin up your own (sort-of permanent) instance of VMware
Workspace ONE Access and UEM.

• If you are working at a VMware partner, the service is


completely free for you to use. Find more about the sign-
up process here:

https://kb.vmtestdrive.com/hc/en-
us/articles/360001372254-Getting-Started-with-TestDrive

• (If you don’t work at a partner, you can also get access
through the VMUG Advantage subscription.) An
alternative is to become vExpert EUC, as you will get the
subscription, as well.

344 VDI Design Guide Part ||


Microsoft offers a free subscription to Microsoft 365 for the
developers and is valid for at least 90 days. If you like to play
around with Azure AD, O365, Teams, etc. You can register for the
service here:

https://developer.microsoft.com/en-us/microsoft-365

Playing with other IDPs for your EUC stack is something you
might like to do, as well.

• Register for a 30-day Okta trial here:

https://www.okta.com/free-trial/

• You can also sign up for an Okta developer account, but


there are some restrictions:

https://developer.okta.com/signup/

345 VDI Design Guide Part ||


• Register for a 30-day Ping trial here:

https://www.pingidentity.com/en/trials/p14e-trial.html

If you would like to play around with application SAML


integration, there are some free options available, as well.

• Salesforce offers a free developer account:

https://developer.salesforce.com/signup

• Atlassian offers free editions of Jira and Confluence:

https://www.atlassian.com/software/jira/free

346 VDI Design Guide Part ||


If you would like to play around with services which can integrate
with the Intelligent Hub, there are a couple available.

• If you would like build VMware’s virtual assistant AVA,


check out the free edition of IBM Watson:

https://www.ibm.com/partners/start/watson-assistant/

• ServiceNow offers the ability to integrate servicedesk


workflows, which is included in a free edition:

https://developer.servicenow.com/dev.do#!/guides/quebe
c/developer-program/pdi-guide/personal-developer-
instance-guide-introduction

There’s a really great blog post from Reinhart Nel (VMware EUC
Instructor) about setting up a VMware EUC SaaS lab. You can find
it here:

https://www.livefire.solutions/euc/build-your-own-vmware-euc-
saas-lab/

347 VDI Design Guide Part ||


MY CURRENT LAB
I already briefly mentioned my own lab in previous sections. I
don’t want to completely go through the entire build, but I do like
to explain a couple of the choices I made and why I made them.
Before I started designing my current lab, I had the following
requirements:

• Certain workloads (such a home automation appliances,


security camera appliances, and
AD/DNS/DHCP/vCenter/etc.) need to be available 24/7.
• Certain workloads are GPU-accelerated, but don’t have to
run permanently (to reduce power consumption).
• The lab needs to run both VDI workloads and AI
workloads.
• Some VDI workloads require 3.5 GHz or above and at
least 8 cores (think VR and simulators).
• The lab needs to be accessible from external locations.
• The environment needs to be integrated with Workspace
ONE for presales demos.
• Power consumption needs to be as low as possible.
• The lab needs to be able to run beta software without
breaking the management cluster.

Whoever remembered my previous lab may also know that I had


datacenter GPUs running on non-datacenter hardware. Those
GPUs (Tesla P4s and later Tesla T4s) are great GPUs, but passively
cooled. To avoid overheating, I had to design custom cooling
solutions. They worked pretty well for that lab. More about that
lab and cooling solution can be found here:

https://vhojan.nl/building-a-low-power-vsan-vgpu-homelab/

348 VDI Design Guide Part ||


I made some improvements to the fans to reduce noise:

https://vhojan.nl/low-noise-vgpu-and-vsan-homelab-
optimizations/

For the new lab, the workloads I wanted to work with required a
different type of GPU. VR workloads and simulators can work on
T4s, but to get the most out of user experience, I went for a pair of
Quadro RTX6000s. They all support NVIDIAs vGPU, but the
RTX6000s can deliver a better performance and even better, they
are actively cooled.

My previous lab consisted of three hosts, in a vSAN all-flash


cluster. Two of them had a GPU running. Because I had the 24/7
workloads running on it, it also meant that I couldn’t shut down
the hosts with the GPUs. To reduce power consumption, I wanted
to redesign that part. Something that’s worth mentioning is that
the previous lab had a constant power consumption averaging 350
Watts, which is a huge impact on the WAF.

I worked a couple of weeks on the design and came up with a lab


infrastructure that consists of multiple clusters and a separation of
24/7 workloads on energy-efficient hosts and VDI/AI workloads
on powerful hosts that can be shut down when they aren’t needed.

The lab now basically looks like this:

Management cluster:

• 3 x Intel NUC9 Pro with 64 GB RAM


• Mellanox ConnectX-5 NIC on 10 GbE
• VMware vSAN All-Flash

349 VDI Design Guide Part ||


VDI Cluster for VR/Sim apps:

• Custom build host with:


o AMD Ryzen 9 3900XT (12 cores/24 threads on 3.8
GHz)
o 128 GB RAM
o Local NVMe storage
o Mellanox ConnectX-5 on 10 GbE
o NVIDIA Quadro RTX6000

AI Cluster:

• Custom build host with:


o Supermicro X10SDV-4C-TLN4F
o 64 GB RAM
o Local NVMe storage
o Embedded 10 GbE
o NVIDIA Quadro RTX6000

Test Cluster:

• Custom build host with:


o Supermicro X10SDV-4C-TLN4F
o 64 GB RAM
o Local NVMe storage
o Embedded 10 GbE

Everything is connected over 10 GbE through a Ubiquiti 10 GbE


switch. To create backups, I have a Synology NAS running which
Veeam uses to run VM backups on.

The result

The new lab has met most of my requirements pretty well. The
three-node vSAN cluster consumes around 100 Watts of power.
They are powerful enough to run all of the 24/7 workloads and
even some non-essential ones.

350 VDI Design Guide Part ||


For the sections about Gaming on VDI and Project VXR, I did a lot
of research and to run the high-performance applications, the VDI
host based on the Ryzen CPU performed as expected. Even though
I virtualized the workloads, the user experience didn’t have any
impact because of the performance the system is able to deliver.

Most of the AI demos I present require a lot of GPU resources, but


not that much CPU resources. The Supermicro Xeon-D-based
system offers a good performance while having a low power
consumption.

The host for testing beta builds doesn’t have to offer a great user
experience, I just want to be able to test functionality, which the
Xeon-D-based platform does really well.

Finally, my platform is connected to Workspace ONE and


authentication requires SAML including MFA for security
purposes. I have a Google AD sync running between my local AD
and Google Workspaces because Google Workspaces is my
primary IDP. I’m able to authenticate using my Google account
and consume all of the resources from my on-premises lab.

It took me around two months to fully build it, but for the first
time I think it might be lab that will be here for multiple years.
And because of the low power consumption, it has a great WAF,
as well.

INTERVIEW WITH WILLIAM LAM


I mentioned William Lam quite a couple of times in this section.
Because of that, there was absolutely no question about the best
suited person to ask for an interview. William works as a Senior
Staff Solution Architect at VMware and might be
mostly known in the VMware community because
he has one of the best-read VMware blogs around.
On https://williamlam.com you can find hundreds
and hundreds of really good articles on all sorts of

351 VDI Design Guide Part ||


topics. Some are focused on scripting, some on the VCSA (vCenter
Server Appliance), and there are also articles on home labs.
William has been working on several VMware flings, of which
some are also focused on lab stuff. The USB Network Native
Driver for ESXi, ESXi ARM Edition, and Community NVMe
Driver for ESXi are some of those examples.

William is also a regular speaker at VMworld, and because of the


technical level and quality of his sessions, a must-see! Besides that,
he’s just a genuinely nice guy.

Me: How long have you been involved with VMware


technologies?

William: I was very fortunate to have been exposed to VMware


technologies during my college years, thanks to my friend Tuan
Duong who played an instrumental role for getting me started on
this path. After graduating from college in 2007, I started to
seriously work with VMware technology in production starting
with Virtual Infrastructure 2.x, which was the name of vSphere
(Virtual Center and ESX) at the time.

Me: What sparked your interest in virtualization in the first place?

William: I have always enjoyed hands on learning and problem


solving. With virtualization, I could build and explore complex
systems without having any fear or constraints of what would
happen if I did something wrong. The possibilities were and are
still endless today, with your imagination being your only limit.

Me: I always enjoy it when you talk about labs. Why do you have
such an interest in labs?

William: My career and everything that I have learned thus far has
been a direct result of being able to explore and experiment in a
VMware-based lab environment. Automation, which is a passion
of mine, has allowed me to build repeatable infrastructure
scenarios and be able to share that with others to benefit. This
ultimately allowed me to better understand how things work,

352 VDI Design Guide Part ||


which is especially important as you not automate something you
do not know fully understand.

Me: What’s your favorite lab platform?


William: I think any recent hardware system in the last 2-3 years
that can run vSphere would be a good lab platform. After that, it
really depends on other factors such as budget, use case, noise and
most importantly approval from your CEO (aka significant other).
I am probably a bit biased towards an Intel NUC, mainly because
of how much you can accomplish with this tiny form factor system
and networking and storage capabilities the latest generations now
support. Today, you can have a solid vSphere environment on just
one of these systems that includes the vCenter Server Appliance
(VCSA), vSAN and even a fully functional vSphere with Tanzu
with as little as 32GB memory. The Intel NUCs can also be
upgraded to support up to 64GB memory and if you throw in a
few more units, you now have a very lower power and quiet setup
that can run most of your typical workloads needed for a home
lab.

Me: I mentioned in the intro that you’ve been involved with


multiple VMware flings. I’m a big fan of those flings, but why do
people need to check them out, as well?

William: The VMware Flings program is pretty cool because it


includes a number of useful utilities that VMware customers can
get benefit from immediately such as the vSphere Mobile Client
and the VMware OS Optimization Tool as an example. However,
it also contains early prototypes of ideas and concepts that
VMware is thinking about and would like to get feedback from the
community. Some of these concepts could evolve into a future
product and/or as an enhancement to an existing product. I will
give you two examples, both of which I was fortunate to have been
involved with, the VCS to VCSA Converter Fling which allowed
customers to convert a Windows vCenter Server to the vCenter
Server Appliance and the Cross vCenter Workload Migration Tool
which allowed customers to easily migrate workloads across two
different vCenter Servers, spanning different vSphere SSO
Domains. Based on feedback from our customers, we were able to
iterate and collaborate openly with our users and this eventually

353 VDI Design Guide Part ||


led to the productization of these two solutions. The VMware
Flings program has something for everyone, and I think it’s a great
way for customers to engage with both Product and Engineering
teams to influence how and where some of these ideas could go
longer term.
Me: One of the other flings you worked on is the VMware Event
Broker Appliance. It’s one of the flings I run in my own lab, as
well, for some automation purposes. I think it might even be my
favorite fling at the moment. Why do you think it’s such a great
fling?

William: That is so awesome to hear that you are a consumer of


VEBA! It definitely is one of my favorite Flings for so many
reasons, but I think the biggest reason is that it truly enables
VMware administrators to re-think some of the automation they
have had to write or have written over the years to be able to react
to certain vCenter Server Events. I still remember my first project
out of college was to figure out how to generate an application
impact assessment report when a vSphere HA event occurred and
if I had VEBA back then, I would have been done in an hour or
less versus the weeks I had spent to figuring out how to talk to
vCenter Server and retrieve all the information I needed. With
VEBA, you simply identify the vCenter Server Event, which can be
done using the new vSphere UI integration and then specify the
“function” or the business logic that you would write to perform a
specific operation. It can be as simple as sending a Slack/Microsoft
Teams notification to integrate with your ticketing system to auto-
remediation, all while using the language of your choice and not
really have to know about how vSphere API works or how to get
basic information for a given event. The customer adoption has
been phenomenal, and it feels like every week we hear about a
new customer story on just how easy VEBA has been to build
event-driven automation.

Me: If people would like to work on their own fling, where should
they start?

William: Today, VMware Flings are built and released by VMware


employees. With that said, customers can certainly engage with
individuals on a particular Fling by interacting with them using

354 VDI Design Guide Part ||


the comments section on a given Fling. If you have a Fling idea,
reach out to someone that works at VMware or even to your
account team, it is possible that someone might like the idea and
could work with customers and partners on potentially building a
new Fling. Who knows, you might be able to influence a future
VMware Fling!
Me: Let’s get back to labs. You were probably one of the first to
actually find out that the Intel NUC platform was able to support
64 GB of RAM, while the specs stated 32 GB as a limit. How did
you find out?

William: Honestly, it was pure luck and hope. In the Fall of 2018,
Apple announced the new Mac Mini which officially supports
64GB (2x32GB DIMM) but the availability of 32GB SODIMM for
individual purchase did not arrive in the market until Spring of
2019. At the time, I had access to an Intel Hades Canyon (8th
Generation) NUC and based on the CPU spec, it technically
supported 64GB of memory although the NUC specification stated
32GB as the limit. I figured I would give it a shot and purchased 2
of the 32GB DIMM which at the time were going for $298 per
DIMM (today, they are $168) and see what would happen. I was
not that surprised that it just worked on the Hades Canyon NUC
given the CPU was fairly recent, but I was totally shocked when I
realized the 64GB also showed up on my 6th Gen NUC, which I
bought back in 2016! This totally changed the game for home labs;
memory has always been a constraint for most folks but now you
have a fairly inexpensive way of upgrading your existing
hardware going as far back as 6th NUC all the way to the latest
generation which now officially support 64GB. I had so many folks
share their success across other NUC generations and how happy
they were to be able to leverage their existing investment. I think
someone would have eventually figured this out, as well, but I
guess it was just good timing and luck on my part.

Me: What’s on your roadmap for lab-related content?

William: I am always on the lookout to make it easy for our end


users to consume various VMware technologies for both education
and exploration purposes. I guess by the time this book is
published, you will probably see a new automated lab deployment

355 VDI Design Guide Part ||


script to handle a basic VMware Cloud Foundation (VCF) setup. I
know there are a couple of solutions out there right now and
although their use cases may be related, I have also found that the
overall end user experience can be quite daunting, especially for
the initial setup. To be fair, some of the existing solutions also
assume you have nothing, and you are starting from bare-metal
and end up with a complete VCF deployment, so I can understand
the requirements will be different. I just wanted an easy way to
deploy VCF and do so using a common pattern that I’ve built over
the years with my various nested vSphere lab deployment scripts.
As part of this work, I have also enhanced my Nested ESXi Virtual
Appliance, so that it would work out of the box for VCF
deployment, so that was also pretty exciting to get some of those
constraints fixed. I think Tanzu in general is another area that I
have spent some time in to make it easier for customers to quickly
set up the required infrastructure to start learning about
Kubernetes, whether that is using the minimum amount of
resources to deploy vSphere with Tanzu or trying out my Tanzu
Kubernetes Grid (TKG) Demo Appliance Fling, which is another
easy way to start exploring Kubernetes. As you can see, home labs
really enable continuous learning and exploration and will
continue to be a theme on my personal blog.

Me: What will the future bring for home labs?

William: I think we will see more diverse set of hardware


platforms being offered from more vendors -- beyond the familiar
vendors you may already know today. There is still a lot of
potential with the small form factor (SFF) system that can be
further innovated with technologies like USB4 and DDR5, which
could further benefit home labs in the future. ARM (aarch64) is
another trend that is just in its infancy, and I think it will play an
interesting role in the future, especially with our ESXi-Arm efforts,
which can run on a number of Arm devices including the tiny
Raspberry Pi 4b to some pretty beefy Arm Servers. I certainly
would welcome am Arm “NUC”-like kit that is comparable in
specs that would allow folks to run many more workloads and
potentially enable things like vSphere with Tanzu in the future as
an example. I definitely will be keeping an eye on this space and

356 VDI Design Guide Part ||


will continue to share what I see happening in this space with the
community on my blog.

If you want to know more about William and the stuff he’s doing,
follow him on twitter: @williamlam
.

357 VDI Design Guide Part ||


CONCLUSION
I hope this book gave you a good understanding of all of the
different things you can do with an EUC platform, and more
specifically with the versatility of a VDI. I closed the first book
with the quote: “As I am always interested in The Art of the Possible, I
would love to hear what kind of other challenges you have faced when
designing your VDI.” All of the great stories and use cases I heard
over the years inspired me to, yet again, bundle them in a book. I
hope it inspired you as well and will help you to also get the most
out of an EUC solution.

358 VDI Design Guide Part ||


INDEX
ADFS, 85 Employee Experience, 59
Age Roskam, 17 EMR applications, 185
ALVR, 299 FPS, 272
AMD, 123, 151 Frank Denneman, 132
Anirban Chakraborty, 177 FSLogix, 43
application sprawl, 193 Gaming, 265
Artificial Intelligence (AI), General AI, 244
244 Google Workspace, 85
AVD, 43 GPUprofiler, 273
Azure AD, 85 GPUs, 144
Azure Virtual Desktop, 43 H.264, 271
Blast Extreme, 270 healthcare, 182
Brian Madden, 12 HEVC, 273
Carbon Black, 102 High Efficiency Video
Central Processing Units, 120 Coding, 273
Christian Reilly, 280 HIPAA, 184
CloudXR, 300 HMDs, 304
Compute by Night, 225 home lab, 320
COPE, 81 Huib Dijkstra, 208
Corporate Owned, Identity Federation, 85
Personally Enabled, 81 Identity Management, 191
CPUs, 120 Intel, 121, 151
CUDA, 223 Intel NUC, 322
CuDNN, 235 Intel vs AMD, 126
Data Scientists, 246 Intrinsic Security, 105
DEEM, 98 Jason Sova, 286
Deep Learning, 245 Justin Murray, 256
Digital Employee Experience Lab services, 341
Management, 98 Licenses, 339
Digital Workspace Journey Liquidware Stratusphere,
Model, 50 219
DirectX, 276 Machine Learning, 245
Electronic Medical Record, Matt Coppinger, 310
184

359 VDI Design Guide Part ||


Maximum Screen TensorFlow, 234
Bandwidth, 272 TensorFlow-GPU, 235
Mellanox, 154 The “New” End User, 62
MIG, 145 Tony Foster, 237
Modern Management, 89 Unified Access Gateway, 304
Multi Instance GPUs, 145 Unified Endpoints, 188
Narrow AI, 244 User Centric Workspace, 51
NEN7510, 183 VDI by Day, 224
NUMA, 135 VDI by Day, Compute by
NVIDIA, 145 Night, 215
NVIDIA’s GPU Cloud, 246 vGPU, 145
Oculus Quest, 297 Virtual Reality, 293
Okta, 85 VMware Flings, 233
OpenGL, 276 VMware View Planner, 126
optimization plug-in for VMware Workspace ONE,
Microsoft Teams, 200 75
outcome-based approach, 50 VMware Workspace ONE
PEBKAC, 28 Access, 78
Pilot and Change Approach, Vulkan, 276
161 William Lam, 348
Ping, 85 Windows Virtual Desktop,
Project VXR, 295 43
Quantization Parameter, 271 Workspace ONE
Remote Display Analyzer, Intelligence, 96
274 Workspace ONE Intelligent
Robert Hellings, 65 Hub, 80
Scott Forehand, 286 Workspace ONE Unified
Spatial Computing, 313 Endpoint Management, 86
Spencer Pitts, 28 WVD, 43
Supermicro, 328

360 VDI Design Guide Part ||


BIO
I was born in 1982 (the year in which the Compact Disk and the
Commodore 64 were released) and ever since I saw a computer for
the first time (it was an original Pong game, which I still own), I
had a fascination for the digital world.

From a professional perspective, my fascination for remote


desktops and applications started off when I first touched Citrix
WinFrame, somewhere in the late 1990s. Citrix WinFrame was a
Microsoft Windows NT 3.51-based Terminal Server solution and
the very first successful attempt to create a Windows-based
terminal emulator that could publish remote Windows-based
desktops and applications, very much like Microsoft Remote App
and Remote Desktops from the present. Connecting to a remote
application with the Citrix WinFrame Client took over a minute or
so because of a 56K modem. ☺

The ICA protocol that Citrix used (and still uses) was so
impressive that the user experience of the remote desktop and
applications blew my mind. How could it be that a desktop

361
protocol was able to present a user with remote applications and
desktops without the user even noticing it? And that was over 20
years ago.

Over those 20 years, a lot has happened. I worked at various


employers, but still with a focus on what became Server-Based
Computing (SBC).

Sometime in 2006, I worked at a company (as a system engineer)


who wanted to build a Software as a Service platform that could
offer hospitals, pharmacies, and general practitioners a complete
set of healthcare applications, hosted from a datacenter in
Amsterdam. Those applications include a variety of office
applications and in-house built ones. It was a great ambition, but
like most of you might know, managing those applications and
keeping them from conflicting with each other’s middleware
versions and DLLs is a real challenge. That’s when I first got
introduced to Softricity Softgrid. It was one of the first Application
Virtualization tools which got acquired by Microsoft in 2008 and is
now known as App-V. Like seeing your first remote desktop, this
was also a game-changer. Without any hassle, we could run
different versions of the same application on a single machine,
without any conflicts. Again, very awesome. To create those
Softgrid packages, which were called Sequences, I used a solution
called VMware Workstation. With VMware Workstation, I could
run a virtual desktop on top of my physical desktop and quickly
restore my virtual desktop (after creating a Softgrid Sequence) to
an initial state.

At that same time, my employer was heavily investing in a new


datacenter architecture on which the SaaS applications and their
servers were going to run. The architecture was based on a
revolutionary new platform called VMware ESX. Four VMware
ESX 2.5 hosts were connected through 4 GiB fiber channel
interfaces to an EMC CX3 Shared Storage and the beauty of it all
was that this new platform could reduce the number of physical
hosts in the datacenter while providing a zero-downtime feature
during maintenance on the hosts -- something with a new
technology called vMotion.

362 VDI Design Guide Part ||


Every person has those moments when he or she knew where they
were when they heard that Michael Jackson passed away or of the
tsunami disaster in Southeast Asia. I’m sure that every IT guy or
girl that was born before 1985 also remembers where he or she was
when they saw their first vMotion; vMotion was one of the biggest
game changers in IT.

The cool thing was, that the more I got involved in the SaaS
project, the more I got to spend time with the VMware ESX
architecture. That's when I knew the direction I wanted to take my
career. Until 2013, I worked on the SaaS platform, which was
running on VMware ESX 4, managed by vCenter Server,
connected to an EMC Clarion CX4 shared storage. The Windows
RDS machines were running on the platform in a big farm, load
balanced with 2X (now knows as Parallels RAS) and provisioned
with applications by Microsoft App-V 4.5.

Late in 2013, it was time to make a career change. But what? It was
kind of easy. When you mix server virtualization with SBC and
application virtualization, what do you get? The answer is simple:
end-user computing (as we knew it back then).

I applied for a job at ITQ Consultancy as a Virtualization


Consultant, got hired and took my career to the next level. The
seven years after that were a rollercoaster. By blogging (on my
own blog and the company blog), presenting and evangelizing
VMware's products, I was rewarded as a vExpert since 2015 and as
a VMware End-User Computing Champion since 2016 (the
program is now embedded in the vExpert EUC sub program). On
top of that, I was recognized as an NVIDIA vGPU Community
Advisor as well, which was a great honor!

By further specializing in VMware's End-User Computing


products and taking exams, I achieved my VMware Certified
Advanced Professional certification for both Datacenter
Virtualization (DCV) as well as Desktop and Mobility (DTM).
Those were the ideal bases to work towards my ultimate goal:
become a VMware Certified Design Expert (VCDX) on Desktop
and Mobility. The VCDX certification basically validates your
skills as an architect. It’s not a course with a lab- or quiz-based

363 VDI Design Guide Part ||


exam, but a path in gaining skills as an architect, using those skills
in a real-life project and finally, validating them in front of a panel.
It’s kind-of how Luke became a Jedi. But instead of a panel, he had
to show his skills to Yoda.

After achieving the VCDX certification, I wanted to pursue another


dream: writing a book. But finding the right energy and making
the final decision to start can be challenging, too. With the help of
Karlijn Bruns (a personal leadership coach), I got rid of all that was
holding me back and started the journey in becoming an author.
My first book, the VDI Design Guide from 2018, sold thousands of
copies and made the dream of becoming an author come true.

Directly after the book launch, I went to VMworld in the US. That
week in 2018 was my most memorable moment in my IT career. I
got the opportunity to launch the book at the Inside Track
community event, followed by a book signing at both NVIDIA and
Liquidware. Especially at the booth of NVIDIA, there was a
massive number of people lined up who all wanted a signed copy
of the book. The icing on the cake was seeing my own book in the
VMworld bookstore.

If you read the book prior to reading this bio, then you know what
happened between 2018 and 2021 in terms of the large number of
different use cases I got to work with. Something which eventually
led to the creation of this book. You’d might expect that writing a
second book is similar to the first book. In a lot of aspects, it was
similar (creating a story line, finding the right topics, creating the
interviews). Where it differed (a lot), was with the research of the
topics and use cases. I obviously wanted to have my facts straight,
which sometimes led to a journey through the rabbit holes of IT.
But this is also the way how I learn. I learn by diving into
technology. Building, breaking, rebuilding, breaking again, and
finally working towards a solution which is viable for customer
projects.

When I got my first IT job, I turned my hobby into my profession. I


kind-of did the same thing again. Researching tech and writing
about it is one of my biggest passions, and I got the opportunity at
ITQ to turn it into my profession (again). Early 2020, I took the step

364 VDI Design Guide Part ||


to become a Technologist in the fields of End-User Computing and
Artificial Intelligence. I’m working with a team of awesome people
to shape our future at ITQ, something that really gets me
energized, every single day.

After finishing this book, I’m not really sure what my next side
project will be. I’d love to develop a VMware Fling, work on some
new podcast ideas, or maybe write a children’s book about VDI ☺
We’ll see. I’m going to enjoy a couple of months without a side
project, and focus on my family expansion first...

365 VDI Design Guide Part ||

You might also like