SQL Server On Vmware Best Practices Guide
SQL Server On Vmware Best Practices Guide
on VMware vSphere
BEST PRACTICES GUIDE
MARCH 2017
Architecting Microsoft SQL Server on VMware vSphere
Table of Contents
1. Introduction ........................................................................................................................................... 5
1.1 Purpose........................................................................................................................................ 5
1.2 Target Audience .......................................................................................................................... 6
2. Application Requirements Considerations ........................................................................................... 7
2.1 Understand the Application Workload ......................................................................................... 7
2.2 Availability and Recovery Options ............................................................................................... 8
2.2.1 VMware Business Continuity Options ..................................................................................... 8
2.2.2 Native SQL Server Capabilities ............................................................................................. 11
3. Best Practices for Deploying SQL Server Using vSphere .................................................................. 12
3.1 Rightsizing ................................................................................................................................. 12
3.2 Host Configuration ..................................................................................................................... 13
3.2.1 BIOS/UEFI and Firmware Versions....................................................................................... 13
3.2.2 BIOS/UEFI Settings ............................................................................................................... 13
3.2.3 Power Management .............................................................................................................. 13
3.3 CPU Configuration ..................................................................................................................... 16
3.3.1 Physical, Virtual, and Logical CPUs and Cores .................................................................... 16
3.3.2 Allocating vCPU to SQL Server Virtual Machines ................................................................. 17
3.3.3 Hyper-threading ..................................................................................................................... 17
3.3.4 NUMA Consideration ............................................................................................................. 18
3.3.5 Cores per Socket ................................................................................................................... 19
3.3.6 CPU Hot Plug ........................................................................................................................ 21
3.3.7 CPU Affinity ........................................................................................................................... 21
3.3.8 Virtual Machine Encryption .................................................................................................... 21
3.4 Memory Configuration ............................................................................................................... 22
3.4.1 Memory Sizing Considerations .............................................................................................. 23
3.4.2 Memory Reservation ............................................................................................................. 24
3.4.3 The Balloon Driver ................................................................................................................. 24
3.4.4 Memory Hot Plug ................................................................................................................... 25
3.5 Storage Configuration ................................................................................................................ 25
3.5.1 vSphere Storage Options ...................................................................................................... 26
3.5.2 Allocating Storage ................................................................................................................. 35
3.5.3 Considerations for Using All-Flash Arrays ............................................................................ 37
3.6 Network Configuration ............................................................................................................... 39
3.7 Virtual Network Concepts .......................................................................................................... 40
3.7.1 Virtual Networking Best Practices ......................................................................................... 41
3.7.2 Using multinic vMotion for High Memory Workloads ............................................................. 41
List of Figures
Figure 1. vSphere HA 9
Figure 2. vSphere FT 9
Figure 3. Recommended ESXi Host Power Management Setting 15
Figure 4. Windows Server CPU Core Parking 16
Figure 5. Recommended Windows Guest Power Scheme 16
Figure 6. An Example of a VM with NUMA Locality 18
Figure 7. An Example of a VM with vNUMA 19
Figure 8. Cores per Sockets 19
Figure 9. Enabling CPU Hot Plug 21
Figure 10. Memory Mappings Between Virtual, Guest, and Physical Memory 22
Figure 11. Sample Overhead Memory on Virtual Machines 23
Figure 12. Setting Memory Reservation 24
Figure 13. Setting Memory Hot Plug 25
Figure 14. VMware Storage Virtualization Stack 26
Figure 15. vSphere Virtual Volumes 28
Figure 16. vSphere Virtual Volumes High Level Architecture 29
Figure 17. VMware vSAN 30
Figure 18. vSAN Stretched Cluster 31
Figure 19. Random Mixed (50% Read/50% Write) I/O Operations per Second (Higher is Better) 34
Figure 20. Sequential Read I/O Operations per Second (Higher is Better) 34
Figure 21. XtremIO Performance with Consolidated SQL Server 38
Figure 22. Virtual Networking Concepts 40
Figure 23. NSX Distributed Firewall Capability 51
Figure 24. vRealize Operations 52
1. Introduction
Microsoft SQL Server is one of the most widely deployed database platforms in the world, with many
organizations having dozens or even hundreds of instances deployed in their environments. The flexibility
of SQL Server, with its rich application capabilities combined with the low costs of x86 computing, has led
to a wide variety of SQL Server installations ranging from large data warehouses to small, highly
specialized departmental and application databases. The flexibility at the database layer translates
directly into application flexibility, giving end users more useful application features and ultimately
improving productivity.
Application flexibility often comes at a cost to operations. As the number of applications in the enterprise
continues to grow, an increasing number of SQL Server installations are brought under lifecycle
management. Each application has its own set of requirements for the database layer, resulting in
multiple versions, patch levels, and maintenance processes. For this reason, many application owners
insist on having a SQL Server installation dedicated to an application. As application workloads vary
greatly, many SQL Server installations are allocated more hardware resources than they need, while
others are starved for compute resources.
These challenges have been recognized by many organizations in recent years. These organizations are
now virtualizing their most critical applications and embracing a virtualization first policy. This means
applications are deployed on virtual machines (VMs) by default rather than on physical servers, and
Microsoft SQL Server is the most virtualized critical application in the past few years.
Virtualizing Microsoft SQL Server with VMware vSphere allows for the best of both worlds,
simultaneously optimizing compute resources through server consolidation and maintaining application
flexibility through role isolation, taking advantage of the SDDC (software-defined data center) and
capabilities such as network and storage virtualization. Microsoft SQL Server workloads can be migrated
to new sets of hardware in their current states without expensive and error-prone application remediation,
and without changing operating system or application versions or patch levels. For high performance
databases, VMware and partners have demonstrated the capabilities of vSphere to run the most
challenging Microsoft SQL Server workloads.
Virtualizing Microsoft SQL Server with vSphere enables many additional benefits. For example, VMware
vSphere vMotion, which enables seamless migration of virtual machines containing Microsoft SQL
Server instances between physical servers and between data centers without interrupting users or their
applications. VMware vSphere Distributed Resource Scheduler (DRS) can be used to dynamically
balance Microsoft SQL Server workloads between physical servers. VMware vSphere High Availability
(HA) and VMware vSphere Fault Tolerance (FT) provide simple and reliable protection for SQL Server
virtual machines and can be used in conjunction with SQL Servers own HA capabilities. Among other
features, VMware NSX provides network virtualization and dynamic security policy enforcement.
VMware Site Recovery Manager provides disaster recovery plan orchestration. There are many more
benefits that VMware can provide for the benefit of virtualized applications.
For many organizations, the question is no longer whether to virtualize SQL Server, rather, it is to
determine the best virtualization strategy to achieve the business requirements while keeping operational
overhead to a minimum for cost effectiveness.
1.1 Purpose
This document provides best practice guidelines for designing Microsoft SQL Server virtual machines to
run on vSphere. The recommendations are not specific to a particular hardware set, or to the size and
scope of a particular SQL Server implementation. The examples and considerations in this document
provide guidance only, and do not represent strict design requirements, as varying application
requirements might result in many valid configuration possibilities.
Batch, reporting services, and ETL databases are busy only during specific periods for such tasks as
reporting, batch jobs, and application integration or ETL workloads. These databases and
applications might be essential to your companys operations, but they have much less stringent
requirements for performance and availability. They may, nonetheless, have other very stringent
business requirements, such as data validation and audit trails.
Other smaller, lightly used databases typically support departmental applications that may not
adversely affect your companys real-time operations if there is an outage. Many times, you can
tolerate such databases and applications being down for extended periods.
Resource needs for SQL Server deployments are defined in terms of CPU, memory, disk and network
I/O, user connections, transaction throughput, query execution efficiency/latencies, and database size.
Some customers have established targets for system utilization on hosts running SQL Server, for
example, 80 percent CPU utilization, leaving enough headroom for any usage spikes and/or availability.
Understanding database workloads and how to allocate resources to meet service levels helps you to
define appropriate virtual machine configurations for individual SQL Server databases. Because you can
consolidate multiple workloads on a single vSphere host, this characterization also helps you to design a
vSphere and storage hardware configuration that provides the resources you need to deploy multiple
workloads successfully on vSphere.
Provides APIs for protecting against application failure allowing third-party tools to continuously
monitor an application and reset the virtual machine if a failure is detected.
Figure 1. vSphere HA
For more details and best practices on Site Recovery Manager, see Section Error! Reference source
not found., Error! Reference source not found., and the Site Recovery Manager documentation at
https://pubs.vmware.com/srm-60/index.jsp.
For guidelines and information on the supported configuration for setting up any Microsoft clustering
technology on vSphere, including AlwaysOn Availability Groups, see the Knowledge Base article
Microsoft Clustering on VMware vSphere: Guidelines for supported configurations (1037959) at
http://kb.vmware.com/kb/1037959.
For a more detailed look at options requirements and how to plan mission critical deployments, see the
following guides:
http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-
vmware-availability-and-recovery-options.pdf
Planning a Highly Available, Mission Critical SQL Server Deployments with VMware vSphere
3.1 Rightsizing
Rightsizing is a term that means when deploying a VM, it is sized properly instead of adding too many,
which is a common sizing practice for physical servers. Rightsizing is imperative when sizing virtual
machines. For example, if the number of CPUs required for a newly designed database server is eight
CPUs, when deployed on a physical machine, the DBA typically asks for more CPU power than is
required at that time. This is because it is typically more difficult for the DBA to add CPUs to this physical
server after it is deployed. It is a similar situation for memory and other aspects of a physical deployment
it is easier to build in capacity than try to adjust it later, which often requires additional cost and
downtime. This can also be problematic if a server started off as undersized and cannot handle the
workload it is supposed to run.
However, when sizing SQL Server deployments to run on a VM, it is important to assign that VM only the
exact amount of resources it requires at that time. This leads to optimized performance and the lowest
overhead, and is where licensing savings can be obtained with critical production SQL Server
virtualization. Subsequently, resources can be added non-disruptively, or with a short reboot of the VM.
To find out how many resources are required for the target SQL server VM, monitor the source physical
SQL server (if one exists) using dynamic management views (DMV)-based tools. There are two ways to
size the VM based on the requirements:
When an SQL server is considered critical with high performance requirements, take the most
sustained peak as the sizing baseline.
With lower tier SQL Server implementations, where consolidation takes higher priority than
performance, an average can be considered for the sizing baseline.
When in doubt, start with the lower amount of resources and grow as necessary.
After the VM has been created, adjustments can be made to its resource allocation from the original base
line. Adjustments can be based on additional monitoring using a DMV-based tool, similar to monitoring a
physical SQL Server deployments. VMware vRealize Operations Manager can perform DMV-based
monitoring with ongoing capacity management and will alert if there is resource waste or contention
points.
Rightsizing and not over allocating resources is important for the following reasons:
Configuring a VM with more virtual CPUs than its workload can use might cause slightly increased
resource usage, potentially impacting performance on heavily loaded systems. Common examples of
this include a single-threaded workload running in a multiple-vCPU VM, or a multithreaded workload
in a virtual machine with more vCPUs than the workload can effectively use. Even if the guest
operating system does not use some of its vCPUs, configuring VMs with those vCPUs still imposes
some small resource requirements on ESXi that translate to real CPU consumption on the host.
Over-allocating memory also unnecessarily increases the VM memory overhead. While ESXi can
typically reclaim the over-allocated memory, it cannot reclaim the overhead associated with this over-
allocated memory, thus consuming memory that could otherwise be used to support more VMs.
Be careful when measuring the amount of memory consumed by a SQL Server VM with the VMware
Active Memory counter. Applications that contain their own memory management, such as SQL
Server, can skew this counter. Consult with the database administrator to confirm memory
consumption rates before adjusting the memory allocated to a SQL Server VM.
Having more vCPUs assigned for the virtual SQL Server also has licensing implications in certain
scenarios, such as per-core licenses.
Adding resources to VMs (a click of a button) is much easier than adding resources to physical
machines.
For more information about sizing for performance, see Performance Best Practices for VMware vSphere
6.0 at http://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf.
terms of network access times, SQL Server is not typically considered a latency sensitive application.
However, given the adverse impact of incorrect power settings in a Windows Server operating system,
customers must pay special attention to power management. See Best Practices for Performance Tuning
of Latency-Sensitive Workloads in vSphere VMs (http://www.vmware.com/files/pdf/techpaper/VMW-
Tuning-Latency-Sensitive-Workloads.pdf).
Server hardware and operating systems are usually engineered to minimize power consumption for
economic reasons. Windows Server and the ESXi hypervisor both favor minimized power consumption
over performance. While previous versions of ESXi default to high performance power schemes,
vSphere 5.0 and later defaults to a balanced power scheme. For critical applications, such as SQL
Server, the default power scheme in vSphere 6.0 is not recommended.
There are three distinct areas of power management in an ESXI hypervisor virtual environment: server
hardware, hypervisor, and guest operating system. The following section provides power management
and power setting recommendations covering all of these areas.
High Performance The VMkernel detects certain power management features, but will not
use them unless the BIOS requests them for power capping or thermal
events. This is the recommended power policy for an SQL Server
running on ESXi.
Balanced (default) The VMkernel uses the available power management features
conservatively to reduce host energy consumption with minimal
compromise to performance.
Custom The VMkernel bases its power management policy on the values of
several advanced configuration parameters. You can set these
parameters in the VMware vSphere Web Client Advanced Settings
dialog box.
Not supported The host does not support any power management features, or power
management is not enabled in the BIOS.
VMware recommends setting the high performance ESXi host power policy for critical SQL Server VMs.
You select a policy for a host using the vSphere Web Client. If you do not select a policy, ESXi uses
Balanced by default.
Figure 3. Recommended ESXi Host Power Management Setting
When a CPU runs at lower frequency, it can also run at lower voltage, which saves power. This type of
power management is called dynamic voltage and frequency scaling (DVFS). ESXi attempts to adjust
CPU frequencies so that VM performance is not affected.
When a CPU is idle, ESXi can take advantage of deep halt states (known as C-states). The deeper the C-
state, the less power the CPU uses, but the longer it takes for the CPU to resume running. When a CPU
becomes idle, ESXi applies an algorithm to predict how long it will be in an idle state and chooses an
appropriate C state to enter. In power management policies that do not use deep C-states, ESXi uses
only the shallowest halt state (C1) for idle CPUs.
Microsoft recommends the high-performance power management policy for applications requiring stability
and performance. VMware supports this recommendation and encourages customers to incorporate it
into their SQL Server tuning and administration practice for virtualized deployment.
Figure 5. Recommended Windows Guest Power Scheme
3.3.3 Hyper-threading
Hyper-threading is an Intel technology that exposes two hardware contexts (threads) from a single
physical core, also referred to as logical CPUs. This is not the same as having twice the number of CPUs
or cores. By keeping the processor pipeline busier and allowing the hypervisor to have more CPU
scheduling opportunities, Hyper-threading generally improves the overall host throughput anywhere from
10 to 30 percent.
VMware recommends enabling Hyper-threading in the BIOS/UEFI so that ESXi can take advantage of
this technology. ESXi makes conscious CPU management decisions regarding mapping vCPUs to
physical cores, taking Hyper-threading into account. An example is a VM with four virtual CPUs. Each
vCPU will be mapped to a different physical core and not to two logical threads that are part of the same
physical core.
Hyper-threading can be controlled on a per VM basis in the hyper-threading Sharing section on the
Properties tab of a VM. This setting provides control of whether a VM should be scheduled to share a
physical core if Hyper-threading is enabled on the host.
Any This is the default setting. The vCPUs of this VM can freely share cores with other virtual CPUs of
this or other virtual machines. VMware recommends leaving this setting to allow the CPU scheduler the
maximum scheduling opportunities.
None The vCPUs of this VM have exclusive use of a processor whenever they are scheduled to the
core. Selecting None in effect disables Hyper-threading for your VM.
Internal This option is similar to None. vCPUs from this VM cannot share cores with vCPUs from other
VMs. They can share cores with the other vCPUs from the same VM.
See additional information about Hyper-threading on a vSphere host in VMware vSphere Resource
Management (https://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-
server-60-resource-management-guide.pdf ).
It is important to remember to account for the differences between a processor thread and a physical
CPU/core during capacity planning for your SQL Server deployment.
For wide SQL Server virtual machines, where the number of allocated vCPUs is greater than the number
of cores in the NUMA node, ESXi divides the CPU and memory of the VM into two or more virtual NUMA
nodes and places each vNUMA on a different physical NUMA node. The vNUMA topology is exposed to
the guest OS and SQL Server to take advantage of memory locality.
In the following example, there is a single VM with 12 vCPUs and 128 GB of RAM residing on the same
physical server that has 8 cores and 96 GB of RAM in each NUMA node, with a total of 16 CPU cores
and 192 GB RAM. The VM will be created as a wide VM with a vNUMA architecture that is exposed to the
underling guest server OS.
Note By default, vNUMA is enabled only for a VM with nine or more vCPUs.
Figure 7. An Example of a VM with vNUMA
For example, a VM can have 4 vCPUs (sockets) each with 4 vCores, or it can have 2 vCPUs each with 8
vCores. Both options result in the VM having 16 vCores that are mapped to 16 pCores or logical Hyper-
threads. This advanced setting was created to assist with licensing limitations for certain applications and
operating systems that limit the number of cores and sockets. This is very useful for virtualized SQL
Server deployments due to SQL Servers upper limits on the number of allowed CPU sockets and cores.
The limits are different for the versions and editions of the software as detailed in the Microsoft article
Compute Capacity Limits by Edition of SQL Server https://msdn.microsoft.com/en-
us/library/ms143760(v=sql.130).aspx .
In the preceding example, if this VM is a SQL Server 2014 Standard Edition, it is limited to the lesser 4
sockets or 16 cores. Anything more than 4 sockets and 16 cores is not recognized by SQL Server. To
maximize the compute allocation to the VM, either 4 vSockets and 4 vCores each (4x4), 2 vSockets and 8
vCores each (2x8), or 1 vSocket and 16 vCores (1x16) assigns access to 16 pCores to the VM.
Changing the number of vCores per vSockets has implications on the vNUMA topology and must be done
with care. In the following examples, it is assumed that there is a physical server with 2 physical CPU
sockets and 12 cores in each physical NUMA node:
When the VM running SQL Server is assigned fewer CPU cores than the number cores in the
physical NUMA node, any configuration of cores per socket is acceptable because even though
vNUMA will be enabled with more than 9 vCPUs by default only one vNUMA node will be configured
for the VM. In the preceding example of a 2x12 physical server, both a 2x4 or a 1x8 VM
configurations will not affect vNUMA because both are lower than the physical cores per physical
NUMA node (8<12). Generally, when the VM CPU count is lower than the physical NUMA node size,
try to use the fewest number of vSockets possible.
For configuration with more vCPUs than physical cores on the physical NUMA, such as 16 vCPU VM
on a 2x12 host, always assign a number of vCPUs that can be divided evenly between physical
NUMA nodes. Do not assign an odd number of vCPUs as that can result in a sub-optimal
configuration. Also, the following considerations needs to be taken into account:
o Prior to vSphere 6.5, the cores per socket configuration directly affected the vNUMA topology of
the VM. Because of that, you must be aware of the underlying physical NUMA topology when
configuring the cores per socket. For example, assuming a requirement of 16 CPUs for a SQL
Server 2014 Standard VM, it will not be able to take advantage of all the vCPUs assigned to it if it
is configured with 1 core per socket (16 vSockets > limit of 4 sockets). In the example of a
physical server with 2x12, a VM with either 4x4 or 2x8 configuration is acceptable because it
allows ESXi to place each of the vNUMA nodes within a physical NUMA node.
o Starting with vSphere 6.5 the number of cores per socket does not affect the vNUMA
configuration by default. That means that any cores per sockets configuration can be set, and
ESXi always tries to create the optimized vNUMA configuration in the backend.
A few things to note:
A VM that was upgraded from earlier versions of vSphere than 6.5 has the following advanced
setting:
numa.vcpu.followcorespersocket = 1 set
This setting forces the old vNUMA behavior, which respects the cores per socket for vNUMA
topology. This is done for backward compatibility. To make the VM to correspond to the new
behavior, change this setting to 0.
Memory size is not considered for vNUMA topology, only the number of CPUs, even if the amount of
RAM assigned to the VM is more than the physical memory in each physical NUMA node. A VM with
more memory than the physical NUMA node, but with fewer CPU cores than the physical NUMA
node, forces memory to be fetched remotely, degrading performance. If there is a need for more
memory than the physical NUMA, then it is best to force the hypervisor to create a vNUMA topology
by either assigning more cores than the physical NUMA configuration (and more than 9 by default), or
by lowering the advanced setting numa.vcpu.min from 9 to the number of CPUs assigned to the
virtual machine.
VM encryption is controlled on a per VM basis, and is implemented in the virtual vSCSI layer using and
IOFilter API. This framework is implemented entirely in user space, which allows the I/Os to be isolated
cleanly from the core architecture of the hypervisor.
VM encryption does not impose any specific hardware requirements, and using a processor that supports
the AES-NI instruction set speeds up the encryption/decryption operation.
Any encryption feature consumes CPU cycles, and any I/O filtering mechanism consumes at least
minimal I/O latency overhead.
The impact of such overheads largely depends on two aspects:
The efficiency of implementation of the feature/algorithm.
The capability of the underlying storage.
If the storage is slow (such as in a locally attached spinning drive), the overhead caused by I/O filtering is
minimal, and has little impact on the overall I/O latency and throughput. However, if the underlying
storage is very high performance, any overhead added by the filtering layers can have a non-trivial impact
on I/O latency and throughput. This impact can be minimized by using processors that support the AES-
NI instruction set.
For the latest performance study of VM encryption, see the following paper:
http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vm-encryption-
vsphere65-perf.pdf.
Note Setting memory reservations might limit vSphere vMotion. A VM can be migrated only if the target
ESXi host has unreserved memory equal to or greater than the size of the reservation.
When designing SQL Server for performance, the goal is to eliminate any chance of paging from
happening. Disable the ability for the hypervisor to reclaim memory from the guest OS by setting the
memory reservation of the VM to the size of the provisioned memory. The recommendation is to leave the
balloon driver installed for corner cases where it might be needed to prevent loss of service. As an
example of when the balloon driver might be needed, assume a vSphere cluster of 16 physical hosts that
is designed for a 2-host failure. In case of a power outage that causes a failure of 4 hosts, the cluster
might not have the required resources to power on the failed VMs. In that case, the balloon driver can
reclaim memory by forcing the guest operating systems to page, allowing the important database servers
to continue running in the least disruptive way to the business.
Note Ballooning is sometimes confused with Microsofts Hyper-V dynamic memory feature. The two
are not the same and Microsoft recommendations to disable dynamic memory for SQL Server
deployments do not apply for the VMware balloon driver.
must provide sufficient I/O throughput as well as storage capacity to accommodate the cumulative needs
of all VMs running on your ESXi hosts.
For information about best practices for SQL Server storage configuration, refer to Microsofts Storage
Top Ten Practices (http://technet.microsoft.com/en-us/library/cc966534.aspx). Follow these
recommendations along with the best practices in this guide.
The goal of vSphere Virtual Volumes is to provide a simpler operational model for managing VMs in
external storage while leveraging the rich set of capabilities available in storage arrays.
For more information about virtual volumes, see the Whats New: vSphere Virtual Volumes white paper at
https://www.vmware.com/files/pdf/products/virtualvolumes/VMware-Whats-New-vSphere-Virtual-
Volumes.pdf.
vSphere Virtual Volumes capabilities help with many of the challenges that large databases are facing:
Business critical virtualized databases need to meet strict SLAs for performance, and storage is
usually the slowest component compared to RAM and CPU and even network.
Database size is growing, while at the same time there is an increasing need to reduce backup
windows and the impact on system performance.
There is a regular need to clone and refresh databases from production to QA and other
environments. The size of the modern databases makes it harder to clone and refresh data from
production to other environments
Databases of different levels of criticality need different storage performance characteristics and
capabilities.
It is a challenge to back up multi-terabyte databases due to the restricted backup windows and the data
churn which itself can be quite large. It is not feasible to make full backups of these multi-terabyte
backups in the allotted backup windows.
Backup solutions, such as native SQL Server backup, provide a fine level granularity for database
backups but they are not always the fastest.
A VM snapshot containing the SQL Server backup file is ideal for solving this issue, but as indicated in
the Knowledge Base article, A snapshot removal can stop a virtual machine for a long time (1002836) at
http://kb.vmware.com/kb/1002836 , the brief stun moment of the VM can potentially cause performance
issues.
Storage based snapshots would be the fastest, but unfortunately storage snapshots are taken at the
datastore and LUN levels and not at the VM level. Therefore, there is no VMDK level granularity with
traditional storage level snapshots.
vSphere Virtual Volumes is an ideal solution that combines snapshot capabilities at the storage level with
the granularity of a VM level snapshot.
Figure 16. vSphere Virtual Volumes High Level Architecture
With vSphere Virtual Volumes, you can also set up different storage policies for different VMs. These
policies instantiate themselves on the physical storage system, enabling VM level granularity for
performance and other data services.
When virtualizing SQL Server on a SAN using vSphere Virtual Volumes as the underlying technology, the
best practices and guidelines remain the same as when using a VMFS datastore.
Make sure that the physical storage on which the VMs virtual disks reside can accommodate the
requirements of the SQL Server implementation with regard to RAID, I/O, latency, queue depth, and so
on, as detailed in the storage best practices in this document.
3.5.1.4. vSAN
vSAN is the VMware software-defined storage solution for hyper-converged infrastructure, a software-
driven architecture that delivers tightly integrated computing, networking, and shared storage from x86
servers. vSAN delivers high performance, highly resilient shared storage. vSAN provides enterprise-class
storage services for virtualized production environments along with predictable scalability and all-flash
performance at a fraction of the price of traditional, purpose-built storage arrays. Like vSphere, vSAN
provides users the flexibility and control to choose from a wide range of hardware options and easily
deploy and manage them for a variety of IT workloads and use cases.
Figure 17. VMware vSAN
vSAN can be configured as a hybrid or an all-flash storage. In a hybrid disk architecture, vSAN hybrid
leverages flash-based devices for performance and magnetic disks for capacity. In an all-flash vSAN
architecture, vSAN can use flash-based devices (PCIe SSD or SAS/SATA SSD) for both the write buffer
and persistent storage. Read cache is not available nor required in an all-flash architecture. vSAN is a
distributed object storage system that leverages the SPBM feature to deliver centrally managed,
application-centric storage services and capabilities. Administrators can specify storage attributes, such
as capacity, performance, and availability as a policy on a per-VMDK level. The policies dynamically self-
tune and load balance the system so that each VM has the appropriate level of resources.
vSAN 6.1 introduced the stretched cluster feature. vSAN stretched clusters provide customers with the
ability to deploy a single vSAN cluster across multiple data centers. vSAN stretched cluster is a specific
configuration implemented in environments where disaster or downtime avoidance is a key requirement.
vSAN stretched cluster builds on the foundation of fault domains. The fault domain feature introduced
rack awareness in vSAN 6.0. The feature allows customers to group multiple hosts into failure zones
across multiple server racks to ensure that replicas of VM objects are not provisioned on to the same
logical failure zones or server racks. vSAN stretched cluster requires three failure domains based on
three sites (two active/active sites and one witness site). The witness site is only utilized to host witness
virtual appliances that store witness objects and cluster metadata information and provide cluster quorum
services during failure events.
Figure 18. vSAN Stretched Cluster
When deploying VMs with SQL Server on a hybrid vSAN, consider the following:
Build vSAN nodes for your business requirements vSAN is a software solution. As such, customers
can design vSAN nodes from the ground up that are customized for their own specific needs. In this
case, it is imperative to use the appropriate hardware components that fit the business requirements.
Plan for capacity The use of multiple disk groups is strongly recommended to increase system
throughput and is best implemented in the initial stage.
Plan for performance It is important to have sufficient space in the caching tier to accommodate the
I/O access of the OLTP application. The general recommendation of the SSD as the caching tier for
each host is to be at least 10 percent of the total storage capacity. However, in cases where high
performance is required for mostly random I/O access patterns, VMware recommends that the SSD
size be at least two times that of the working set.
For the SQL Server mission critical user database, use the following recommendations to design the
SSD size:
o SSD size to cache active user database The I/O access pattern of the TPC-E like OLTP is
small (8 KB dominant), random, and read-intensive. To support the possible read-only workload
of the secondary and log hardening workload, VMware recommends having two times the size of
the primary and secondary database. For example, for a 100-GB user database, design 2 x 2 x
100 GB SSD size.
o Select appropriate SSD class to support designed IOPS For the read-intensive OLTP workload,
the supported IOPS of SSD depends on the class of SSD. A well-tuned TPC-E like workload can
have ten percent write ratio.
o The VMware Compatibility Guide at https://www.vmware.com/resources/compatibility/search.php
specifies the following designated flash. For optimal performance, VMware recommends using a
flash-device class that meets workload performance requirements:
Class A: 2,5005,000 writes per second
Class B: 5,00010,000 writes per second
Class C: 10,00020,000 writes per second
Class D: 20,00030,000 writes per second
Class E: 30,000+ writes per second
Plan for availability Design more than three hosts and additional capacity that enables the cluster to
automatically remediate in the event of a failure. For SQL Server mission-critical user databases,
enable AlwaysOn to put the database in the high availability state when the AlwaysOn in synchronous
mode. Setting FTT greater than 1 means more write copies to vSAN disks. Unless special data
protection is required, FTT=1 can satisfy most of the mission-critical SQL Server databases with
AlwaysOn enabled.
Set proper SPBM vSAN SPBM can set availability, capacity, and performance policies per VM:
Set object space reservation Set to 100 percent. The capacity is allocated up front from the vSAN
datastore.
Number of disk stripes per object The number of disk stripes per object is also referred to as stripe
width. It is the setting of vSAN policy to define the minimum number of capacity devices across which
replica of a storage objects is distributed. vSAN can create up to 12 stripes per object. Striping can
help performance if the VM is running an I/O intensive application such as an OLTP database. In the
design of a hybrid vSAN environment for a SQL Server OLTP workload, leveraging multiple SSDs
with more backed HDDs is more important than only increasing the stripe width. Consider the
following conditions:
o If more disk groups with more SSDs can be configured, setting a large stripe width number for a
virtual disk can spread the data files to multiple disk groups and improve the disk performance.
o A larger stripe width number can split a virtual disk larger than 255 GB into more disk
components. However, vSAN cannot guarantee that the increased disk components will be
distributed across multiple disk groups with each component stored on one HDD disk. If multiple
disk components of the same VMDK are on the same disk group, the increased number of
components are spread only on more backed HDDs and not SSDs for that virtual disk, which
means that increasing the stripe width might not improve performance unless there is a de-
staging performance issue.
Depending on the database size, VMware recommends having multiple VMDKs for one VM. Multiple
VMDKs spread database components across disk groups in a vSAN cluster.
In All Flash vSAN, for read-intensive OLTP databases, such as TPC-E-like databases, the most
space requirement comes from data including table and index, and the space requirement for
transaction log is often smaller versus data size. VMware recommends using separate vSAN policies
for the virtual disks for the data and transaction log of SQL Server. For data, VMware recommends
using RAID 5 to reduce space usage from 2x to 1.33x. The test of a TPC-E-like workload confirmed
that the RAID 5 achieves good disk performance. Regarding the virtual disks for transaction log,
VMware recommends using RAID 1.
VMware measured the performance impact on All-Flash vSAN with different stripe widths. In
summary, after leveraging multiple virtual disks for one database that essentially distributes data in
the cluster to better utilize resources, the TPC-E-like performance had no obvious improvement or
degradation with additional stripe width. VMware tested different stripe width (1 to 6, and 12) for a
200 GB database in All-Flash vSAN and found:
o The TPS, transaction time and response time were similar in all configurations.
o Virtual disk latency was less than 2 milliseconds in all test configurations.
VMware suggests setting stripe width as needed to split the disk object into multiple components to
distribute the object components to more disks in different disk groups. In some situations, you might
need this setting for large virtual disks.
Use Quality of Service for Database Restore Operations. vSAN 6.2 introduces a QoS feature that
sets a policy to limit the number of IOPS that an object can consume. The QoS feature was validated
in the sequential I/O-dominant database restore operations in this solution. Limiting the IOPS affects
the overall duration of concurrent database restore operations. Other applications on the same vSAN
that has performance contention with I/O-intensive operations (such as database maintenance), can
benefit from QoS.
For more information about the implementation of hybrid vSAN with SQL Server solution, see the
Microsoft SQL Server 2014 on VMware Virtual SAN 6.1 Hybrid white paper at
http://www.vmware.com/files/pdf/products/vsan/microsoft-sql-on-vrtual-san61-hybrid.pdf.
For more information about the implementation of All-Flash vSAN with SQL Server solution, see the
Microsoft SQL Server 2014 on VMware vSAN 6.2 All-Flash paper at
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vmware-microsoft-
sql-on-all-flash-virtual-san-6-2.pdf
Figure 19. Random Mixed (50% Read/50% Write) I/O Operations per Second (Higher is Better)
Figure 20. Sequential Read I/O Operations per Second (Higher is Better)
When deploying a Tier 1 mission-critical SQL Server, placing SQL Server binary, data, transaction log,
and tempdb files on separate storage devices allows for maximum flexibility, and can improve
performance. SQL Server accesses data and transaction log files with very different I/O patterns. While
data file access is mostly random, transaction log file access is sequential only. Traditional storage built
with spinning disk media requires repositioning of the disk head for random read and write access.
Therefore, sequential data is much more efficient than random data access. Separating files that have
different random access patterns, compared with sequential access patterns, helps to minimize disk head
movements, and thus optimizes storage performance.
The following guidelines can help to achieve best performance:
Place SQL Server data (system and user), transaction log, and backup files into separate VMDKs (if
not using RDMs). The SQL Server binaries are usually installed in the OS VMDK. Separating SQL
Server installation files from data and transaction logs also provides better flexibility for backup,
management, and troubleshooting.
For the most critical databases where performance requirements supersede all other requirements,
maintain 1:1 mapping between VMDKs and LUNs. This will provide better workload isolation and will
prevent any chance for storage contention on the datastore level. Of course, the underlying physical
disk configuration must accommodate for I/O and latency requirements as well. When manageability
is a concern, group VMDKs and SQL Server files with similar I/O characteristics on common LUNs
while making sure that the underling physical device can accommodate the aggregated I/O
requirements of all the VMDKs.
For underlying storage, where applicable, RAID 10 can provide the best performance and availability
for user data, transaction log files, and TempDB.
For lower-tier SQL Server workloads, consider the following:
Deploying multiple, lower-tier SQL Server systems on VMFS facilitates easier management and
administration of template cloning, snapshots, and storage consolidation.
Manage performance of VMFS. The aggregate IOPS demands of all VMs on the VMFS should not
exceed the IOPS capability the physical disks.
Use VMware vSphere Storage DRS for automatic load balancing between datastores to provide
space and avoid I/O bottlenecks as per pre-defined rules.
Note While increasing the default queue depth of a virtual SCSI controller can be beneficial to an SQL
Server-based VM, the configuration can also introduce unintended adverse effects in overall
performance if not done properly. VMware highly recommends that customers consult and work
with the appropriate storage vendors support personnel to evaluate the impact of such changes
and obtain recommendations or other adjustments that may be required to support the increase
in queue depth of a virtual SCSI controller.
Use multiple vSCSI adapters. Placing OS, data, and transaction logs onto a separate vSCSI adapter
optimizes I/O by distributing load across multiple target devices and allowing for more queues on the
operating system level.
Spread the I/O load across all PVSCSI adapters to help optimize the I/O from the guest. In cases where
there is a requirement for many data and transaction log disks, it will be beneficial to set the OS boot disk
to use PVSCSI as well. To do that, during the OS installation you can provide the PVSCSI adapter driver
to the OS installation.
For more information about the study, see the Best Practices for Running SQL Server on EMC XtremIO
document at http://www.emc.com/collateral/white-paper/h14583-wp-best-practice-sql-server-xtremio.pdf.
When designing SQL server on all-flash array, there are considerations for storage and file layout which
differ from traditional storage systems. This section refers to two aspects of the all-flash storage design:
RAID configuration
Separation of SQL Server files
disks, LUNs and even physical disk groups at the array level as possible. The main rationale for this
historical recommendation is the need to make the various I/O types parallel to reduce latencies, enhance
responsiveness, and enable easier management, troubleshooting, and fault isolation.
All-flash storage arrays introduce a different dimension to this recommendation. All-flash arrays utilize
solid state disks (SSDs) which typically have no moving parts and, consequently, do not experience the
performance inefficiencies historically associated with legacy disk subsystems. The inherent optimized
data storage and retrieval algorithm of modern SSD-backed arrays makes the physical location of a given
block of data on the physical storage device of less concern than on traditional storage arrays. Allocating
different LUNs or disk groups for SQL Server data, transaction log, TempDB files on an all-flash array
does not result in any significant performance difference on these modern arrays.
Nevertheless, VMware recommends that, unless explicitly discouraged by corporate mandates,
customers should separate the virtual disks for the TempDB volumes allocated to a high-transaction SQL
Server virtual machine on vSphere, even when using an all-flash storage array. The TempDB is a global
resource that is shared by all databases within an SQL Server instance. It is a temporary work space that
is recreated each time an SQL Server instance starts. Separating the TempDB disks from other disk
types (data or logs) allows customers to apply data services (for example, replication, disaster recovery
and snapshots) to the database and transaction logs volumes without including the TempDB files which
are not required in such use cases.
Additional considerations for optimally designing the storage layout for a mission-critical SQL server on
an all-flash array vary among storage vendors. VMware recommends that customers consult their array
vendors for the best guidance when making their disk placement decisions.
As shown in the figure, the following components make up the virtual network:
Physical switch vSphere host-facing edge of the physical local area network.
NIC team Group of NICs connected to the same physical/logical networks to provide redundancy
and aggregated bandwidth.
Physical network interface (pnic/vmnic/uplink) Provides connectivity between the ESXi host and the
local area network.
vSphere switch (standard and distributed) The virtual switch is created in software and provides
connectivity between VMs. Virtual switches must uplink to a physical NIC (also known as vmnic) to
provide VMs with connectivity to the LAN. Otherwise, virtual machine traffic is contained within
the VM.
Port group Used to create a logical boundary within a virtual switch. This boundary can provide
VLAN segmentation when 802.1q trunking is passed from the physical switch, or it can create a
boundary for policy settings.
Virtual NIC (vNIC) Provides connectivity between the VM and the virtual switch.
VMkernel (vmknic) Interface for hypervisor functions, such as connectivity for NFS, iSCSI, vSphere
vMotion, and vSphere Fault Tolerance logging.
Virtual port Provides connectivity between a vmknic and a virtual switch.
While this is beneficial to guarantee the vMotion operation to complete, the performance degradation
during the vMotion operation might not be an acceptable risk for some workloads. To get around this and
reduce the risk of SDPS activating, you can utilize multi-nic vMotion. With multi-nic vMotion, every
vMotion operation utilizes multiple port links, even a single VM vMotion operation. This speeds up
vMotion operation and reduces the risk for SDPS on large, memory intensive VMs.
For more information on how to set Multi-nic vMotion refer to the following kb article:
https://kb.vmware.com/kb/2007467
For more information about vMotion architecture and SDPS, see the VMware vSphere vMotion
Architecture, Performance and Best Practices in VMware vSphere 5 paper at
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vmotion-
performance-vsphere5.pdf
Figure 24. vMotion of a Large Intensive VM with SDPS Activated
o Enable RSS on the VMXNET network adapter driver. In Windows in Network adapters, right-
click the VMXNET network adapter and click Properties. On the Advanced tab, enable the
setting Receive-side scaling.
should also be used in conjunction with the Max Server Memory setting to avoid SQL Server taking over
all memory on the VM.
For lower-tiered SQL Server workloads where performance is less critical, the ability to overcommit to
maximize usage of the available host memory might be more important. When deploying lower-tiered
SQL Server workloads, VMware recommends that you do not enable the Lock Pages in Memory user
right. Lock Pages in Memory causes conflicts with vSphere balloon driver. For lower tier SQL Server
workloads, it is better to have balloon driver manage the memory dynamically for the VM containing that
instance. Having balloon driver dynamically manage vSphere memory can help maximize memory usage
and increase consolidation ratio.
correct large page memory is granted by checking messages in the SQL Server ERRORLOG. See the
following example:
2009-06-04 14:20:40.03 Server Using large pages for buffer pool.
2009-06-04 14:27:56.98 Server 8192 MB of large page memory allocated.
Refer to SQL Server and Large Pages Explained (http://blogs.msdn.com/b/psssql/archive/2009/06/05/sql-
server-and-large-pages-explained.aspx) for additional information on running SQL Server with large
pages.
VMware NSX 6.2 adds support for listening on multiple port ranges. A VIP can be associated with
multiple ports or port ranges, thereby improving the scalability and reducing the number of edges that
need to be deployed.
NSX Edge supports both Layer 7 (the recommended load-balancing option without session affinity
requirements in Exchange Server 2016) and Layer 4 load balancing of HTTP and HTTPS protocols. It
supports multiple load balancing methods, such as round-robin and least connection. Layer 7
HTTP/HTTPS VIP addresses are processed after passing the NSX Edge firewall. NSX Edge uses the
faster Layer 4 load balancer engine. The Layer 4 VIP address is processed before passing the NSX Edge
firewall.
The NSX Edge services gateway supports the following deployment models for load-balancer
functionality:
One-armed load balancer
Inline load balancer
To monitor the SQL server application and ingest data from the MS SQL Server database into vRealize
Operations Manager dashboards, there are two options:
Utilize the EPO management pack for MS SQL Server provided by VMware This management pack
is included with vRealize Operations Enterprise and can be implemented by the customer or
VMware services. The EPO management pack is collecting information from SQL Server
deployments using an agent and does not include capacity management information.
BlueMedora management pack for SQL Server While this solution incurs additional cost, it provides
added value with agentless integration and includes capacity information from which you can build
what if scenario analysis for SQL Server.
For more information about Blue Medora management packs, see http://www.bluemedora.com/wp-
content/uploads/2015/06/vROps-MP-MSSQL-Server-June-2015.pdf.
6. Acknowledgments
Author: Niran Even-Chen, Staff Solutions Architect Microsoft Applications.