Performance Tuning Windows Server 2016
Performance Tuning Windows Server 2016
When you run a server system in your organization, you might have business needs not met using default server
settings. For example, you might need the lowest possible energy consumption, or the lowest possible latency, or
the maximum possible throughput on your server. This guide provides a set of guidelines that you can use to tune
the server settings in Windows Server 2016 and obtain incremental performance or energy efficiency gains,
especially when the nature of the workload varies little over time.
It is important that your tuning changes consider the hardware, the workload, the power budgets, and the
performance goals of your server. This guide describes each setting and its potential effect to help you make an
informed decision about its relevance to your system, workload, performance, and energy usage goals.
WARNING
Registry settings and tuning parameters changed significantly between versions of Windows Server. Be sure to use the latest
tuning guidelines to avoid unexpected results.
In this guide
This guide organizes performance and tuning guidance for Windows Server 2016 across three tuning categories:
Hardware performance considerations Active Directory Servers Cache and memory management
Web Servers
The following section lists important items that you should consider when you choose server hardware. Following
these guidelines can help remove performance bottlenecks that might impede the servers performance.
Processor Recommendations
Choose 64-bit processors for servers. 64-bit processors have significantly more address space, and are required for
Windows Server 2016. No 32-bit editions of the operating system will be provided, but 32-bit applications will run
on the 64-bit Windows Server 2016 operating system.
To increase the computing resources in a server, you can use a processor with higher-frequency cores, or you can
increase the number of processor cores. If CPU is the limiting resource in the system, a core with 2x frequency
typically provides a greater performance improvement than two cores with 1x frequency.
Multiple cores are not expected to provide a perfect linear scaling, and the scaling factor can be even less if hyper-
threading is enabled because hyper-threading relies on sharing resources of the same physical core.
IMPORTANT
Match and scale the memory and I/O subsystem with the CPU performance, and vice versa.
Do not compare CPU frequencies across manufacturers and generations of processors because the comparison can
be a misleading indicator of speed.
For Hyper-V, make sure that the processor supports SLAT (Second Level Address Translation). It is implemented as
Extended Page Tables (EPT) by Intel and Nested Page Tables (NPT) by AMD. You can verify this feature is present by
using SystemInfo.exe on your server.
Cache Recommendations
Choose large L2 or L3 processor caches. On newer architectures, such as Haswell or Skylake, there is a unified Last
Level Cache (LLC) or an L4. The larger caches generally provide better performance, and they often play a bigger
role than raw CPU frequency.
Disk Recommendations
Choose disks with higher rotational speeds to reduce random request service times (~2 ms on average when you
compare 7,200- and 15,000-RPM drives) and to increase sequential request bandwidth. However, there are cost,
power, and other considerations associated with disks that have high rotational speeds.
2.5-inch enterprise-class disks can service a significantly larger number of random requests per second compared
to equivalent 3.5-inch drives.
Store frequently accessed data, especially sequentially accessed data, near the beginning of a disk because this
roughly corresponds to the outermost (fastest) tracks.
Consolidating small drives into fewer high-capacity drives can reduce overall storage performance. Fewer spindles
mean reduced request service concurrency; and therefore, potentially lower throughput and longer response times
(depending on the workload intensity).
The use of SSD and high speed flash disks is useful for read mostly disks with high I/O rates or latency sensitive I/O.
Boot disks are good candidates for the use of SSD or high speed flash disks as they can improve boot times
significantly.
NVMe SSDs offer superior performance with greater command queue depths, more efficient interrupt processing,
and greater efficiency for 4KB commands. This particularly benefits scenarios that requires heavy simultaneous I/O.
See Also
Server Hardware Power Considerations
Power and Performance Tuning
Processor Power Management Tuning
Recommended Balanced Plan Parameters
Server Hardware Performance Considerations
4/24/2017 5 min to read Edit Online
The following section lists important items that you should consider when you choose server hardware. Following
these guidelines can help remove performance bottlenecks that might impede the servers performance.
Processor Recommendations
Choose 64-bit processors for servers. 64-bit processors have significantly more address space, and are required
for Windows Server 2016. No 32-bit editions of the operating system will be provided, but 32-bit applications will
run on the 64-bit Windows Server 2016 operating system.
To increase the computing resources in a server, you can use a processor with higher-frequency cores, or you can
increase the number of processor cores. If CPU is the limiting resource in the system, a core with 2x frequency
typically provides a greater performance improvement than two cores with 1x frequency.
Multiple cores are not expected to provide a perfect linear scaling, and the scaling factor can be even less if hyper-
threading is enabled because hyper-threading relies on sharing resources of the same physical core.
IMPORTANT
Match and scale the memory and I/O subsystem with the CPU performance, and vice versa.
Do not compare CPU frequencies across manufacturers and generations of processors because the comparison
can be a misleading indicator of speed.
For Hyper-V, make sure that the processor supports SLAT (Second Level Address Translation). It is implemented as
Extended Page Tables (EPT) by Intel and Nested Page Tables (NPT) by AMD. You can verify this feature is present
by using SystemInfo.exe on your server.
Cache Recommendations
Choose large L2 or L3 processor caches. On newer architectures, such as Haswell or Skylake, there is a unified Last
Level Cache (LLC) or an L4. The larger caches generally provide better performance, and they often play a bigger
role than raw CPU frequency.
Disk Recommendations
Choose disks with higher rotational speeds to reduce random request service times (~2 ms on average when you
compare 7,200- and 15,000-RPM drives) and to increase sequential request bandwidth. However, there are cost,
power, and other considerations associated with disks that have high rotational speeds.
2.5-inch enterprise-class disks can service a significantly larger number of random requests per second compared
to equivalent 3.5-inch drives.
Store frequently accessed data, especially sequentially accessed data, near the beginning of a disk because this
roughly corresponds to the outermost (fastest) tracks.
Consolidating small drives into fewer high-capacity drives can reduce overall storage performance. Fewer spindles
mean reduced request service concurrency; and therefore, potentially lower throughput and longer response
times (depending on the workload intensity).
The use of SSD and high speed flash disks is useful for read mostly disks with high I/O rates or latency sensitive
I/O. Boot disks are good candidates for the use of SSD or high speed flash disks as they can improve boot times
significantly.
NVMe SSDs offer superior performance with greater command queue depths, more efficient interrupt processing,
and greater efficiency for 4KB commands. This particularly benefits scenarios that requires heavy simultaneous
I/O.
See Also
Server Hardware Power Considerations
Power and Performance Tuning
Processor Power Management Tuning
Recommended Balanced Plan Parameters
Server Hardware Power Considerations
4/24/2017 2 min to read Edit Online
It is important to recognize the increasing importance of energy efficiency in enterprise and data center
environments. High performance and low-energy usage are often conflicting goals, but by carefully selecting
server components, you can achieve the correct balance between them. The following sections lists guidelines for
power characteristics and capabilities of server hardware components.
Processor Recommendations
Frequency, operating voltage, cache size, and process technology affect the energy consumption of processors.
Processors have a thermal design point (TDP) rating that gives a basic indication of energy consumption relative
to other models.
In general, opt for the lowest TDP processor that will meet your performance goals. Also, newer generations of
processors are generally more energy efficient, and they may expose more power states for the Windows power
management algorithms, which enables better power management at all levels of performance. Or they may use
some of the new cooperative power management techniques that Microsoft has developed in partnership with
hardware manufacturers.
For more info on cooperative power management techniques, see the section named Collaborative Processor
Performance Control in the Advanced Configuration and Power Interface Specification.
Memory Recommendations
Memory accounts for an increasing fraction of the total system power. Many factors affect the energy
consumption of a memory DIMM, such as memory technology, error correction code (ECC), bus frequency,
capacity, density, and number of ranks. Therefore, it is best to compare expected power ratings before purchasing
large quantities of memory.
Low-power memory is now available, but you must consider the performance and cost trade-offs. If your server
will be paging, you should also factor in the energy cost of the paging disks.
Disks Recommendations
Higher RPM means increased energy consumption. SSD drives are more power efficient than rotational drives.
Also, 2.5-inch drives generally require less power than 3.5-inch drives.
Fan Recommendations
Fans, like power supplies, are an area where you can reduce energy consumption without affecting system
performance. Variable-speed fans can reduce RPM as the system load decreases, eliminating otherwise
unnecessary energy consumption.
Processor terminology
The processor terminology used throughout this topic reflects the hierarchy of components available in the
following figure. Terms used from largest to smallest granularity of components are the following:
Processor socket
NUMA node
Core
Logical processor
See Also
Server Hardware Performance Considerations
Power and Performance Tuning
Processor Power Management Tuning
Recommended Balanced Plan Parameters
Power and performance tuning
4/24/2017 13 min to read Edit Online
Energy efficiency is increasingly important in enterprise and data center environments, and it adds another set of
tradeoffs to the mix of configuration options.
Windows Server 2016 is optimized for excellent energy efficiency with minimum performance impact across a
wide range of customer workloads. Processor Power Management (PPM) Tuning for the Windows Server Balanced
Power Plan describes the workloads used for tuning the default parameters in Windows Server 2016, and
provides suggestions for customized tunings.
This section expands on energy-efficiency tradeoffs to help you make informed decisions if you need to adjust the
default power settings on your server. However, the majority of server hardware and workloads should not
require administrator power tuning when running Windows Server 2016.
You can use this metric to set practical goals that respect the tradeoff between power and performance. In
contrast, a goal of 10 percent energy savings across the data center fails to capture the corresponding effects on
performance and vice versa.
Similarly, if you tune your server to increase performance by 5 percent, and that results in 10 percent higher
energy consumption, the total result might or might not be acceptable for your business goals. The energy
efficiency metric allows for more informed decision making than power or performance metrics alone.
You can use load lines to evaluate and compare the performance and energy consumption of configurations at all
load points. In this particular example, it is easy to see what the best configuration is. However, there can easily be
scenarios where one configuration works best for heavy workloads and one works best for light workloads.
You need to thoroughly understand your workload requirements to choose an optimal configuration. Dont
assume that when you find a good configuration, it will always remain optimal. You should measure system
utilization and energy consumption on a regular basis and after changes in workloads, workload levels, or server
hardware.
IMPORTANT
To ensure an accurate analysis, make sure that all local apps are closed before you run PowerCfg.exe.
Shortened timer tick rates, drivers that lack power management support, and excessive CPU utilization are a few of
the behavioral issues that are detected by the powercfg /energy command. This tool provides a simple way to
identify and fix power management issues, potentially resulting in significant cost savings in a large datacenter.
For more info about PowerCfg.exe, see Using PowerCfg to Evaluate System Energy Efficiency.
High Performance Increases performance at Low latency apps and app Processors are always locked
the cost of high energy code that is sensitive to at the highest performance
consumption. Power and processor performance state (including turbo
thermal limitations, changes frequencies). All cores are
operating expenses, and unparked. Thermal output
reliability considerations may be significant.
apply.
Power Saver Limits performance to save Deployments with limited Caps processor frequency at
energy and reduce power budgets and thermal a percentage of maximum (if
operating cost. Not constraints supported), and enables
recommended without other energy-saving
thorough testing to make features.
sure performance is
adequate.
These power plans exist in Windows for alternating current (AC) and direct current (DC) powered systems, but we
will assume that servers are always using an AC power source.
For more info on power plans and power policy configurations, see Power Policy Configuration and Deployment
in Windows.
NOTE
Some server manufactures have their own power management options available through the BIOS settings. If the operating
system does not have control over the power management, changing the power plans in Windows will not affect system
power and performance.
For Intel Nehalem and AMD processors, Turbo is disabled by default on P-state-based platforms. However, if a
system supports Collaborative Processor Performance Control (CPPC), which is a new alternative mode of
performance communication between the operating system and the hardware (defined in ACPI 5.0), Turbo may be
engaged if the Windows operating system dynamically requests the hardware to deliver the highest possible
performance levels.
To enable or disable the Turbo Boost feature, the Processor Performance Boost Mode parameter must be
configured by the administrator or by the default parameter settings for the chosen power plan. Processor
Performance Boost Mode has five allowable values, as shown in Table 5.
For P-state-based control, the choices are Disabled, Enabled (Turbo is available to the hardware whenever nominal
performance is requested), and Efficient (Turbo is available only if the EPB register is implemented).
For CPPC-based control, the choices are Disabled, Efficient Enabled (Windows specifies the exact amount of Turbo
to provide), and Aggressive (Windows asks for maximum performance to enable Turbo).
In Windows Server 2016, the default value for Boost Mode is 3.
The following commands enable Processor Performance Boost Mode on the current power plan (specify the policy
by using a GUID alias):
IMPORTANT
You must run the powercfg -setactive command to enable the new settings. You do not need to reboot the server.
To set this value for power plans other than the currently selected plan, you can use aliases such as SCHEME_MAX
(Power Saver), SCHEME_MIN (High Performance), and SCHEME_BALANCED (Balanced) in place of
SCHEME_CURRENT. Replace scheme current in the powercfg -setactive commands previously shown with the
desired alias to enable that power plan.
For example, to adjust the Boost Mode in the Power Saver plan and make that Power Saver is the current plan, run
the following commands:
Powercfg -setacvalueindex scheme_max sub_processor PERFBOOSTMODE 1
Powercfg -setactive scheme_max
If your server requires lower energy consumption, you might want to cap the processor performance state at a
percentage of maximum. For example, you can restrict the processor to 75 percent of its maximum frequency by
using the following commands:
NOTE
Capping processor performance at a percentage of maximum requires processor support. Check the processor
documentation to determine whether such support exists, or view the Performance Monitor counter % of maximum
frequency in the Processor group to see if any frequency caps were applied.
To reduce the number of schedulable cores to 50 percent of the maximum count, set the Processor Performance
Core Parking Maximum Cores parameter to 50 as follows:
See Also
Server Hardware Performance Considerations
Server Hardware Power Considerations
Processor Power Management Tuning
Recommended Balanced Plan Parameters
Processor Power Management (PPM) Tuning for the
Windows Server Balanced Power Plan
4/24/2017 9 min to read Edit Online
Starting with Windows Server 2008, Windows Server provides three power plans: Balanced, High Performance,
and Power Saver. The Balanced power plan is the default choice that aims to give the best energy efficiency for a
set of typical server workloads. This topic describes the workloads that have been used to determine the default
settings for the Balanced scheme for the past several releases of Windows.
If you run a server system that has dramatically different workload characteristics or performance and power
requirements than these workloads, you might want to consider tuning the default power settings (i.e., create a
custom power plan). One source of useful tuning information is the Server Hardware Power Considerations.
Alternately, you may decide that the High Performance power plan is the right choice for your environment,
recognizing that you will likely take a significant energy hit in exchange for some level of increased
responsiveness.
IMPORTANT
You should leverage the power policies that are included with Windows Server unless you have a specific need to create a
custom one and have a very good understanding that your results will vary depending on the characteristics of your
workload.
IMPORTANT
Even though the system can run at its peak load, we typically optimize for lower load levels, since servers that consistently
run at their peak load levels would be well-advised to use the High Performance power plan unless energy efficiency is a
high priority.
Metrics
All of the tested benchmarks use throughput as the performance metric. Response Time is considered as an SLA
requirement for these workloads (except for SAP, where it is a primary metric). For example, a benchmark run is
considered valid if the mean or maximum response time is less than certain value.
Therefore, the PPM tuning analysis also uses throughput as its performance metric. At the highest load level
(100% CPU utilization), our goal is that the throughput should not decrease more than a few percent due to power
management optimizations. But the primary consideration is to maximize the power efficiency (as defined below)
at medium and low load levels.
Running the CPU cores at lower frequencies reduces energy consumption. However, lower frequencies typically
decrease throughput and increase response time. For the Balanced power plan, there is an intentional tradeoff of
responsiveness and power efficiency. The SAP workload tests, as well as the response time SLAs on the other
workloads, make sure that the response time increase doesnt exceed certain threshold (5% as an example) for
these specific workloads.
NOTE
If the workload uses response time as the performance metric, the system should either switch to the High Performance
power plan or change Balanced power plan as suggested in Recommended Balanced Power Plan Parameters for Quick
Response Time.
Tuning results
Starting with Windows Server 2008, Microsoft worked with Intel and AMD to optimize the PPM parameters for
the most up-to-date server processors for each Windows release. A tremendous number of PPM parameter
combinations were tested on each of the previously-discussed workloads to find the best power efficiency at
different load levels. As software algorithms were refined and as hardware power architectures evolved, each new
Windows Server always had better or equal power efficiency than its previous versions across the range of tested
workloads.
The following figure gives an example of the power efficiency under different TPC-E load levels on a 4-socket
production server running Windows Server 2008 R2. It shows an 8% improvement at medium load levels
compared to Windows Server 2008.
See Also
Server Hardware Performance Considerations
Server Hardware Power Considerations
Power and Performance Tuning
Processor Power Management Tuning
Recommended Balanced Plan Parameters
Recommended Balanced Power Plan Parameters for
Workloads Requiring Quick Response Times
4/24/2017 3 min to read Edit Online
The default Balanced power plan uses throughput as the performance metric for tuning. During the steady
state, throughput does not change with varying utilizations till the system is totally overloaded (~100%
utilization). As a result, the Balanced power plan favors power quite a lot with minimizing processor frequency
and maximizing utilization.
However response time could exponentially increase with utilization increases. Nowadays, the requirement of
quick response time has dramatically increased. Even though Microsoft suggested the users to switch to the High
Performance power plan when they need quick response time, some users do not want to lose the power benefit
during light to medium load levels. Hence, Microsoft provides the following set of suggested parameter changes
for the workloads requiring quick response time.
To set the proposed values, the users can run the following commands in a window with administrator:
This change is based on the performance and power tradeoff analysis using the following workloads. For the
users who want to further fine tune the power efficiency with certain SLA requirements, please refer to Server
Hardware Performance Considerations.
NOTE
For additional recommendations and insight on leveraging power plans to tune virtualized workloads, read Hyper-v
Configuration
SPECpower JAVA workload
SPECpower_ssj2008, the most popular industry-standard SPEC benchmark for server power and performance
characteristics, is used to check the power impact. Since it only uses throughput as performance metric, the
default Balanced power plan provides the best power efficiency.
The proposed parameter change consumes slightly higher power at the light (i.e., <= 20%) load levels. But with
the higher load level, the difference increases, and it starts to consume same power as the High Performance
power plan after the 60% load level. To use the proposed change parameters, the users should be aware of the
power cost at medium to high load levels during their rack capacity planning.
GeekBench 3
GeekBench 3 is a cross-platform processor benchmark that separates the scores for single-core and multi-core
performance. It simulates a set of workloads including integer workloads (encryptions, compressions, image
processing, etc.), floating point workloads (modeling, fractal, image sharpening, image blurring, etc.) and memory
workloads (streaming).
Response time is a major measure in its score calculation. In our tested system, the default Balanced power plan
has ~18% regression in single-core tests and ~40% regression in multi-core tests compared to the High
Performance power plan. The proposed change removes these regressions.
DiskSpd
Diskspd is a command-line tool for storage benchmarking developed by Microsoft. It is widely used to generate a
variety of requests against storage systems for the storage performance analysis.
We set up a [Failover Cluster], and used Diskspd to generate random and sequential, and read and write IOs to the
local and remote storage systems with different IO sizes. Our tests show that the IO response time is very
sensitive to processor frequency under different power plans. The Balanced power plan could double the
response time of that from the High Performance power plan under certain workloads. The proposed change
removes most of the regressions.
IMPORTANT
Starting from Intel [Broadwell] processors running Windows Server 2016, most of the processor power management
decisions are made in the processor instead of OS level to achieve quicker adaption to the workload changes. The legacy
PPM parameters used by OS have minimal impact on the actual frequency decisions, except telling the processor if it should
favor power or performance, or capping the minimal and maximum frequencies. Hence, the proposed PPM parameter
change is only targeting to the pre-Broadwell systems.
See Also
Server Hardware Performance Considerations
Server Hardware Power Considerations
Power and Performance Tuning
Processor Power Management Tuning
Failover Cluster
Performance tuning Active Directory Servers
4/24/2017 1 min to read Edit Online
IMPORTANT
Proper configuration and sizing of Active Directory has a significant potential impact on overall system and workload
performance. Readers are highly encouraged to start by reading Capacity Planning for Active Directory Domain Services.
See also
Hardware considerations
LDAP considerations
Proper placement of domain controllers and site considerations
Troubleshooting ADDS performance
Capacity Planning for Active Directory Domain Services
Proper placement of domain controllers and site
considerations
4/24/2017 6 min to read Edit Online
Proper site definition is critical to performance. Clients falling out of site can experience poor performance for
authentications and queries. Furthermore, with the introduction of IPv6 on clients, the request can come from
either the IPv4 or the IPv6 address and Active Directory needs to have sites properly defined for IPv6. The
operating system prefers IPv6 to IPv4 when both are configured.
Starting in Windows Server 2008, the domain controller attempts to use name resolution to do a reverse lookup in
order to determine the site the client should be in. This can cause exhaustion of the ATQ Thread Pool and cause the
domain controller to become unresponsive. The appropriate resolution to this is to properly define the site
topology for IPv6. As a workaround, one can optimize the name resolution infrastructure to respond quickly to
domain controller requests. For more info see Windows Server 2008 or Windows Server 2008 R2 Domain
Controller delayed response to LDAP or Kerberos requests.
An additional area of consideration is locating Read/Write DCs for scenarios where RODCs are in use. Certain
operations require access to a writable Domain Controller or target a writable Domain Controller when a Read-
Only Domain Controller would suffice. Optimizing these scenarios would take two paths:
Contacting writable Domain Controllers when a Read-Only Domain Controller would suffice. This requires an
application code change.
Where a writable Domain Controller may be necessary. Place read-write Domain Controllers at central locations
to minimize latency.
For further information reference:
Application Compatibility with RODCs
Active Directory Service Interface (ADSI) and the Read Only Domain Controller (RODC) Avoiding performance
issues
See also
Performance tuning Active Directory Servers
Hardware considerations
LDAP considerations
Troubleshooting ADDS performance
Capacity Planning for Active Directory Domain Services
Hardware considerations in ADDS performance
tuning
4/24/2017 5 min to read Edit Online
IMPORTANT
The following is a summary of the key recommendations and considerations to optimize server hardware for Active Directory
workloads covered in greater depth in the Capacity Planning for Active Directory Domain Services article. Readers are highly
encouraged to review Capacity Planning for Active Directory Domain Services for a greater technical understanding and
implications of these recommendations.
See also
Performance tuning Active Directory Servers
LDAP considerations
Proper placement of domain controllers and site considerations
Troubleshooting ADDS performance
Capacity Planning for Active Directory Domain Services
LDAP considerations in ADDS performance tuning
4/24/2017 5 min to read Edit Online
IMPORTANT
The following is a summary of the key recommendations and considerations to optimize server hardware for Active Directory
workloads covered in greater depth in the Capacity Planning for Active Directory Domain Services article. Readers are highly
encouraged to review Capacity Planning for Active Directory Domain Services for a greater technical understanding and
implications of these recommendations.
See also
Performance tuning Active Directory Servers
Hardware considerations
Proper placement of domain controllers and site considerations
Troubleshooting ADDS performance
Capacity Planning for Active Directory Domain Services
Troubleshooting Active Directory Domain Services
performance
4/24/2017 1 min to read Edit Online
For additional information on ADDS performance troubleshooting, see Monitoring Your Branch Office
Environment.
See also
Performance tuning Active Directory Servers
Hardware considerations
LDAP considerations
Proper placement of domain controllers and site considerations
Capacity Planning for Active Directory Domain Services
Performance tuning for file servers
4/24/2017 6 min to read Edit Online
You should select the proper hardware to satisfy the expected file server load, considering average load, peak load,
capacity, growth plans, and response times. Hardware bottlenecks limit the effectiveness of software tuning.
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\ConnectionCountPerNetworkInterface
Applies to Windows 10, Windows 8.1, Windows 8, Windows Server 2016, Windows Server 2012 R2, and
Windows Server 2012
The default is 1, and we strongly recommend using the default. The valid range is 1-16. The maximum
number of connections per interface to be established with a server for non-RSS interfaces.
ConnectionCountPerRssNetworkInterface
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\ConnectionCountPerRssNetworkInterfac
e
Applies to Windows 10, Windows 8.1, Windows 8, Windows Server 2016, Windows Server 2012 R2, and
Windows Server 2012
The default is 4, and we strongly recommend using the default. The valid range is 1-16. The maximum
number of connections per interface to be established with a server for RSS interfaces.
ConnectionCountPerRdmaNetworkInterface
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\ConnectionCountPerRdmaNetworkInterfa
ce
Applies to Windows 10, Windows 8.1, Windows 8, Windows Server 2016, Windows Server 2012 R2, and
Windows Server 2012
The default is 2, and we strongly recommend using the default. The valid range is 1-16. The maximum
number of connections per interface to be established with a server for RDMA interfaces.
MaximumConnectionCountPerServer
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\MaximumConnectionCountPerServer
Applies to Windows 10, Windows 8.1, Windows 8, Windows Server 2016, Windows Server 2012 R2, and
Windows Server 2012
The default is 32, with a valid range from 1-64. The maximum number of connections to be established with
a single server running Windows Server 2012 across all interfaces.
DormantDirectoryTimeout
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DormantDirectoryTimeout
Applies to Windows 10, Windows 8.1, Windows 8, Windows Server 2016, Windows Server 2012 R2, and
Windows Server 2012
The default is 600 seconds. The maximum time server directory handles held open with directory leases.
FileInfoCacheLifetime
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\FileInfoCacheLifetime
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 10 seconds. The file information cache timeout period.
DirectoryCacheLifetime
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DirectoryCacheLifetime
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 10 seconds. This is the directory cache timeout.
Note
This parameter controls caching of directory metadata in the absence of directory leases.
DirectoryCacheEntrySizeMax
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DirectoryCacheEntrySizeMax
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 64 KB. This is the maximum size of directory cache entries.
FileNotFoundCacheLifetime
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\FileNotFoundCacheLifetime
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 5 seconds. The file not found cache timeout period.
CacheFileTimeout
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\CacheFileTimeout
Applies to Windows 8.1, Windows 8, Windows Server 2012, Windows Server 2012 R2, and Windows 7
The default is 10 seconds. This setting controls the length of time (in seconds) that the redirector will hold on
to cached data for a file after the last handle to the file is closed by an application.
DisableBandwidthThrottling
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DisableBandwidthThrottling
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 0. By default, the SMB redirector throttles throughput across high-latency network
connections, in some cases to avoid network-related timeouts. Setting this registry value to 1 disables this
throttling, enabling higher file transfer throughput over high-latency network connections.
DisableLargeMtu
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DisableLargeMtu
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 0 for Windows 8 only. In Windows 8, the SMB redirector transfers payloads as large as 1 MB
per request, which can improve file transfer speed. Setting this registry value to 1 limits the request size to
64 KB. You should evaluate the impact of this setting before applying it.
RequireSecuritySignature
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\RequireSecuritySignature
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 0, disabling SMB Signing. Changing this value to 1 enables SMB signing for all SMB
communication, preventing SMB communication with computers where SMB signing is disabled. SMB
signing can increase CPU cost and network round trips, but helps block man-in-the-middle attacks. If SMB
signing is not required, ensure that this registry value is 0 on all clients and servers.
For more info, see The Basics of SMB Signing.
FileInfoCacheEntriesMax
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\FileInfoCacheEntriesMax
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 64, with a valid range of 1 to 65536. This value is used to determine the amount of file
metadata that can be cached by the client. Increasing the value can reduce network traffic and increase
performance when a large number of files are accessed.
DirectoryCacheEntriesMax
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DirectoryCacheEntriesMax
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 16, with a valid range of 1 to 4096. This value is used to determine the amount of directory
information that can be cached by the client. Increasing the value can reduce network traffic and increase
performance when large directories are accessed.
FileNotFoundCacheEntriesMax
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\FileNotFoundCacheEntriesMax
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 128, with a valid range of 1 to 65536. This value is used to determine the amount of file name
information that can be cached by the client. Increasing the value can reduce network traffic and increase
performance when a large number of file names are accessed.
MaxCmds
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\MaxCmds
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 15. This parameter limits the number of outstanding requests on a session. Increasing the
value can use more memory, but it can improve performance by enabling a deeper request pipeline.
Increasing the value in conjunction with MaxMpxCt can also eliminate errors that are encountered due to
large numbers of outstanding long-term file requests, such as FindFirstChangeNotification calls. This
parameter does not affect connections with SMB 2.0 servers.
DormantFileLimit
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DormantFileLimit
Applies to Windows 10, Windows 8.1, Windows 8, Windows 7, Windows Vista, Windows Server 2016,
Windows Server 2012 R2, Windows Server 2012, Windows Server 2008 R2, and Windows Server 2008
The default is 1023. This parameter specifies the maximum number of files that should be left open on a
shared resource after the application has closed the file.
Client tuning example
The general tuning parameters for client computers can optimize a computer for accessing remote file shares,
particularly over some high-latency networks (such as branch offices, cross-datacenter communication, home
offices, and mobile broadband). The settings are not optimal or appropriate on all computers. You should evaluate
the impact of individual settings before applying them.
DisableBandwidthThrottling 1 0
FileInfoCacheEntriesMax 32768 64
DirectoryCacheEntriesMax 4096 16
PARAMETER VALUE DEFAULT
MaxCmds 32768 15
Starting in Windows 8, you can configure many of these SMB settings by using the Set-SmbClientConfiguration
and Set-SmbServerConfiguration Windows PowerShell cmdlets. Registry-only settings can be configured by
using Windows PowerShell as well.
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\Smb2CreditsMax
The defaults are 512 and 8192, respectively. These parameters allow the server to throttle client operation
concurrency dynamically within the specified boundaries. Some clients might achieve increased throughput
with higher concurrency limits, for example, copying files over high-bandwidth, high-latency links.
TIP
You can monitor SMB Client Shares\Credit Stalls /Sec to see if there are any issues with credits.
AdditionalCriticalWorkerThreads
HKLM\System\CurrentControlSet\Control\Session Manager\Executive\AdditionalCriticalWorkerThreads
The default is 0, which means that no additional critical kernel worker threads are added. This value affects
the number of threads that the file system cache uses for read-ahead and write-behind requests. Raising this
value can allow for more queued I/O in the storage subsystem, and it can improve I/O performance,
particularly on systems with many logical processors and powerful storage hardware.
TIP
The value may need to be increased if the amount of cache manager dirty data (performance counter Cache\Dirty
Pages) is growing to consume a large portion (over ~25%) of memory or if the system is doing lots of synchronous
read I/Os.
MaxThreadsPerQueue
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\MaxThreadsPerQueue
The default is 20. Increasing this value raises the number of threads that the file server can use to service
concurrent requests. When a large number of active connections need to be serviced, and hardware
resources, such as storage bandwidth, are sufficient, increasing the value can improve server scalability,
performance, and response times.
TIP
An indication that the value may need to be increased is if the SMB2 work queues are growing very large
(performance counter Server Work Queues\Queue Length\SMB2 NonBlocking *' is consistently above ~100).
AsynchronousCredits
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\MaxThreadsPerQueue
The default is 512. This parameter limits the number of concurrent asynchronous SMB commands that are
allowed on a single connection. Some cases (such as when there is a front-end server with a back-end IIS
server) require a large amount of concurrency (for file change notification requests, in particular). The value
of this entry can be increased to support these cases.
SMB server tuning example
The following settings can optimize a computer for file server performance in many cases. The settings are not
optimal or appropriate on all computers. You should evaluate the impact of individual settings before applying
them.
AdditionalCriticalWorkerThreads 64 0
MaxThreadsPerQueue 64 20
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\OptimalReads
The default is 0. This parameter determines whether files are opened for FILE_RANDOM_ACCESS or for
FILE_SEQUENTIAL_ONLY, depending on the workload I/O characteristics. Set this value to 1 to force files to
be opened for FILE_RANDOM_ACCESS. FILE_RANDOM_ACCESS prevents the file system and cache manager
from prefetching.
NOTE
This setting must be carefully evaluated because it may have potential impact on system file cache grow.
RdWrHandleLifeTime
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\RdWrHandleLifeTime
The default is 5. This parameter controls the lifetime of an NFS cache entry in the file handle cache. The
parameter refers to cache entries that have an associated open NTFS file handle. Actual lifetime is
approximately equal to RdWrHandleLifeTime multiplied by RdWrThreadSleepTime. The minimum is 1 and
the maximum is 60.
RdWrNfsHandleLifeTime
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\RdWrNfsHandleLifeTime
The default is 5. This parameter controls the lifetime of an NFS cache entry in the file handle cache. The
parameter refers to cache entries that do not have an associated open NTFS file handle. Services for NFS
uses these cache entries to store file attributes for a file without keeping an open handle with the file system.
Actual lifetime is approximately equal to RdWrNfsHandleLifeTime multiplied by RdWrThreadSleepTime. The
minimum is 1 and the maximum is 60.
RdWrNfsReadHandlesLifeTime
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\RdWrNfsReadHandlesLifeTime
The default is 5. This parameter controls the lifetime of an NFS read cache entry in the file handle cache.
Actual lifetime is approximately equal to RdWrNfsReadHandlesLifeTime multiplied by
RdWrThreadSleepTime. The minimum is 1 and the maximum is 60.
RdWrThreadSleepTime
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\RdWrThreadSleepTime
The default is 5. This parameter controls the wait interval before running the cleanup thread on the file
handle cache. The value is in ticks, and it is non-deterministic. A tick is equivalent to approximately 100
nanoseconds. The minimum is 1 and the maximum is 60.
FileHandleCacheSizeinMB
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\FileHandleCacheSizeinMB
The default is 4. This parameter specifies the maximum memory to be consumed by file handle cache entries.
The minimum is 1 and the maximum is 1*1024*1024*1024 (1073741824).
LockFileHandleCacheInMemory
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\LockFileHandleCacheInMemory
The default is 0. This parameter specifies whether the physical pages that are allocated for the cache size
specified by FileHandleCacheSizeInMB are locked in memory. Setting this value to 1 enables this activity.
Pages are locked in memory (not paged to disk), which improves the performance of resolving file handles,
but reduces the memory that is available to applications.
MaxIcbNfsReadHandlesCacheSize
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\MaxIcbNfsReadHandlesCacheSize
The default is 64. This parameter specifies the maximum number of handles per volume for the read data
cache. Read cache entries are created only on systems that have more than 1 GB of memory. The minimum
is 0 and the maximum is 0xFFFFFFFF.
HandleSigningEnabled
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\HandleSigningEnabled
The default is 1. This parameter controls whether handles that are given out by NFS File Server are signed
cryptographically. Setting it to 0 disables handle signing.
RdWrNfsDeferredWritesFlushDelay
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\RdWrNfsDeferredWritesFlushDelay
The default is 60. This parameter is a soft timeout that controls the duration of NFS V3 UNSTABLE Write data
caching. The minimum is 1, and the maximum is 600. Actual lifetime is approximately equal to
RdWrNfsDeferredWritesFlushDelay multiplied by RdWrThreadSleepTime.
CacheAddFromCreateAndMkDir
HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\CacheAddFromCreateAndMkDir
The default is 1 (enabled). This parameter controls whether handles that are opened during NFS V2 and V3
CREATE and MKDIR RPC procedure handlers are retained in the file handle cache. Set this value to 0 to
disable adding entries to the cache in CREATE and MKDIR code paths.
AdditionalDelayedWorkerThreads
HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Executive\AdditionalDelayedWorkerThreads
Increases the number of delayed worker threads that are created for the specified work queue. Delayed
worker threads process work items that are not considered time-critical and that can have their memory
stack paged out while waiting for work items. An insufficient number of threads reduces the rate at which
work items are serviced; a value that is too high consumes system resources unnecessarily.
NtfsDisable8dot3NameCreation
HKLM\System\CurrentControlSet\Control\FileSystem\NtfsDisable8dot3NameCreation
The default in Windows Server 2012 and Windows Server 2012 R2 is 2. In releases prior to Windows Server
2012, the default is 0. This parameter determines whether NTFS generates a short name in the 8dot3
(MSDOS) naming convention for long file names and for file names that contain characters from the
extended character set. If the value of this entry is 0, files can have two names: the name that the user
specifies and the short name that NTFS generates. If the user-specified name follows the 8dot3 naming
convention, NTFS does not generate a short name. A value of 2 means that this parameter can be configured
per volume.
NOTE
The system volume has 8dot3 enabled by default. All other volumes in Windows Server 2012 and Windows Server
2012 R2 have 8dot3 disabled by default. Changing this value does not change the contents of a file, but it avoids the
short-name attribute creation for the file, which also changes how NTFS displays and manages the file. For most file
servers, the recommended setting is 1 (disabled).
NtfsDisableLastAccessUpdate
HKLM\System\CurrentControlSet\Control\FileSystem\NtfsDisableLastAccessUpdate
The default is 1. This system-global switch reduces disk I/O load and latencies by disabling the updating of
the date and time stamp for the last file or directory access.
MaxConcurrentConnectionsPerIp
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Rpcxdr\Parameters\MaxConcurrentConnectionsPerIp
The default value of the MaxConcurrentConnectionsPerIp parameter is 16. You can increase this value up to
a maximum of 8192 to increase the number of connections per IP address.
Performance Tuning Hyper-V Servers
4/24/2017 1 min to read Edit Online
Hyper-V is the virtualization server role in Windows Server 2016. Virtualization servers can host multiple virtual
machines that are isolated from each other but share the underlying hardware resources by virtualizing the
processors, memory, and I/O devices. By consolidating servers onto a single machine, virtualization can improve
resource usage and energy efficiency and reduce the operational and maintenance costs of servers. In addition,
virtual machines and the management APIs offer more flexibility for managing resources, balancing load, and
provisioning systems.
See also
Hyper-V terminology
Hyper-V architecture
Hyper-V server configuration
Hyper-V processor performance
Hyper-V memory performance
Hyper-V storage I/O performance
Hyper-V network I/O performance
Detecting bottlenecks in a virtualized environment
Linux Virtual Machines
Hyper-V Terminology
4/24/2017 2 min to read Edit Online
This section summarizes key terminology specific to virtual machine technology that is used throughout this
performance tuning topic:
TERM DEFINITION
child partition Any virtual machine that is created by the root partition.
hypervisor A layer of software that sits above the hardware and below
one or more operating systems. Its primary job is to provide
isolated execution environments called partitions. Each
partition has its own set of virtualized hardware resources
(central processing unit or CPU, memory, and devices). The
hypervisor controls and arbitrates access to the underlying
hardware.
root partition The root partition that is created first and owns all the
resources that the hypervisor does not, including most
devices and system memory. The root partition hosts the
virtualization stack and creates and manages the child
partitions.
TERM DEFINITION
virtualization service client (VSC) A software module that a guest loads to consume a resource
or service. For I/O devices, the virtualization service client can
be a device driver that the operating system kernel loads.
virtualization service provider (VSP) A provider exposed by the virtualization stack in the root
partition that provides resources or services such as I/O to a
child partition.
See also
Hyper-V architecture
Hyper-V server configuration
Hyper-V processor performance
Hyper-V memory performance
Hyper-V storage I/O performance
Hyper-V network I/O performance
Detecting bottlenecks in a virtualized environment
Linux Virtual Machines
Hyper-V Architecture
4/24/2017 1 min to read Edit Online
Hyper-V features a Type 1 hypervisor-based architecture. The hypervisor virtualizes processors and memory and
provides mechanisms for the virtualization stack in the root partition to manage child partitions (virtual machines)
and expose services such as I/O devices to the virtual machines.
The root partition owns and has direct access to the physical I/O devices. The virtualization stack in the root
partition provides a memory manager for virtual machines, management APIs, and virtualized I/O devices. It also
implements emulated devices such as the integrated device electronics (IDE) disk controller and PS/2 input device
port, and it supports Hyper-V-specific synthetic devices for increased performance and reduced overhead.
The Hyper-V-specific I/O architecture consists of virtualization service providers (VSPs) in the root partition and
virtualization service clients (VSCs) in the child partition. Each service is exposed as a device over VMBus, which
acts as an I/O bus and enables high-performance communication between virtual machines that use mechanisms
such as shared memory. The guest operating systems Plug and Play manager enumerates these devices,
including VMBus, and loads the appropriate device drivers (virtual service clients). Services other than I/O are also
exposed through this architecture.
Starting with Windows Server 2008, the operating system features enlightenments to optimize its behavior when
it is running in virtual machines. The benefits include reducing the cost of memory virtualization, improving
multicore scalability, and decreasing the background CPU usage of the guest operating system.
The following sections suggest best practices that yield increased performance on servers running Hyper-V role.
See also
Hyper-V terminology
Hyper-V server configuration
Hyper-V processor performance
Hyper-V memory performance
Hyper-V storage I/O performance
Hyper-V network I/O performance
Detecting bottlenecks in a virtualized environment
Linux Virtual Machines
Hyper-V Configuration
4/24/2017 5 min to read Edit Online
Hardware selection
The hardware considerations for servers running Hyper-V generally resemble those of non-virtualized servers,
but servers running Hyper-V can exhibit increased CPU usage, consume more memory, and need larger I/O
bandwidth because of server consolidation.
Processors
Hyper-V in Windows Server 2016 presents the logical processors as one or more virtual processors to
each active virtual machine. Hyper-V now requires processors that support Second Level Address
Translation (SLAT) technologies such as Extended Page Tables (EPT) or Nested Page Tables (NPT).
Cache
Hyper-V can benefit from larger processor caches, especially for loads that have a large working set in
memory and in virtual machine configurations in which the ratio of virtual processors to logical processors
is high.
Memory
The physical server requires sufficient memory for the both the root and child partitions. The root partition
requires memory to efficiently perform I/Os on behalf of the virtual machines and operations such as a
virtual machine snapshot. Hyper-V ensures that sufficient memory is available to the root partition, and
allows remaining memory to be assigned to child partitions. Child partitions should be sized based on the
needs of the expected load for each virtual machine.
Storage
The storage hardware should have sufficient I/O bandwidth and capacity to meet the current and future
needs of the virtual machines that the physical server hosts. Consider these requirements when you select
storage controllers and disks and choose the RAID configuration. Placing virtual machines with highly disk-
intensive workloads on different physical disks will likely improve overall performance. For example, if four
virtual machines share a single disk and actively use it, each virtual machine can yield only 25 percent of
the bandwidth of that disk.
CPU statistics
Hyper-V publishes performance counters to help characterize the behavior of the virtualization server and report
the resource usage. The standard set of tools for viewing performance counters in Windows includes
Performance Monitor and Logman.exe, which can display and log the Hyper-V performance counters. The names
of the relevant counter objects are prefixed with Hyper-V.
You should always measure the CPU usage of the physical system by using the Hyper-V Hypervisor Logical
Processor performance counters. The CPU utilization counters that Task Manager and Performance Monitor
report in the root and child partitions do not reflect the actual physical CPU usage. Use the following performance
counters to monitor performance:
Hyper-V Hypervisor Logical Processor (*)\% Total Run Time The total non-idle time of the logical
processors
Hyper-V Hypervisor Logical Processor (*)\% Guest Run Time The time spent running cycles within a
guest or within the host
Hyper-V Hypervisor Logical Processor (*)\% Hypervisor Run Time The time spent running within the
hypervisor
Hyper-V Hypervisor Root Virtual Processor (*)\\* Measures the CPU usage of the root partition
Hyper-V Hypervisor Virtual Processor (*)\\* Measures the CPU usage of guest partitions
See also
Hyper-V terminology
Hyper-V architecture
Hyper-V processor performance
Hyper-V memory performance
Hyper-V storage I/O performance
Hyper-V network I/O performance
Detecting bottlenecks in a virtualized environment
Linux Virtual Machines
Hyper-V Processor Performance
4/24/2017 3 min to read Edit Online
Virtual processors
Hyper-V in Windows Server 2016 supports a maximum of 240 virtual processors per virtual machine. Virtual
machines that have loads that are not CPU intensive should be configured to use one virtual processor. This is
because of the additional overhead that is associated with multiple virtual processors, such as additional
synchronization costs in the guest operating system.
Increase the number of virtual processors if the virtual machine requires more than one CPU of processing under
peak load.
Background activity
Minimizing the background activity in idle virtual machines releases CPU cycles that can be used elsewhere by
other virtual machines. Windows guests typically use less than one percent of one CPU when they are idle. The
following are several best practices for minimizing the background CPU usage of a virtual machine:
Install the latest version of the Virtual Machine Integration Services.
Remove the emulated network adapter through the virtual machine settings dialog box (use the Microsoft
Hyper-V-specific adapter).
Remove unused devices such as the CD-ROM and COM port, or disconnect their media.
Keep the Windows guest operating system on the sign-in screen when it is not being used and disable the
screen saver.
Review the scheduled tasks and services that are enabled by default.
Review the ETW trace providers that are on by default by running logman.exe query -ets
Improve server applications to reduce periodic activity (such as timers).
Close Server Manager on both the host and guest operating systems.
Dont leave Hyper-V Manager running since it constantly refreshes the virtual machines thumbnail.
The following are additional best practices for configuring a client version of Windows in a virtual machine to
reduce the overall CPU usage:
Disable background services such as SuperFetch and Windows Search.
Disable scheduled tasks such as Scheduled Defrag.
Virtual NUMA
To enable virtualizing large scale-up workloads, Hyper-V in Windows Server 2016 expanded virtual machine scale
limits. A single virtual machine can be assigned up to 240 virtual processors and 12 TB of memory. When creating
such large virtual machines, memory from multiple NUMA nodes on the host system will likely be utilized. In such
virtual machine configuration, if virtual processors and memory are not allocated from the same NUMA node,
workloads may have bad performance due to the inability to take advantage of NUMA optimizations.
In Windows Server 2016, Hyper-V presents a virtual NUMA topology to virtual machines. By default, this virtual
NUMA topology is optimized to match the NUMA topology of the underlying host computer. Exposing a virtual
NUMA topology into a virtual machine allows the guest operating system and any NUMA-aware applications
running within it to take advantage of the NUMA performance optimizations, just as they would when running on
a physical computer.
There is no distinction between a virtual and a physical NUMA from the workloads perspective. Inside a virtual
machine, when a workload allocates local memory for data, and accesses that data in the same NUMA node, fast
local memory access results on the underlying physical system. Performance penalties due to remote memory
access are successfully avoided. Only NUMA-aware applications can benefit of vNUMA.
Microsoft SQL Server is an example of NUMA aware application. For more info, see Understanding Non-uniform
Memory Access.
Virtual NUMA and Dynamic Memory features cannot be used at the same time. A virtual machine that has
Dynamic Memory enabled effectively has only one virtual NUMA node, and no NUMA topology is presented to
the virtual machine regardless of the virtual NUMA settings.
For more info on Virtual NUMA, see Hyper-V Virtual NUMA Overview.
See also
Hyper-V terminology
Hyper-V architecture
Hyper-V server configuration
Hyper-V memory performance
Hyper-V storage I/O performance
Hyper-V network I/O performance
Detecting bottlenecks in a virtualized environment
Linux Virtual Machines
Hyper-V Memory Performance
4/24/2017 2 min to read Edit Online
The hypervisor virtualizes the guest physical memory to isolate virtual machines from each other and to provide a
contiguous, zero-based memory space for each guest operating system, just as on non-virtualized systems.
Memory Standby Cache Reserve Bytes Sum of Standby Cache Reserve Bytes and Free and Zero Page
List Bytes should be 200 MB or more on systems with 1 GB,
and 300 MB or more on systems with 2 GB or more of visible
RAM.
Memory Free & Zero Page List Bytes Sum of Standby Cache Reserve Bytes and Free and Zero Page
List Bytes should be 200 MB or more on systems with 1 GB,
and 300 MB or more on systems with 2 GB or more of visible
RAM.
Memory Pages Input/Sec Average over a 1-hour period is less than 10.
See also
Hyper-V terminology
Hyper-V architecture
Hyper-V server configuration
Hyper-V processor performance
Hyper-V storage I/O performance
Hyper-V network I/O performance
Detecting bottlenecks in a virtualized environment
Linux Virtual Machines
Hyper-V Storage I/O Performance
4/24/2017 17 min to read Edit Online
This section describes the different options and considerations for tuning storage I/O performance in a virtual
machine. The storage I/O path extends from the guest storage stack, through the host virtualization layer, to the
host storage stack, and then to the physical disk. Following are explanations about how optimizations are possible
at each of these stages.
Virtual controllers
Hyper-V offers three types of virtual controllers: IDE, SCSI, and Virtual host bus adapters (HBAs).
IDE
IDE controllers expose IDE disks to the virtual machine. The IDE controller is emulated, and it is the only controller
that is available for guest VMs running older version of Windows without the Virtual Machine Integration
Services. Disk I/O that is performed by using the IDE filter driver that is provided with the Virtual Machine
Integration Services is significantly better than the disk I/O performance that is provided with the emulated IDE
controller. We recommend that IDE disks be used only for the operating system disks because they have
performance limitations due to the maximum I/O size that can be issued to these devices.
Virtual disks
Disks can be exposed to the virtual machines through the virtual controllers. These disks could be virtual hard
disks that are file abstractions of a disk or a pass-through disk on the host.
VHD format
The VHD format was the only virtual hard disk format that was supported by Hyper-V in past releases. Introduced
in Windows Server 2012, the VHD format has been modified to allow better alignment, which results in
significantly better performance on new large sector disks.
Any new VHD that is created on a Windows Server 2012 or newer has the optimal 4 KB alignment. This aligned
format is completely compatible with previous Windows Server operating systems. However, the alignment
property will be broken for new allocations from parsers that are not 4 KB alignment-aware (such as a VHD
parser from a previous version of Windows Server or a non-Microsoft parser).
Any VHD that is moved from a previous release does not automatically get converted to this new improved VHD
format.
To convert to new VHD format, run the following Windows PowerShell command:
You can check the alignment property for all the VHDs on the system, and it should be converted to the optimal 4
KB alignment. You create a new VHD with the data from the original VHD by using the Create-from-Source
option.
To check for alignment by using Windows Powershell, examine the Alignment line, as shown below:
Path : E:\vms\testvhd\test.vhd
VhdFormat : VHD
VhdType : Dynamic
FileSize : 69245440
Size : 10737418240
MinimumSize : 10735321088
LogicalSectorSize : 512
PhysicalSectorSize : 512
BlockSize : 2097152
ParentPath :
FragmentationPercentage : 10
Alignment : 0
Attached : False
DiskNumber :
IsDeleted : False
Number :
To verify alignment by using Windows PowerShell, examine the Alignment line, as shown below:
Get-VHD Path E:\vms\testvhd\test-converted.vhd
Path : E:\vms\testvhd\test-converted.vhd
VhdFormat : VHD
VhdType : Dynamic
FileSize : 69369856
Size : 10737418240
MinimumSize : 10735321088
LogicalSectorSize : 512
PhysicalSectorSize : 512
BlockSize : 2097152
ParentPath :
FragmentationPercentage : 0
Alignment : 1
Attached : False
DiskNumber :
IsDeleted : False
Number :
VHDX format
VHDX is a new virtual hard disk format introduced in Windows Server 2012, which allows you to create resilient
high-performance virtual disks up to 64 terabytes. Benefits of this format include:
Support for virtual hard disk storage capacity of up to 64 terabytes.
Protection against data corruption during power failures by logging updates to the VHDX metadata
structures.
Ability to store custom metadata about a file, which a user might want to record, such as operating system
version or patches applied.
The VHDX format also provides the following performance benefits:
Improved alignment of the virtual hard disk format to work well on large sector disks.
Larger block sizes for dynamic and differential disks, which allows these disks to attune to the needs of the
workload.
4 KB logical sector virtual disk that allows for increased performance when used by applications and
workloads that are designed for 4 KB sectors.
Efficiency in representing data, which results in smaller file size and allows the underlying physical storage
device to reclaim unused space. (Trim requires pass-through or SCSI disks and trim-compatible hardware.)
When you upgrade to Windows Server 2016, we recommend that you convert all VHD files to the VHDX format
due to these benefits. The only scenario where it would make sense to keep the files in the VHD format is when a
virtual machine has the potential to be moved to a previous release of Hyper-V that does not support the VHDX
format.
Each 4 KB write command that is issued by the current parser to update the payload data results in two reads for
two blocks on the disk, which are then updated and subsequently written back to the two disk blocks. Hyper-V in
Windows Server 2016 mitigates some of the performance effects on 512e disks on the VHD stack by preparing
the previously mentioned structures for alignment to 4 KB boundaries in the VHD format. This avoids the RMW
effect when accessing the data within the virtual hard disk file and when updating the virtual hard disk metadata
structures.
As mentioned earlier, VHDs that are copied from previous versions of Windows Server will not automatically be
aligned to 4 KB. You can manually convert them to optimally align by using the Copy from Source disk option
that is available in the VHD interfaces.
By default, VHDs are exposed with a physical sector size of 512 bytes. This is done to ensure that physical sector
size dependent applications are not impacted when the application and VHDs are moved from a previous version
of Windows Server.
By default, disks with the VHDX format are created with the 4 KB physical sector size to optimize their
performance profile regular disks and large sector disks. To make full use of 4 KB sectors its recommended to
use VHDX format.
Pass-through disks
The VHD in a virtual machine can be mapped directly to a physical disk or logical unit number (LUN), instead of to
a VHD file. The benefit is that this configuration bypasses the NTFS file system in the root partition, which reduces
the CPU usage of storage I/O. The risk is that physical disks or LUNs can be more difficult to move between
machines than VHD files.
Pass-through disks should be avoided due to the limitations introduced with virtual machine migration scenarios.
Server 2016 contains several improvements and new functionality to optimize network performance under
Hyper-V. Documentation on how to optimize network performance will be included in a future version of this
article.
Live Migration
Live Migration lets you to transparently move running virtual machines from one node of a failover cluster to
another node in the same cluster without a dropped network connection or perceived downtime.
Note
Failover Clustering requires shared storage for the cluster nodes.
The process of moving a running virtual machine can be divided into two major phases. The first phase copies the
memory of the virtual machine from the current host to the new host. The second phase transfers the virtual
machine state from the current host to the new host. The durations of both phases is greatly determined by the
speed at which data can be transferred from the current host to the new host.
Providing a dedicated network for live migration traffic helps minimize the time that is required to complete a live
migration, and it ensures consistent migration times.
Additionally, increasing the number of send and receive buffers on each network adapter that is involved in the
migration can improve migration performance.
Windows Server 2012 R2 introduced an option to speed up Live Migration by compressing memory before
transferring over the network or use Remote Direct Memory Access (RDMA), if your hardware supports it.
See also
Hyper-V terminology
Hyper-V architecture
Hyper-V server configuration
Hyper-V processor performance
Hyper-V memory performance
Hyper-V storage I/O performance
Detecting bottlenecks in a virtualized environment
Linux Virtual Machines
Detecting bottlenecks in a virtualized environment
4/24/2017 3 min to read Edit Online
This section should give you some hints on what to monitor by using Performance Monitor and how to identify
where the problem might be when either the host or some of the virtual machines do not perform as you would
have expected.
Processor bottlenecks
Here are some common scenarios that could cause processor bottlenecks:
One or more logical processors are loaded
One or more virtual processors are loaded
You can use the following performance counters from the host:
Logical Processor Utilization - \Hyper-V Hypervisor Logical Processor(*)\% Total Run Time
Virtual Processor Utilization - \Hyper-V Hypervisor Virtual Processor(*)\% Total Run Time
Root Virtual Processor Utilization - \Hyper-V Hypervisor Root Virtual Processor(*)\% Total Run Time
If the Hyper-V Hypervisor Logical Processor(_Total)\% Total Runtime counter is over 90%, the host is
overloaded. You should add more processing power or move some virtual machines to a different host.
If the Hyper-V Hypervisor Virtual Processor(VM Name:VP x)\% Total Runtime counter is over 90% for all
virtual processors, you should do the following:
Verify that the host is not overloaded
Find out if the workload can leverage more virtual processors
Assign more virtual processors to the virtual machine
If Hyper-V Hypervisor Virtual Processor(VM Name:VP x)\% Total Runtime counter is over 90% for some,
but not all, of the virtual processors, you should do the following:
If your workload is receive network-intensive, you should consider using vRSS.
If the virtual machines are not running Windows Server 2012 R2, you should add more network adapters.
If your workload is storage-intensive, you should enable virtual NUMA and add more virtual disks.
If the Hyper-V Hypervisor Root Virtual Processor (Root VP x)\% Total Runtime counter is over 90% for
some, but not all, virtual processors and the Processor (x)\% Interrupt Time and Processor (x)\% DPC Time
counter approximately adds up to the value for the Root Virtual Processor(Root VP x)\% Total Runtime
counter, you should ensure enable VMQ on the network adapters.
Memory bottlenecks
Here are some common scenarios that could cause memory bottlenecks:
The host is not responsive.
Virtual machines cannot be started.
Virtual machines run out of memory.
You can use the following performance counters from the host:
Memory\Available Mbytes
Hyper-V Dynamic Memory Balancer (*)\Available Memory
You can use the following performance counters from the virtual machine:
Memory\Available Mbytes
If the Memory\Available Mbytes and Hyper-V Dynamic Memory Balancer (*)\Available Memory counters
are low on the host, you should stop non-essential services and migrate one or more virtual machines to another
host.
If the Memory\Available Mbytes counter is low in the virtual machine, you should assign more memory to the
virtual machine. If you are using Dynamic Memory, you should increase the maximum memory setting.
Network bottlenecks
Here are some common scenarios that could cause network bottlenecks:
The host is network bound.
The virtual machine is network bound.
You can use the following performance counters from the host:
Network Interface(network adapter name)\Bytes/sec
You can use the following performance counters from the virtual machine:
Hyper-V Virtual Network Adapter (virtual machine name name<GUID>)\Bytes/sec
If the Physical NIC Bytes/sec counter is greater than or equal to 90% of capacity, you should add additional
network adapters, migrate virtual machines to another host, and configure Network QoS.
If the Hyper-V Virtual Network Adapter Bytes/sec counter is greater than or equal to 250 MBps, you should
add additional teamed network adapters in the virtual machine, enable vRSS, and use SR-IOV.
If your workloads cant meet their network latency, enable SR-IOV to present physical network adapter resources
to the virtual machine.
Storage bottlenecks
Here are some common scenarios that could cause storage bottlenecks:
The host and virtual machine operations are slow or time out.
The virtual machine is sluggish.
You can use the following performance counters from the host:
Physical Disk(disk letter)\Avg. disk sec/Read
Physical Disk(disk letter)\Avg. disk sec/Write
Physical Disk(disk letter)\Avg. disk read queue length
Physical Disk(disk letter)\Avg. disk write queue length
If latencies are consistently greater than 50ms, you should do the following:
Spread virtual machines across additional storage
Consider purchasing faster storage
Consider Tiered Storage Spaces, which was introduced in Windows Server 2012 R2
Consider using Storage QoS, which was introduced in Windows Server 2012 R2
Use VHDX
See also
Hyper-V terminology
Hyper-V architecture
Hyper-V server configuration
Hyper-V processor performance
Hyper-V memory performance
Hyper-V storage I/O performance
Hyper-V network I/O performance
Linux Virtual Machines
Linux Virtual Machine Considerations
4/24/2017 2 min to read Edit Online
Linux and BSD virtual machines have additional considerations compared to Windows virtual machines in Hyper-
V.
The first consideration is whether Integration Services are present or if the VM is running merely on emulated
hardware with no enlightenment. A table of Linux and BSD releases that have built-in or downloadable
Integration Services is available in Supported Linux and FreeBSD virtual machines for Hyper-V on Windows.
These pages have grids of available Hyper-V features available to Linux distribution releases, and notes on those
features where applicable.
Even when the guest is running Integration Services, it can be configured with legacy hardware which does not
exhibit the best performance. For example, configure and use a virtual ethernet adapter for the guest instead of
using a legacy network adapter. With Windows Server 2016, advanced networking like SR-IOV are available as
well.
In the guest additional TCP tuning can be performed by increasing limits. For the best performance spreading
workload over multiple CPUs and having deep workloads produces the best throughput, as virtualized workloads
will have higher latency than "bare metal" ones.
Some example tuning paramters that have been useful in network benchmarks include:
net.core.netdev_max_backlog = 30000
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.ipv4.tcp_wmem = 4096 12582912 33554432
net.ipv4.tcp_rmem = 4096 12582912 33554432
net.ipv4.tcp_max_syn_backlog = 80960
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_abort_on_overflow = 1
A useful tool for network microbenchmarks is ntttcp, which is available on both Linux and Windows. The Linux
version is open source and available from ntttcp-for-linux on github.com. The Windows version can be found in
the download center. When tuning workloads it is best to use as many streams as necessary to get the best
throughput. Using ntttcp to model traffic, the -P parameter sets the number of parallel connections used.
Linux Storage Performance
Some best practices, like the following, are listed on Best Practices for Running Linux on Hyper-V. The Linux
kernel has different I/O schedulers to reorder requests with different algorithms. NOOP is a first-in first-out
queue that passes the schedule decision to be made by the hypervisor. It is recommended to use NOOP as the
scheduler when running Linux virtual machine on Hyper-V. To change the scheduler for a specific device, in the
boot loader's configuration (/etc/grub.conf, for example), add elevator=noop to the kernel parameters, and then
restart.
Similar to networking, Linux guest performance with storage benefits the most from multiple queues with
enough depth to keep the host busy. Microbenchmarking storage performance is probably best with the fio
benchmark tool with the libaio engine.
See also
Hyper-V terminology
Hyper-V architecture
Hyper-V server configuration
Hyper-V processor performance
Hyper-V memory performance
Hyper-V storage I/O performance
Hyper-V network I/O performance
Detecting bottlenecks in a virtualized environment
Performance tuning Windows Server containers
5/4/2017 5 min to read Edit Online
Introduction
Windows Server 2016 is the first version of Windows to ship support for container technology built in to the OS. In
Server 2016, two types of containers are available: Windows Server Containers and Hyper-V Containers. Each
container type supports either the Server Core or Nano Server SKU of Windows Server 2016.
These configurations have different performance implications which we detail below to help you understand which
is right for your scenarios. In addition, we detail performance impacting configurations, and describe the tradeoffs
with each of those options.
Windows Server Container and Hyper-V Containers
Windows Server Container and Hyper-V containers offer many of the same portability and consistency benefits but
differ in terms of their isolation guarantees and performance characteristsics.
Windows Server Containers provide application isolation through process and namespace isolation technology.
A Windows Server container shares a kernel with the container host and all containers running on the host.
Hyper-V Containers expand on the isolation provided by Windows Server Containers by running each container
in a highly optimized virtual machine. In this configuration the kernel of the container host is not shared with the
Hyper-V Containers.
The additional isolation provided by Hyper-V containers is achieved in large part by a hypervisor layer of isolation
between the container and the container host. This affects container density as, unlike Windows Server Containers,
less sharing of system files and binaries can occur, resulting in an overall larger storage and memory footprint. In
addition there is the expected additional overhead in some network, storage io, and CPU paths.
Nano Server and Server Core
Windows Server Containers and Hyper-V containers offer support for Server Core and for a new installation option
available in Windows Server 2016 : Nano Server.
Nano Server is a remotely administered server operating system optimized for private clouds and datacenters. It is
similar to Windows Server in Server Core mode, but significantly smaller, has no local logon capability, and only
supports 64-bit applications, tools, and agents. It takes up far less disk space and starts faster.
Storage
Mounted Data Volumes
Containers offer the ability to use the container host system drive for the container scratch space. However, the
container scratch space has a life span equal to that of the container. That is, when the container is stopped, the
scratch space and all associated data goes away.
However, there are many scenarios in which having data persist independent of container lifetime is desired. In
these cases, we support mounting data volumes from the container host into the container. For Windows Server
Containers, there is neglible IO path overhead associated with mounted data volumes (near native performance).
However, when mounting data volumes into Hyper-V containers, there is some IO performance degradation in that
path. In addition, this impact is exaggerated when running Hyper-V containers inside of virtual machines.
Scratch Space
Both Windows Server Containers and Hyper-V containers provide a 20GB dynamic VHD for the container scratch
space by default. For both container types, the container OS takes up a portion of that space, and this is true for
every container started. Thus it is important to remember that every container started has some storage impact,
and depending on the workload can write up to 20GB of the backing storage media. Server storage configurations
should be designed with this in mind. (can we configure scratch size)
Networking
Windows Server Containers and Hyper-V containers offer a variety of networking modes to best suit the needs of
differing networking configurations. Each of these options present their own performance characteristics.
Windows Network Address Translation (WinNAT )
Each container will receive an IP address from an internal, private IP prefix (e.g. 172.16.0.0/12). Port forwarding /
mapping from the container host to container endpoints is supported. Docker creates a NAT network by default
when the dockerd first runs.
Of these three modes, the NAT configuration is the most expensive network IO path, but has the least amount of
configuration needed.
Windows Server containers use a Host vNIC to attach to the virtual switch. Hyper-V Containers use a Synthetic VM
NIC (not exposed to the Utility VM) to attach to the virtual switch. When containers are communicating with the
external network, packets are routed through WinNAT with address translations applied, which incurs some
overhead.
Transparent
Each container endpoint is directly connected to the physical network. IP addresses from the physical network can
be assigned statically or dynamically using an external DHCP server.
Transparent mode is the least expensive in terms of the network IO path, and external packets are directly passed
through to the container virtual NIC giving direct access to the external network.
L2 Bridge
Each container endpoint will be in the same IP subnet as the container host. The IP addresses must be assigned
statically from the same prefix as the container host. All container endpoints on the host will have the same MAC
address due to Layer-2 address translation.
L2 Bridge Mode is more performant than WinNAT mode as it provides direct access to the external network, but
less performant than Transparent mode as it also introduces MAC address translation.
Performance Tuning Remote Desktop Session Hosts
4/24/2017 12 min to read Edit Online
This topic discusses how to select Remote Desktop Session Host (RD Session Host) hardware, tune the host, and
tune applications.
In this topic:
Selecting the proper hardware for performance
Tuning applications for Remote Desktop Session Host
Remote Desktop Session Host tuning parameters
If DLLs are relocated, it is impossible to share their code across sessions, which significantly increases
the footprint of a session. This is one of the most common memory-related performance issues on an
RD Session Host server.
For common language runtime (CLR) applications, use Native Image Generator (Ngen.exe) to increase page
sharing and reduce CPU overhead.
When possible, apply similar techniques to other similar execution engines.
This topic discusses how to select Remote Desktop Session Host (RD Session Host) hardware, tune the host, and
tune applications.
In this topic:
Selecting the proper hardware for performance
Tuning applications for Remote Desktop Session Host
Remote Desktop Session Host tuning parameters
If DLLs are relocated, it is impossible to share their code across sessions, which significantly increases
the footprint of a session. This is one of the most common memory-related performance issues on an
RD Session Host server.
For common language runtime (CLR) applications, use Native Image Generator (Ngen.exe) to increase page
sharing and reduce CPU overhead.
When possible, apply similar techniques to other similar execution engines.
Remote Desktop Virtualization Host (RD Virtualization Host) is a role service that supports Virtual Desktop
Infrastructure (VDI) scenarios and lets multiple concurrent users run Windows-based applications in virtual
machines that are hosted on a server running Windows Server 2016 and Hyper-V.
Windows Server 2016 supports two types of virtual desktops, personal virtual desktops and pooled virtual
desktops.
In this topic:
General considerations
Performance optimizations
General considerations
Storage
Storage is the most likely performance bottleneck, and it is important to size your storage to properly handle the
I/O load that is generated by virtual machine state changes. If a pilot or simulation is not feasible, a good guideline
is to provision one disk spindle for four active virtual machines. Use disk configurations that have good write
performance (such as RAID 1+0).
When appropriate, use Disk Deduplication and caching to reduce the disk read load and to enable your storage
solution to speed up performance by caching a significant portion of the image.
Data Deduplication and VDI
Introduced in Windows Server 2012 R2, Data Deduplication supports optimization of open files. In order to use
virtual machines running on a deduplicated volume, the virtual machine files need to be stored on a separate host
from the Hyper-V host. If Hyper-V and deduplication are running on the same machine, the two features will
contend for system resources and negatively impact overall performance.
The volume must also be configured to use the Virtual Desktop Infrastructure (VDI) deduplication optimization
type. You can configure this by using Server Manager (File and Storage Services -> Volumes -> Dedup
Settings) or by using the following Windows PowerShell command:
Note
Data Deduplication optimization of open files is supported only for VDI scenarios with Hyper-V using remote
storage over SMB 3.0.
Memory
Server memory usage is driven by three main factors:
Operating system overhead
Hyper-V service overhead per virtual machine
Memory allocated to each virtual machine
For a typical knowledge worker workload, guest virtual machines running x86 Window 8 or Windows 8.1 should be
given ~512 MB of memory as the baseline. However, Dynamic Memory will likely increase the guest virtual
machines memory to about 800 MB, depending on the workload. For x64, we see about 800 MB starting,
increasing to 1024 MB.
Therefore, it is important to provide enough server memory to satisfy the memory that is required by the expected
number of guest virtual machines, plus allow a sufficient amount of memory for the server.
CPU
When you plan server capacity for an RD Virtualization Host server, the number of virtual machines per physical
core will depend on the nature of the workload. As a starting point, it is reasonable to plan 12 virtual machines per
physical core, and then run the appropriate scenarios to validate performance and density. Higher density may be
achievable depending on the specifics of the workload.
We recommend enabling hyper-threading, but be sure to calculate the oversubscription ratio based on the number
of physical cores and not the number of logical processors. This ensures the expected level of performance on a per
CPU basis.
Virtual GPU
Microsoft RemoteFX for RD Virtualization Host delivers a rich graphics experience for Virtual Desktop Infrastructure
(VDI) through host-side remoting, a render-capture-encode pipeline, a highly efficient GPU-based encode, throttling
based on client activity, and a DirectX-enabled virtual GPU. RemoteFX for RD Virtualization Host upgrades the
virtual GPU from DirectX9 to DirectX11. It also improves the user experience by supporting more monitors at
higher resolutions.
The RemoteFX DirectX11 experience is available without a hardware GPU, through a software-emulated driver.
Although this software GPU provides a good experience, the RemoteFX virtual graphics processing unit (VGPU)
adds a hardware accelerated experience to virtual desktops.
To take advantage of the RemoteFX VGPU experience on a server running Windows Server 2016, you need a GPU
driver (such as DirectX11.1 or WDDM 1.2) on the host server. For more information about GPU offerings to use
with RemoteFX for RD Virtualization Host, contact your GPU provider.
If you use the RemoteFX virtual GPU in your VDI deployment, the deployment capacity will vary based on usage
scenarios and hardware configuration. When you plan your deployment, consider the following:
Number of GPUs on your system
Video memory capacity on the GPUs
Processor and hardware resources on your system
RemoteFX server system memory
For every virtual desktop enabled with a virtual GPU, RemoteFX uses system memory in the guest operating system
and in the RemoteFX-enabled server. The hypervisor guarantees the availability of system memory for a guest
operating system. On the server, each virtual GPU-enabled virtual desktop needs to advertise its system memory
requirement to the hypervisor. When the virtual GPU-enabled virtual desktop is starting, the hypervisor reserves
additional system memory in the RemoteFX-enabled server for the VGPU-enabled virtual desktop.
The memory requirement for the RemoteFX-enabled server is dynamic because the amount of memory consumed
on the RemoteFX-enabled server is dependent on the number of monitors that are associated with the VGPU-
enabled virtual desktops and the maximum resolution for those monitors.
RemoteFX server GPU video memory
Every virtual GPU-enabled virtual desktop uses the video memory in the GPU hardware on the host server to
render the desktop. In addition to rendering, the video memory is used by a codec to compress the rendered screen.
The amount of memory needed is directly based on the amount of monitors that are provisioned to the virtual
machine.
The video memory that is reserved varies based on the number of monitors and the system screen resolution.
Some users may require a higher screen resolution for specific tasks. There is greater scalability with lower
resolution settings if all other settings remain constant.
RemoteFX processor
The hypervisor schedules the RemoteFX-enabled server and the virtual GPU-enabled virtual desktops on the CPU.
Unlike the system memory, there isnt information that is related to additional resources that RemoteFX needs to
share with the hypervisor. The additional CPU overhead that RemoteFX brings into the virtual GPU-enabled virtual
desktop is related to running the virtual GPU driver and a user-mode Remote Desktop Protocol stack.
On the RemoteFX-enabled server, the overhead is increased, because the system runs an additional process
(rdvgm.exe) per virtual GPU-enabled virtual desktop. This process uses the graphics device driver to run commands
on the GPU. The codec also uses the CPUs for compressing the screen data that needs to be sent back to the client.
More virtual processors mean a better user experience. We recommend allocating at least two virtual CPUs per
virtual GPU-enabled virtual desktop. We also recommend using the x64 architecture for virtual GPU-enabled virtual
desktops because the performance on x64 virtual machines is better compared to x86 virtual machines.
RemoteFX GPU processing power
For every virtual GPU-enabled virtual desktop, there is a corresponding DirectX process running on the RemoteFX-
enabled server. This process replays all the graphics commands that it receives from the RemoteFX virtual desktop
onto the physical GPU. For the physical GPU, it is equivalent to simultaneously running multiple DirectX
applications.
Typically, graphics devices and drivers are tuned to run a few applications on the desktop. RemoteFX stretches the
GPUs to be used in a unique manner. To measure how the GPU is performing on a RemoteFX server, performance
counters have been added to measure the GPU response to RemoteFX requests.
Usually when a GPU resource is low on resources, Read and Write operations to the GPU take a long time to
complete. By using performance counters, administrators can take preventative action, eliminating the possibility of
any downtime for their end users.
The following performance counters are available on the RemoteFX server to measure the virtual GPU performance:
RemoteFX graphics
Frames Skipped/Second - Insufficient Client Resources Number of frames skipped per second due to
insufficient client resources
Graphics Compression Ratio Ratio of the number of bytes encoded to the number of bytes input
RemoteFX root GPU management
Resources: TDRs in Server GPUs Total number of times that the TDR times out in the GPU on the server
Resources: Virtual machines running RemoteFX Total number of virtual machines that have the
RemoteFX 3D Video Adapter installed
VRAM: Available MB per GPU Amount of dedicated video memory that is not being used
VRAM: Reserved % per GPU Percent of dedicated video memory that has been reserved for RemoteFX
RemoteFX software
Capture Rate for monitor [1-4] Displays the RemoteFX capture rate for monitors 1-4
Compression Ratio Deprecated in Windows 8 and replaced by Graphics Compression Ratio
Delayed Frames/sec Number of frames per second where graphics data was not sent within a certain
amount of time
GPU response time from Capture Latency measured within RemoteFX Capture (in microseconds) for GPU
operations to complete
GPU response time from Render Latency measured within RemoteFX Render (in microseconds) for GPU
operations to complete
Output Bytes Total number of RemoteFX output bytes
Waiting for client count/sec Deprecated in Windows 8 and replaced by Frames Skipped/Second -
Insufficient Client Resources
RemoteFX vGPU management
Resources: TDRs local to virtual machines Total number of TDRs that have occurred in this virtual
machine (TDRs that the server propagated to the virtual machines are not included)
Resources: TDRs propagated by Server Total number of TDRs that occurred on the server and that have
been propagated to the virtual machine
RemoteFX virtual machine vGPU performance
Data: Invoked presents/sec Total number (in seconds) of present operations to be rendered to the desktop
of the virtual machine per second
Data: Outgoing presents/sec Total number of present operations sent by the virtual machine to the server
GPU per second
Data: Read bytes/sec Total number of read bytes from the RemoteFX-enabled server per second
Data: Send bytes/sec Total number of bytes sent to the RemoteFX-enabled server GPU per second
DMA: Communication buffers average latency (sec) Average amount of time (in seconds) spent in the
communication buffers
DMA: DMA buffer latency (sec) Amount of time (in seconds) from when the DMA is submitted until
completed
DMA: Queue length DMA Queue length for a RemoteFX 3D Video Adapter
Resources: TDR timeouts per GPU Count of TDR timeouts that have occurred per GPU on the virtual
machine
Resources: TDR timeouts per GPU engine Count of TDR timeouts that have occurred per GPU engine on
the virtual machine
In addition to the RemoteFX virtual GPU performance counters, you can also measure the GPU utilization by using
Process Explorer, which shows video memory usage and the GPU utilization.
Performance optimizations
Dynamic Memory
Dynamic Memory enables more efficiently utilization of the memory resources of the server running Hyper-V by
balancing how memory is distributed between running virtual machines. Memory can be dynamically reallocated
between virtual machines in response to their changing workloads.
Dynamic Memory enables you to increase virtual machine density with the resources you already have without
sacrificing performance or scalability. The result is more efficient use of expensive server hardware resources, which
can translate into easier management and lower costs.
On guest operating systems running Windows 8 and above with virtual processors that span multiple logical
processors, consider the tradeoff between running with Dynamic Memory to help minimize memory usage and
disabling Dynamic Memory to improve the performance of an application that is computer-topology aware. Such
an application can leverage the topology information to make scheduling and memory allocation decisions.
Tiered Storage
RD Virtualization Host supports tiered storage for virtual desktop pools. The physical computer that is shared by all
pooled virtual desktops within a collection can use a small-size, high-performance storage solution, such as a
mirrored solid-state drive (SSD). The pooled virtual desktops can be placed on less expensive, traditional storage
such as RAID 1+0.
The physical computer should be placed on a SSD is because most of the read-I/Os from pooled virtual desktops go
to the management operating system. Therefore, the storage that is used by the physical computer must sustain
much higher read I/Os per second.
This deployment configuration assures cost effective performance where performance is needed. The SSD provides
higher performance on a smaller size disk (~20 GB per collection, depending on the configuration). Traditional
storage for pooled virtual desktops (RAID 1+0) uses about 3 GB per virtual machine.
CSV cache
Failover Clustering in Windows Server 2012 and above provides caching on Cluster Shared Volumes (CSV). This is
extremely beneficial for pooled virtual desktop collections where the majority of the read I/Os come from the
management operating system. The CSV cache provides higher performance by several orders of magnitude
because it caches blocks that are read more than once and delivers them from system memory, which reduces the
I/O. For more info on CSV cache, see How to Enable CSV Cache.
Pooled virtual desktops
By default, pooled virtual desktops are rolled back to the pristine state after a user signs out, so any changes made
to the Windows operating system since the last user sign-in are abandoned.
Although its possible to disable the rollback, it is still a temporary condition because typically a pooled virtual
desktop collection is re-created due to various updates to the virtual desktop template.
It makes sense to turn off Windows features and services that depend on persistent state. Additionally, it makes
sense to turn off services that are primarily for non-enterprise scenarios.
Each specific service should be evaluated appropriately prior to any broad deployment. The following are some
initial things to consider:
SERVICE WHY?
Auto update Pooled virtual desktops are updated by re-creating the virtual
desktop template.
Offline files Virtual desktops are always online and connected from a
networking point-of-view.
Background defrag File-system changes are discarded after a user signs off (due to
a rollback to the pristine state or re-creating the virtual
desktop template, which results in re-creating all pooled virtual
desktops).
Bug check memory dump No such concept for pooled virtual desktops. A bug-check
pooled virtual desktop will start from the pristine state.
Note
This list is not meant to be a complete list, because any changes will affect the intended goals and scenarios. For
more info, see Hot off the presses, get it now, the Windows 8 VDI optimization script, courtesy of PFE!.
Note
SuperFetch in Windows 8 is enabled by default. It is VDI-aware and should not be disabled. SuperFetch can further
reduce memory consumption through memory page sharing, which is beneficial for VDI. Pooled virtual desktops
running Windows 7, SuperFetch should be disabled, but for personal virtual desktops running Windows 7, it should
be left on.
Performance Tuning Remote Desktop Gateways
4/24/2017 2 min to read Edit Online
Note
In Windows 8+ and Windows Server 2012 R2+, Remote Desktop Gateway (RD Gateway) supports TCP, UDP, and
the legacy RPC transports. Most of the following data is regarding the legacy RPC transport. If the legacy RPC
transport is not being used, this section is not applicable.
This topic describes the performance-related parameters that help improve the performance of a customer
deployment and the tunings that rely on the customers network usage patterns.
At its core, RD Gateway performs many packet forwarding operations between Remote Desktop Connection
instances and the RD Session Host server instances within the customers network.
Note
The following parameters apply to RPC transport only.
Internet Information Services (IIS) and RD Gateway export the following registry parameters to help improve
system performance in the RD Gateway.
Thread tunings
Maxiothreads
This app-specific thread pool specifies the number of threads that RD Gateway creates to handle incoming
requests. If this registry setting is present, it takes effect. The number of threads equals the number of logical
processes. If the number of logical processors is less than 5, the default is 5 threads.
MaxPoolThreads
HKLM\System\CurrentControlSet\Services\InetInfo\Parameters\MaxPoolThreads (REG_DWORD)
This parameter specifies the number of IIS pool threads to create per logical processor. The IIS pool threads
watch the network for requests and process all incoming requests. The MaxPoolThreads count does not
include threads that RD Gateway consumes. The default value is 4.
Remote procedure call tunings for RD Gateway
The following parameters can help tune the remote procedure calls (RPC) that are received by Remote Desktop
Connection and RD Gateway computers. Changing the windows helps throttle how much data is flowing through
each connection and can improve performance for RPC over HTTP v2 scenarios.
ServerReceiveWindow
HKLM\Software\Microsoft\Rpc\ServerReceiveWindow (REG_DWORD)
The default value is 64 KB. This value specifies the window that the server uses for data that is received from
the RPC proxy. The minimum value is set to 8 KB, and the maximum value is set at 1 GB. If a value is not
present, the default value is used. When changes are made to this value, IIS must be restarted for the change
to take effect.
ServerReceiveWindow
HKLM\Software\Microsoft\Rpc\ServerReceiveWindow (REG_DWORD)
The default value is 64 KB. This value specifies the window that the client uses for data that is received from
the RPC proxy. The minimum value is 8 KB, and the maximum value is 1 GB. If a value is not present, the
default value is used.
This topic describes performance tuning methods and recommendations for Windows Server 2016 web servers.
WARNING
Some applications, such as incremental backup utilities, rely on this update information, and they do not function correctly
without it.
See also
IIS 10.0 performance tuning
HTTP 1.1/2 tuning
Tuning IIS 10.0
4/24/2017 23 min to read Edit Online
Internet Information Services (IIS) 10.0 is included with Windows Server 2016. It uses a process model similar to
that of IIS 8.5 and IIS 7.0. A kernel-mode web driver (http.sys) receives and routes HTTP requests, and satisfies
requests from its response cache. Worker processes register for URL subspaces, and http.sys routes the request to
the appropriate process (or set of processes for application pools).
HTTP.sys is responsible for connection management and request handling. The request can be served from the
HTTP.sys cache or passed to a worker process for further handling. Multiple worker processes can be configured,
which provides isolation at a reduced cost. For more info on how request handling works, see the following figure:
HTTP.sys includes a response cache. When a request matches an entry in the response cache, HTTP.sys sends the
cache response directly from kernel mode. Some web application platforms, such as ASP.NET, provide mechanisms
to enable any dynamic content to be cached in the kernel-mode cache. The static file handler in IIS 10.0
automatically caches frequently requested files in http.sys.
Because a web server has kernel-mode and user-mode components, both components must be tuned for optimal
performance. Therefore, tuning IIS 10.0 for a specific workload includes configuring the following:
HTTP.sys and the associated kernel-mode cache
Worker processes and user-mode IIS, including the application pool configuration
Certain tuning parameters that affect performance
The following sections discuss how to configure the kernel-mode and user-mode aspects of IIS 10.0.
Kernel-mode settings
Performance-related HTTP.sys settings fall into two broad categories: cache management and connection and
request management. All registry settings are stored under the following registry entry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters
Note
If the HTTP service is already running, you must restart it for the changes to take effect.
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters\MaxConnections
IdleConnectionsHighMark
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters\IdleConnectionsHighMark
IdleConnectionsLowMark
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters\IdleConnectionsLowMark
IdleListTrimmerPeriod
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters\IdleListTrimmerPeriod
RequestBufferLookasideDepth
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters\RequestBufferLookasideDepth
InternalRequestLookasideDepth
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters\InternalRequestLookasideDepth
User-mode settings
The settings in this section affect the IIS 10.0 worker process behavior. Most of these settings can be found in the
following XML configuration file:
%SystemRoot%\system32\inetsrv\config\applicationHost.config
Use Appcmd.exe, the IIS 10.0 Management Console, the WebAdministration or IISAdministration PowerShell
Cmdlets to change them. Most settings are automatically detected, and they do not require a restart of the IIS 10.0
worker processes or web application server. For more info about the applicationHost.config file, see Introduction to
ApplicationHost.config.
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\InetInfo\Parameters\ThreadPoolUseIdealCpu
With this feature enabled, IIS thread manager makes its best effort to evenly distribute IIS thread pool threads
across all CPUs in all NUMA nodes based on their current loads. In general, it is recommended to keep this default
setting unchanged for NUMA hardware.
Note
The ideal CPU setting is different from the worker process NUMA node assignment settings
(numaNodeAssignment and numaNodeAffinityMode) introduced in CPU Settings for an Application Pool. The ideal
CPU setting affects how IIS distributes its thread pool threads, while the worker process NUMA node assignment
settings determine on which NUMA node a worker process starts.
staticCompression-EnableCpuUsage Enables or disables compression if the 50, 100, 50, and 90 respectively
current percentage CPU usage goes
staticCompression-DisableCpuUsage above or below specified limits.
system.webServer/urlCompression
Note
For servers running IIS 10.0 that have low average CPU usage, consider enabling compression for dynamic content,
especially if responses are large. This should first be done in a test environment to assess the effect on the CPU
usage from the baseline.
Tuning the default document list
The default document module handles HTTP requests for the root of a directory and translates them into requests
for a specific file, such as Default.htm or Index.htm. On average, around 25 percent of all requests on the Internet
go through the default document path. This varies significantly for individual sites. When an HTTP request does not
specify a file name, the default document module searches the list of allowed default documents for each name in
the file system. This can adversely affect performance, especially if reaching the content requires making a network
round trip or touching a disk.
You can avoid the overhead by selectively disabling default documents and by reducing or ordering the list of
documents. For websites that use a default document, you should reduce the list to only the default document
types that are used. Additionally, order the list so that it begins with the most frequently accessed default document
file name.
You can selectively set the default document behavior on particular URLs by customizing the configuration inside a
location tag in applicationHost.config or by inserting a web.config file directly in the content directory. This allows a
hybrid approach, which enables default documents only where they are necessary and sets the list to the correct
file name for each URL.
To disable default documents completely, remove DefaultDocumentModule from the list of modules in the
system.webServer/globalModules section in applicationHost.config.
system.webServer/defaultDocument
<files> element Specifies the file names that are The default list is Default.htm,
configured as default documents. Default.asp, Index.htm, Index.html,
Iisstart.htm, and Default.aspx.
system.applicationHost/log/centralBinaryLogFile
system.applicationHost/sites/VirtualDirectoryDefault
system.webServer/asp/limits
ATTRIBUTE DESCRIPTION DEFAULT
system.webServer/asp/comPlus
<system.web>
<applicationPool maxConcurrentRequestsPerCPU=5000/>
</system.web>
<system.web>
<applicationPool percentCpuLimit=90 percentCpuLimitMinActiveRequestPerCpu=100/>
</system.web>
percentCpuLimit Default value: 90 Asynchronous request has some scalability issues when a huge load
(beyond the hardware capabilities) is put on such scenario. The problem is due to the nature of allocation on
asynchronous scenarios. In these conditions, allocation will happen when the asynchronous operation starts,
and it will be consumed when it completes. By that time, its very possible the objects have been moved to
generation 1 or 2 by GC. When this happens, increasing the load will show increase on request per second (rps)
until a point. Once we pass that point, the time spent in GC will start to become a problem and the rps will start
to dip, having a negative scaling effect. To fix the problem, when the cpu usage exceeds percentCpuLimit setting,
requests will be sent to the ASP.NET native queue.
percentCpuLimitMinActiveRequestPerCpu Default value: 100 CPU throttling(percentCpuLimit setting) is not
based on number of requests but on how expensive they are. As a result, there could be just a few CPU-
intensive requests causing a backup in the native queue with no way to empty it aside from incoming requests.
To solve this problme, percentCpuLimitMinActiveRequestPerCpu can be used to ensure a minimum number of
requests are being served before throttling kicks in.
1 /SourceSilverLight/Geosourc 10:01
e.web/grosource.html
5 / 10:23 0:11
SourceSilverLight/Geosource
webService/Service.asmx
6 / 11:50 1:27
SourceSilverLight/Geosource.
web/GeoSearchServer.
REQUEST URL REQUEST TIME DELTA
The hard part, though, is figuring out what setting to apply to make sense. In our case, the site gets a bunch of
requests from users, and the table above shows that a total of 4 unique sessions occurred in a period of 4 hours.
With the default settings for worker process suspension of the application pool, the site would be terminated after
the default timeout of 20 minutes, which means each of these users would experience the site spin-up cycle. This
makes it an ideal candidate for worker process suspension, because for most of the time, the site is idle, and so
suspending it would conserve resources, and allow the users to reach the site almost instantly.
A final, and very important note about this is that disk performance is crucial for this feature. Because the
suspension and wake-up process involve writing and reading large amount of data to the hard drive, we strongly
recommend using a fast disk for this. Solid State Drives (SSDs) are ideal and highly recommended for this, and you
should make sure that the Windows page file is stored on it (if the operating system itself is not installed on the
SSD, configure the operating system to move the page file to it).
Whether you use an SSD or not, we also recommend fixing the size of the page file to accommodate writing the
page-out data to it without file-resizing. Page-file resizing might happen when the operating system needs to store
data in the page file, because by default, Windows is configured to automatically adjust its size based on need. By
setting the size to a fixed one, you can prevent resizing and improve performance a lot.
To configure a pre-fixed page file size, you need to calculate its ideal size, which depends on how many sites you
will be suspending, and how much memory they consume. If the average is 200 MB for an active worker process
and you have 500 sites on the servers that will be suspending, then the page file should be at least (200 * 500) MB
over the base size of the page file (so base + 100 GB in our example).
Note
When sites are suspended, they will consume approximately 6 MB each, so in our case, memory usage if all sites
are suspended would be around 3 GB. In reality, though, youre probably never going to have them all suspended
at the same time.
See also
Web Server performance tuning
HTTP 1.1/2 tuning
Performance Tuning HTTP 1.1/2
4/24/2017 1 min to read Edit Online
HTTP/2 is meant to improve performance on the client side (e.g., page load time on a browser). On the server, it
may represent a slight increase in CPU cost. Whereas the server no longer requires a single TCP connection for
every request, some of that state will now be kept in the HTTP layer. Furthermore, HTTP/2 has header compression,
which represents additional CPU load.
Some situations require an HTTP/1.1 fallback (resetting the HTTP/2 connection and instead establishing a new
connection to use HTTP/1.1). In particular, TLS renegotiation and HTTP authentication (other than Basic and Digest)
require HTTP/1.1 fallback. Even though this adds overhead, these operations already imply some delay and so are
not particularly performance-sensitive.
See also
Web Server performance tuning
IIS 10.0 performance tuning
Performance Tuning Cache and Memory Manager
4/24/2017 2 min to read Edit Online
By default, Windows caches file data that is read from disks and written to disks. This implies that read operations
read file data from an area in system memory, known as the system file cache, rather than from the physical disk.
Correspondingly, write operations write file data to the system file cache rather than to the disk, and this type of
cache is referred to as a write-back cache. Caching is managed per file object. Caching occurs under the direction of
the Cache Manager, which operates continuously while Windows is running.
File data in the system file cache is written to the disk at intervals determined by the operating system. Flushed
pages stay either in system cache working set (when FILE_FLAG_RANDOM_ACCESS is set and file handle wasnt
closed) or on the standby list where these become part of available memory.
The policy of delaying the writing of the data to the file and holding it in the cache until the cache is flushed is called
lazy writing, and it is triggered by the Cache Manager at a determinate time interval. The time at which a block of
file data is flushed is partially based on the amount of time it has been stored in the cache and the amount of time
since the data was last accessed in a read operation. This ensures that file data that is frequently read will stay
accessible in the system file cache for the maximum amount of time.
This file data caching process is illustrated in the following figure:
As depicted by the solid arrows in the preceding figure, a 256 KB region of data is read into a 256 KB cache slot in
system address space when it is first requested by the Cache Manager during a file read operation. A user-mode
process then copies the data in this slot to its own address space. When the process has completed its data access,
it writes the altered data back to the same slot in the system cache, as shown by the dotted arrow between the
process address space and the system cache. When the Cache Manager has determined that the data will no longer
be needed for a certain amount of time, it writes the altered data back to the file on the disk, as shown by the dotted
arrow between the system cache and the disk.
In this section:
Cache and Memory Manager Potential Performance Issues
Cache and Memory Manager Improvements in Windows Server 2016
Troubleshoot Cache and Memory Manager
Performance Issues
4/24/2017 2 min to read Edit Online
Before Windows Server 2012, two primary potential issues caused system file cache to grow until available
memory was almost depleted under certain workloads. When this situation results in the system being sluggish,
you can determine whether the server is facing one of these issues.
Counters to monitor
Memory\Long-Term Average Standby Cache Lifetime (s) < 1800 seconds
Memory\Available Mbytes is low
Memory\System Cache Resident Bytes
If Memory\Available Mbytes is low and at the same time Memory\System Cache Resident Bytes is consuming
significant part of the physical memory, you can use RAMMAP to find out what the cache is being used for.
The problem used to be mitigated by DynCache tool. In Windows Server 2012+, the architecture has been
redesigned and this problem should no longer exist.
This topic describes Cache Manager and Memory Manager improvements in Windows Server 2012 and 2016.
You can use this topic for an overview of the network subsystem and for links to other topics in this guide.
NOTE
In addition to this topic, the following sections of this guide provide performance tuning recommendations for network
devices and the network stack.
Choosing a Network Adapter
Configure the Order of Network Interfaces
Performance Tuning Network Adapters
Network-Related Performance Counters
Performance Tools for Network Workloads
Performance tuning the network subsystem, particularly for network intensive workloads, can involve each layer
of the network architecture, which is also called the network stack. These layers are broadly divided into the
following sections.
1. Network interface. This is the lowest layer in the network stack, and contains the network driver that
communicates directly with the network adapter.
2. Network Driver Interface Specification (NDIS). NDIS exposes interfaces for the driver below it and for
the layers above it, such as the Protocol Stack.
3. Protocol Stack. The protocol stack implements protocols such as TCP/IP and UDP/IP. These layers expose
the transport layer interface for layers above them.
4. System Drivers. These are typically clients that use a transport data extension (TDX) or Winsock Kernel
(WSK) interface to expose interfaces to user-mode applications. The WSK interface was introduced in
Windows Server 2008 and Windows Vista, and it is exposed by AFD.sys. The interface improves
performance by eliminating the switching between user mode and kernel mode.
5. User-Mode Applications. These are typically Microsoft solutions or custom applications.
The table below provides a vertical illustration of the layers of the network stack, including examples of items that
run in each layer.
Choosing a Network Adapter
4/24/2017 10 min to read Edit Online
You can use this topic to learn some of the features of network adapters that might affect your purchasing choices.
Network-intensive applications require high-performance network adapters. This section explores some
considerations for choosing network adapters, as well as how to configure different network adapter settings to
achieve the best network performance.
TIP
You can configure network adapter settings by using Windows PowerShell. For more information, see Network Adapter
Cmdlets in Windows PowerShell.
Offload Capabilities
Offloading tasks from the central processing unit (CPU) to the network adapter can reduce CPU usage on the
server, which improves the overall system performance.
The network stack in Microsoft products can offload one or more tasks to a network adapter if you select a network
adapter that has the appropriate offload capabilities. The following table provides a brief overview of different
offload capabilities that are available in Windows Server 2016.
Checksum calculation for TCP The network stack can offload the calculation and validation of
Transmission Control Protocol (TCP) checksums on send and
receive code paths. It can also offload the calculation and
validation of IPv4 and IPv6 checksums on send and receive
code paths.
Checksum calculation for UDP The network stack can offload the calculation and validation of
User Datagram Protocol (UDP) checksums on send and
receive code paths.
Checksum calculation for IPv4 The network stack can offload the calculation and validation of
IPv4 checksums on send and receive code paths.
Checksum calculation for IPv6 The network stack can offload the calculation and validation of
IPv6 checksums on send and receive code paths.
Segmentation of large TCP packets The TCP/IP transport layer supports Large Send Offload v2
(LSOv2). With LSOv2, the TCP/IP transport layer can offload
the segmentation of large TCP packets to the network
adapter.
OFFLOAD TYPE DESCRIPTION
Receive Side Scaling (RSS) RSS is a network driver technology that enables the efficient
distribution of network receive processing across multiple
CPUs in multiprocessor systems. More detail about RSS is
provided later in this topic.
Receive Segment Coalescing (RSC) RSC is the ability to group packets together to minimize the
header processing that is necessary for the host to perform. A
maximum of 64 KB of received payload can be coalesced into a
single larger packet for processing. More detail about RSC is
provided later in this topic.
NOTE
For a detailed command reference for each cmdlet, including syntax and parameters, you can click the following links. In
addition, you can pass the cmdlet name to Get-Help at the Windows PowerShell prompt for details on each command.
Disable-NetAdapterRss. This command disables RSS on the network adapter that you specify.
Enable-NetAdapterRss. This command enables RSS on the network adapter that you specify.
Get-NetAdapterRss. This command retrieves RSS properties of the network adapter that you specify.
Set-NetAdapterRss. This command sets the RSS properties on the network adapter that you specify.
RSS profiles
You can use the Profile parameter of the Set-NetAdapterRss cmdlet to specify which logical processors are
assigned to which network adapter. Available values for this parameter are:
Closest. Logical processor numbers that are near the network adapters base RSS processor are preferred.
With this profile, the operating system might rebalance logical processors dynamically based on load.
ClosestStatic. Logical processor numbers near the network adapters base RSS processor are preferred.
With this profile, the operating system does not rebalance logical processors dynamically based on load.
NUMA. Logical processor numbers are generally selected on different NUMA nodes to distribute the load.
With this profile, the operating system might rebalance logical processors dynamically based on load.
NUMAStatic. This is the default profile. Logical processor numbers are generally selected on different
NUMA nodes to distribute the load. With this profile, the operating system will not rebalance logical
processors dynamically based on load.
Conservative. RSS uses as few processors as possible to sustain the load. This option helps reduce the
number of interrupts.
Depending on the scenario and the workload characteristics, you can also use other parameters of the Set-
NetAdapterRss Windows PowerShell cmdlet to specify the following:
On a per-network adapter basis, how many logical processors can be used for RSS.
The starting offset for the range of logical processors.
The node from which the network adapter allocates memory.
Following are the additional Set-NetAdapterRss parameters that you can use to configure RSS:
NOTE
In the example syntax for each parameter below, the network adapter name Ethernet is used as an example value for the
Name parameter of the Set-NetAdapterRss command. When you run the cmdlet, ensure that the network adapter name
that you use is appropriate for your environment.
* MaxProcessors: Sets the maximum number of RSS processors to be used. This ensures that application
traffic is bound to a maximum number of processors on a given interface. Example syntax:
Set-NetAdapterRss Name Ethernet MaxProcessors <value>
* BaseProcessorGroup: Sets the base processor group of a NUMA node. This impacts the processor array
that is used by RSS. Example syntax:
Set-NetAdapterRss Name Ethernet BaseProcessorGroup <value>
* MaxProcessorGroup: Sets the Max processor group of a NUMA node. This impacts the processor array
that is used by RSS. Setting this would restrict a maximum processor group so that load balancing is aligned
within a k-group. Example syntax:
Set-NetAdapterRss Name Ethernet MaxProcessorGroup <value>
* BaseProcessorNumber: Sets the base processor number of a NUMA node. This impacts the processor
array that is used by RSS. This allows partitioning processors across network adapters. This is the first logical
processor in the range of RSS processors that is assigned to each adapter. Example syntax:
Set-NetAdapterRss Name Ethernet BaseProcessorNumber <Byte Value>
* NumaNode: The NUMA node that each network adapter can allocate memory from. This can be within a
k-group or from different k-groups. Example syntax:
Set-NetAdapterRss Name Ethernet NumaNodeID <value>
* NumberofReceiveQueues: If your logical processors seem to be underutilized for receive traffic (for
example, as viewed in Task Manager), you can try increasing the number of RSS queues from the default of
2 to the maximum that is supported by your network adapter. Your network adapter may have options to
change the number of RSS queues as part of the driver. Example syntax:
Set-NetAdapterRss Name Ethernet NumberOfReceiveQueues <value>
For more information, click the following link to download Scalable Networking: Eliminating the Receive Processing
BottleneckIntroducing RSS in Word format.
Understanding RSS Performance
Tuning RSS requires understanding the configuration and the load-balancing logic. To verify that the RSS settings
have taken effect, you can review the output when you run the Get-NetAdapterRss Windows PowerShell cmdlet.
Following is example output of this cmdlet.
PS C:\Users\Administrator> get-netadapterrss
Name : testnic 2
InterfaceDescription : Broadcom BCM5708C NetXtreme II GigE (NDIS VBD Client) #66
Enabled : True
NumberOfReceiveQueues : 2
Profile : NUMAStatic
BaseProcessor: [Group:Number] : 0:0
MaxProcessor: [Group:Number] : 0:15
MaxProcessors : 8
IndirectionTable: [Group:Number]:
0:0 0:4 0:0 0:4 0:0 0:4 0:0 0:4
(# indirection table entries are a power of 2 and based on # of processors)
0:0 0:4 0:0 0:4 0:0 0:4 0:0 0:4
In addition to echoing parameters that were set, the key aspect of the output is the indirection table output. The
indirection table displays the hash table buckets that are used to distribute incoming traffic. In this example, the n:c
notation designates the Numa K-Group:CPU index pair that is used to direct incoming traffic. We see exactly 2
unique entries (0:0 and 0:4), which represent k-group 0/cpu0 and k-group 0/cpu 4, respectively.
There is only one k-group for this system (k-group 0) and a n (where n <= 128) indirection table entry. Because the
number of receive queues is set to 2, only 2 processors (0:0, 0:4) are chosen - even though maximum processors is
set to 8. In effect, the indirection table is hashing incoming traffic to only use 2 CPUs out of the 8 that are available.
To fully utilize the CPUs, the number of RSS Receive Queues must be equal to or greater than Max Processors. In
the previous example, the Receive Queue should be set to 8 or greater.
NIC Teaming and RSS
RSS can be enabled on a network adapter that is teamed with another network interface card using NIC Teaming. In
this scenario, only the underlying physical network adapter can be configured to use RSS. A user cannot set RSS
cmdlets on the teamed network adapter.
Receive Segment Coalescing (RSC )
Receive Segment Coalescing (RSC) helps performance by reducing the number of IP headers that are processed for
a given amount of received data. It should be used to help scale the performance of received data by grouping (or
coalescing) the smaller packets into larger units.
This approach can affect latency with benefits mostly seen in throughput gains. RSC is recommended to increase
throughput for received heavy workloads. Consider deploying network adapters that support RSC.
On these network adapters, ensure that RSC is on (this is the default setting), unless you have specific workloads
(for example, low latency, low throughput networking) that show benefit from RSC being off.
Understanding RSC Diagnostics
You can diagnose RSC by using the Windows PowerShell cmdlets Get-NetAdapterRsc and Get-
NetAdapterStatistics.
Following is example output when you run the Get-NetAdapterRsc cmdlet.
PS C:\Users\Administrator> Get-NetAdapterRsc
The Get cmdlet shows whether RSC is enabled in the interface and whether TCP enables RSC to be in an
operational state. The failure reason provides details about the failure to enable RSC on that interface.
In the previous scenario, IPv4 RSC is supported and operational in the interface. To understand diagnostic failures,
one can see the coalesced bytes or exceptions caused. This provides an indication of the coalescing issues.
Following is example output when you run the Get-NetAdapterStatistics cmdlet.
CoalescedBytes : 0
CoalescedPackets : 0
CoalescingEvents : 0
CoalescingExceptions : 0
In Windows Server 2016 and Windows 10, you can use the interface metric to configure the order of network
interfaces.
This is different than in previous versions of Windows and Windows Server, which allowed you to configure the
binding order of network adapters by using either the user interface or the commands
INetCfgComponentBindings::MoveBefore and INetCfgComponentBindings::MoveAfter. These two
methods for ordering network interfaces are not available in Windows Server 2016 and Windows 10.
Instead, you can use the new method for setting the enumerated order of network adapters by configuring the
interface metric of each adapter. You can configure the interface metric by using the Set-NetIPInterface Windows
PowerShell command.
When network traffic routes are chosen and you have configured the InterfaceMetric parameter of the Set-
NetIPInterface command, the overall metric that is used to determine the interface preference is the sum of the
route metric and the interface metric. Typically, the interface metric gives preference to a particular interface, such
as using wired if both wired and wireless are available.
The following Windows PowerShell command example shows use of this parameter.
The order in which adapters appear in a list is determined by the IPv4 or IPv6 interface metric. For more
information, see GetAdaptersAddresses function.
For links to all topics in this guide, see Network Subsystem Performance Tuning.
Performance Tuning Network Adapters
4/24/2017 7 min to read Edit Online
You can use this topic to performance tune network adapters that are installed in computers that are running
Windows Server 2016.
Determining the correct tuning settings for your network adapter depend on the following variables:
The network adapter and its feature set
The type of workload performed by the server
The server hardware and software resources
Your performance goals for the server
If your network adapter provides tuning options, you can optimize network throughput and resource usage to
achieve optimum throughput based on the parameters described above.
The following sections describe some of your performance tuning options.
IMPORTANT
Do not use the offload features IPsec Task Offload or TCP Chimney Offload. These technologies are deprecated in
Windows Server 2016, and might adversely affect server and networking performance. In addition, these technologies might
not be supported by Microsoft in the future.
For example, enabling segmentation offload can reduce the maximum sustainable throughput on some network
adapters because of limited hardware resources. However, if the reduced throughput is not expected to be a
limitation, you should enable offload capabilities, even for this type of network adapter.
NOTE
Some network adapters require offload features to be independently enabled for send and receive paths.
NOTE
If a network adapter does not expose manual resource configuration, it either dynamically configures the resources, or the
resources are set to a fixed value that cannot be changed.
NOTE
The operating system can exert no control over SMIs because the logical processor is running in a special maintenance mode,
which prevents operating system intervention.
```
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
```
TcpWindowSize
NumTcbTablePartitions
MaxHashTableSize
Windows Filtering Platform
The Windows Filtering Platform (WFP) that was introduced in Windows Vista and Windows Server 2008 provides
APIs to non-Microsoft independent software vendors (ISVs) to create packet processing filters. Examples include
firewall and antivirus software.
NOTE
A poorly written WFP filter can significantly decrease a servers networking performance. For more information, see Porting
Packet-Processing Drivers and Apps to WFP in the Windows Dev Center.
For links to all topics in this guide, see Network Subsystem Performance Tuning.
Network-Related Performance Counters
4/24/2017 1 min to read Edit Online
This topic lists the counters that are relevant to managing network performance, and contains the following
sections.
Resource Utilization
Potential Network Problems
Receive Side Coalescing (RSC) performance
Resource Utilization
The following performance counters are relevant to network resource utilization.
IPv4, IPv6
Datagrams Received/sec
Datagrams Sent/sec
TCPv4, TCPv6
Segments Received/sec
Segments Sent/sec
Segments Retransmitted/sec
Network Interface(), Network Adapter(\)
Bytes Received/sec
Bytes Sent/sec
Packets Received/sec
Packets Sent/sec
Output Queue Length
This counter is the length of the output packet queue (in packets). If this is longer than 2, delays occur.
You should find the bottleneck and eliminate it if you can. Because NDIS queues the requests, this
length should always be 0.
Processor Information
% Processor Time
Interrupts/sec
DPCs Queued/sec
This counter is an average rate at which DPCs were added to the logical processor's DPC queue. Each
logical processor has its own DPC queue. This counter measures the rate at which DPCs are added to
the queue, not the number of DPCs in the queue. It displays the difference between the values that
were observed in the last two samples, divided by the duration of the sample interval.
This topic provides an overview of Network Interface Card (NIC) Teaming in Windows Server 2016.
NOTE
In addition to this topic, the following NIC Teaming content is available.
NIC Teaming in Virtual Machines (VMs)
NIC Teaming and Virtual Local Area Networks (VLANs)
NIC Teaming MAC Address Use and Management
Troubleshooting NIC Teaming
Create a New NIC Team on a Host Computer or VM
NIC Teaming (NetLBFO) Cmdlets in Windows PowerShell
TechNet Gallery Download: Windows Server 2016 NIC and Switch Embedded Teaming User Guide
NOTE
A NIC team that contains only one network adapter cannot provide load balancing and failover; however with one network
adapter, you can use NIC Teaming for separation of network traffic when you are also using virtual Local Area Networks
(VLANs).
When you configure network adapters into a NIC team, they are connected into the NIC teaming solution common
core, which then presents one or more virtual adapters (also called team NICs [tNICs] or team interfaces) to the
operating system. Windows Server 2016 supports up to 32 team interfaces per team. There are a variety of
algorithms that distribute outbound traffic (load) between the NICs.
The following illustration depicts a NIC Team with multiple tNICs.
In addition, you can connect your teamed NICs to the same switch or to different switches. If you connect NICs to
different switches, both switches must be on the same subnet.
IMPORTANT
Hyper-V virtual NICs that are exposed in the host partition (vNICs) must not be placed in a team. Teaming of vNICs
inside of the host partition is not supported in any configuration or combination. Attempts to team vNICs might
cause a complete loss of communication if network failures occur.
See Also
NIC Teaming in Virtual Machines (VMs)
NIC Teaming in Virtual Machines (VMs)
4/24/2017 2 min to read Edit Online
This topic provides information about using NIC Teaming within Hyper-V VMs, and contains the following sections.
NIC Teaming Configuration Requirements
NIC Teaming with SR-IOV-Capable Network Adapters
Each VM can have a virtual function (VF) from one or both SR-IOV NICs and, in the event of a NIC disconnect,
failover from the primary VF to the back-up adapter (VF). Alternately, the VM may have a VF from one NIC and a
non-VF vmNIC connected to another virtual switch. If the NIC associated with the VF gets disconnected, the traffic
can failover to the other switch without loss of connectivity.
Because failover between NICs in a VM might result in traffic being sent with the MAC address of the other vmNIC,
each Hyper-V Virtual Switch port associated with a VM that is using NIC Teaming must be set to allow teaming. To
discover how to enable NIC Teaming in the VM, see Create a New NIC Team in a VM.
See Also
NIC Teaming and Virtual Local Area Networks (VLANs)
Create a New NIC Team in a VM
NIC Teaming
NIC Teaming and Virtual Local Area Networks
(VLANs)
4/24/2017 2 min to read Edit Online
This topic provides information about using NIC Teaming with virtual Local Area Networks (VLANs) on both host
computers and VMs, and includes the following sections.
Team interfaces and VLANs
Using VLANs with NIC Teaming in a VM
Managing network interfaces and VLANs
See Also
NIC Teaming MAC Address Use and Management
NIC Teaming
NIC Teaming MAC Address Use and Management
4/24/2017 3 min to read Edit Online
When you configure a NIC Team with switch independent mode and either address hash or dynamic load
distribution, the team uses the media access control (MAC) address of the primary NIC Team member on outbound
traffic. The primary NIC Team member is a network adapter selected by the operating system from the initial set of
team members.
The primary team member is the first team member to bind to the team after you create it or after the host
computer is restarted. Because the primary team member might change in a non-deterministic manner at each
boot, NIC disable/enable action, or other reconfiguration activities, the primary team member might change, and
the MAC address of the team might vary.
In most situations this doesn't cause problems, but there are a few cases where issues might arise.
If the primary team member is removed from the team and then placed into operation there may be a MAC
address conflict. To resolve this conflict, disable and then enable the team interface. The process of disabling and
then enabling the team interface causes the interface to select a new MAC address from the remaining team
members, thereby eliminating the MAC address conflict.
You can set the MAC address of the NIC team to a specific MAC address by setting it in the primary team interface,
just as you can do when configuring the MAC address of any physical NIC.
See Also
Create a New NIC Team on a Host Computer or VM
NIC Teaming
Create a New NIC Team on a Host Computer or VM
4/24/2017 8 min to read Edit Online
This topic provides information about NIC Teaming configuration so that you understand the selections you must
make when you are configuring a new NIC Team. This topic contains the following sections.
Choosing a Teaming Mode
Choosing a Load Balancing Mode
Choosing a Standby Adapter Setting
Using the Primary Team Interface Property
NOTE
If you already understand these configuration items, you can use the following procedures to configure NIC Teaming.
Create a New NIC Team in a VM
Create a New NIC Team
When you create a new NIC Team, you must configure the following NIC Team properties.
Team name
Member adapters
Teaming mode
Load balancing mode
Standby adapter
You can also optionally configure the primary team interface and configure a virtual LAN (VLAN) number.
These NIC Team properties are displayed in the following illustration, which contains example values for some NIC
Team properties.
Choosing a Teaming Mode
The options for Teaming mode are Switch Independent, Static Teaming, and Link Aggregation Control
Protocol (LACP). Both Static Teaming and LACP are Switch Dependent modes. For best NIC Team performance
with all three Teaming modes, it is recommended that you use a Load Balancing mode of Dynamic distribution.
Switch Independent
With Switch Independent mode, the switch or switches to which the NIC Team members are connected are
unaware of the presence of the NIC team and do not determine how to distribute network traffic to NIC Team
members - instead, the NIC Team distributes inbound network traffic across the NIC Team members.
When you use Switch Independent mode with Dynamic distribution, the network traffic load is distributed based
on the TCP Ports address hash as modified by the Dynamic load balancing algorithm. The Dynamic load balancing
algorithm redistributes flows to optimize team member bandwidth utilization so that individual flow transmissions
can move from one active team member to another. The algorithm takes into account the small possibility that
redistributing traffic could cause out-of-order delivery of packets, so it takes steps to minimize that possibility.
Switch Dependent
With Switch Dependent modes, the switch to which the NIC Team members are connected determines how to
distribute the inbound network traffic among the NIC Team members. The switch has complete independence to
determine how to distribute the network traffic across the NIC Team members.
IMPORTANT
Switch dependent teaming requires that all team members are connected to the same physical switch or a multi-chassis
switch that shares a switch ID among the multiple chassis.
Static Teaming requires you to manually configure both the switch and the host to identify which links form the
team. Because this is a statically configured solution, there is no additional protocol to assist the switch and the
host to identify incorrectly plugged cables or other errors that could cause the team to fail to perform. This mode is
typically supported by server-class switches.
Unlike Static Teaming, LACP Teaming mode dynamically identifies links that are connected between the host and
the switch. This dynamic connection enables the automatic creation of a team and, in theory but rarely in practice,
the expansion and reduction of a team simply by the transmission or receipt of LACP packets from the peer entity.
All server-class switches support LACP, and all require the network operator to administratively enable LACP on
the switch port. When you configure a Teaming mode of LACP, NIC Teaming always operates in LACP's Active
mode with a short timer. No option is presently available to modify the timer or change the LACP mode.
When you use Switch Dependent modes with Dynamic distribution, the network traffic load is distributed based on
the TransportPorts address hash as modified by the Dynamic load balancing algorithm. The Dynamic load
balancing algorithm redistributes flows to optimize team member bandwidth utilization. Individual flow
transmissions can move from one active team member to another as part of the dynamic distribution. The
algorithm takes into account the small possibility that redistributing traffic could cause out-of-order delivery of
packets, so it takes steps to minimize that possibility.
As with all switch dependent configurations, the switch determines how to distribute the inbound traffic among
the team members. The switch is expected to do a reasonable job of distributing the traffic across the team
members but it has complete independence to determine how it does so.
Because the adjacent switch always sees a particular MAC address on one port, the switch distributes the ingress
load (the traffic from the switch to the host) on multiple links based on the destination MAC (VM MAC) address.
This is particularly useful when Virtual Machine Queues (VMQs) are used, because a queue can be placed on the
specific NIC where the traffic is expected to arrive.
However, if the host has only a few VMs, this mode might not be granular enough to achieve a well-balanced
distribution. This mode will also always limit a single VM (i.e., the traffic from a single switch port) to the
bandwidth that is available on a single interface. NIC Teaming uses the Hyper-V Virtual Switch Port as the identifier
instead of using the source MAC address because, in some instances, a VM might be configured with more than
one MAC address on a switch port.
Dynamic
This Load balancing mode utilizes the best aspects of each of the other two modes and combines them into a
single mode:
Outbound loads are distributed based on a hash of the TCP Ports and IP addresses. Dynamic mode also
rebalances loads in real time so that a given outbound flow may move back and forth between team
members.
Inbound loads are distributed in the same manner as the Hyper-V port mode.
The outbound loads in this mode are dynamically balanced based on the concept of flowlets. Just as human speech
has natural breaks at the ends of words and sentences, TCP flows (TCP communication streams) also have
naturally occurring breaks. The portion of a TCP flow between two such breaks is referred to as a flowlet.
When the dynamic mode algorithm detects that a flowlet boundary has been encountered - such as when a break
of sufficient length has occurred in the TCP flow - the algorithm automatically rebalances the flow to another team
member if appropriate. In some circumstances the algorithm might also periodically rebalance flows that do not
contain any flowlets. Because of this, the affinity between TCP flow and team member can change at any time as
the dynamic balancing algorithm works to balance the workload of the team members.
Whether the team is configured with Switch Independent or one of the Switch Dependent modes, it is
recommended that you use Dynamic distribution mode for best performance.
There is an exception to this rule when the NIC Team has just two team members, is configured in Switch
Independent mode, and has Active/Standby mode enabled, with one NIC active and the other configured for
Standby. With this NIC Team configuration, Address Hash distribution provides slightly better performance than
Dynamic distribution.
After you click the highlighted link, the following New Team Interface dialog box opens.
If you are using VLANs, you can use this dialog box to specify a VLAN number.
Whether or not you are using VLANs, you can specify a tNIC name for the NIC Team.
See Also
Create a New NIC Team in a VM
NIC Teaming
Create a New NIC Team
4/24/2017 2 min to read Edit Online
You can use this topic to create a new NIC Team on a host computer or in a Hyper-V virtual machine (VM) that is
running Windows Server 2016.
IMPORTANT
If you are creating a new NIC Team in a VM, review the topic Create a New NIC Team in a VM before you perform this
procedure.
3. In Adapters and Interfaces, select the network adapters that you want to add to a NIC Team. For example,
if you want to add the adapters Ethernet 2 and Ethernet 3 to a new NIC Team, make the selection per the
illustration below.
4. Click TASKS, and then click Add to New Team.
5. The New team dialog box opens and displays network adapters and team members. In Team name, type a
name for the new NIC Team, and then click Additional properties.
6. In Additional properties, select values for Teaming mode, Load balancing mode, and Standby
adapter. In most cases, the highest performing load balancing mode is Dynamic. For more detailed
explanations of these modes, see the topic Create a New NIC Team on a Host Computer or VM.
IMPORTANT
If you are configuring a NIC Team in a virtual machine (VM), you must select a Teaming mode of Switch
Independent and a Load balancing mode of Address Hash.
7. If you want to configure the primary team interface name or assign a VLAN number to the NIC Team, click
the link to the right of Primary team interface. The New team interface dialog box opens.
8. Depending on your requirements, take one of the following actions:
To provide a tNIC interface name, type an interface name.
To configure VLAN membership, click Specific VLAN. Type the VLAN information in the first section
of the dialog box, which is highlighted in the illustration below. For example, if you want to add this
NIC Team to the accounting VLAN number 44, Type Accounting 44 - VLAN. Next, to the right of
Specific VLAN, type the VLAN number that you want to use. For example, type 44.
9. Click OK.
See Also
Create a New NIC Team on a Host Computer or VM
Create a New NIC Team in a VM
NIC Teaming
Create a New NIC Team in a VM
4/24/2017 3 min to read Edit Online
You can use this topic to connect a virtual machine (VM) to Hyper-V Virtual Switches in a manner that is
consistent with NIC Teaming requirements within VMs. You can also use this topic to create a new NIC team in a
VM.
This topic contains the following sections.
Network configuration requirements
Configure the physical and virtual network
Create a NIC Team
IMPORTANT
This procedure does not include instructions on how to create a VM.
Membership in Administrators, or equivalent, is the minimum required to perform this procedure.
To create a virtual switch and connect a VM
1. On the Hyper-V host, open Hyper-V Manager, and then click Virtual Switch Manager.
2. Virtual Switch Manager opens. In What type of virtual switch do you want to create?, ensure that
External is selected, and then click Create Virtual Switch.
3. The Virtual Switch Properties page opens. Type a Name for the virtual switch, and add Notes as needed.
4. In Connection type, in External network, select the physical network adapter to which you want to
attach the virtual switch.
5. Configure additional switch properties so that they are correct for your deployment, and then click OK.
6. Create a second external virtual switch by repeating the previous steps. Connect the second external switch
to a different network adapter.
7. Open Hyper-V Manager. In Virtual Machines, right-click the VM that you want to configure, and then click
Settings. The VM Settings dialog box opens.
IMPORTANT
Ensure that the VM is not started. If it is started, perform shutdown before configuring the VM.
NOTE
Steps 10 through 12 demonstrate how to enable NIC Teaming by using the graphical user interface. You can also
enable NIC Teaming by running the following Windows PowerShell command:
Set-VMNetworkAdapter -VMName <VMname> -AllowTeaming On
11. In Advanced Features, scroll down to NIC Teaming.
12. In NIC Teaming, click to select Enable this network adapter to be part of a team in the guest
operating system. Click OK.
13. To add a second network adapter, in Hyper-V Manager, in Virtual Machines, right-click the same VM, and
then click Settings. The VM Settings dialog box opens.
14. In Add Hardware, click Network Adapter, and then click Add.
15. In Network Adapter properties, select the second virtual switch that you created in previous steps, and
then click Apply.
16. In Hardware, click to expand the plus sign (+) next to Network Adapter. Click Advanced Features.
17. In Advanced Features, scroll down to NIC Teaming.
18. In NIC Teaming, click to select Enable this network adapter to be part of a team in the guest
operating system. Click OK.
You can now start and log on to your VM to create your new NIC Team.
See Also
Create a New NIC Team on a Host Computer or VM
NIC Teaming
Troubleshooting NIC Teaming
4/24/2017 2 min to read Edit Online
This topic provides information about troubleshooting NIC Teaming, and contains the following sections, which
describe possible causes of issues with NIC Teaming.
Hardware that doesn't conform to specification
Physical switch security features
Disabling and Enabling with Windows PowerShell
This sequence of commands does not enable all of the NetAdapters that it disabled.
This is because disabling all of the underlying physical member NICs causes the NIC team interface to be removed
and no longer show up in Get-NetAdapter. Because of this, the Enable-NetAdapter \* command does not enable
the NIC Team, because that adapter is removed.
The Enable-NetAdapter \* command does, however, enable the member NICs, which then (after a short time)
causes the team interface to be recreated. In this circumstance, the team interface is still in a "disabled" state
because it has not been re-enabled. Enabling the team interface after it is recreated will allow network traffic to
begin to flow again.
See Also
NIC Teaming
Performance Tuning Software Defined Networks
4/24/2017 3 min to read Edit Online
Software Defined Networking (SDN) in Windows Server 2016 is made up of a combination of a Network
Controller, Hyper-V Hosts, Software Load Balancer Gateways and HNV Gateways. For tuning of each of these
components refer to the following sections:
Network Controller
The network controller is a Windows Server role which must be enabled on Virtual Machines running on hosts that
are configured to use SDN and are controlled by the network controller.
Three Network Controller enabled VMs are sufficient for high availability and maximum performance. Each VM
must be sized according to the guidelines provided in the SDN infrastructure virtual machine role requirements
section of the Plan Software Defined Networking topic.
SDN Quality of Service (QoS )
To ensure virtual machine traffic is prioritized effectively and fairly, it is recommended that you configure SDN QoS
on the workload virtual machines. For more information on configuring SDN QoS, refer to the Configure QoS for a
Tenant VM Network Adapter topic.
(Get-NetworkControllerVirtualNetworkConfiguration -connectionuri
$uri).properties.networkvirtualizationprotocol
For best performance, if VXLAN is returned then you must make sure your physical network adapters support
VXLAN task offload. If NVGRE is returned, then your physical network adapters must support NVGRE task offload.
MTU
Encapsulation results in extra bytes being added to each packet. In order to avoid fragmentation of these packets,
the physical network must be configured to use jumbo frames. An MTU value of 9234 is the recommended size for
either VXLAN or NVGRE and must be configured on the physical switch for the physical interfaces of the host ports
(L2) and the router interfaces (L3) of the VLANs over which encapsulated packets will be sent. This includes the
Transit, HNV Provider and Management networks.
MTU on the Hyper-V host is configured through the network adapter, and the Network Controller Host Agent
running on the Hyper-V host will adjust for the encapsulation overhead automatically if supported by the network
adapter driver.
Once traffic egresses from the virtual network via a Gateway, the encapsulation is removed and the original MTU as
sent from the VM is used.
Single Root IO Virtualization (SR -IOV )
SDN is implemented on the Hyper-V host using a forwarding switch extension in the virtual switch. For this switch
extension to process packets, SR-IOV must not be used on virtual network interfaces that are configured for use
with the network controller as it causes VM traffic to bypass the virtual switch.
SR-IOV can still be enabled on the virtual switch if desired and can be used by VM network adapters that are not
controlled by the network controller. These SR-IOV VMs can coexist on the same virtual switch as network
controller controlled VMs which do not use SR-IOV.
If you are using 40Gbit network adapters it is recommended that you enable SR-IOV on the virtual switch for the
Software Load Balancing (SLB) Gateways to achieve maximum throughput. This is covered in more detail in the
Software Load Balancer Gateways section.
HNV Gateways
You can find information on tuning HNV Gateways for use with SDN in the HVN Gateways section.
This topic provides hardware specifications and configuration recommendations for servers that are running
Hyper-V and hosting Windows Server Gateway virtual machines, in addition to configuration parameters for
Windows Server Gateway virtual machines (VMs). To extract best performance from Windows Server gateway VMs,
it is expected that these guidelines will be followed. The following sections contain hardware and configuration
requirements when you deploy Windows Server Gateway.
1. Hyper-V hardware recommendations
2. Hyper-V host configuration
3. Windows Server gateway VM configuration
Network Interface Cards (NICs) Two 10 GB NICs,The gateway performance will depend on the
line rate. If the line rate is less than 10Gbps, the gateway
tunnel throughput numbers will also go down by the same
factor.
Ensure that the number of virtual processors that are assigned to a Windows Server Gateway VM does not exceed
the number of processors on the NUMA node. For example, if a NUMA node has 8 cores, the number of virtual
processors should be less than or equal to 8. For best performance, it should be 8. To find out the number of NUMA
nodes and the number of cores per NUMA node, run the following Windows PowerShell script on each Hyper-V
host:
$nodes = [object[]] $(gwmi Namespace root\virtualization\v2 -Class MSVM_NumaNode)
$cores = ($nodes | Measure-Object NumberOfProcessorCores -sum).Sum
$lps = ($nodes | Measure-Object NumberOfLogicalProcessors -sum).Sum
IMPORTANT
Allocating virtual processors across NUMA nodes might have a negative performance impact on Windows Server Gateway.
Running multiple VMs, each of which has virtual processors from one NUMA node, likely provides better aggregate
performance than a single VM to which all virtual processors are assigned.
One gateway VM with eight virtual processors and at least 8GB RAM is recommended when selecting the number
of gateway VMs to install on each Hyper-V host when each NUMA node has eight cores. In this case, one NUMA
node is dedicated to the host machine.
NOTE
To run the following Windows PowerShell commands, you must be a member of the Administrators group.
Switch Embedded Teaming When you create a vswitch with multiple network adapters, it
automatically enabled switch embedded teaming for those
adapters.
New-VMSwitch -Name TeamedvSwitch -NetAdapterName "NIC
1","NIC 2"
Traditional teaming through LBFO is not supported with SDN
in Windows Server 2016. Switch Embedded Teaming allows
you to use the same set of NICs for your virtual traffic and
RDMA traffic. This was not supported with NIC teaming based
on LBFO.
Interrupt Moderation on physical NICs Use default settings. To check the configuration, you can use
the following Windows PowerShell command:
Get-NetAdapterAdvancedProperty
CONFIGURATION ITEM WINDOWS POWERSHELL CONFIGURATION
Receive Buffers size on physical NICs You can verify whether the physical NICs support the
configuration of this parameter by running the command
Get-NetAdapterAdvancedProperty . If they do not support
this parameter, the output from the command does not
include the property Receive Buffers. If NICs do support this
parameter, you can use the following Windows PowerShell
command to set the Receive Buffers size:
Set-NetAdapterAdvancedProperty NIC1 DisplayName
Receive Buffers DisplayValue 3000
Send Buffers size on physical NICs You can verify whether the physical NICs support the
configuration of this parameter by running the command
Get-NetAdapterAdvancedProperty . If the NICs do not
support this parameter, the output from the command does
not include the property Send Buffers. If NICs do support
this parameter, you can use the following Windows PowerShell
command to set the Send Buffers size:
Set-NetAdapterAdvancedProperty NIC1 DisplayName
Transmit Buffers DisplayValue 3000
Receive Side Scaling (RSS) on physical NICs You can verify whether your physical NICs have RSS enabled
by running the Windows PowerShell command Get-
NetAdapterRss. You can use the following Windows
PowerShell commands to enable and configure RSS on your
network adapters:
Enable-NetAdapterRss NIC1,NIC2
Set-NetAdapterRss NIC1,NIC2
NumberOfReceiveQueues 16 -MaxProcessors
NOTE: If VMMQ or VMQ is enabled, RSS does not have to be
enabled on the physical network adapters. You can enable it
on the host virtual network adapters
Virtual Machine Queue (VMQ) on the NIC Team You can enable VMQ on your SET team by using the following
Windows PowerShell command:
Enable-NetAdapterVmq
NOTE: This should be enabled only if the HW does not
support VMMQ. If supported, VMMQ should be enabled for
better performance.
NOTE
VMQ and vRSS come into picture only when the load on the VM is high and the CPU is being utilized to the maximum. Only
then will at least one processor core max out. VMQ and vRSS will then be beneficial to help spread the processing load across
multiple cores. This is not applicable for IPsec traffic as IPsec traffic is confined to a single core.
Memory 8 GB
Number of virtual network adapters 3 NICs with the following specific uses: 1 for Management that
is used by the management operating system, 1 External that
provides access to external networks, 1 that is Internal that
provides access to internal networks only.
Receive Side Scaling (RSS) You can keep the default RSS settings for the Management
NIC. The following example configuration is for a VM that has
8 virtual processors. For the External and Internal NICs, you
can enable RSS with BaseProcNumber set to 0 and
MaxRssProcessors set to 8 using the following Windows
PowerShell command:
Set-NetAdapterRss Internal,External
BaseProcNumber 0 MaxProcessorNumber 8
Send side buffer You can keep the default Send Side Buffer settings for the
Management NIC. For both the Internal and External NICs you
can configure the Send Side Buffer with 32 MB of RAM by
using the following Windows PowerShell command:
Set-NetAdapterAdvancedProperty Internal,External
DisplayName Send Buffer Size DisplayValue 32MB
Receive Side buffer You can keep the default Receive Side Buffer settings for the
Management NIC. For both the Internal and External NICs,
you can configure the Receive Side Buffer with 16 MB of RAM
by using the following Windows PowerShell command:
Set-NetAdapterAdvancedProperty Internal,External
DisplayName Receive Buffer Size DisplayValue
16MB
Forward Optimization You can keep the default Forward Optimization settings for
the Management NIC. For both the Internal and External NICs,
you can enable Forward Optimization by using the following
Windows PowerShell command:
Set-NetAdapterAdvancedProperty Internal,External
DisplayName Forward Optimization DisplayValue 1
SLB Gateway Performance Tuning in Software
Defined Networks
4/24/2017 2 min to read Edit Online
Software load balancing is provided by a combination of a load balancer manager in the Network Controller VMs,
the Hyper-V Virtual Switch and a set of Load Balancer Multixplexor (Mux) VMs.
No additional performance tuning is required to configure the Network Controller or the Hyper-V host for load
balancing beyond what is described in the Software Defined Networking section, unless you will be using SR-IOV
for the Muxes as described below.
Then, it must be enabled on the virtual network adapter(s) of the SLB Mux VM which process the data traffic. In this
example, SR-IOV is being enabled on all adapters:
Storage Spaces Direct, a Windows Server-based software-defined storage solution, automatically tunes its
performance, obviating the need to manually specify column counts, the cache configuration of the hardware you
use, and other factors that must be set manually with shared SAS storage solutions. For background info, see
Storage Spaces Direct in Windows Server 2016.
The Storage Spaces Direct Software Storage Bus Cache is automatically configured based on the types of storage
present in the system. Three types recognized: HDD, SSD and NVMe. The cache claims the fastest storage for read
and/or write caching, as appropriate, and uses the slower storage for persistent storage of data.
The following table summarizes the defaults:
Any Single Type If there is only one type of storage present, the Software
Storage Bus Cache isn't configured.
SSD+HDD or NVMe+HDD The fastest storage is configured as the cache layer and caches
both reads and writes.
Note that caching over an SSD or NVMe device defaults to write caching, only. The intention is that since the
capacity device is fast, there is limited value in moving read content to the cache devices. There are cases where this
may not hold, though care should be taken since enabling read cache may unnecessarily consume cache device
endurance for no increase in performance. Examples may include:
NVme+SSD Enabling read cache will allow read IO to take advantage of the PCIe connectivity and/or higher
IOPS performance of the NVMe devices as compared to the aggregated SSD.
This may be true for bandwidth-oriented scenarios due to the relative bandwidth capabilities of the NVMe
devices vs. the HBA connecting to the SSD. It may not be true for IOPS-oriented scenarios where CPU costs of
IOPS may limit systems before the increased performance can be realized.
NVMe+NVMe Similarly, if the read capability of the cache NVMe are greater than the combined capacity
NVMe, there may be value in enabling read cache.
Good cases for read cache in these configurations are expected to be unusual.
To view and alter the cache configuration, use the Get-ClusterStorageSpacesDirect and Set-
ClusterStorageSpacesDirect cmdlets. The CacheModeHDD and CacheModeSSD properties define how the cache
operates on capacity media of the indicated type.
See also
Understanding Storage Spaces Direct
Planning Storage Spaces Direct
Performance tuning for file servers
Software-Defined Storage Design Considerations Guide (for Windows Server 2012 R2 and shared SAS storage)
Frequently Asked Questions about Storage Replica
5/31/2017 10 min to read Edit Online
This topic contains answers to frequently asked questions (FAQs) about Storage Replica.
NOTE
You must use the Storage Nano Server package during setup. For more information about deploying Nano Server, see
Getting Started with Nano Server.
NOTE
This step is only necessary if the computer is not a member of an Active Directory Domain Services forest or in an
untrusted forest. It adds NTLM support to PSSession remoting, which is disabled by default for security reasons. For
more information, see PowerShell Remoting Security Considerations.
2. To install the Storage Replica feature, run the following cmdlet from a management computer:
Using the Test-SRTopology cmdlet with Nano Server in Windows Server 2016 requires remote script
invocation with CredSSP. Unlike other Storage Replica cmdlets, Test-SRTopology requires running locally on
the source server.
On the Nano server (through a remote PSSession) :
NOTE
CREDSSP is needed for Kerberos double-hop support in the Test-SRTopology cmdlet, and not needed by other
Storage Replica cmdlets, which handle distributed system credentials automatically. Using CREDSSP is not
recommended under typical circumstances. For an alternative to CREDSSP, review the following Microsoft blog post:
"PowerShell Remoting Kerberos Double Hop Solved Securely" -
https://blogs.technet.microsoft.com/ashleymcglone/2016/08/30/powershell-remoting-kerberos-double-hop-solved-
securely/
Enable-WSManCredSSP -role server
$CustomCred = Get-Credential
Then copy the results to your management computer or share the path. Because Nano lacks the necessary
graphical libraries, you can use Test-SRTopology to process the results and give you a report file with charts.
For example:
Get-SRGroup
do{
$r=(Get-SRGroup -Name "Replication 2").replicas
[System.Console]::Write("Number of remaining bytes {0}`n", $r.NumOfBytesRemaining)
Start-Sleep 10
}until($r.ReplicationStatus -eq 'ContinuouslyReplicating')
Write-Output "Replica Status: "$r.replicationstatus
Get-SRPartnership
Get-NetIPConfiguration
Note the gateway and interface information (on both servers) and the partnership directions. Then run:
Set-SRNetworkConstraint -SourceComputerName sr-srv06 -SourceRGName rg02 -
SourceNWInterface 2 -DestinationComputerName sr-srv05 -DestinationNWInterface 3 -DestinationRGName rg01
Get-SRNetworkConstraint
Update-SmbMultichannelConnection
The cmdlet will remind you that the user needs to log off and on of the server they are planning to administer in
order for the change to take effect. You can use Get-SRDelegation and Revoke-SRDelegation to further control this.
You can also schedule this tool to run periodically using a scheduled task. For more information on using VSS,
review Vssadmin. There is no need or value in backing up the log volumes. Attempting to do so will be ignored by
VSS. Use of Windows Server Backup, Microsoft Azure Backup, Microsoft DPM, or other snapshot, VSS, virtual
machine, or file-based technologies are supported by Storage Replica as long as they operate within the volume
layer. Storage Replica does not support block-based backup and restore.
Related Topics
Storage Replica Overview
Stretch Cluster Replication Using Shared Storage
Server to Server Storage Replication
Cluster to Cluster Storage Replication
Storage Replica: Known Issues
See Also
Storage Overview
Storage Spaces Direct in Windows Server 2016
Advanced Data Deduplication settings
4/24/2017 12 min to read Edit Online
This document describes how to modify advanced Data Deduplication settings. For recommended workloads, the
default settings should be sufficient. The main reason to modify these settings is to improve Data Deduplication's
performance with other kinds of workloads.
The most common reason for changing when Data Deduplication jobs run is to ensure that jobs run during off
hours. The following step-by-step example shows how to modify the Data Deduplication schedule for a sunny day
scenario: a hyper-converged Hyper-V host that is idle on weekends and after 7:00 PM on weeknights. To change the
schedule, run the following PowerShell cmdlets in an Administrator context.
1. Disable the scheduled hourly Optimization jobs.
2. Remove the currently scheduled Garbage Collection and Integrity Scrubbing jobs.
3. Create a nightly Optimization job that runs at 7:00 PM with high priority and all the CPUs and memory
available on the system.
4. Create a weekly Garbage Collection job that runs on Saturday starting at 7:00 AM with high priority and all
the CPUs and memory available on the system.
5. Create a weekly Integrity Scrubbing job that runs on Sunday starting at 7 AM with high priority and all the
CPUs and memory available on the system.
Type The type of the job that Optimization This value is required
should be scheduled GarbageCollection because it is the type of job
Scrubbing that you want to have be
scheduled. This value cannot
be changed after the task
has been scheduled.
Priority The system priority of the High This value helps the system
scheduled job Medium determine how to allocate
Low CPU time. High will use
more CPU time, low will use
less.
Days The days that the job is An array of integers 0-6 Scheduled tasks have to run
scheduled representing the days of the on at least one day.
week:
0 = Sunday
1 = Monday
2 = Tuesday
3 = Wednesday
4 = Thursday
5 = Friday
6 = Saturday
Cores The percentage of cores on Integers 0-100 (indicates a To control what level of
the system that a job should percentage) impact a job will have on the
use compute resources on the
system
WHY WOULD YOU WANT TO
PARAMETER NAME DEFINITION ACCEPTED VALUES SET THIS VALUE?
DurationHours The maximum number of Positive integers To prevent a job for running
hours a job should be into a workload's non-idle
allowed to run hours
Enabled Whether the job will run True/false To disable a job without
removing it
Full For scheduling a full Garbage Switch (true/false) By default, every fourth job
Collection job is a full Garbage Collection
job. With this switch, you can
schedule full Garbage
Collection to run more
frequently.
InputOutputThrottle Specifies the amount of Integers 0-100 (indicates a Throttling ensures that jobs
input/output throttling percentage) don't interfere with other
applied to the job I/O-intensive processes.
Memory The percentage of memory Integers 0-100 (indicates a To control what level of
on the system that a job percentage) impact the job will have on
should use the memory resources of the
system
Name The name of the scheduled String A job must have a uniquely
job identifiable name.
ReadOnly Indicates that the scrubbing Switch (true/false) You want to manually
job processes and reports on restore files that sit on bad
corruptions that it finds, but sections of the disk.
does not run any repair
actions
Start Specifies the time a job System.DateTime The date part of the
should start System.Datetime provided
to Start is irrelevant (as long
as it's in the past), but the
time part specifies when the
job should start.
StopWhenSystemBusy Specifies whether Data Switch (True/False) This switch gives you the
Deduplication should stop if ability to control the
the system is busy behavior of Data
Deduplication--this is
especially important if you
want to run Data
Deduplication while your
workload is not idle.
The main reasons to modify the volume settings from the selected usage type are to improve read performance for
specific files (such as multimedia or other file types that are already compressed) or to fine-tune Data Deduplication
for better optimization for your specific workload. The following example shows how to modify the Data
Deduplication volume settings for a workload that most closely resembles a general purpose file server workload,
but uses large files that change frequently.
1. See the current volume settings for Cluster Shared Volume 1.
2. Enable OptimizePartialFiles on Cluster Shared Volume 1 so that the MinimumFileAge policy applies to
sections of the file rather than the whole file. This ensures that the majority of the file gets optimized even
though sections of the file change regularly.
ChunkRedundancyThreshold The number of times that a Positive integers The main reason to modify
chunk is referenced before a this number is to increase
chunk is duplicated into the the savings rate for volumes
hotspot section of the with high duplication. In
Chunk Store. The value of general, the default value
the hotspot section is that (100) is the recommended
so-called "hot" chunks that setting, and you shouldn't
are referenced frequently need to modify this.
have multiple access paths
to improve access time.
ExcludeFileType File types that are excluded Array of file extensions Some file types, particularly
from optimization multimedia or files that are
already compressed, do not
benefit very much from
being optimized. This setting
allows you to configure
which types are excluded.
ExcludeFolder Specifies folder paths that Array of folder paths If you want to improve
should not be considered for performance or keep content
optimization in particular paths from
being optimized, you can
exclude certain paths on the
volume from consideration
for optimization.
WHY WOULD YOU WANT TO
SETTING NAME DEFINITION ACCEPTED VALUES MODIFY THIS VALUE?
InputOutputScale Specifies the level of IO Positive integers ranging 1- The main reason to modify
parallelization (IO queues) 36 this value is to decrease the
for Data Deduplication to impact on the performance
use on a volume during a of a high IO workload by
post-processing job restricting the number of IO
queues that Data
Deduplication is allowed to
use on a volume. Note that
modifying this setting from
the default may cause Data
Deduplication's post-
processing jobs to run
slowly.
MinimumFileAgeDays Number of days after the file Positive integers (inclusive of The Default and HyperV
is created before the file is zero) usage types set this value to
considered to be in-policy 3 to maximize performance
for optimization. on hot or recently created
files. You may want to
modify this if you want Data
Deduplication to be more
aggressive or if you do not
care about the extra latency
associated with
deduplication.
MinimumFileSize Minimum file size that a file Positive integers (bytes) The main reason to change
must have to be considered greater than 32 KB this value is to exclude small
in-policy for optimization files that may have limited
optimization value to
conserve compute time.
NoCompressionFileType File types whose chunks Array of file extensions Some types of files,
should not be compressed particularly multimedia files
before going into the Chunk and already compressed file
Store types, may not compress
well. This setting allows
compression to be turned off
for those files, saving CPU
resources.
WHY WOULD YOU WANT TO
SETTING NAME DEFINITION ACCEPTED VALUES MODIFY THIS VALUE?
OptimizeInUseFiles When enabled, files that True/false Enable this setting if your
have active handles against workload keeps files open for
them will be considered as extended periods of time. If
in-policy for optimization. this setting is not enabled, a
file would never get
optimized if the workload
has an open handle to it,
even if it's only occasionally
appending data at the end.
WlmMemoryOverPercentThr This setting allows jobs to Positive integers (a value of If you have another task that
eshold use more memory than Data 300 means 300% or 3 times) will stop if Data
Deduplication judges to Deduplication takes more
actually be available. For memory
example, a setting of 300
would mean that the job
would have to use three
times the assigned memory
to get canceled.
DeepGCInterval This setting configures the Integers (-1 indicates See this frequently asked
interval at which regular disabled) question
Garbage Collection jobs
become full Garbage
Collection jobs. A setting of
n would mean that every nth
job was a full Garbage
Collection job. Note that full
Garbage Collection is always
disabled (regardless of the
registry value) for volumes
with the Backup Usage Type.
Start-DedupJob -Type
GarbageCollection -Full
may be used if full Garbage
Collection is desired on a
Backup volume.
This document discusses general guidelines for achieving the best possible performance of PowerShell 5.1. Some
of the issues described in this document may be addressed in future versions.
This document does not describe best practices.
The guidance in this document should be applied in a thoughtful manner.
Performance is often not an issue, saving 10ms or 100ms might go completely unnoticed.
Some guidance in this document describes atypical PowerShell usage that may be confusing and unfamiliar to
some PowerShell users.
The following topics provide specific guidance.
Script Authoring Considerations
Module Authoring Considerations
PowerShell scripting performance considerations
5/3/2017 2 min to read Edit Online
PowerShell scripts that leverage .NET directly and avoid the pipeline tend to be faster than idiomatic PowerShell.
Idiomatic PowerShell typically uses cmdlets and PowerShell functions heavily, often leveraging the pipeline, and
dropping down into .NET only when necessary.
NOTE
Many of the techniques described here are not idiomatic PowerShell and may reduce the readability of a PowerShell script.
Script authors are advised to use idiomatic PowerShell unless performance dictates otherwise.
Suppressing Output
There are many ways to avoid writing objects to the pipeline:
$null = $arrayList.Add($item)
[void]$arrayList.Add($item)
Assignment to $null or casting to [void] are roughly equivalent and should generally be preferred where
performance matters.
File redirection to $null is nearly as good as the previous alternatives, most scripts would never notice the
difference. Depending on the scenario, file redirection does introduce a little bit of overhead though.
$arrayList.Add($item) | Out-Null
Piping to Out-Null has significant overhead when compared to the alternatives. It should be avoiding in
performance sensitive code.
$null = . {
$arrayList.Add($item)
$arrayList.Add(42)
}
Introducing a script block and calling it (using dot sourcing or otherwise) then assigning the result to $null is a
convenient technique for suppressing the output of a large block of script. This technique performs roughly as well
as piping to Out-Null and should be avoided in performance sensitive script. The extra overhead in this example
comes from the creation of and invoking a script block that was previously inline script.
Array Addition
Generating a list of items is often done using an array with the addition operator:
$results = @()
$results += Do-Something
$results += Do-SomethingElse
$results
This can be very inefficent because arrays are immutable. Each addition to the array actually creates a new array
big enough to hold all elements of both the left and right operands, then copies the elements of both operands into
the new array. For small collections, this overhead may not matter. For large collections, this can definitely be an
issue.
There are a couple of alternatives. If you don't actually require an array, instead consider using an ArrayList:
$results = [System.Collections.ArrayList]::new()
$results.AddRange((Do-Something))
$results.AddRange((Do-SomethingElse))
$results
If you do require an array, you can use your own ArrayList and simply call ArrayList.ToArray when you want the
array. Alternatively, you can let PowerShell create the ArrayList and Array for you:
$results = @(
Do-Something
Do-SomethingElse
)
In this example, PowerShell creates an ArrayList to hold the results written to the pipeline inside the array
expression. Just before assigning to $results , PowerShell converts the ArrayList to an object[] .
This can be nearly an order of magnitude slower than using .NET APIs directly:
try
{
$stream = [System.IO.StreamReader]::new($path)
while ($line = $stream.ReadLine())
{
if ($line.Length -gt 10)
{
$line
}
}
}
finally
{
$stream.Dispose()
}
Avoid Write-Host
It is generally considered poor practice to write output directly to the console, but when it makes sense, many
scripts use Write-Host .
If you must write many messages to the console, Write-Host can be an order of magnitude slower than
[Console]::WriteLine() .
PowerShell module authoring considerations
5/3/2017 1 min to read Edit Online
This document includes some guidelines related to how a module is authored for best performance.
If the module does not export commands of a particular type, specify this explicitly in the manifest by
specifying @() . A missing or $null entry is equivalent to specifying the wildcard * .
The following should be avoided where possible:
@{
FunctionsToExport = '*'
Instead, use:
@{
FunctionsToExport = 'Format-Hex', 'Format-Octal'
CmdletsToExport = @() # Specify an empty array, not $null
AliasesToExport = @() # Also ensure all three entries are present
}
Avoid CDXML
When deciding how to implement your module, there are three primary choices:
Binary (usually C#)
Script (PowerShell)
CDXML (an XML file wrapping CIM)
If the speed of loading your module is important, CDXML is roughly an order of magnitude slower than a binary
module.
A binary module loads the fastest because it is compiled ahead of time and can use NGen to JIT compile once per
machine.
A script module typically loads a bit more slowly than a binary module because PowerShell must parse the script
before compiling and executing it.
A CDXML module is typically much slower than a script module because it must first parse an XML file which then
generates quite a bit of PowerShell script that is then parsed and compiled.
Additional performance tuning resources
4/24/2017 1 min to read Edit Online
Use the links in this topic to learn more about the concepts that were discussed in this tuning guide.