Module 4 Implementing Storage Spaces and Data Deduplication
Module 4 Implementing Storage Spaces and Data Deduplication
Contents:
Module Overview
Lesson 1: Implementing Storage Spaces
Lesson 2: Managing Storage Spaces
Lab A: Implementing Storage Spaces
Lesson 3: Implementing Data Deduplication
Lab B: Implementing Data Deduplication
Module Review and Takeaways
Module Overview
The Windows Server 2016 operating system introduces a number of storage
technologies and improvements to existing storage technologies. You can use Storage
Spaces, a feature of Windows Server 2016, to virtualize and provision storage based on
storage pools and virtual disks which abstract the physical storage from the operating
system. Data Deduplication is a feature that you can use to find and remove duplicate
data while maintaining your data’s integrity. This module describes how to use these
two new features within your Windows Server storage architecture.
Objectives
Describe and implement the Storage Spaces feature in the context of enterprise
•
storage needs.
• Manage and maintain Storage Spaces.
• Describe and implement Data Deduplication.
Lesson Objectives
After completing this lesson, you will be able to:
• Implement Storage Spaces as an enterprise storage solution.
• Describe the Storage Spaces feature and its components.
Describe the features of Storage Spaces, including storage layout, drive allocation,
•
and provisioning schemes such as thin provisioning.
• Describe changes to the Storage Spaces feature in Windows Server 2016.
Describe common usage scenarios for storage spaces, and weigh their benefits and
•
limitations.
• Compare using Storage Spaces to using other storage solutions.
In most organizations, discussions about storage needs can be straining. This is typically
because storage costs are a major item on many Information Technology (IT) budgets.
Despite the decreasing cost of individual units of storage, the amount of data that
organizations produce continues to grow rapidly, so the overall cost of storage continues
to grow.
Finally, you should consider using disaggregated compute and storage deployments
when planning how to lower the costs of delivering IaaS storage services. While many
converged compute/storage solutions provide simpler management features, they also
require scaling both components simultaneously. In other words, you might have to add
compute power in the same ratio as previous hardware when expanding storage. To
achieve lower costs of delivering IaaS storage service, you should consider independent
management and independent scaling when planning your storage solution.
While your requirements might dictate which advanced features to consider during your
storage planning, the primary drivers are typically capacity, performance, cost, and
resiliency when assessing storage solutions. Although you could have lengthy
discussions over each of these drivers separately, your storage solution needs to be a
balanced storage deployment approach.
When planning your balanced storage deployment approach to meet your storage needs,
you will need to assess your capacity and performance requirements in relation to your
cost. For cost efficiency, your storage environment should utilize solid-state disks
(SSDs) for highly active data (higher performance for the cost) and hard disk drives
(HDDs) for data accessed infrequently (higher capacity for the cost).
If you deploy only HDDs, your budget constraints will prevent you from meeting your
performance requirements; this is because HDDs provide higher capacity, but with
lower performance. Likewise, if you deploy only SSDs, your budget constraints will
prevent you from meeting your capacity requirements; this is because SSDs provide
higher performance, but with lower capacity. As a result, your balanced storage
deployment approach will most likely include a mix of HDDs and SSDs to achieve the
best performance and capacity at the appropriate cost.
Included in your storage planning, you should consider whether your storage solution
needs to support the common capabilities of most storage products, such as:
• Mirror/parity support
• Data stripping
• Enclosure awareness
• Storage tiering
• Storage replication
• Data deduplication
• Data encryption
Performance analysis
• Note: This list is only meant to provide suggestions and is not an exhaustive list of
the common capabilities of most storage products. The storage requirements of your
organization might differ.
The growth in the size of data volumes, the ever-increasing cost of storage, and the need
to ensure high availability of data volumes can be difficult problems for IT departments
to solve. Windows Server 2016 provides a number of storage features that aim to
address these important facets of storage management.
Question: Which factors should you consider when planning your enterprise storage
strategy?
Storage Spaces is a storage virtualization feature built into Windows Server 2016 and
Windows 10.
Storage pools. Storage pools are a collection of physical disks aggregated into a
single logical disk, allowing you to manage the multiple physical disks as a single
•
disk. You can use Storage Spaces to add physical disks of any type and size to a
storage pool.
Storage spaces. Storage spaces are virtual disks created from free space in a storage
pool. Storage spaces have attributes such as resiliency level, storage tiers, fixed
•
provisioning, and precise administrative control. The primary advantage of storage
spaces is that you no longer need to manage single disks. Instead, you can manage
them as one unit. Virtual disks are the equivalent of a logical unit number (LUN) on a
SAN.
Note: The virtual disks that you create with the Storage Spaces feature are not the
same as the virtual hard disk files that have the .vhd and .vhdx file extensions.
Physical disks. Physical disks are disks such as Serial Advanced Technology
Attachment (SATA) or serial-attached SCSI (SAS) disks. If you want to add physical
disks to a storage pool, the disks must adhere to the following requirements:
Note: When planning your Storage Spaces deployment, you need to verify whether the
storage enclosure is certified for Storage Spaces in Windows Server 2016. For Storage
Spaces to identify disks by slot and use the array’s failure and identify/locate lights, the
array must support SCSI Enclosure Services (SES) version 3.
Additional Reading: For more information, refer to: “Windows Server Catalog” at:
http://aka.ms/Rdpiy8
You can format a storage space virtual disk with an FAT32 file system, New
Technology File System (NTFS) file system, or Resilient File System (ReFS). You will
need to format the virtual disk with NTFS if you plan to use the storage space as part of
a Clustered Shared Volume (CSV), for Data Deduplication, or with File Server
Resource Manager (FSRM).
Feature Description
Storage layout Storage layout is one of the characteristics that defines the number of
disks from the storage pool that are allocated. Valid options include:
Note: The number of columns for a given storage space can also
impact the number of disks.
Disk sector A storage pool’s sector size is set the moment it is created. Its default
size sizes are set as follows:
If the list of drives being used contains only 512 and 512e drives,
the pool sector size is set to 512e. A 512 disk uses 512-byte sectors.
•
A 512e drive is a hard disk with 4,096-byte sectors that emulates
512-byte sectors.
If the list contains at least one 4-kilobyte (KB) drive, the pool
•
sector size is set to 4 KB.
Cluster disk Failover clustering prevents work interruptions if there is a computer
requirement failure. For a pool to support failover clustering, all drives in the pool
must support SAS.
Drive Drive allocation defines how the drive is allocated to the pool.
allocation Options are:
When creating pools, Storage Spaces can use any DAS device. You can use SATA and
SAS drives (or even older integrated drive electronics [IDE] and SCSI drives) that are
connected internally to the computer. When planning your Storage Spaces storage
subsystems, you must consider the following factors:
Fault tolerance. Do you want data to be available in case a physical disk fails? If so,
• you must use multiple physical disks and provision virtual disks by using mirroring
or parity.
Performance. You can improve performance for read and write actions by using a
parity layout for virtual disks. You also need to consider the speed of each individual
physical disk when determining performance. Alternatively, you can use disks of
•
different types to provide a tiered system for storage. For example, you can use SSDs
for data to which you require fast and frequent access and use SATA drives for data
that you do not access as frequently.
Reliability. Virtual disks in parity layout provide some reliability. You can improve
• that degree of reliability by using hot spare physical disks in case a physical disk
fails.
Extensibility. One of the main advantages of using Storage Spaces is the ability to
expand storage in the future by adding physical disks. You can add physical disks to a
•
storage pool any time after you create it to expand its storage capacity or to provide
fault tolerance.
In the VIRTUAL DISKS pane, create a New Virtual Disk with the following
settings:
The following file and storage services features are new or improved in Windows
Server 2016:
Storage Spaces Direct. This feature enables you to build highly available storage
• systems by using storage nodes with only local storage. You will learn more about
this feature later in this module.
Storage Replica. This new feature in Windows Server 2016 enables replication—
between servers or clusters that are in the same location or different sites—for
• disaster recovery. Storage Replica includes both synchronous and asynchronous
replication for shorter or longer distance between sites. This enables you to achieve
storage replication at a lower cost.
Storage Quality of Service (QoS). With this feature, you can create centralized QoS
policies on a Scale-Out File Server and assign them to virtual disks on Hyper-V
•
virtual machines. QoS ensures that performance for the storage adapts to meet
policies as the storage load changes.
Data Deduplication. This feature was introduced in Windows Server 2012 and is
improved in Windows Server 2016 in the following areas (more information about
Data Deduplication is covered later in this module):
Support for volume sizes up to 64 terabytes (TB). The feature has been redesigned
in Windows Server 2016 and is now multithreaded and able to utilize multiple
o
CPU’s per volume to increase optimization throughput rates on volume sizes up to
64 TB.
• Support for file sizes up to 1 TB. With the use of new stream map structures and
o other improvements to increase optimization throughput and access performance,
deduplication in Windows Server 2016 performs well on files up to 1 TB.
Simplified deduplication configuration for virtualized backup applications. In
o Windows Server 2016, the configuration of deduplication for virtualized backup
applications is simplified when enabling deduplication for a volume.
Support for Nano Server. A new deployment option in Windows Server 2016,
o
Nano Server fully supports Data Deduplication.
Support for cluster rolling upgrades. You can upgrade each node in an existing
• Windows Server 2012 R2 cluster to Windows Server 2016 without incurring
downtime to upgrade all the nodes at once.
Server Message Block (SMB) hardening improvements. In Windows Server 2016,
client connections to the Active Directory Domain Services default SYSVOL and
NETLOGON shares on domain controllers now require SMB signing and mutual
authentication (e.g., Kerberos authentication). This change reduces the likelihood of
•
man-in-the-middle attacks. If SMB signing and mutual authentication are
unavailable, a Windows Server 2016 computer won’t process domain-based Group
Policy and scripts.
Note: The registry values for these settings aren’t present by default; however, the
hardening rules still apply until Group Policy or other registry values override them.
Windows Server 2012 R2 and Windows Server 2012 offered several new and improved
file and storage-services features over its predecessor, including:
When considering whether to use Storage Spaces in a given situation, you should weigh
the following benefits and limitations. The Storage Spaces feature was designed to
enable storage administrators to:
There are, however, inherent limitations in Storage Spaces. For example, in Windows
Server 2016, the following are some of the limitations that you should consider when
planning:
When planning for the reliability of a particular workload in your environment, Storage
Spaces provide different resiliency types. As a result, some workloads are better suited
for specific resilient scenarios. The following table depicts these recommended
workload types.
3 (three-way mirror)
Parity 2 (single parity) Sequential workloads with large units of
read/write, such as archival
3 (dual parity)
Resiliency Number of data Workload recommendations
type copies maintained
Simple 1 Workloads which do not need resiliency, or
provide alternate resiliency mechanism
Storage Spaces Direct removes the need for a shared SAS fabric, simplifying
deployment and configuration. Instead, it uses the existing network as a storage fabric,
leveraging SMB 3.0 and SMB Direct for high-speed, low-latency CPU efficient storage.
To scale out, you simply add more servers to increase storage capacity and I/O
performance.
Storage Spaces Direct can be deployed in support of either primary storage of Hyper-V
Virtual Machine (VM) file or secondary storage for Hyper-V Replica virtual machine
files. In Windows Server 2016, both options provide storage for Hyper-V, specifically
focusing on Hyper-V IaaS (Infrastructure as a Service) for service providers and
enterprises.
You also can deploy Storage Spaces Direct in support of SQL Server 2012 or newer,
which can store both system and user database files. SQL Server is configured to store
these files on SMB 3.0 file shares for both stand-alone and clustered instances of SQL
Server. The database server accesses the SOFS over the network using the SMB 3.0
protocol. This scenario requires Windows Server 2012 or newer on both the file servers
and the database servers.
Note: Storage Spaces does not support Exchange Server workloads currently.
You can use Storage Spaces inside an Azure virtual machine to combine multiple
virtual hard drives, creating more storage capacity or performance than is available from
a single Azure virtual hard drive. There are three supported scenarios for using Storage
Spaces in Azure virtual machines, but there are some limitations and best practices that
you should follow, as described below.
Multi-tenant scenarios
You can provide delegation of administration of storage pools through access control
lists (ACLs). You can delegate on a per-storage-pool basis, thereby supporting hosting
scenarios that require tenant isolation. Because Storage Spaces uses the Windows
security model, it can be integrated fully with Active Directory Domain Services.
Storage Spaces can be made visible only to a subset of nodes in the file cluster. This can
be used in some scenarios to leverage the cost and management advantage of larger
shared clusters and to segment those clusters for performance or access purposes.
Additionally, you can apply ACLs at various levels of the storage stack (for example,
file shares, CSV, and storage spaces). In a multitenant scenario, this means that the full
storage infrastructure can be shared and managed centrally and that you can design
dedicated and controlled access to segments of the storage infrastructure. You can
configure a particular customer to have LUNs, storage pools, storage spaces, cluster
shared volumes, and file shares dedicated to them, and ACLs can ensure only that the
tenant has access to them.
Additionally, by using SMB Encryption, you can ensure all access to the file-based
storage is encrypted to protect against tampering and eavesdropping attacks. The biggest
benefit of using SMB Encryption over more general solutions, such as IPsec, is that
there are no deployment requirements or costs beyond changing the SMB settings on
the server. The encryption algorithm used is AES-CCM, which also provides data
integrity validation.
Question: What are the advantages of using Storage Spaces compared to using SANs or
NAS?
Question: What are the disadvantages of using Storage Spaces compared to using
SANs or NAS?
Lesson Objectives
After completing this lesson, you will be able to:
• Describe how to manage Storage Spaces.
• Explain how use Storage Spaces to mitigate storage failure.
• Explain how to expand your storage pool.
• Describe how to use event logs and performance counters to monitor Storage Spaces.
Storage Spaces is integrated with failover clustering for high availability, and integrated
with cluster shared volumes (CSV) for SOFS deployments. You can manage Storage
Spaces by using:
• Server Manager
• Windows PowerShell
• Failover Cluster Manager
• System Center Virtual Machine Manager
• Windows Management Instrumentation (WMI)
Server Manager provides you with the ability to perform basic management of virtual
disks and storage pools. In Server Manager, you can create storage pools; add and
remove physical disks from pools; and create, manage, and delete virtual disks. For
example, in Server Manager you can view the physical disks that are attached to a
virtual disk. If any of these disks are unhealthy, you will see an unhealthy disk icon next
to the disk name.
Windows PowerShell provides advanced management options for virtual disks and
storage pools. The following table lists some examples of management cmdlets.
Additional Reading: For more information, refer to: “Storage Cmdlets in Windows
PowerShell” at: http://aka.ms/po9qve
To use Storage Spaces cmdlets in Windows PowerShell, you must download the
StorageSpaces module for use in Windows Server 2016. For more information, refer to:
“Storage Spaces Cmdlets in Windows PowerShell” at: http://aka.ms/M1fccp
When planning for storage tiering, you should assess the workload characteristics of
your storage environment so that you can store your data most cost-effectively
depending on how you use it. In Windows Server 2016, the server automatically
optimizes your storage performance by transparently moving the data that's accessed
more frequently to your faster solid state drives (the SSD tier) and moving less active
data to your less expensive, but higher capacity, hard disk drives (the HDD tier).
In many environments, the most common workload characteristics includes a large data
set with a majority of the data that is typically cold. Cold, or cool, data is files that you
access infrequently, and have a longer lifespan. In contrast, the most common workload
characteristics also includes a smaller portion of the data that is typically hot. Hot data,
commonly referred to as working set, is files that you are working on currently; this part
of the data set is highly active and changes over time.
Note: The storage tiers optimization process moves data, not files; the data is mapped
and moved at a sub-file level. For example, if only 30 percent of the data on a virtual
hard disk is hot, only that 30 percent of the data is moved to your SSD tier.
Additionally, when planning for storage tiering, you should assess if there are situations
in which a file works best when placed in a specific tier. For example, you need to place
an important file in the fast tier, or you need to place a backup file in the slow tier. For
these situations, your storage solution might have the option to assign a file to a
particular tier, also referred to as pinning the file to a tier.
Before you create storage spaces, plan ahead and give yourself room to fine-tune the
storage spaces after you observe your workloads in action. After observing input/output
operations per second (IOPS) and latency, you will be able to predict the storage
requirements of each workload more accurately. Here are some recommendations when
planning ahead:
Don't allocate all available SSD capacity for your storage spaces immediately. Keep
• some SSD capacity in the storage pool in reserve, so you can increase the size of an
SSD tier when a workload demands it.
Don't pin files to storage tiers until you see how well Storage Tiers Optimization can
optimize storage performance. When a tenant or workload requires a particular level
•
of performance, you can pin files to a storage tier to ensure that all I/O activity is
performed on that tier.
Do consider pinning the parent VHDX file to the SSD tier if you're providing pooled
desktops through VDI. If you have deployed a Virtual Desktop Infrastructure (VDI)
•
to provide pooled desktops for users, you should consider pinning the master image
that's used to clone users' desktops to the SSD tier.
You should use the Storage Tier Optimization Report when observing or monitoring
your workloads. This report is used to check the performance of your storage tiers and
identify the changes that might optimize their performance. As part of the performance
analysis, the report provides data for answering questions such as, “How large is my
working set?” and “How much do I gain by adding SSD capacity?”
Additional Reading: For more information, refer to: “Monitoring Storage Tiers
Performance” at: http://aka.ms/Sz4zfi
To help avoid problems caused by failing hardware, your storage plan should account
for the types and number of failures, which might occur in your environment. You
should also plan for how your solution should handle each fault without service
interruption.
Design a complete, fault-tolerant storage solution. For example, if you want your
storage solution to be able to tolerate a single fault at any level, you need this
minimum setup:
o All storage spaces in the storage pool must use fixed provisioning.
o Two-way mirror spaces must use three or more physical disks.
o Three-way mirror spaces must use five or more physical disks.
o All physical disks in a clustered pool must be connected by using SAS.
All physical disks must support persistent reservations and pass the failover cluster
validation tests.
o
Note: The SAS JBOD must be physically connected to all cluster nodes that will
use the storage pool. Direct attached storage that is not connected to all cluster
nodes is not supported for clustered storage pools with Storage Spaces.
Unless you deployed a highly available storage pool, import a storage pool on another
server if the system fails. In Windows Server 2016, Storage Spaces writes the
• configuration about the storage pool directly to the disks. Therefore, if the single-
point-of-failure system fails and the server hardware requires replacement or a
complete reinstall, you can mount a storage pool on another server.
Most problems with Storage Spaces occur because of incompatible hardware or
because of firmware issues. To reduce problems, follow these best practices:
Use only certified SAS-connected JBODs. These enclosure models have been
o tested with Storage Spaces and enable you to identify the enclosure and slot for a
physical disk easily.
Don't mix and match disk models within a JBOD. Use one model of solid-state
drive SSD and one model of HDD for all disks in a JBOD (assuming that you are
o
• using storage tiers), and make sure that the disks are fully compatible with the
JBOD model.
Install the latest firmware and driver versions on all disks. Install the firmware
version that is listed as approved for the device in the Windows Server Catalog or
o
is recommended by your hardware vendor. Within a JBOD, it's important that all
disks of the same model have the same firmware version.
Follow the vendor's recommendations for disk placement. Install disks in the slots
o recommended by your hardware vendor. JBODs often have different requirements
for placement of SSDs and HDDs, for cooling and other reasons.
Unless you enabled hot spares, retire missing disks automatically. The default policy
for handling a physical disk that goes missing from a storage pool (-
RetireMissingPhysicalDisks = Auto) simply marks the disk as missing (Lost
Communication), and no repair operation on the virtual disks takes place. This
policy avoids potentially I/O-intensive virtual disk repairs if a disk temporarily goes
• offline, but the storage pool health will remain degraded, compromising resiliency if
another disk fails before an administrator takes action. Unless you are using hot
spares, we recommend that you change the RetireMissingPhysicalDisks policy to
Always, to initiate virtual disk repair operations automatically if a disk loses
communication with the system, restoring the health of the pool and the dependent
storage spaces as soon as possible.
Always replace the physical disk before you remove the drive from the storage pool.
Changing the storage pool configuration before you replace the physical disk in the
•
enclosure can cause an I/O failure or initiate virtual disk repair, which can result in a
“STOP 0x50” error and potential data loss.
As a general rule, keep unallocated disk space in the pool for virtual disk repairs
instead of using hot spares. In Windows Server 2016, you have the option to use
available capacity on existing disks in the pool for disk repair operations instead of
bringing a hot spare online. This enables Storage Spaces to automatically repair
storage spaces with failed disks by copying data to multiple disks in the pool,
significantly reducing the time it takes to recover from the failed disk when compared
with using hot spares, and it lets you use the capacity on all disks instead of setting
• aside hot spares.
To correct a failed disk in a virtual disk or storage pool, you must remove the disk
o that is causing the problem. Actions such as defragmenting, scan disk, or using
chkdsk cannot repair a storage pool.
To replace a failed disk, you must add a new disk to the pool. The new disk
o resynchronizes automatically when disk maintenance occurs during daily
maintenance. Alternatively, you can trigger disk maintenance manually.
When you configure column counts, make sure you have enough physical disks to
support automatic virtual disk repairs. Typically, you should configure the virtual
disk with 3-4 columns for a good balance of throughput and low latency. Increasing
the column count increases the number of physical disks across which a virtual disk is
striped, which increases throughput and IOPS for that virtual disk. However,
•
increasing the column count can also increase latency. For this reason, you should
optimize overall cluster performance by using multiple virtual disks with 3−4
columns (when using mirrors) or seven columns when using parity spaces. The
performance of the entire cluster remains high because multiple virtual disks are used
in parallel, making up for the reduced column count.
Be prepared for multiple disk failures. If you purchased all of the disks in an
enclosure at the same time, the disks are the same age, and the failure of one disk
might be followed fairly quickly by other disk failures. Even if the storage spaces
return to health after the initial disk repairs, you should replace the failed disk as soon
•
as possible to avoid the risk of additional disk failures, which might compromise
storage health and availability and risk data loss. If you want to be able to delay disk
repairs safely until your next scheduled maintenance, configure your storage spaces
to tolerate two disk failures.
Provide fault tolerance at the enclosure level. If you need to provide an added level of
fault tolerance, at the enclosure level, deploy multiple, compatible JBODs that
support enclosure awareness. In an enclosure-aware storage solution, Storage Spaces
writes each copy of data to a specific JBOD enclosure. As a result, if one enclosure
fails or goes offline, the data remains available in one or more alternate enclosures.
• To use enclosure awareness with Storage Spaces, your environment must meet the
following requirements:
One of the main benefits of using Storage Spaces is the ability to expand your storage
pool by adding additional storage. Occasionally, however, you must investigate the way
in which storage is being used across the disks in your pool before you are able to
extend the storage. This is because the blocks for your various virtual disks are
distributed across the physical disks in the storage pool in a configuration that is based
on the storage layout options that you selected when creating the pool. Depending upon
the specifics, you might not be able to extend the storage, even if there is available
space in the pool.
Example
In the first illustration, a storage pool consists of five disks, where disk1 is larger than
the others. Space is consumed across all five disks by vdisk1, while vdisk2 consumes
space only on disks 1 through 3.
FIGURE 4.1: A STORAGE POOL CONSISTING OF FIVE DISKS
In the second illustration, a sixth disk has been added to the storage pool.
Before you add storage to a storage pool, you must determine the current distribution of
blocks across the devices by determining column usage. To do this, you can use the
Windows PowerShell cmdlet Get-VirtualDisk.
Note: For more information, refer to: “Storage Spaces Frequently Asked Questions
(FAQ)” at: http://aka.ms/knx5zg
After you determine column usage where necessary, you can expand your storage pool
using one of these options:
Server Manager. Open Server Manager, select File and Storage Services, and then
• click Storage Pools. You can add a physical disk by right-clicking the pool, and then
click Add Physical Disk.
Windows PowerShell. You can use the Windows PowerShell cmdlet Add-
PhysicalDisk to add a physical disk to the storage pool. For example:
•
Add-PhysicalDisk –VirtualDiskFriendlyName UserData –PhysicalDisks
(Get-PhysicalDisk -
FriendlyName PhysicalDisk3, PhysicalDisk4)
To return a list of storage pools with their current health and operational status,
a. run the following command:
Get-StoragePool
To return more information about StoragePool1, run the following command:
b.
3. Get-StoragePool StoragePool1 | fl
To return detailed information about your virtual disks, including provisioning
c. type, parity layout, and health, run the following command:
Get-VirtualDisk | fl
To return a list of physical disks than can be pooled, run the following command:
d.
Get-PhysicalDisk | Where {$_.canpool –eq “true”}
When problems are identified in the storage architecture, Storage Spaces generates
errors, and then logs these errors to the Event Log. You can access these events by
using the Event Log tool, or by accessing the recorded errors by using Server Manager
or Windows PowerShell cmdlets. The following table identifies common Event IDs
associated with problematic storage.
Performance monitoring
Most decisions that you make regarding the configuration of your storage architecture
have an impact on the performance of your storage architecture. This is also true for
using Storage Spaces to implement your storage architecture. Performance is better or
worse because of the balance between multiple factors including cost, reliability,
availability, power, and ease-of-use.
There are multiple components that handle storage requests within your storage
architecture, including:
You can use Windows PowerShell and Performance Monitor to monitor the
performance of your storage pools. If you want to use Windows PowerShell, you must
install the Storage Spaces Performance Analysis module for Windows PowerShell.
Note: To download the “Storage Spaces Performance Analysis module for Windows
PowerShell” module, go to: http://aka.ms/b1d52u
Measure-StorageSpacesPhysicalDiskPerformance -StorageSpaceFriendlyName
StorageSpace1
-MaxNumberOfSamples 60 -SecondsBetweenSamples 2 -
ReplaceExistingResultsFile
-ResultsFilePath StorageSpace1.blg -SpacetoPDMappingPath PDMap.csv
This cmdlet:
Monitors the performance of all physical disks associated with the storage space
•
named StorageSpace1.
• Captures performance data for 60 seconds at two-second intervals.
• Replaces the results files if they already exist.
• Stores the performance log in the file named StorageSpace1.blg.
• Stores the physical disk mapping information in a file named PDMap.csv.
You can use Performance Monitor to view the data collected in the two files specified in
the cmdlet above, named StorageSpace1.blg and PDMap.csv.
A. Datum corporation has purchased a number of hard disk drives and SSDs and you
have been tasked
with creating a storage solution that can utilize these new devices to the fullest. With
mixed requirements in A. Datum for data access and redundancy, you must ensure that
you have a redundancy solution for critical data that does not require fast disk read and
write access. You also must create a solution for data that does require fast read and
write access.
You decide to use Storage Spaces and storage tiering to meet the requirements.
Objectives
Lab Setup
Password: Pa55w.rd
For this lab, you need to use the available virtual machine environment. Before you
begin the lab, you must complete the following steps:
Your server does not have a hardware-based RAID card, but you have been asked to
configure redundant storage. To support this feature, you must create a storage pool.
After creating the storage pool, you must create a redundant virtual disk. Because the
data is critical, the request for redundant storage specifies that you must use a three-way
mirrored volume. Shortly after the volume is in use, a disk fails, and you have to replace
it by adding another disk to the storage pool.
1. Create a storage pool from six disks that are attached to the server.
2. Create a three-way mirrored virtual disk (need at least five physical disks).
3. Copy a file to the volume, and verify it is visible in File Explorer.
4. Remove a physical drive to simulate drive failure.
5. Verify that the file is still available.
6. Add a new disk to the storage pool and remove the broken disk.
Task 1: Create a storage pool from six disks that are attached to the server
3. o Name: StoragePool1
o Physical disks: first 6 disks.
Task 2: Create a three-way mirrored virtual disk (need at least five physical disks)
Task 3: Copy a file to the volume, and verify it is visible in File Explorer
On the host computer, in Hyper-V Manager, in the Virtual Machines pane, change
the 20740B-LON-SVR1 settings to the following:
•
o Remove the hard drive that begins with 20740B-LON-SVR1-Disk1.
1. Switch to LON-SVR1.
2. Open File Explorer, and then go to H:\.
3. Verify that write .exe is still available.
In Server Manager, in the STORAGE POOLS pane, on the menu bar, click
Refresh “Storage Pools”.
4.
Note: Notice the warning that is visible next to Mirrored Disk.
Open the Mirrored Disk Properties dialog box, and then access the Health pane.
5.
Note: Notice that the Health Status indicates a warning. The Operational Status
should indicate one or more of the following: Incomplete, Unknown, or Degraded.
6. Close the Mirrored Disk Properties dialog box.
Task 6: Add a new disk to the storage pool and remove the broken disk
Get-PhysicalDisk
3. a. Note: Note the FriendlyName for the disk that shows an OperationalStatus of
Lost Communication. Use this disk name in the next command in place of
diskname.
b. $Disk = Get-PhysicalDisk -FriendlyName ‘diskname’
Remove-PhysicalDisk -PhysicalDisks $disk -StoragePoolFriendlyName
c.
StoragePool1
4. In Server Manager, refresh the storage pools view to see the warnings disappear.
Results: After completing this exercise, you should have successfully created a storage
pool and added five disks to it. Additionally, you should have created a three-way
mirrored, thinly-provisioned virtual disk from the storage pool. You also should have
copied a file to the new volume and then verified that it is accessible. Next, after
removing a physical drive, you should have verified that the virtual disk was still
available and that you could access it. Finally, you should have added another physical
disk to the storage pool.
Management wants you to implement storage tiers to take advantage of the high-
performance attributes of a number of SSDs, while utilizing less expensive hard disk
drives for less frequently accessed data.
1. Use the Get-PhysicalDisk cmdlet to view all available disks on the system.
2. Create a new storage pool.
3. View the media types.
Specify the media type for the sample disks and verify that the media type is
4.
changed.
5. Create pool-level storage tiers by using Windows PowerShell.
6. Create a new virtual disk with storage tiering by using the New Virtual Disk Wizard.
7. Prepare for the next lab.
Task 1: Use the Get-PhysicalDisk cmdlet to view all available disks on the system
On LON-SVR1, in Windows PowerShell (Admin), run the following command:
•
Get-PhysicalDisk
Task 4: Specify the media type for the sample disks and verify that the media type
is changed
To create pool-level storage tiers, one for SSD media types and one for HDD media
types, on LON-SVR1, in Windows PowerShell, run the following commands:
Task 6: Create a new virtual disk with storage tiering by using the New Virtual
Disk Wizard
1. On LON-SVR1, in Server Manager, in Storage Pools, refresh the display.
In the VIRTUAL DISKS pane, create a virtual disk with the following settings:
o Storage tiers
4. o Capacity
o Allocated space
o Used pool space
o Storage layout
• When you complete the lab, leave the virtual machines running for the next lab.
Results: After completing this exercise, you should have successfully enabled and
configured storage tiering.
Question: At a minimum, how many disks must you add to a storage pool to create a
three-way mirrored virtual disk?
Question: You have a USB-attached disk, four SAS disks, and one SATA disk that are
attached to a Windows Server 2012 server. You want to provide a single volume to your
users that they can use for file storage. What would you use?
Lesson Objectives
After completing this lesson, you will be able to:
To cope with data storage growth in the enterprise, organizations are consolidating
servers and making capacity scaling and data optimization the key goals. Data
Deduplication provides practical ways to achieve these goals, including:
Capacity optimization. Data Deduplication stores more data in less physical space. It
achieves greater storage efficiency as compared to features such as Single Instance
• Store (SIS) or NTFS compression. Data deduplication uses subfile variable-size
chunking and compression, which deliver optimization ratios of 2:1 for general file
servers and up to 20:1 for virtualization data.
Scale and performance. Data Deduplication is highly scalable, resource efficient, and
nonintrusive. While it can process up to 50 MB per second in Windows Server 2012
R2 and about 20 MB of data per second in Windows Server 2012, Windows Server
2016 is staged to perform significantly better, through the advancements in the
Deduplication Processing Pipeline. In this latest version of Windows Server, Data
Deduplication can run multiple threads in parallel by using multiple I/O queues on
•
multiple volumes simultaneously without affecting other workloads on the server.
Throttling the CPU maintains the low impact on the server workloads and memory
resources that are consumed; if the server is very busy, deduplication can stop
completely. In addition, you have the flexibility to run Data Deduplication jobs at any
time, set schedules for when data deduplication should run, and establish file
selection policies.
Reliability and data integrity. When you apply Data Deduplication to a volume on a
server, it maintains the integrity of the data. Data Deduplication uses checksum
results, consistency, and identity validation to ensure data integrity. Data
•
Deduplication maintains redundancy, for all metadata and the most frequently
referenced data, to ensure that the data is repaired, or at least recoverable, in the event
of data corruption.
Bandwidth efficiency with BranchCache. Through integration with BranchCache, the
same optimization techniques are applied to data transferred over the WAN to a
•
branch office. The result is faster file download times and reduced bandwidth
consumption.
Optimization management with familiar tools. Data Deduplication has optimization
functionality built into Server Manager and Windows PowerShell. Default settings
can provide savings immediately, or you can fine-tune the settings to see more gains.
By using Windows PowerShell cmdlets, you can start an optimization job or schedule
•
one to run in the future. Installing the Data Deduplication feature and enabling
deduplication on selected volumes can also be accomplished by using the
Unattend.xml file that calls a Windows PowerShell script and can be used with
Sysprep to deploy deduplication when a system first boots.
The Data Deduplication process involves finding and removing duplication within data
without compromising its fidelity or integrity. The goal is to store more data in less
space by segmenting files into small variable-sized chunks (32–128 KB), identifying
duplicate chunks, and maintaining a single copy of each chunk.
After deduplication, files are no longer stored as independent streams of data, and they
are replaced with stubs that point to data blocks that are stored within a common chunk
store. Because these files share blocks, those blocks are only stored once, which reduces
the disk space needed to store all files. During file access, the correct blocks are
transparently assembled to serve the data without the application or the user having any
knowledge of the on-disk transformation to the file. This enables you to apply
deduplication to files without having to worry about any change in behavior to the
applications or impact to users who are accessing those files. Data Deduplication works
best in storage scenarios with large amounts of data that are not modified frequently.
Windows Server 2016 includes several important improvements to the way Data
Deduplication worked in Windows Server 2012 R2 and Windows Server 2012,
including:
Support for volume sizes up to 64 TB. Data Deduplication in Windows Server 2012
R2 does not perform well on volumes greater than 10 TB in size (or less for
workloads with a high rate of data changes), the feature has been redesigned in
• Windows Server 2016. Deduplication Processing Pipeline is now multithreaded and
able to utilize multiple CPUs per volume to increase optimization throughput rates on
volume sizes up to 64 TB. This is a limitation of VSS, on which Data Deduplication
is dependent.
Support for file sizes up to 1 TB. In Windows Server 2012 R2, very large files are not
good candidates for Data Deduplication. However, with the use of the new stream
• map structures and other improvements to increase the optimization throughput and
access performance, deduplication in Windows Server 2016 performs well on files up
to 1 TB.
Simplified deduplication configuration for virtualized backup applications. Although
Windows Server 2012 R2 supports deduplication for virtualized backup applications,
it requires manually tuning the deduplication settings. In Windows Server 2016,
•
however, the configuration of deduplication for virtualized backup applications is
drastically simplified by a predefined usage-type option when enabling deduplication
for a volume.
Support for Nano Server. Nano Server is a new deployment option in Windows
Server 2016 that has a smaller system resource footprint, starts up significantly faster,
• and requires fewer updates and restarts than by using the Sever Core deployment
option for Windows Server. In addition, Nano Server fully supports Data
Deduplication.
Support for cluster rolling upgrades. Windows servers in a failover cluster running
deduplication can include a mix of nodes running Windows Server 2012 R2 and
nodes running Windows Server 2016. This major enhancement provides full data
access to all of your deduplicated volumes during a cluster rolling upgrade. For
example, you can gradually upgrade each deduplication node in an existing Windows
Server 2012 R2 cluster to Windows Server 2016 without incurring downtime to
•
upgrade all the nodes at once.
Note: Although both the Windows Server versions of deduplication can access the
optimized data, the optimization jobs run only on the Windows Server 2012 R2
deduplication nodes and are blocked from running on the Windows Server 2016
deduplication nodes until the cluster rolling upgrade is complete.
Volumes must not be a system or boot volume. Because most files used by an
operating system are constantly open, Data Deduplication on system volumes would
•
negatively affect the performance because deduplicated data would need to be
expanded again before you could use the files.
Volumes might be partitioned by using master boot record (MBR) or GUID partition
•
table (GPT) format and must be formatted by using the NTFS or ReFS file system.
Volumes must be attached to the Windows Server and cannot appear as non-
• removable drives. This means that you cannot use USB or floppy drives for Data
Deduplication, nor use remotely-mapped drives.
• Volumes can be on shared storage, such as Fibre Channel, iSCSI SAN, or SAS array.
Files with extended attributes, encrypted files, files smaller than 32 KB, and reparse
•
point files will not be processed for Data Deduplication.
• Data Deduplication is not available for Windows client operating systems.
The Data Deduplication role service consists of several components. These components
include:
Filter driver. This component monitors local or remote I/O and handles the chunks of
• data on the file system by interacting with the various jobs. There is one filter driver
for every volume.
Deduplication service. This component manages the following job types:
Deduplication keeps backup copies of popular chunks when they are referenced
• over 100 times in an area called the hotspot. If the working copy is corrupted,
o ▪ deduplication uses its redundant copy in the case of soft corruptions such as bit
flips or torn writes.
If using mirrored Storage Spaces, deduplication can use the mirror image of the
▪
redundant chunk to serve the I/O and fix the corruption.
If a file is processed with a chunk that is corrupted, the corrupted chunk is
eliminated, and the new incoming chunk is used to fix the corruption.
▪
Note: Because of the additional validations that are built into deduplication, the
deduplication subsystem is often the first system to report any early signs of
data corruption in the hardware or file system.
Unoptimization. This job undoes deduplication on all of the optimized files on the
volume. Some of the common scenarios for using this type of job include
decommissioning a server with volumes enabled for Data Deduplication,
troubleshooting issues with deduplicated data, or migration of data to another
system that doesn’t support Data Deduplication. Before you start this job, you
o should use the Disable-DedupVolume Windows PowerShell cmdlet to disable
further data deduplication activity on one or more volumes. After you disable Data
Deduplication, the volume remains in the deduplicated state, and the existing
deduplicated data remains accessible; however, the server stops running
optimization jobs for the volume, and it does not deduplicate the new data.
Afterwards, you would use the unoptimization job to undo the existing
deduplicated data on a volume. At the end of a successful unoptimization job, all
of the data deduplication metadata is deleted from the volume.
Note: You should be cautious when using the unoptimization job because all the
deduplicated data will return to the original logical file size. As such, you should
verify the volume has enough free space for this activity or move/delete some of
the data to allow the job to complete successfully.
Optimization jobs, which are background tasks, run with low priority on the server to
•
process the files on the volume.
By using an algorithm, segment all file data on the volume into small, variable-sized
•
chunks that range from 32 KB to 128 KB.
• Identifies chunks that have one or more duplicates on the volume.
• Inserts chunks into a common chunk store.
Replaces all duplicate chunks with a reference, or stub, to a single copy of the chunk
•
in the chunk store.
Replaces the original files with a reparse point, which contains references to its data
•
chunks.
Compresses chunks and organizes them in container files in the System Volume
•
Information folder.
• Removes primary data stream of the files.
The Data Deduplication process works through scheduled tasks on the local server, but
you can run the process interactively by using Windows PowerShell. More information
about this is discussed later in the module.
Data deduplication does not have any write-performance impact because the data is not
deduplicated while the file is being written. Windows Server 2016 uses post-process
deduplication, which ensures that the deduplication potential is maximized. Another
advantage with this type of deduplication process is that your application servers and
client computers offload all processing, which means less stress on the other resources
in your environment. There is, however, a small performance impact when reading
deduplicated files.
Note: The three main types of data deduplication are source, target (or post-process
deduplication), and in-line (or transit deduplication).
Data Deduplication potentially can process all of the data on a selected volume, except
for files that are less than 32 KB in size, and files in folders that are excluded. You must
carefully determine if a server and its attached volumes are suitable candidates for
deduplication prior to enabling the feature. You should also consider backing up
important data regularly during the deduplication process.
After you enable a volume for deduplication and the data is optimized, the volume
contains the following elements:
Unoptimized files. Includes files that do not meet the selected file-age policy setting,
• system state files, alternate data streams, encrypted files, files with extended
attributes, files smaller than 32 KB, or other reparse point files.
Optimized files. Includes files that are stored as reparse points that contain pointers to
• a map of the respective chunks in the chunk store that are needed to restore the file
when it is requested.
• Chunk store. Location for the optimized file data.
Additional free space. The optimized files and chunk store occupy much less space
•
than they did prior to optimization.
Prior to installing and configuring Data Deduplication in your environment, you must
plan your deployment using the following steps:
Target deployments. Data Deduplication is designed to be applied on primary – and
not to logically extended – data volumes without adding any additional dedicated
hardware.
You can schedule deduplication based on the type of data that is involved and the
frequency and volume of changes that occur to the volume or particular file types.
You should consider using deduplication for the following data types:
• o General file shares. Group content publication and sharing, user home folders, and
Folder Redirection/Offline Files.
o Software deployment shares. Software binaries, images, and updates.
VHD libraries. Virtual hard disk (VHD) file storage for provisioning to
o
hypervisors.
VDI deployments. Virtual Desktop Infrastructure (VDI) deployments using Hyper-
o
V.
Virtualized backup. Backup applications running as Hyper-V guests saving backup
o
data to mounted VHDs.
Determine which volumes are candidates for deduplication. Deduplication can be
very effective for optimizing storage and reducing the amount of disk space
consumed – saving you 50 to 90 percent of your system’s storage space when applied
to the right data. Use the following considerations to evaluate which volumes are
ideal candidates for deduplication:
File shares or servers which host user documents, software deployment binaries, or
o virtual hard disk files tend to have plenty of duplication, and yield higher storage
savings from deduplication. More information on the deployment candidates for
deduplication and the supported/unsupported scenarios are discussed later in this
module.
Does the data access pattern allow for sufficient time for deduplication?
•
For example, files that frequently change and are often accessed by users or
applications are not good candidates for deduplication. In these scenarios,
o
deduplication might not be able to process the files, as the constant access and
change to the data are likely to cancel any optimization gains made by
deduplication. On the other hand, good candidates allow time for deduplication of
the files.
Does the server have sufficient resources and time to run deduplication?
Note: When you install the deduplication feature, the Deduplication Evaluation Tool
(DDPEval.exe) is automatically installed to the \Windows\System32\ directory.
Plan the rollout, scalability, and deduplication policies. The default deduplication
policy settings are usually sufficient for most environments. However, if your
deployment has any of the following conditions, you might consider altering the
default settings:
Incoming data is static or expected to be read-only, and you want to process files
o on the volume sooner. In this scenario, change the MinimumFileAgeDays setting
• to a smaller number of days to process files earlier.
You have directories that you do not want to deduplicate. Add a directory to the
o
exclusion list.
You have file types that you do not want to deduplicate. Add a file type to the
o
exclusion list.
The server has different off-peak hours than the default and you want to change the
o Garbage Collection and Scrubbing schedules. Update the schedules using
Windows PowerShell.
After completing your planning, you need to use the following steps to deploy Data
Deduplication to a server in your environment:
Install Data Deduplication components on the server. Use the following options to
install deduplication components on the server:
Note: Replace VolumeLetter with the drive letter of the volume. Replace
o StorageType with the value corresponding to the expected type of workload for the
volume. Acceptable values include:
Configure Data Deduplication jobs. With Data Deduplication jobs, you can run them
manually, on demand, or use a schedule. The following list are the types of jobs
which you can perform on a volume:
Note: Data Deduplication jobs only support, at most, weekly job schedules. If you
need to create a schedule for a monthly job or for any other custom time period, use
Windows Task Scheduler. However, you will be unable to view these custom job
schedules created with Windows Task Scheduler by using the Windows PowerShell
cmdlet Get-DedupSchedule.
1. Open File Explorer and observe the available volumes and free space.
2. Return to File and Storage Services.
3. Click Disks.
4. Click the 1 disk, and then click the D volume.
5. Enable Data Deduplication, and then click the General purpose file server setting.
Configure the following settings:
a. Get-DedupStatus
2. b. Get-DedupStatus | fl
c. Get-DedupVolume
d. Get-DedupVolume |fl
e. Start-DedupJob D: -Type Optimization –Memory 50
Repeat commands 2a and 2c.
3.
Note: Because most the files on drive D are small, you may not notice a significant
amount of saved space.
4. Close all open windows.
Your data storage savings will vary by data type, the mix of data, and the size of the
volume and the files that the volume contains. You should consider using the
Deduplication Evaluation Tool to evaluate the volumes before you enable
deduplication.
User documents. This includes group content publication or sharing, user home
folders (or MyDocs), and profile redirection for accessing offline files. Applying
•
Data Deduplication to these shares might save you up to 30 to 50 percent of your
system’s storage space.
Software deployment shares. This includes software binaries, cab files, symbols files,
• images, and updates. Applying Data Deduplication to these shares might be able to
save you up to 70 to 80 percent of your system’s storage space.
Virtualization libraries. This includes virtual hard disk files (i.e., .vhd and .vhdx files)
storage for provisioning to hypervisors. Applying Data Deduplication to these
•
libraries might be able to save you up to 80 to 95 percent of your system’s storage
space.
General file share. This includes a mix of all the types of data identified above.
• Applying Data Deduplication to these shares might save you up to 50 to 60 percent of
your system’s storage space.
o Line-of-business servers
• o Static content providers
o Web servers
o High-performance computing (HPC)
Not ideal candidates for deduplication
• o Hyper-V hosts
o WSUS
o SQL Server and Exchange Server database volumes
In Windows Server 2016, you should consider the following related technologies and
potential issues when deploying Data Deduplication:
Note: Single Instance Storage (SIS), a file system filter driver used for NTFS file
deduplication, was deprecated in Windows Server 2012 R2 and completely removed
in Windows Server 2016.
When planning for Data Deduplication in your environment, you will inevitably ask
yourself, “What size should my configured deduplicated volumes be?” Although
Windows Server 2016 supports Data Deduplication on volumes up to 64 TB, you must
assess the appropriate size of the deduplicated volumes that your environment can
support. For many, the answer to this question is that it depends on your hardware
specifications and your unique workload. More specifically, it depends primarily on
how much and how frequently the data on the volume changes and the data access
throughput rates of the disk storage subsystem.
You should consider the following when estimating the size of your volumes enabled
for Data Deduplication:
• Deduplication optimization must be able to keep up with the daily data churn.
• The total amount of churn scales with the size of the volume.
The speed of deduplication optimization significantly depends on the data access
•
throughput rates of the disk storage subsystem.
Therefore, to estimate the maximum size for a deduplicated volume, you should be
familiar with the size of the data churn and the speed of optimization processing on your
volumes. You can choose to use reference data, such as server hardware specifications,
storage drive/array speed, and deduplication speed of various usage types, for your
estimations. However, the most accurate method of assessing the appropriate volume
size is to perform the measurements directly on your deduplication system based on the
representative samples of your data, such as data churn and deduplication processing
speed.
You should consider using the following options to monitor deduplication in your
environment and to report on its health:
Windows PowerShell cmdlets. After you enable the Data Deduplication feature on a
server, you can use the following Windows PowerShell cmdlets:
Get-DedupStatus. The most commonly used cmdlet, this cmdlet returns the
deduplication status for volumes which have data deduplication metadata, which
o
includes the deduplication rate, the number/sizes of optimized files, the last run-
time of the deduplication jobs, and the amount of space saved on the volume.
Get-DedupVolume. This cmdlet returns the deduplication status for volumes that
have data deduplication metadata. The metadata includes the deduplication rate,
o the number/sizes of optimized files, and deduplication settings such as minimum
• file age, minimum file size, excluded files/folders, compression-excluded file
types, and the chunk redundancy threshold.
Get-DedupMetadata. This cmdlet returns status information of the deduplicated
data store for volumes that have data deduplication metadata, which includes the
number of:
One common scenario is to assess whether deduplication is keeping pace with the
rate of incoming data. You can use the Get-DedupStatus cmdlet to monitor the
number of optimized files compared with the number of in-policy files. This
enables you to see if all the in-policy files are processed. If the number of in-policy
o files is continuously rising faster than the number of optimized files, you should
examine your hardware specifications for appropriate utilization or the type of data
on the volume usage type to ensure deduplication efficiency. However, if the
output value from the cmdlet for LastOptimizationResult is 0x00000000, the
entire dataset was processed successfully during the previous optimization job.
Note: For more information, refer to: “Storage Cmdlets in Windows PowerShell”
at: http://aka.ms/po9qve
Event Viewer logs. Monitoring the event log can also be helpful to understand
deduplication events and status. To view deduplication events, in Event Viewer,
• navigate to Applications and Services Logs, click Microsoft, click Windows, and
then click Deduplication. For example, Event ID 6153 will provide you with the
elapsed time of a deduplication job and the throughput rate.
Performance Monitor data. In addition to using the counters for monitoring server
performance, such as CPU and memory, you can use the typical disk counters to
monitor the throughput rates of the jobs that are currently running, such as: Disk
Read Bytes/sec, Disk Write Bytes/sec, and Average Disk sec/Transfer. Depending on
other activities on the server, you might be able to use the data results from these
counters to get a rough estimate of the saving ratio by examining how much data is
being read and how much is being written per interval. You can also use the Resource
•
Monitor to identify the resource usage of specific programs/services. To view disk
activity, in Windows Resource Monitor, filter the list of processes to locate
fsdmhost.exe and examine the I/O on the files under the Disk tab.
Note: Fsdmhost.exe is the executable file for the Microsoft File Server Data
Management Host process, which is used by the Data Deduplication process in
Windows Server 2016.
File Explorer. While not the ideal choice for validating deduplication on an entire
volume, you can use File Explorer to spot check deduplication on individual files. In
viewing the properties of a file, you notice that Size displays the logical size of the
• file, and Size on Disk displays the true physical allocation of the file. For an
optimized file, Size on Disk is less than the actual file size. This is because
deduplication moves the contents of the file to a common chunk store and replaces
the original file with an NTFS reparse point stub and metadata.
Some of the most common causes for deduplication to report corruption are:
Incompatible Robocopy options used when copying data. Using Robocopy with the
/MIR option on the volume root as the target wipes the deduplication store. To avoid
this problem, use the /XD option to exclude the System Volume Information folder
from the scope of the Robocopy command.
•
Note: For more information, refer to: “FSRM and Data Deduplication may be
adversely affected when you use Robocopy /MIR in Windows Server 2012” at:
http://aka.ms/W0ux7m
Incompatible Backup/Restore program used on a deduplicated volume. You should
verify whether your backup solution supports Data Deduplication in Windows Server
•
2016, as unsupported backup solutions might introduce corruptions after a restore.
More information about this is covered later in this module.
Migrating a deduplicated volume to a down-level Windows Server version. File
corruption messages might be reported on files accessed from a deduplicated volume,
which is mounted on an older version of Windows Server, but were optimized on a
later version of the operating system. In this scenario, you should verify the version
of the server accessing the deduplicated data is the same version level or higher than
• the version of the server that optimized the data on the volume. Although
deduplicated volumes can be remounted on different servers, deduplication is
backward compatible but not forward compatible; you can upgrade and migrate to a
newer version of Windows Server, but data deduplicated by a newer version of
Windows Server cannot be read on older versions of Windows Server and might
report the data as corrupted when trying to read.
Enabling compression on the root of a volume also enabled with deduplication.
Deduplication is not supported on volumes that have compression enabled at the root.
•
As a result, this might lead to the corruption and inaccessibility of deduplicated files.
Note: Deduplication of files in compressed folders is supported in Windows Server
2016 and should function normally.
Hardware issues. Many hardware storage issues are detectable early by using the
• deduplication scrubbing job. Refer to the general corruption troubleshooting steps
below for more information.
General corruption. You can use the steps below to troubleshoot most general causes
for deduplication to report corruption:
Check the Event Logs for details of corruption. Check the deduplication
Scrubbing Event logs for cases of early file corruption and attempted corruption
fixes by the scrubbing job. Any corruption detected by deduplication is logged to
the event log. The Scrubbing channel lists any corruptions that were detected and
files that were attempted to be fixed by the job. The deduplication Scrubbing
Event logs are located in the Event Viewer (under Application and Services >
Microsoft > Windows > Deduplication > Scrubbing). In addition, searching for
hardware events in the System Event logs and Storage Spaces Event logs will
a.
often yield additional information about hardware issues.
When you perform an optimized backup, your backup is also smaller. This is because
the total size of the optimized files, non-optimized files, and data deduplication chunk
store files are much smaller than the logical size of the volume.
Note: Many block-based backup systems should work with data deduplication,
maintaining the optimization on the backup media. File-based backup operations that do
not use deduplication usually copy the files in their original format.
The following backup and restore scenarios are supported with deduplication in
Windows Server 2016:
On the other hand, the following backup and restore scenarios are not supported with
deduplication in Windows Server 2016:
• Back up only the changed files created, modified, or deleted since your last backup.
• Back up the changed chunk store container files.
Perform an incremental backup at the sub-file level.
• Note: New chunks are appended to the current chunk store container. When its size
reaches approximately 1 GB, that container file is sealed and a new container file is
created.
Restore operations
Restore operations also can benefit from data deduplication. Any file-level, full-volume
restore operations can benefit because they are essentially a reverse of the backup
procedure, and less data means quicker operations. The method of a full volume restore
is:
1. The complete set of data deduplication metadata and container files are restored.
2. The complete set of data deduplication reparse points are restored.
3. All non-deduplicated files are restored.
As with any product from a third-party vendor, you should verify whether the backup
solution supports Data Deduplication in Windows Server 2016, as unsupported backup
solutions might introduce corruptions after a restore. Here are the common methods on
solutions which support Data Deduplication in Windows Server 2016:
The backup vendor should be able to comment on what their product supports, the
method it uses and with which version.
Note: For more information, refer to: “Backup and Restore of Data Deduplication-
Enabled Volumes” at: http://aka.ms/w8iows
Question: Can you enable Data Deduplication on a drive with storage tiering enabled?
After you have tested the storage redundancy and performance options, you decide that
it also would be beneficial to maximize the available disk space that you have,
especially on generic file servers. You decide to test Data Deduplication solutions to
maximize storage availability for users.
New: After you have tested the storage redundancy and performance options, you now
decide that it would also be beneficial to maximize the available disk space that you
have, especially around virtual machine storage which is in ever increasing demand.
You decide to test out Data Deduplication solutions to maximize storage availability for
virtual machines.
Objectives
Lab Setup
Password: Pa55w.rd
For this lab, you must use the available virtual machine environment. These should
already be running from Lab A. If they are not, before you begin the lab, you must
complete the following steps and then complete Lab A:
You decide to install the Data Deduplication role service on intensively used file servers
by using Server Manager.
Note: You will use the values returned from the previous command later in the lab.
Results: After completing this exercise, you should have successfully installed the Data
Deduplication role service and enabled it on one of your file servers.
You determine that drive E is heavily used and you suspect it contains duplicate files in
some folders. You decide to enable and configure the Data Deduplication role to reduce
the consumed space on this volume.
Note: Verify the status of the optimization job from the previous command. Repeat
the previous command until the Progress shows as 100%.
2. Get-DedupStatus –Volume D: | fl
Get-DedupVolume –Volume D: |fl
When you complete the lab, revert the virtual machines to their initial state.
Results: After completing this exercise, you should have successfully configured Data
Deduplication for the appropriate data volume on LON-SVR1.
Question: Your manager is worried about the impact that using data deduplication will
have on the write performance of your file servers’ volumes. Is this concern valid?
Question: You attach five 2-TB disks to your Windows Server 2012 computer. You
want to simplify the process of managing the disks. In addition, you want to ensure that
if one disk fails, the failed disk’s data is not lost. What feature can you implement to
accomplish these goals?
Question: Your manager has asked you to consider the use of Data Deduplication
within your storage architecture. In what scenarios is the Data Deduplication role
service particularly useful?