0% found this document useful (0 votes)
116 views

Module 4 Implementing Storage Spaces and Data Deduplication

This document provides an overview and objectives of a training module on implementing Storage Spaces and Data Deduplication in Windows Server 2016. It describes Storage Spaces as a feature that pools physical disks into storage pools from which virtual disks/storage spaces can be created. These storage spaces abstract physical storage and can be configured with properties like resiliency level, storage tiers, and provisioning type. The document also discusses factors to consider in enterprise storage planning like capacity, performance, cost and resiliency and compares Storage Spaces to other storage solutions.

Uploaded by

Magno Solís
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

Module 4 Implementing Storage Spaces and Data Deduplication

This document provides an overview and objectives of a training module on implementing Storage Spaces and Data Deduplication in Windows Server 2016. It describes Storage Spaces as a feature that pools physical disks into storage pools from which virtual disks/storage spaces can be created. These storage spaces abstract physical storage and can be configured with properties like resiliency level, storage tiers, and provisioning type. The document also discusses factors to consider in enterprise storage planning like capacity, performance, cost and resiliency and compares Storage Spaces to other storage solutions.

Uploaded by

Magno Solís
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Module 4: Implementing Storage Spaces and Data Deduplication

Contents:
Module Overview
Lesson 1: Implementing Storage Spaces
Lesson 2: Managing Storage Spaces
Lab A: Implementing Storage Spaces
Lesson 3: Implementing Data Deduplication
Lab B: Implementing Data Deduplication
Module Review and Takeaways

Module Overview
The Windows Server 2016 operating system introduces a number of storage
technologies and improvements to existing storage technologies. You can use Storage
Spaces, a feature of Windows Server 2016, to virtualize and provision storage based on
storage pools and virtual disks which abstract the physical storage from the operating
system. Data Deduplication is a feature that you can use to find and remove duplicate
data while maintaining your data’s integrity. This module describes how to use these
two new features within your Windows Server storage architecture.

Objectives

After completing this module, you will be able to:

Describe and implement the Storage Spaces feature in the context of enterprise

storage needs.
• Manage and maintain Storage Spaces.
• Describe and implement Data Deduplication.

Lesson 1 : Implementing Storage Spaces


Managing direct-attached storage (DAS) on a server can be a tedious task for
administrators. To overcome this problem, many organizations use storage area
networks (SANs) that group disks together. However, SANs are expensive because they
require special configuration, and sometimes special hardware. To help overcome these
storage issues, you can use Storage Spaces to pool disks together. Storage Spaces are
then presented to the operating system as a single disk that can span multiple physical
disks in a pool. This lesson explains how to implement Storage Spaces.

Lesson Objectives
After completing this lesson, you will be able to:
• Implement Storage Spaces as an enterprise storage solution.
• Describe the Storage Spaces feature and its components.
Describe the features of Storage Spaces, including storage layout, drive allocation,

and provisioning schemes such as thin provisioning.
• Describe changes to the Storage Spaces feature in Windows Server 2016.
Describe common usage scenarios for storage spaces, and weigh their benefits and

limitations.
• Compare using Storage Spaces to using other storage solutions.

Enterprise storage needs

In most organizations, discussions about storage needs can be straining. This is typically
because storage costs are a major item on many Information Technology (IT) budgets.
Despite the decreasing cost of individual units of storage, the amount of data that
organizations produce continues to grow rapidly, so the overall cost of storage continues
to grow.

Consequently, many organizations are investigating storage solutions that provide a


cost-effective alternative to their existing solution, without sacrificing performance. A
typical demand from organizations during storage planning is how to lower the costs
and effort of delivering infrastructure as a service (IaaS) storage services. When
planning your storage solution, you need to assess how well the storage solution scales.
If your storage solution does not scale well, it will cost more. Additionally, you should
consider deploying inexpensive networks and storage environments. This comes by
deploying industry-standard server, network, and storage infrastructure to build highly
available and scalable software-defined storage.

Finally, you should consider using disaggregated compute and storage deployments
when planning how to lower the costs of delivering IaaS storage services. While many
converged compute/storage solutions provide simpler management features, they also
require scaling both components simultaneously. In other words, you might have to add
compute power in the same ratio as previous hardware when expanding storage. To
achieve lower costs of delivering IaaS storage service, you should consider independent
management and independent scaling when planning your storage solution.

While your requirements might dictate which advanced features to consider during your
storage planning, the primary drivers are typically capacity, performance, cost, and
resiliency when assessing storage solutions. Although you could have lengthy
discussions over each of these drivers separately, your storage solution needs to be a
balanced storage deployment approach.

When planning your balanced storage deployment approach to meet your storage needs,
you will need to assess your capacity and performance requirements in relation to your
cost. For cost efficiency, your storage environment should utilize solid-state disks
(SSDs) for highly active data (higher performance for the cost) and hard disk drives
(HDDs) for data accessed infrequently (higher capacity for the cost).

If you deploy only HDDs, your budget constraints will prevent you from meeting your
performance requirements; this is because HDDs provide higher capacity, but with
lower performance. Likewise, if you deploy only SSDs, your budget constraints will
prevent you from meeting your capacity requirements; this is because SSDs provide
higher performance, but with lower capacity. As a result, your balanced storage
deployment approach will most likely include a mix of HDDs and SSDs to achieve the
best performance and capacity at the appropriate cost.

Included in your storage planning, you should consider whether your storage solution
needs to support the common capabilities of most storage products, such as:

• Mirror/parity support
• Data stripping
• Enclosure awareness
• Storage tiering
• Storage replication
• Data deduplication
• Data encryption
Performance analysis

• Note: This list is only meant to provide suggestions and is not an exhaustive list of
the common capabilities of most storage products. The storage requirements of your
organization might differ.
The growth in the size of data volumes, the ever-increasing cost of storage, and the need
to ensure high availability of data volumes can be difficult problems for IT departments
to solve. Windows Server 2016 provides a number of storage features that aim to
address these important facets of storage management.

Question: Which factors should you consider when planning your enterprise storage
strategy?

Question: What storage technologies does your organization use?

What are Storage Spaces?

Storage Spaces is a storage virtualization feature built into Windows Server 2016 and
Windows 10.

The Storage Spaces feature consists of two components:

Storage pools. Storage pools are a collection of physical disks aggregated into a
single logical disk, allowing you to manage the multiple physical disks as a single

disk. You can use Storage Spaces to add physical disks of any type and size to a
storage pool.
Storage spaces. Storage spaces are virtual disks created from free space in a storage
pool. Storage spaces have attributes such as resiliency level, storage tiers, fixed

provisioning, and precise administrative control. The primary advantage of storage
spaces is that you no longer need to manage single disks. Instead, you can manage
them as one unit. Virtual disks are the equivalent of a logical unit number (LUN) on a
SAN.

Note: The virtual disks that you create with the Storage Spaces feature are not the
same as the virtual hard disk files that have the .vhd and .vhdx file extensions.

To create a virtual disk, you need the following:

Physical disks. Physical disks are disks such as Serial Advanced Technology
Attachment (SATA) or serial-attached SCSI (SAS) disks. If you want to add physical
disks to a storage pool, the disks must adhere to the following requirements:

o One physical disk is required to create a storage pool.


o At least two physical disks are required to create a resilient, mirrored virtual disk.
At least three physical disks are required to create a virtual disk with resiliency
o
• through parity.
o At least five physical disks are required for three-way mirroring.
Disks must be blank and unformatted, which means no volumes can exist on the
o
disks.
Disks can be attached using a variety of bus interfaces including SAS, SATA,
SCSI, Non-Volatile Memory Express (NVMe), and universal serial bus (USB). If
o
you plan to use failover clustering with storage pools, you cannot use SATA,
SCSI, nor USB disks.
Storage pool. A storage pool is a collection of one or more physical disks that you
can use to create virtual disks. You can add one or more available, nonformatted

physical disks to a storage pool, but you can attach a physical disk to only one storage
pool.
Virtual disk or storage space. This is similar to a physical disk from the perspective of
users and applications. However, virtual disks are more flexible because they include
both fixed provisioning and thin provisioning, also known as just-in-time (JIT)
• allocations. They are also more resilient to physical disk failures with built-in
functionality such as mirroring and parity. These resemble Redundant Array of
Independent Disks (RAID) technologies, but Storage Spaces store the data differently
than RAID.
Disk drive. This is a volume that you can access from your Windows operating

system, for example, by using a drive letter.

Note: When planning your Storage Spaces deployment, you need to verify whether the
storage enclosure is certified for Storage Spaces in Windows Server 2016. For Storage
Spaces to identify disks by slot and use the array’s failure and identify/locate lights, the
array must support SCSI Enclosure Services (SES) version 3.

Additional Reading: For more information, refer to: “Windows Server Catalog” at:
http://aka.ms/Rdpiy8

You can format a storage space virtual disk with an FAT32 file system, New
Technology File System (NTFS) file system, or Resilient File System (ReFS). You will
need to format the virtual disk with NTFS if you plan to use the storage space as part of
a Clustered Shared Volume (CSV), for Data Deduplication, or with File Server
Resource Manager (FSRM).

Components and features of Storage Spaces

An important step when configuring storage spaces is planning virtual disks. To


configure storage spaces to meet your requirements, you must consider the Storage
Spaces features described in the following table before you implement virtual disks.

Feature Description
Storage layout Storage layout is one of the characteristics that defines the number of
disks from the storage pool that are allocated. Valid options include:

Simple. A simple space has data striping but no redundancy. In data


striping, logically sequential data is segmented across several disks
in a way that enables different physical storage drives to access
these sequential segments. Striping can improve performance
• because it is possible to access multiple segments of data at the
same time. To enable data striping, you must deploy at least two
disks. The simple storage layout does not provide any redundancy,
so if one disk in the storage pool fails, you will lose all data unless
you have a backup.
Feature Description
Two-way and three-way mirrors. Mirroring helps provide
protection against the loss of one or more disks. Mirror spaces
maintain two or three copies of the data that they host. Specifically,
two-way mirrors maintain two data copies, and three-way mirrors
maintain three data copies for three-way mirrors. Duplication
occurs with every write to ensure that all data copies are always

current. Mirror spaces also stripe the data across multiple physical
drives. To implement mirroring, you must deploy at least two
physical disks. Mirroring provides protection against the loss of one
or more disks, so use mirroring when you are storing important
data. The disadvantage of using mirroring is that the data is
duplicated on multiple disks, so disk usage is inefficient.
Parity. A parity space resembles a simple space because data is
written across multiple disks. However, parity information is also
written across the disks when you use a parity storage layout. You
can use the parity information to calculate data if you lose a disk.
Parity enables Storage Spaces to continue to perform read-and-
write requests even when a drive has failed. The parity information
is always rotated across available disks to enable I/O optimization.
A storage space requires a minimum of three physical drives for

parity spaces. Parity spaces have increased resiliency through
journaling. The parity storage layout provides redundancy but is
more efficient in utilizing disk space than mirroring.

Note: The number of columns for a given storage space can also
impact the number of disks.

Disk sector A storage pool’s sector size is set the moment it is created. Its default
size sizes are set as follows:

If the list of drives being used contains only 512 and 512e drives,
the pool sector size is set to 512e. A 512 disk uses 512-byte sectors.

A 512e drive is a hard disk with 4,096-byte sectors that emulates
512-byte sectors.
If the list contains at least one 4-kilobyte (KB) drive, the pool

sector size is set to 4 KB.
Cluster disk Failover clustering prevents work interruptions if there is a computer
requirement failure. For a pool to support failover clustering, all drives in the pool
must support SAS.
Drive Drive allocation defines how the drive is allocated to the pool.
allocation Options are:

Data-store. This is the default allocation when any drive is added to



a pool. Storage Spaces can automatically select available capacity
Feature Description
on data-store drives for both storage space creation and JIT
allocation.
Manual. A manual drive is not used as part of a storage space
unless it is specifically selected when you create that storage space.

This drive allocation property lets administrators specify particular
types of drives for use only by certain storage spaces.
Hot spare. These are reserve drives that are not used in the creation
of a storage space, but are added to a pool. If a drive that is hosting

columns of a storage space fails, one of these reserve drives is
called on to replace the failed drive.
Provisioning You can provision a virtual disk by using one of two schemes:
schemes
Thin provisioning space. Thin provisioning enables storage to be
allocated readily on a just-enough and JIT basis. Storage capacity in
the pool is organized into provisioning slabs that are not allocated
until datasets require the storage. Instead of the traditional fixed

storage allocation method in which large portions of storage
capacity are allocated but might remain unused, thin provisioning
optimizes the use of any available storage by reclaiming storage
that is no longer needed, using a process known as trim.
Fixed provisioning space. In Storage Spaces, fixed provisioned
spaces also use flexible provisioning slabs. The difference is that it
allocates the storage capacity up front, at the time that you create
the space. You can create both thin and fixed provisioning virtual
disks within the same storage pool. Having both provisioned types

in the same storage pool is convenient, especially when they are
related to the same workload. For example, you can choose to use a
thin provisioning space for a shared folder containing user files, and
a fixed provisioning space for a database that requires high disk
I/O.
Stripe You can increase the performance of a virtual disk by striping data
parameters across multiple physical disks. When creating a virtual disk, you can
configure the stripe by using two parameters, NumberOfColumns and
Interleave.

A stripe represents one pass of data written to a storage space, with



data written in multiple stripes, or passes.
Columns correlate to underlying physical disks across which one

stripe of data for a storage space is written.
Interleave represents the amount of data written to a single column
per stripe.

Feature Description
The NumberOfColumns and Interleave parameters determine the
width of the stripe (e.g., stripe_width = NumberOfColumns *
Interleave). In the case of parity spaces, the stripe width determines
how much data and parity Storage Spaces writes across multiple
disks to increase performance available to apps. You can control
the number of columns and the stripe interleave when creating a
new virtual disk by using the Windows PowerShell cmdlet New-
VirtualDisk with the NumberOfColumns and Interleave
parameters.

When creating pools, Storage Spaces can use any DAS device. You can use SATA and
SAS drives (or even older integrated drive electronics [IDE] and SCSI drives) that are
connected internally to the computer. When planning your Storage Spaces storage
subsystems, you must consider the following factors:

Fault tolerance. Do you want data to be available in case a physical disk fails? If so,
• you must use multiple physical disks and provision virtual disks by using mirroring
or parity.
Performance. You can improve performance for read and write actions by using a
parity layout for virtual disks. You also need to consider the speed of each individual
physical disk when determining performance. Alternatively, you can use disks of

different types to provide a tiered system for storage. For example, you can use SSDs
for data to which you require fast and frequent access and use SATA drives for data
that you do not access as frequently.
Reliability. Virtual disks in parity layout provide some reliability. You can improve
• that degree of reliability by using hot spare physical disks in case a physical disk
fails.
Extensibility. One of the main advantages of using Storage Spaces is the ability to
expand storage in the future by adding physical disks. You can add physical disks to a

storage pool any time after you create it to expand its storage capacity or to provide
fault tolerance.

Demonstration: Configuring Storage Spaces


In this demonstration, you will see how to:

• Create a storage pool.


• Create a virtual disk and a volume.

Demonstration Steps Create a storage pool

On LON-SVR1, in Server Manager, access File and Storage Services and


1.
Storage Pools.
In the STORAGE POOLS pane, create a New Storage Pool named StoragePool1,
2.
and then add some of the available disks.
Create a virtual disk and a volume

In the VIRTUAL DISKS pane, create a New Virtual Disk with the following
settings:

o Storage pool: StoragePool1


1. o Disk name: Simple vDisk
o Storage layout: Simple
o Provisioning type: Thin
o Size: 2 GB
On the View results page, wait until the task completes, and then ensure that the
2.
Create a volume when this wizard closes check box is selected.
In the New Volume Wizard, create a volume with these settings:

3. o Virtual disk: Simple vDisk


o File system: ReFS
o Volume label: Simple Volume
4. Wait until the task completes, and then click Close.

Changes to file and storage services in Windows Server


2016
File and storage services includes technologies that help you deploy and manage one or
multiple file servers.

New features in Windows Server 2016

The following file and storage services features are new or improved in Windows
Server 2016:

Storage Spaces Direct. This feature enables you to build highly available storage
• systems by using storage nodes with only local storage. You will learn more about
this feature later in this module.
Storage Replica. This new feature in Windows Server 2016 enables replication—
between servers or clusters that are in the same location or different sites—for
• disaster recovery. Storage Replica includes both synchronous and asynchronous
replication for shorter or longer distance between sites. This enables you to achieve
storage replication at a lower cost.
Storage Quality of Service (QoS). With this feature, you can create centralized QoS
policies on a Scale-Out File Server and assign them to virtual disks on Hyper-V

virtual machines. QoS ensures that performance for the storage adapts to meet
policies as the storage load changes.
Data Deduplication. This feature was introduced in Windows Server 2012 and is
improved in Windows Server 2016 in the following areas (more information about
Data Deduplication is covered later in this module):

Support for volume sizes up to 64 terabytes (TB). The feature has been redesigned
in Windows Server 2016 and is now multithreaded and able to utilize multiple
o
CPU’s per volume to increase optimization throughput rates on volume sizes up to
64 TB.
• Support for file sizes up to 1 TB. With the use of new stream map structures and
o other improvements to increase optimization throughput and access performance,
deduplication in Windows Server 2016 performs well on files up to 1 TB.
Simplified deduplication configuration for virtualized backup applications. In
o Windows Server 2016, the configuration of deduplication for virtualized backup
applications is simplified when enabling deduplication for a volume.
Support for Nano Server. A new deployment option in Windows Server 2016,
o
Nano Server fully supports Data Deduplication.
Support for cluster rolling upgrades. You can upgrade each node in an existing
• Windows Server 2012 R2 cluster to Windows Server 2016 without incurring
downtime to upgrade all the nodes at once.
Server Message Block (SMB) hardening improvements. In Windows Server 2016,
client connections to the Active Directory Domain Services default SYSVOL and
NETLOGON shares on domain controllers now require SMB signing and mutual
authentication (e.g., Kerberos authentication). This change reduces the likelihood of

man-in-the-middle attacks. If SMB signing and mutual authentication are
unavailable, a Windows Server 2016 computer won’t process domain-based Group
Policy and scripts.
Note: The registry values for these settings aren’t present by default; however, the
hardening rules still apply until Group Policy or other registry values override them.

New features in Windows Server 2012 and Windows Server 2012 R2

Windows Server 2012 R2 and Windows Server 2012 offered several new and improved
file and storage-services features over its predecessor, including:

Multiterabyte volumes. This feature deploys multiterabyte NTFS file system


volumes, which support consolidation scenarios and maximize storage use. NTFS
• volumes on master boot record (MBR) formatted disks can be up to 2 terabytes (TB)
in size. Volumes on a globally unique identifier (GUID) partition table (GPT)
formatted disks can be up to 18 exabytes.
Data deduplication. This feature saves disk space by storing a single copy of identical

data on the volume.
iSCSI Target Server. The iSCSI Target Server provides block storage to other servers
and applications on the network by using the iSCSI standard. Windows Server 2012

R2 also includes VHDX support and end-to-end management by using the Storage
Management Initiative Specification.
Storage spaces and storage pools. This feature enables you to virtualize storage by
grouping industry standard disks into storage pools, and then create storage spaces
from the available capacity in the storage pools. Storage Spaces in Windows Server

2012 R2 enables you to create a tiered storage solution that transparently delivers an
appropriate balance between capacity and performance that can meet the needs of
enterprise workloads.
Unified remote management of File and Storage Services in Server Manager. You
• can use the Server Manager to manage multiple file servers remotely, including their
role services and storage.
Windows PowerShell cmdlets for File and Storage Services. You can use the
• Windows PowerShell cmdlets for performing most administration tasks for file and
storage servers.
ReFS. The new Resilient File System (ReFS) introduced in Windows Server 2012
• offers enhanced integrity, availability, scalability, and error protection for file-based
data storage.
Server Message Block (SMB) 3.0. SMB protocol is a network file-sharing protocol
• that allows applications to read and write to files and request services from server
programs on a network.
Offloaded Data Transfer (ODX). ODX functionality enables ODX-capable storage
• arrays to bypass the host computer and directly transfer data within or between
compatible storage devices.
Chkdsk. The new version of Chkdsk runs automatically in the background and
monitors the health of the system volume; enabling organizations to deploy
multiterabyte NTFS file system volumes without concern about endangering their

availability. The Chkdsk tool introduces a new approach. It prioritizes volume
availability and allows for the detection of corruption while the volume remains
online, and its data remains available to the user during maintenance.
Storage Spaces usage scenarios

When considering whether to use Storage Spaces in a given situation, you should weigh
the following benefits and limitations. The Storage Spaces feature was designed to
enable storage administrators to:

• Implement and easily manage scalable, reliable, and inexpensive storage.


• Aggregate individual drives into storage pools, which are managed as a single entity.
• Use inexpensive storage with or without external storage.
• Use different types of storage in the same pool (e.g., SATA, SAS, USB, SCSI).
• Grow storage pools as required.
• Provision storage when required from previously created storage pools.
• Designate specific drives as hot spares.
• Automatically repair pools containing hot spares.
• Delegate administration by pool.
Use the existing tools for backup and restore and Volume Shadow Copy Service

(VSCS) for snapshots.
Management can be local or remote, by using Microsoft Management Console

(MMC) or Windows PowerShell.
Utilize Storage Spaces with Failover Clusters.

Note: While the list above mentions USB as a supported storage medium, using USB
in a pool might be more practical on a Windows 8 client or while developing a proof
of concept. The performance of this technology also depends on the performance
capabilities of the storage you choose to pool together.

There are, however, inherent limitations in Storage Spaces. For example, in Windows
Server 2016, the following are some of the limitations that you should consider when
planning:

• Storage Spaces volumes are not supported on boot or system volumes.


The contents of a drive are lost when you introduce that drive into a storage pool.

o You should add only unformatted, or non-partitioned, drives.
• You must have at least one drive in a simple storage pool.
Fault tolerant configurations have specific requirements:

• o A mirrored pool requires a minimum of two drives.


o Three-way mirroring requires a minimum of five drives.
o Parity requires a minimum of three drives.
• All drives in a pool must use the same sector size.
Storage layers that abstract the physical disks are not compatible with Storage
Spaces, including:

o VHDs and pass-through disks in a virtual machine (VM).
o Storage subsystems deployed in a separate RAID layer.
• Fibre Channel and iSCSI are not supported.
Failover Clusters are limited to SAS as a storage medium.

Note: Microsoft Support provides troubleshooting assistance only in environments



when you deploy Storage Spaces on a physical machine, not a virtual machine. In
addition, just a bunch of disks (JBOD) hardware solutions that you implement must
be certified by Microsoft.

When planning for the reliability of a particular workload in your environment, Storage
Spaces provide different resiliency types. As a result, some workloads are better suited
for specific resilient scenarios. The following table depicts these recommended
workload types.

Resiliency Number of data Workload recommendations


type copies maintained
Mirror 2 (two-way mirror) Recommended for all workloads

3 (three-way mirror)
Parity 2 (single parity) Sequential workloads with large units of
read/write, such as archival
3 (dual parity)
Resiliency Number of data Workload recommendations
type copies maintained
Simple 1 Workloads which do not need resiliency, or
provide alternate resiliency mechanism

Storage Spaces Direct deployment scenarios

Storage Spaces Direct removes the need for a shared SAS fabric, simplifying
deployment and configuration. Instead, it uses the existing network as a storage fabric,
leveraging SMB 3.0 and SMB Direct for high-speed, low-latency CPU efficient storage.
To scale out, you simply add more servers to increase storage capacity and I/O
performance.

Storage Spaces Direct can be deployed in support of either primary storage of Hyper-V
Virtual Machine (VM) file or secondary storage for Hyper-V Replica virtual machine
files. In Windows Server 2016, both options provide storage for Hyper-V, specifically
focusing on Hyper-V IaaS (Infrastructure as a Service) for service providers and
enterprises.

In the disaggregated deployment scenario, the Hyper-V servers (compute component)


are located in a separate cluster from the Storage Spaces Direct servers (storage
component). The virtual machines are configured to store their files on the Scale-Out
File Server (SOFS). The SOFS is designed for use as a file share for server application
data and is accessed over the network using the SMB 3.0 protocol. This allows for
scaling Hyper-V clusters (compute) and SOFS cluster (storage) independently.

In the hyper-converged deployment scenario, the Hyper-V (compute) and Storage


Spaces Direct (storage) components are on the same cluster. This option does not
require deploying a SOFS, because the virtual machine files are stored on the CSVs.
This allows for scaling Hyper-V compute clusters and storage together and does not
require configuring file server access and permissions. Once you configure Storage
Spaces Direct and the CSV volumes are available, configuring and provisioning Hyper-
V is the same process and uses the same tools that you use with any other Hyper-V
deployment on a failover cluster.

You also can deploy Storage Spaces Direct in support of SQL Server 2012 or newer,
which can store both system and user database files. SQL Server is configured to store
these files on SMB 3.0 file shares for both stand-alone and clustered instances of SQL
Server. The database server accesses the SOFS over the network using the SMB 3.0
protocol. This scenario requires Windows Server 2012 or newer on both the file servers
and the database servers.

Note: Storage Spaces does not support Exchange Server workloads currently.

Interoperability with Azure virtual machines scenarios

You can use Storage Spaces inside an Azure virtual machine to combine multiple
virtual hard drives, creating more storage capacity or performance than is available from
a single Azure virtual hard drive. There are three supported scenarios for using Storage
Spaces in Azure virtual machines, but there are some limitations and best practices that
you should follow, as described below.

• As high performance and/or capacity storage for a virtual machine.


• As backup targets for System Center Data Protection Manager.
• As storage for Azure Site Recovery.

Multi-tenant scenarios

You can provide delegation of administration of storage pools through access control
lists (ACLs). You can delegate on a per-storage-pool basis, thereby supporting hosting
scenarios that require tenant isolation. Because Storage Spaces uses the Windows
security model, it can be integrated fully with Active Directory Domain Services.

Storage Spaces can be made visible only to a subset of nodes in the file cluster. This can
be used in some scenarios to leverage the cost and management advantage of larger
shared clusters and to segment those clusters for performance or access purposes.
Additionally, you can apply ACLs at various levels of the storage stack (for example,
file shares, CSV, and storage spaces). In a multitenant scenario, this means that the full
storage infrastructure can be shared and managed centrally and that you can design
dedicated and controlled access to segments of the storage infrastructure. You can
configure a particular customer to have LUNs, storage pools, storage spaces, cluster
shared volumes, and file shares dedicated to them, and ACLs can ensure only that the
tenant has access to them.

Additionally, by using SMB Encryption, you can ensure all access to the file-based
storage is encrypted to protect against tampering and eavesdropping attacks. The biggest
benefit of using SMB Encryption over more general solutions, such as IPsec, is that
there are no deployment requirements or costs beyond changing the SMB settings on
the server. The encryption algorithm used is AES-CCM, which also provides data
integrity validation.

Discussion: Comparing Storage Spaces to other storage


solutions
Storage Spaces in Windows Server 2016 provides an alternative to using more
traditional storage solutions, such as SANs and network-attached storage (NAS).

Consider the following questions to prepare for the class discussion:

Question: What are the advantages of using Storage Spaces compared to using SANs or
NAS?

Question: What are the disadvantages of using Storage Spaces compared to using
SANs or NAS?

Question: In what scenarios would you recommend each option?

Lesson 2: Managing Storage Spaces


Once you have implemented Storage Spaces, you must know how to manage and
maintain them. This lesson explores how to use Storage Spaces to mitigate disk failure,
to expand your storage pool, and to use logs and performance counters to ensure the
optimal behavior of your storage.

Lesson Objectives
After completing this lesson, you will be able to:
• Describe how to manage Storage Spaces.
• Explain how use Storage Spaces to mitigate storage failure.
• Explain how to expand your storage pool.
• Describe how to use event logs and performance counters to monitor Storage Spaces.

Managing Storage Spaces

Storage Spaces is integrated with failover clustering for high availability, and integrated
with cluster shared volumes (CSV) for SOFS deployments. You can manage Storage
Spaces by using:

• Server Manager
• Windows PowerShell
• Failover Cluster Manager
• System Center Virtual Machine Manager
• Windows Management Instrumentation (WMI)

Manage using Server Manager

Server Manager provides you with the ability to perform basic management of virtual
disks and storage pools. In Server Manager, you can create storage pools; add and
remove physical disks from pools; and create, manage, and delete virtual disks. For
example, in Server Manager you can view the physical disks that are attached to a
virtual disk. If any of these disks are unhealthy, you will see an unhealthy disk icon next
to the disk name.

Manage using Windows PowerShell

Windows PowerShell provides advanced management options for virtual disks and
storage pools. The following table lists some examples of management cmdlets.

Windows PowerShell Description


cmdlet
Get-StoragePool Lists storage pools.
Get-VirtualDisk Lists virtual disks.
Repair-VirtualDisk Repairs a virtual disk.
Get-PhysicalDisk | Where Lists unhealthy physical disks.
{$_.HealthStatus -ne
“Healthy”}
Reset-PhysicalDisk Removes a physical disk from a storage pool.
Get-VirtualDisk | Get- Lists physical disks that are used for a virtual disk.
PhysicalDisk
Optimize-Volume Optimizes a volume, performing such tasks on
supported volumes and system SKUs as
defragmentation, trim, slab consolidation, and storage
tier processing.

Additional Reading: For more information, refer to: “Storage Cmdlets in Windows
PowerShell” at: http://aka.ms/po9qve

To use Storage Spaces cmdlets in Windows PowerShell, you must download the
StorageSpaces module for use in Windows Server 2016. For more information, refer to:
“Storage Spaces Cmdlets in Windows PowerShell” at: http://aka.ms/M1fccp

Monitoring storage tier performance

When planning for storage tiering, you should assess the workload characteristics of
your storage environment so that you can store your data most cost-effectively
depending on how you use it. In Windows Server 2016, the server automatically
optimizes your storage performance by transparently moving the data that's accessed
more frequently to your faster solid state drives (the SSD tier) and moving less active
data to your less expensive, but higher capacity, hard disk drives (the HDD tier).

In many environments, the most common workload characteristics includes a large data
set with a majority of the data that is typically cold. Cold, or cool, data is files that you
access infrequently, and have a longer lifespan. In contrast, the most common workload
characteristics also includes a smaller portion of the data that is typically hot. Hot data,
commonly referred to as working set, is files that you are working on currently; this part
of the data set is highly active and changes over time.
Note: The storage tiers optimization process moves data, not files; the data is mapped
and moved at a sub-file level. For example, if only 30 percent of the data on a virtual
hard disk is hot, only that 30 percent of the data is moved to your SSD tier.

Additionally, when planning for storage tiering, you should assess if there are situations
in which a file works best when placed in a specific tier. For example, you need to place
an important file in the fast tier, or you need to place a backup file in the slow tier. For
these situations, your storage solution might have the option to assign a file to a
particular tier, also referred to as pinning the file to a tier.

Before you create storage spaces, plan ahead and give yourself room to fine-tune the
storage spaces after you observe your workloads in action. After observing input/output
operations per second (IOPS) and latency, you will be able to predict the storage
requirements of each workload more accurately. Here are some recommendations when
planning ahead:

Don't allocate all available SSD capacity for your storage spaces immediately. Keep
• some SSD capacity in the storage pool in reserve, so you can increase the size of an
SSD tier when a workload demands it.
Don't pin files to storage tiers until you see how well Storage Tiers Optimization can
optimize storage performance. When a tenant or workload requires a particular level

of performance, you can pin files to a storage tier to ensure that all I/O activity is
performed on that tier.
Do consider pinning the parent VHDX file to the SSD tier if you're providing pooled
desktops through VDI. If you have deployed a Virtual Desktop Infrastructure (VDI)

to provide pooled desktops for users, you should consider pinning the master image
that's used to clone users' desktops to the SSD tier.

You should use the Storage Tier Optimization Report when observing or monitoring
your workloads. This report is used to check the performance of your storage tiers and
identify the changes that might optimize their performance. As part of the performance
analysis, the report provides data for answering questions such as, “How large is my
working set?” and “How much do I gain by adding SSD capacity?”

Additional Reading: For more information, refer to: “Monitoring Storage Tiers
Performance” at: http://aka.ms/Sz4zfi

Managing disk failure with Storage Spaces


Before deployment, you should plan Storage Spaces to handle disk and JBOD enclosure
failures with minimal impact on service and minimal risk of data loss. With any storage
solution, you should expect that hardware failure will occur; this is especially true in a
large-scale storage solution.

To help avoid problems caused by failing hardware, your storage plan should account
for the types and number of failures, which might occur in your environment. You
should also plan for how your solution should handle each fault without service
interruption.

Design a complete, fault-tolerant storage solution. For example, if you want your
storage solution to be able to tolerate a single fault at any level, you need this
minimum setup:

o Two-way mirror or single-parity storage spaces.


• o A clustered file server.
o Redundant SAS connections between each file server node and each JBOD.
o Redundant network adapters and network switches.
Enough JBOD enclosures to tolerate an entire JBOD failing or becoming
o
disconnected.
Deploy a highly available storage pool. Using mirrored or parity virtual disks in
Storage Spaces provides some fault tolerance and high availability to storage
• resources. However, because all physical disks connect to a single system, that
system itself becomes a single point of failure. If the system to which the physical
disks are connected fails, access to the storage resources ceases to exist. Storage
Spaces in Windows Server 2016 supports creating a clustered storage pool when
using mirror spaces, parity spaces, and simple spaces. To cluster Storage Spaces, your
environment must meet the following requirements:

o All storage spaces in the storage pool must use fixed provisioning.
o Two-way mirror spaces must use three or more physical disks.
o Three-way mirror spaces must use five or more physical disks.
o All physical disks in a clustered pool must be connected by using SAS.
All physical disks must support persistent reservations and pass the failover cluster
validation tests.
o
Note: The SAS JBOD must be physically connected to all cluster nodes that will
use the storage pool. Direct attached storage that is not connected to all cluster
nodes is not supported for clustered storage pools with Storage Spaces.
Unless you deployed a highly available storage pool, import a storage pool on another
server if the system fails. In Windows Server 2016, Storage Spaces writes the
• configuration about the storage pool directly to the disks. Therefore, if the single-
point-of-failure system fails and the server hardware requires replacement or a
complete reinstall, you can mount a storage pool on another server.
Most problems with Storage Spaces occur because of incompatible hardware or
because of firmware issues. To reduce problems, follow these best practices:

Use only certified SAS-connected JBODs. These enclosure models have been
o tested with Storage Spaces and enable you to identify the enclosure and slot for a
physical disk easily.
Don't mix and match disk models within a JBOD. Use one model of solid-state
drive SSD and one model of HDD for all disks in a JBOD (assuming that you are
o
• using storage tiers), and make sure that the disks are fully compatible with the
JBOD model.
Install the latest firmware and driver versions on all disks. Install the firmware
version that is listed as approved for the device in the Windows Server Catalog or
o
is recommended by your hardware vendor. Within a JBOD, it's important that all
disks of the same model have the same firmware version.
Follow the vendor's recommendations for disk placement. Install disks in the slots
o recommended by your hardware vendor. JBODs often have different requirements
for placement of SSDs and HDDs, for cooling and other reasons.
Unless you enabled hot spares, retire missing disks automatically. The default policy
for handling a physical disk that goes missing from a storage pool (-
RetireMissingPhysicalDisks = Auto) simply marks the disk as missing (Lost
Communication), and no repair operation on the virtual disks takes place. This
policy avoids potentially I/O-intensive virtual disk repairs if a disk temporarily goes
• offline, but the storage pool health will remain degraded, compromising resiliency if
another disk fails before an administrator takes action. Unless you are using hot
spares, we recommend that you change the RetireMissingPhysicalDisks policy to
Always, to initiate virtual disk repair operations automatically if a disk loses
communication with the system, restoring the health of the pool and the dependent
storage spaces as soon as possible.
Always replace the physical disk before you remove the drive from the storage pool.
Changing the storage pool configuration before you replace the physical disk in the

enclosure can cause an I/O failure or initiate virtual disk repair, which can result in a
“STOP 0x50” error and potential data loss.
As a general rule, keep unallocated disk space in the pool for virtual disk repairs
instead of using hot spares. In Windows Server 2016, you have the option to use
available capacity on existing disks in the pool for disk repair operations instead of
bringing a hot spare online. This enables Storage Spaces to automatically repair
storage spaces with failed disks by copying data to multiple disks in the pool,
significantly reducing the time it takes to recover from the failed disk when compared
with using hot spares, and it lets you use the capacity on all disks instead of setting
• aside hot spares.

To correct a failed disk in a virtual disk or storage pool, you must remove the disk
o that is causing the problem. Actions such as defragmenting, scan disk, or using
chkdsk cannot repair a storage pool.
To replace a failed disk, you must add a new disk to the pool. The new disk
o resynchronizes automatically when disk maintenance occurs during daily
maintenance. Alternatively, you can trigger disk maintenance manually.
When you configure column counts, make sure you have enough physical disks to
support automatic virtual disk repairs. Typically, you should configure the virtual
disk with 3-4 columns for a good balance of throughput and low latency. Increasing
the column count increases the number of physical disks across which a virtual disk is
striped, which increases throughput and IOPS for that virtual disk. However,

increasing the column count can also increase latency. For this reason, you should
optimize overall cluster performance by using multiple virtual disks with 3−4
columns (when using mirrors) or seven columns when using parity spaces. The
performance of the entire cluster remains high because multiple virtual disks are used
in parallel, making up for the reduced column count.
Be prepared for multiple disk failures. If you purchased all of the disks in an
enclosure at the same time, the disks are the same age, and the failure of one disk
might be followed fairly quickly by other disk failures. Even if the storage spaces
return to health after the initial disk repairs, you should replace the failed disk as soon

as possible to avoid the risk of additional disk failures, which might compromise
storage health and availability and risk data loss. If you want to be able to delay disk
repairs safely until your next scheduled maintenance, configure your storage spaces
to tolerate two disk failures.
Provide fault tolerance at the enclosure level. If you need to provide an added level of
fault tolerance, at the enclosure level, deploy multiple, compatible JBODs that
support enclosure awareness. In an enclosure-aware storage solution, Storage Spaces
writes each copy of data to a specific JBOD enclosure. As a result, if one enclosure
fails or goes offline, the data remains available in one or more alternate enclosures.
• To use enclosure awareness with Storage Spaces, your environment must meet the
following requirements:

o JBOD storage enclosures must support SCSI Enclosure Services (SES).


o Storage Spaces must be configured as a mirror.
To tolerate one failed enclosure with two-way mirrors, you need three compatible
o
storage enclosures.
To tolerate two failed enclosures with three-way mirrors, you need five compatible
o
storage enclosures.

Storage pool expansion

One of the main benefits of using Storage Spaces is the ability to expand your storage
pool by adding additional storage. Occasionally, however, you must investigate the way
in which storage is being used across the disks in your pool before you are able to
extend the storage. This is because the blocks for your various virtual disks are
distributed across the physical disks in the storage pool in a configuration that is based
on the storage layout options that you selected when creating the pool. Depending upon
the specifics, you might not be able to extend the storage, even if there is available
space in the pool.

Example

Consider the following example:

In the first illustration, a storage pool consists of five disks, where disk1 is larger than
the others. Space is consumed across all five disks by vdisk1, while vdisk2 consumes
space only on disks 1 through 3.
FIGURE 4.1: A STORAGE POOL CONSISTING OF FIVE DISKS

In the second illustration, a sixth disk has been added to the storage pool.

FIGURE 4.2: A STORAGE POOL CONSISTING OF SIX DISKS


If you attempt to extend vdisk1, the maximum available space for that disk has
already been used, even though more space is available within the pool on disk 6.
• This is because the layout required by vdisk1—due to the options chosen at creation
(such as mirroring and parity) — needs five disks. Therefore, to expand vdisk1, you
would need to add four additional disks.
However, if you attempt to extend vdisk2, you can do so because that disk is
currently distributed across three devices and there is available space across those
three devices to extend it.

Note: In Storage Spaces, blocked storage is arranged as columns. Therefore, in a pre-
expanded state, vdisk1 uses five columns and vdisk2 uses three columns.
Vdisk2 might just be a virtual disk that used two-way mirroring. This means that data
on disk1 is duplicated on disk2 and disk3. If you wish to expand a virtual disk with

two-way mirroring, it has to have the appropriate number of columns available to
accommodate the needs of the virtual disk.

Determining Column Usage

Before you add storage to a storage pool, you must determine the current distribution of
blocks across the devices by determining column usage. To do this, you can use the
Windows PowerShell cmdlet Get-VirtualDisk.

Note: For more information, refer to: “Storage Spaces Frequently Asked Questions
(FAQ)” at: http://aka.ms/knx5zg

Expanding a storage pool

After you determine column usage where necessary, you can expand your storage pool
using one of these options:

Server Manager. Open Server Manager, select File and Storage Services, and then
• click Storage Pools. You can add a physical disk by right-clicking the pool, and then
click Add Physical Disk.
Windows PowerShell. You can use the Windows PowerShell cmdlet Add-
PhysicalDisk to add a physical disk to the storage pool. For example:

Add-PhysicalDisk –VirtualDiskFriendlyName UserData –PhysicalDisks
(Get-PhysicalDisk -
FriendlyName PhysicalDisk3, PhysicalDisk4)

Demonstration: Managing Storage Spaces by using


Windows PowerShell
In this demonstration, you will see how to use Windows PowerShell to:

• View the properties of a storage pool.


• Add physical disks to a storage pool.
Demonstration Steps View the properties of a storage pool

1. On LON-SRV1, open Windows PowerShell.


2. View the current storage configuration in Server Manager.
Run the following commands:

To return a list of storage pools with their current health and operational status,
a. run the following command:
Get-StoragePool
To return more information about StoragePool1, run the following command:
b.
3. Get-StoragePool StoragePool1 | fl
To return detailed information about your virtual disks, including provisioning
c. type, parity layout, and health, run the following command:
Get-VirtualDisk | fl
To return a list of physical disks than can be pooled, run the following command:
d.
Get-PhysicalDisk | Where {$_.canpool –eq “true”}

Add physical disks to a storage pool

Run the following commands:

To create a new virtual disk in StoragePool1, run the following command:

New-VirtualDisk –StoragePoolFriendlyName StoragePool1 -


a. FriendlyName Data -Size
2GB

You can see this new virtual disk in Server Manager.


1.
To add a list of physical disks that can be pooled to the variable, run the
b. following command:
$canpool = Get-PhysicalDisk –CanPool $true
To add the physical disks in the variable to StoragePool1, run the following
command:
c.
Add-PhysicalDisk -PhysicalDisks $canpool -StoragePoolFriendlyName
StoragePool1
2. View the additional physical disks in Server Manager.

Event logs and performance counters


With any storage technology, it is important that you monitor storage behavior and
function to ensure ongoing reliability, availability, and optimal performance.

Using the Event Log

When problems are identified in the storage architecture, Storage Spaces generates
errors, and then logs these errors to the Event Log. You can access these events by
using the Event Log tool, or by accessing the recorded errors by using Server Manager
or Windows PowerShell cmdlets. The following table identifies common Event IDs
associated with problematic storage.

Event Message Cause


ID
100 Physical drive %1 failed to read the A physical drive can fail to read the
configuration or returned corrupt configuration or return corrupt data for a
data for storage pool %2. As a storage pool for the following reasons:
result, the in- memory The physical drive might fail requests
configuration might not be the most with device I/O errors. The physical drive
recent copy of the configuration. might contain corrupted storage pool
Return Code: %3. configuration data. The physical drive
might contain insufficient memory
resources.
102 Majority of the physical drives of A write failure might occur when
storage pool %1 failed a •
writing a storage pool configuration to
configuration update, which caused
Event Message Cause
ID
the pool to go into a failed state. physical drives for the following
Return Code: %2. reasons:
Physical drives might fail requests with

device I/O errors.
An insufficient number of physical
• drives are online and updated with their
latest configurations.
The physical drive might contain

insufficient memory resources.
103 The capacity consumption of the The capacity consumption of the storage
storage pool %1 has exceeded the pool has exceeded the threshold limit set
threshold limit set on the pool. on the pool.
Return Code: %2.
104 The capacity consumption of the The capacity consumption of the storage
storage pool %1 is now below the pool returns to a level that is below the
threshold limit set on the pool. threshold limit set on the pool.
Return Code: %2.
200 Windows was unable to read the Windows was unable to read the drive
drive header for physical drive %1. header for a physical drive.
If you know the drive is still usable,
then resetting the drive health by
using the command line or GUI
might clear this failure condition
and enable you to reassign the drive
to its storage pool. Return Code:
%2.
201 Physical drive %1 has invalid meta- The metadata on a physical drive has
data. Resetting the health status by become corrupt.
using the command line or GUI
might bring the physical drive to
the primordial pool. Return Code:
%2.
202 Physical drive %1 has invalid meta- The metadata on a physical drive has
data. Resetting the health status by become corrupt.
using the command line or GUI
might resolve the issue. Return
Code: %2.
203 An I/O failure has occurred on An I/O failure has occurred on a physical
Physical drive %1. Return Code: drive.
%2.
300 Physical drive %1 failed to read the A physical drive can fail to read the
configuration or returned corrupt • configuration or return corrupt data for
data for storage space %2. As a the following reasons:
result, the in- memory
Event Message Cause
ID
configuration might not be the most The physical drive might fail requests
recent copy of the configuration. •
with device I/O errors.
Return Code: %3. The physical drive might contain
• corrupted storage space configuration
data.
The physical drive might contain

insufficient memory resources.
301 All pool drives failed to read the You can experience all physical drives
configuration or returned corrupt failing to read their configuration or
data for storage space %1. As a •
returning corrupt data for storage
result, the storage space will not spaces for the following reasons:
attach. Return Code: %2. Physical drives might fail requests with

device I/O errors.
Physical drives might contain corrupted

storage pool configuration data.
The physical drive might contain

insufficient memory resources.
302 Majority of the pool drives hosting The majority of the pool drives hosting
space meta-data for storage space space metadata for a storage space can
%1 failed a space meta-data update, • fail a metadata update for the following
which caused the storage pool to go reasons:
in failed state. Return Code: %2. Physical drives might fail requests with

device I/O errors.
Insufficient number of physical drives

have online storage space metadata.
The physical drive might contain

insufficient memory resources.
303 Drives hosting data for storage This event can occur if a drive in the
space have failed or are missing. As storage pool fails or is removed.
a result, no copy of data is
available. Return Code: %2.
304 One or more drives hosting data for One or more drives hosting data for a
storage space %1 have failed or are storage space have failed or are missing.
missing. As a result, at least one As a result, at least one copy of data is not
copy of data is not available. available. However, at least one copy of
However, at least one copy of data data is still available.
is still available. Return Code: %2.
306 The attempt to map or allocate The attempt to map or allocate more
more storage for the storage space storage for the storage space has failed.
%1 has failed. This is because there More physical drives are needed.
was a write failure involved in the
updating the storage space
metadata. Return Code: %2.
Event Message Cause
ID
307 The attempt to unmap or trim the The attempt to unmap or trim the listed
storage space %1 has failed. Return storage space has failed.
Code: %2.
308 The driver initiated a repair attempt The driver initiated a repair attempt for
for storage space %1. Return Code: storage space. This is a normal condition.
%2. No further action is required.

Performance monitoring

Most decisions that you make regarding the configuration of your storage architecture
have an impact on the performance of your storage architecture. This is also true for
using Storage Spaces to implement your storage architecture. Performance is better or
worse because of the balance between multiple factors including cost, reliability,
availability, power, and ease-of-use.

There are multiple components that handle storage requests within your storage
architecture, including:

• File cache management.


• File system architecture.
• Volume management.
• Physical storage hardware.
• Storage Spaces configuration options.

You can use Windows PowerShell and Performance Monitor to monitor the
performance of your storage pools. If you want to use Windows PowerShell, you must
install the Storage Spaces Performance Analysis module for Windows PowerShell.

Note: To download the “Storage Spaces Performance Analysis module for Windows
PowerShell” module, go to: http://aka.ms/b1d52u

To use Windows PowerShell to generate and collect performance data, at a Windows


PowerShell prompt, run the following cmdlet:

Measure-StorageSpacesPhysicalDiskPerformance -StorageSpaceFriendlyName
StorageSpace1
-MaxNumberOfSamples 60 -SecondsBetweenSamples 2 -
ReplaceExistingResultsFile
-ResultsFilePath StorageSpace1.blg -SpacetoPDMappingPath PDMap.csv

This cmdlet:

Monitors the performance of all physical disks associated with the storage space

named StorageSpace1.
• Captures performance data for 60 seconds at two-second intervals.
• Replaces the results files if they already exist.
• Stores the performance log in the file named StorageSpace1.blg.
• Stores the physical disk mapping information in a file named PDMap.csv.

You can use Performance Monitor to view the data collected in the two files specified in
the cmdlet above, named StorageSpace1.blg and PDMap.csv.

Lab A: Implementing Storage Spaces


Scenario

A. Datum corporation has purchased a number of hard disk drives and SSDs and you
have been tasked

with creating a storage solution that can utilize these new devices to the fullest. With
mixed requirements in A. Datum for data access and redundancy, you must ensure that
you have a redundancy solution for critical data that does not require fast disk read and
write access. You also must create a solution for data that does require fast read and
write access.

You decide to use Storage Spaces and storage tiering to meet the requirements.

Objectives

After completing this lab, you will be able to:

• Create a storage space.


• Enable and configure storage tiering.

Lab Setup

Estimated Time: 40 minutes

Virtual machines: 20740B-LON-DC1, 20740B-LON-SVR1

User name: Adatum\Administrator

Password: Pa55w.rd

For this lab, you need to use the available virtual machine environment. Before you
begin the lab, you must complete the following steps:

1. On the host computer, start Hyper-V Manager.


In Hyper-V Manager, click 20740B-LON-DC1, and, in the Actions pane, click
2.
Start.
3. In the Actions pane, click Connect. Wait until the virtual machine starts.
Sign in using the following credentials:

4. o User name: Administrator


o Password: Pa55w.rd
o Domain: Adatum
5. Repeat steps 2 through 4 for 20740B-LON-SVR1.

Exercise 1: Creating a Storage Space


Scenario

Your server does not have a hardware-based RAID card, but you have been asked to
configure redundant storage. To support this feature, you must create a storage pool.

After creating the storage pool, you must create a redundant virtual disk. Because the
data is critical, the request for redundant storage specifies that you must use a three-way
mirrored volume. Shortly after the volume is in use, a disk fails, and you have to replace
it by adding another disk to the storage pool.

The main tasks for this exercise are as follows:

1. Create a storage pool from six disks that are attached to the server.
2. Create a three-way mirrored virtual disk (need at least five physical disks).
3. Copy a file to the volume, and verify it is visible in File Explorer.
4. Remove a physical drive to simulate drive failure.
5. Verify that the file is still available.
6. Add a new disk to the storage pool and remove the broken disk.

Task 1: Create a storage pool from six disks that are attached to the server

1. On LON-SVR1, open Server Manager.


In the left pane, click File and Storage Services, and then, in the Servers pane,
2.
click Storage Pools.
Create a storage pool with the following settings:

3. o Name: StoragePool1
o Physical disks: first 6 disks.

Task 2: Create a three-way mirrored virtual disk (need at least five physical disks)

On LON-SVR1, in Server Manager, in the VIRTUAL DISKS pane, create a virtual


disk with the following settings:
1.
o Storage pool: StoragePool1
o Name: Mirrored Disk
o Storage Layout: Mirror
o Resiliency settings: Three-way mirror
o Provisioning type: Thin
Virtual disk size: 10 GB
o
Note: If the three-way resiliency setting is unavailable, proceed to the next step in
the lab.
In the New Volume Wizard, create a volume with the following settings:

o Virtual disk: Mirrored Disk


2. o Drive letter: H
o File system: ReFS
o Volume label: Mirrored Volume

Task 3: Copy a file to the volume, and verify it is visible in File Explorer

1. On LON-SVR, open Command Prompt.


Type the following command, and then press Enter:
2.
Copy C:\windows\system32\write.exe H:\
Open File Explorer from the taskbar, and then access Mirrored Volume (H:). You
3.
should see write.exe in the file list.

Task 4: Remove a physical drive to simulate drive failure

On the host computer, in Hyper-V Manager, in the Virtual Machines pane, change
the 20740B-LON-SVR1 settings to the following:

o Remove the hard drive that begins with 20740B-LON-SVR1-Disk1.

Task 5: Verify that the file is still available

1. Switch to LON-SVR1.
2. Open File Explorer, and then go to H:\.
3. Verify that write .exe is still available.
In Server Manager, in the STORAGE POOLS pane, on the menu bar, click
Refresh “Storage Pools”.
4.
Note: Notice the warning that is visible next to Mirrored Disk.
Open the Mirrored Disk Properties dialog box, and then access the Health pane.
5.
Note: Notice that the Health Status indicates a warning. The Operational Status
should indicate one or more of the following: Incomplete, Unknown, or Degraded.
6. Close the Mirrored Disk Properties dialog box.
Task 6: Add a new disk to the storage pool and remove the broken disk

On LON-SVR1, in Server Manager, in the STORAGE POOLS pane, on the menu


1.
bar, click Refresh “Storage Pools”.
In the STORAGE POOLS pane, right-click StoragePool1, click Add Physical
2.
Disk, and then add the first disk in the list.
To remove the disconnected disk, open Windows PowerShell, and then run the
following commands:

Get-PhysicalDisk

3. a. Note: Note the FriendlyName for the disk that shows an OperationalStatus of
Lost Communication. Use this disk name in the next command in place of
diskname.
b. $Disk = Get-PhysicalDisk -FriendlyName ‘diskname’
Remove-PhysicalDisk -PhysicalDisks $disk -StoragePoolFriendlyName
c.
StoragePool1
4. In Server Manager, refresh the storage pools view to see the warnings disappear.

Results: After completing this exercise, you should have successfully created a storage
pool and added five disks to it. Additionally, you should have created a three-way
mirrored, thinly-provisioned virtual disk from the storage pool. You also should have
copied a file to the new volume and then verified that it is accessible. Next, after
removing a physical drive, you should have verified that the virtual disk was still
available and that you could access it. Finally, you should have added another physical
disk to the storage pool.

Exercise 2: Enabling and configuring storage tiering


Scenario

Management wants you to implement storage tiers to take advantage of the high-
performance attributes of a number of SSDs, while utilizing less expensive hard disk
drives for less frequently accessed data.

The main tasks for this exercise are as follows:

1. Use the Get-PhysicalDisk cmdlet to view all available disks on the system.
2. Create a new storage pool.
3. View the media types.
Specify the media type for the sample disks and verify that the media type is
4.
changed.
5. Create pool-level storage tiers by using Windows PowerShell.
6. Create a new virtual disk with storage tiering by using the New Virtual Disk Wizard.
7. Prepare for the next lab.

Task 1: Use the Get-PhysicalDisk cmdlet to view all available disks on the system
On LON-SVR1, in Windows PowerShell (Admin), run the following command:

Get-PhysicalDisk

Task 2: Create a new storage pool

In Windows PowerShell, run the following commands:

1. $canpool = Get-PhysicalDisk –CanPool $true


New-StoragePool -FriendlyName "TieredStoragePool" -
StorageSubsystemFriendlyName
"Windows Storage*" -PhysicalDisks $canpool
Open File Explorer, and then run D:\Labfiles\Mod04\mod4.ps1 script. This
2.
configures the disk names for the next part of the exercise.

Task 3: View the media types

To verify the media types, on LON-SVR1, in Windows PowerShell, run the


following command:

Get-StoragePool –FriendlyName TieredStoragePool | Get-PhysicalDisk |
Select
FriendlyName, MediaType, Usage, BusType

Task 4: Specify the media type for the sample disks and verify that the media type
is changed

To configure the media types, on LON-SVR1, in Windows PowerShell, run the


following commands:
1.
Set-PhysicalDisk –FriendlyName PhysicalDisk1 –MediaType SSD
Set-PhysicalDisk –FriendlyName PhysicalDisk2 –MediaType HDD
To verify the media types, run the following command:
2.
Get-PhysicalDisk | Select FriendlyName, Mediatype, Usage, BusType

Task 5: Create pool-level storage tiers by using Windows PowerShell

To create pool-level storage tiers, one for SSD media types and one for HDD media
types, on LON-SVR1, in Windows PowerShell, run the following commands:

New-StorageTier –StoragePoolFriendlyName TieredStoragePool -


• FriendlyName HDD_Tier
–MediaType HDD
New-StorageTier –StoragePoolFriendlyName TieredStoragePool -
FriendlyName SSD_Tier
–MediaType SSD

Task 6: Create a new virtual disk with storage tiering by using the New Virtual
Disk Wizard
1. On LON-SVR1, in Server Manager, in Storage Pools, refresh the display.
In the VIRTUAL DISKS pane, create a virtual disk with the following settings:

o Storage pool: TieredStoragePool


2. o Name: TieredVirtDisk
o Storage layout: Simple
o Provisioning type: Fixed
o Virtual disk size: 4 GB (2 GB on each physical disk)
In the New Volume Wizard, create a volume with the following settings:

o Virtual disk: TieredVirtDisk


o Drive letter: R
3. o File system: ReFS
Volume label: Tiered Volume
o
Note: If ReFS is not available from the file system drop-down menu, select
NTFS.
In the properties of TieredVirtDisk, observe:

o Storage tiers
4. o Capacity
o Allocated space
o Used pool space
o Storage layout

Task 7: Prepare for the next lab

• When you complete the lab, leave the virtual machines running for the next lab.

Results: After completing this exercise, you should have successfully enabled and
configured storage tiering.

Question: At a minimum, how many disks must you add to a storage pool to create a
three-way mirrored virtual disk?

Question: You have a USB-attached disk, four SAS disks, and one SATA disk that are
attached to a Windows Server 2012 server. You want to provide a single volume to your
users that they can use for file storage. What would you use?

Lesson 3: Implementing Data


Deduplication
Data Deduplication is a role service of Windows Server 2016. This service identifies
and removes duplications within data without compromising data integrity. It does this
to achieve the ultimate goals of storing more data and using less physical disk space.
This lesson explains how to implement Data Deduplication in Windows Server 2016
storage.

Lesson Objectives
After completing this lesson, you will be able to:

• Describe Data Deduplication in Windows Server 2016.


• Identify Data Deduplication components in Windows Server 2016.
• Explain how to deploy Data Deduplication.
• Describe common usage scenarios for data deduplication.
• Explain how to monitor and maintain data deduplication.
• Describe backup and restore considerations with Data Deduplication.

What is Data Deduplication?

To cope with data storage growth in the enterprise, organizations are consolidating
servers and making capacity scaling and data optimization the key goals. Data
Deduplication provides practical ways to achieve these goals, including:
Capacity optimization. Data Deduplication stores more data in less physical space. It
achieves greater storage efficiency as compared to features such as Single Instance
• Store (SIS) or NTFS compression. Data deduplication uses subfile variable-size
chunking and compression, which deliver optimization ratios of 2:1 for general file
servers and up to 20:1 for virtualization data.
Scale and performance. Data Deduplication is highly scalable, resource efficient, and
nonintrusive. While it can process up to 50 MB per second in Windows Server 2012
R2 and about 20 MB of data per second in Windows Server 2012, Windows Server
2016 is staged to perform significantly better, through the advancements in the
Deduplication Processing Pipeline. In this latest version of Windows Server, Data
Deduplication can run multiple threads in parallel by using multiple I/O queues on

multiple volumes simultaneously without affecting other workloads on the server.
Throttling the CPU maintains the low impact on the server workloads and memory
resources that are consumed; if the server is very busy, deduplication can stop
completely. In addition, you have the flexibility to run Data Deduplication jobs at any
time, set schedules for when data deduplication should run, and establish file
selection policies.
Reliability and data integrity. When you apply Data Deduplication to a volume on a
server, it maintains the integrity of the data. Data Deduplication uses checksum
results, consistency, and identity validation to ensure data integrity. Data

Deduplication maintains redundancy, for all metadata and the most frequently
referenced data, to ensure that the data is repaired, or at least recoverable, in the event
of data corruption.
Bandwidth efficiency with BranchCache. Through integration with BranchCache, the
same optimization techniques are applied to data transferred over the WAN to a

branch office. The result is faster file download times and reduced bandwidth
consumption.
Optimization management with familiar tools. Data Deduplication has optimization
functionality built into Server Manager and Windows PowerShell. Default settings
can provide savings immediately, or you can fine-tune the settings to see more gains.
By using Windows PowerShell cmdlets, you can start an optimization job or schedule

one to run in the future. Installing the Data Deduplication feature and enabling
deduplication on selected volumes can also be accomplished by using the
Unattend.xml file that calls a Windows PowerShell script and can be used with
Sysprep to deploy deduplication when a system first boots.

The Data Deduplication process involves finding and removing duplication within data
without compromising its fidelity or integrity. The goal is to store more data in less
space by segmenting files into small variable-sized chunks (32–128 KB), identifying
duplicate chunks, and maintaining a single copy of each chunk.

After deduplication, files are no longer stored as independent streams of data, and they
are replaced with stubs that point to data blocks that are stored within a common chunk
store. Because these files share blocks, those blocks are only stored once, which reduces
the disk space needed to store all files. During file access, the correct blocks are
transparently assembled to serve the data without the application or the user having any
knowledge of the on-disk transformation to the file. This enables you to apply
deduplication to files without having to worry about any change in behavior to the
applications or impact to users who are accessing those files. Data Deduplication works
best in storage scenarios with large amounts of data that are not modified frequently.

Enhancements to the Data Deduplication role service

Windows Server 2016 includes several important improvements to the way Data
Deduplication worked in Windows Server 2012 R2 and Windows Server 2012,
including:

Support for volume sizes up to 64 TB. Data Deduplication in Windows Server 2012
R2 does not perform well on volumes greater than 10 TB in size (or less for
workloads with a high rate of data changes), the feature has been redesigned in
• Windows Server 2016. Deduplication Processing Pipeline is now multithreaded and
able to utilize multiple CPUs per volume to increase optimization throughput rates on
volume sizes up to 64 TB. This is a limitation of VSS, on which Data Deduplication
is dependent.
Support for file sizes up to 1 TB. In Windows Server 2012 R2, very large files are not
good candidates for Data Deduplication. However, with the use of the new stream
• map structures and other improvements to increase the optimization throughput and
access performance, deduplication in Windows Server 2016 performs well on files up
to 1 TB.
Simplified deduplication configuration for virtualized backup applications. Although
Windows Server 2012 R2 supports deduplication for virtualized backup applications,
it requires manually tuning the deduplication settings. In Windows Server 2016,

however, the configuration of deduplication for virtualized backup applications is
drastically simplified by a predefined usage-type option when enabling deduplication
for a volume.
Support for Nano Server. Nano Server is a new deployment option in Windows
Server 2016 that has a smaller system resource footprint, starts up significantly faster,
• and requires fewer updates and restarts than by using the Sever Core deployment
option for Windows Server. In addition, Nano Server fully supports Data
Deduplication.
Support for cluster rolling upgrades. Windows servers in a failover cluster running
deduplication can include a mix of nodes running Windows Server 2012 R2 and
nodes running Windows Server 2016. This major enhancement provides full data
access to all of your deduplicated volumes during a cluster rolling upgrade. For
example, you can gradually upgrade each deduplication node in an existing Windows
Server 2012 R2 cluster to Windows Server 2016 without incurring downtime to

upgrade all the nodes at once.

Note: Although both the Windows Server versions of deduplication can access the
optimized data, the optimization jobs run only on the Windows Server 2012 R2
deduplication nodes and are blocked from running on the Windows Server 2016
deduplication nodes until the cluster rolling upgrade is complete.

Effectively, Data Deduplication in Windows Server 2016, allows you to efficiently


store, transfer, and backup fewer bits.

Volume requirements for Data Deduplication


After you install the role service, you can enable Data Deduplication on a per-volume
basis. Data Deduplication includes the following requirements:

Volumes must not be a system or boot volume. Because most files used by an
operating system are constantly open, Data Deduplication on system volumes would

negatively affect the performance because deduplicated data would need to be
expanded again before you could use the files.
Volumes might be partitioned by using master boot record (MBR) or GUID partition

table (GPT) format and must be formatted by using the NTFS or ReFS file system.
Volumes must be attached to the Windows Server and cannot appear as non-
• removable drives. This means that you cannot use USB or floppy drives for Data
Deduplication, nor use remotely-mapped drives.
• Volumes can be on shared storage, such as Fibre Channel, iSCSI SAN, or SAS array.
Files with extended attributes, encrypted files, files smaller than 32 KB, and reparse

point files will not be processed for Data Deduplication.
• Data Deduplication is not available for Windows client operating systems.

Data Deduplication components

The Data Deduplication role service consists of several components. These components
include:
Filter driver. This component monitors local or remote I/O and handles the chunks of
• data on the file system by interacting with the various jobs. There is one filter driver
for every volume.
Deduplication service. This component manages the following job types:

Optimization. Consisting of multiple jobs, they perform both deduplication and


compression of files according to the data deduplication policy for the volume.
o
After initial optimization of a file, if the file is then modified and meets the data
deduplication policy threshold for optimization, the file will be optimized again.
Garbage Collection. Data Deduplication includes garbage collection jobs to
process deleted or modified data on the volume so that any data chunks no longer
referenced are cleaned up. This job processes previously deleted or logically
o overwritten optimized content to create usable volume free space. When an
optimized file is deleted or overwritten by new data, the old data in the chunk store
is not deleted right away. While garbage collection is scheduled to run weekly, you
might consider running garbage collection only after large deletions have occurred.
Scrubbing. Data Deduplication has built-in data integrity features such as
checksum validation and metadata consistency checking. It also has built-in
redundancy for critical metadata and the most popular data chunks. As data is
accessed or deduplication jobs process data, if these features encounter corruption,
they record the corruption in a log file. Scrubbing jobs use these features to
analyze the chunk store corruption logs and, when possible, to make repairs.
Possible repair operations include using three sources of redundant data:

Deduplication keeps backup copies of popular chunks when they are referenced
• over 100 times in an area called the hotspot. If the working copy is corrupted,
o ▪ deduplication uses its redundant copy in the case of soft corruptions such as bit
flips or torn writes.
If using mirrored Storage Spaces, deduplication can use the mirror image of the

redundant chunk to serve the I/O and fix the corruption.
If a file is processed with a chunk that is corrupted, the corrupted chunk is
eliminated, and the new incoming chunk is used to fix the corruption.

Note: Because of the additional validations that are built into deduplication, the
deduplication subsystem is often the first system to report any early signs of
data corruption in the hardware or file system.
Unoptimization. This job undoes deduplication on all of the optimized files on the
volume. Some of the common scenarios for using this type of job include
decommissioning a server with volumes enabled for Data Deduplication,
troubleshooting issues with deduplicated data, or migration of data to another
system that doesn’t support Data Deduplication. Before you start this job, you
o should use the Disable-DedupVolume Windows PowerShell cmdlet to disable
further data deduplication activity on one or more volumes. After you disable Data
Deduplication, the volume remains in the deduplicated state, and the existing
deduplicated data remains accessible; however, the server stops running
optimization jobs for the volume, and it does not deduplicate the new data.
Afterwards, you would use the unoptimization job to undo the existing
deduplicated data on a volume. At the end of a successful unoptimization job, all
of the data deduplication metadata is deleted from the volume.

Note: You should be cautious when using the unoptimization job because all the
deduplicated data will return to the original logical file size. As such, you should
verify the volume has enough free space for this activity or move/delete some of
the data to allow the job to complete successfully.

Data Deduplication process

In Windows Server 2016, Data Deduplication transparently removes duplication


without changing access semantics. When you enable Data Deduplication on a volume,
a post-process, or target, deduplication is used to optimize the file data on the volume
by performing the following actions:

Optimization jobs, which are background tasks, run with low priority on the server to

process the files on the volume.
By using an algorithm, segment all file data on the volume into small, variable-sized

chunks that range from 32 KB to 128 KB.
• Identifies chunks that have one or more duplicates on the volume.
• Inserts chunks into a common chunk store.
Replaces all duplicate chunks with a reference, or stub, to a single copy of the chunk

in the chunk store.
Replaces the original files with a reparse point, which contains references to its data

chunks.
Compresses chunks and organizes them in container files in the System Volume

Information folder.
• Removes primary data stream of the files.

The Data Deduplication process works through scheduled tasks on the local server, but
you can run the process interactively by using Windows PowerShell. More information
about this is discussed later in the module.

Data deduplication does not have any write-performance impact because the data is not
deduplicated while the file is being written. Windows Server 2016 uses post-process
deduplication, which ensures that the deduplication potential is maximized. Another
advantage with this type of deduplication process is that your application servers and
client computers offload all processing, which means less stress on the other resources
in your environment. There is, however, a small performance impact when reading
deduplicated files.

Note: The three main types of data deduplication are source, target (or post-process
deduplication), and in-line (or transit deduplication).

Data Deduplication potentially can process all of the data on a selected volume, except
for files that are less than 32 KB in size, and files in folders that are excluded. You must
carefully determine if a server and its attached volumes are suitable candidates for
deduplication prior to enabling the feature. You should also consider backing up
important data regularly during the deduplication process.

After you enable a volume for deduplication and the data is optimized, the volume
contains the following elements:

Unoptimized files. Includes files that do not meet the selected file-age policy setting,
• system state files, alternate data streams, encrypted files, files with extended
attributes, files smaller than 32 KB, or other reparse point files.
Optimized files. Includes files that are stored as reparse points that contain pointers to
• a map of the respective chunks in the chunk store that are needed to restore the file
when it is requested.
• Chunk store. Location for the optimized file data.
Additional free space. The optimized files and chunk store occupy much less space

than they did prior to optimization.

Deploying Data Deduplication

Planning a Data Deduplication deployment

Prior to installing and configuring Data Deduplication in your environment, you must
plan your deployment using the following steps:
Target deployments. Data Deduplication is designed to be applied on primary – and
not to logically extended – data volumes without adding any additional dedicated
hardware.

You can schedule deduplication based on the type of data that is involved and the
frequency and volume of changes that occur to the volume or particular file types.
You should consider using deduplication for the following data types:

• o General file shares. Group content publication and sharing, user home folders, and
Folder Redirection/Offline Files.
o Software deployment shares. Software binaries, images, and updates.
VHD libraries. Virtual hard disk (VHD) file storage for provisioning to
o
hypervisors.
VDI deployments. Virtual Desktop Infrastructure (VDI) deployments using Hyper-
o
V.
Virtualized backup. Backup applications running as Hyper-V guests saving backup
o
data to mounted VHDs.
Determine which volumes are candidates for deduplication. Deduplication can be
very effective for optimizing storage and reducing the amount of disk space
consumed – saving you 50 to 90 percent of your system’s storage space when applied
to the right data. Use the following considerations to evaluate which volumes are
ideal candidates for deduplication:

Is duplicate data present?

File shares or servers which host user documents, software deployment binaries, or
o virtual hard disk files tend to have plenty of duplication, and yield higher storage
savings from deduplication. More information on the deployment candidates for
deduplication and the supported/unsupported scenarios are discussed later in this
module.
Does the data access pattern allow for sufficient time for deduplication?

For example, files that frequently change and are often accessed by users or
applications are not good candidates for deduplication. In these scenarios,
o
deduplication might not be able to process the files, as the constant access and
change to the data are likely to cancel any optimization gains made by
deduplication. On the other hand, good candidates allow time for deduplication of
the files.
Does the server have sufficient resources and time to run deduplication?

Deduplication requires reading, processing, and writing large amounts of data,


o which consumes server resources. Servers typically have periods of high activity
and times when there is low resource utilization; the deduplication jobs work more
efficiently when resources are available. However, if a server is constantly at
maximum resource capacity, it might not be an ideal candidate for deduplication.
Evaluate savings with the Deduplication Evaluation Tool. You can use the

Deduplication Evaluation Tool, DDPEval.exe, to determine the expected savings that
you would get if you enable deduplication on a particular volume. DDPEval.exe
supports evaluating local drives and mapped or unmapped remote shares.

Note: When you install the deduplication feature, the Deduplication Evaluation Tool
(DDPEval.exe) is automatically installed to the \Windows\System32\ directory.

For more information, refer to “Plan to Deploy Data Deduplication” at:


http://aka.ms/sxzd2l

Plan the rollout, scalability, and deduplication policies. The default deduplication
policy settings are usually sufficient for most environments. However, if your
deployment has any of the following conditions, you might consider altering the
default settings:

Incoming data is static or expected to be read-only, and you want to process files
o on the volume sooner. In this scenario, change the MinimumFileAgeDays setting
• to a smaller number of days to process files earlier.
You have directories that you do not want to deduplicate. Add a directory to the
o
exclusion list.
You have file types that you do not want to deduplicate. Add a file type to the
o
exclusion list.
The server has different off-peak hours than the default and you want to change the
o Garbage Collection and Scrubbing schedules. Update the schedules using
Windows PowerShell.

Installing and configuring Data Deduplication

After completing your planning, you need to use the following steps to deploy Data
Deduplication to a server in your environment:

Install Data Deduplication components on the server. Use the following options to
install deduplication components on the server:

Server Manager. In Server Manager, you can install Data Deduplication by


navigating to Add Roles and Features Wizard > under Server Roles > select
o
File and Storage Services > select the File Services check box > select the Data
• Deduplication check box > click Install.
Windows PowerShell. You can use the following command to install Data
Deduplication:
o
Import-Module ServerManager
Add-WindowsFeature -Name FS-Data-Deduplication
Import-Module Deduplication
Enable Data Deduplication. Use the following options to enable Data Deduplication
on the server:

Server Manager. From the Server Manager dashboard:
o
i. Right-click a data volume and select Configure Data Deduplication.
In the Data deduplication box, select the workload you want to host on the
volume. For example, select General purpose file server for general data
ii.
files or Virtual Desktop Infrastructure (VDI) server when configuring
storage for running virtual machines.
Enter the minimum number of days that should elapse from the date of file
creation before files are deduplicated, enter the extensions of any file types
iii.
that should not be deduplicated, and then click Add to browse to any folders
with files that should not be deduplicated.
Click Apply to apply these settings and return to the Server Manager
iv. dashboard, or click the Set Deduplication Schedule button to continue to set
up a schedule for deduplication.
Windows PowerShell. Use the following command to enable deduplication on a
volume:

Enable-DedupVolume –Volume VolumeLetter –UsageType StorageType

Note: Replace VolumeLetter with the drive letter of the volume. Replace
o StorageType with the value corresponding to the expected type of workload for the
volume. Acceptable values include:

• HyperV. A volume for Hyper-V storage.


• Backup. A volume that is optimized for virtualized backup servers.
• Default. A general purpose volume.

Optionally, you can use the Windows PowerShell cmdlet Set-DedupVolume to


configure additional options, such as the minimum number of days that should elapse
from the date of file creation before files are deduplicated, the extensions of any file
types that should not be deduplicated, or the folders that should be excluded from
deduplication.

Configure Data Deduplication jobs. With Data Deduplication jobs, you can run them
manually, on demand, or use a schedule. The following list are the types of jobs
which you can perform on a volume:

Optimization. Includes built-in jobs which are scheduled automatically for


optimizing the volumes on a periodic basis. Optimization jobs deduplicate data and
o compress file chunks on a volume per the policy settings. You can also use the
• following command to trigger an optimization job on demand:
Start-DedupJob –Volume VolumeLetter –Type Optimization
Data Scrubbing. Scrubbing jobs are scheduled automatically to analyze the
volume on a weekly basis and produce a summary report in the Windows event
o log. You can also use the following command to trigger a scrubbing job on
demand:

Start-DedupJob –Volume VolumeLetter –Type Scrubbing


Garbage Collection. Garbage collection jobs are scheduled automatically to
process data on the volume on a weekly basis. Because garbage collection is a
processing-intensive operation, you may consider waiting until after the deletion
o load reaches a threshold to run this job on demand or schedule the job for after
hours. You can also use the following command to trigger a garbage collection job
on demand:

Start-DedupJob –Volume VolumeLetter –Type GarbageCollection


Unoptimization. Unoptimization jobs are available on an as-needed basis and are
not scheduled automatically. However, you can use the following command to
trigger an unoptimization job on demand:
o Start-DedupJob –Volume VolumeLetter –Type Unoptimization

Note: For more information, refer to “Set-DedupVolume” at:


http://aka.ms/o30xqw
Configure Data Deduplication schedules. When you enable Data Deduplication on a
server, three schedules are enabled by default: Optimization is scheduled to run every
hour, and Garbage Collection and Scrubbing are scheduled to run once a week. You
can view the schedules by using this Windows PowerShell cmdlet Get-
DedupSchedule. These scheduled jobs run on all the volumes on the server.
However, if you want to run a job only on a particular volume, you must create a new
job. You can create, modify, or delete job schedules from the Deduplication Settings
• page in Server Manager, or by using the Windows PowerShell cmdlets: New-
DedupSchedule, Set-DedupSchedule, or Remove-DedupSchedule.

Note: Data Deduplication jobs only support, at most, weekly job schedules. If you
need to create a schedule for a monthly job or for any other custom time period, use
Windows Task Scheduler. However, you will be unable to view these custom job
schedules created with Windows Task Scheduler by using the Windows PowerShell
cmdlet Get-DedupSchedule.

Demonstration: Implementing Data Deduplication


In this demonstration, you will see how to:

• Install the Data Deduplication role service.


• Enable Data Deduplication.
• Check the status of Data Deduplication.

Demonstration Steps Install the Data Deduplication role service

• On LON-SVR1, in Server Manager, add the Data Deduplication role service.

Enable Data Deduplication

1. Open File Explorer and observe the available volumes and free space.
2. Return to File and Storage Services.
3. Click Disks.
4. Click the 1 disk, and then click the D volume.
5. Enable Data Deduplication, and then click the General purpose file server setting.
Configure the following settings:

6. a. Deduplicate files older than (in days): 1


b. Enable throughput optimization
c. Exclude: D:\shares

Check the status of Data Deduplication

1. Switch to Windows PowerShell.


Execute the following commands to verify Data Deduplication status:

a. Get-DedupStatus
2. b. Get-DedupStatus | fl
c. Get-DedupVolume
d. Get-DedupVolume |fl
e. Start-DedupJob D: -Type Optimization –Memory 50
Repeat commands 2a and 2c.
3.
Note: Because most the files on drive D are small, you may not notice a significant
amount of saved space.
4. Close all open windows.

Usage scenarios for Data Deduplication


The following table highlights typical deduplication savings for various content types.

Your data storage savings will vary by data type, the mix of data, and the size of the
volume and the files that the volume contains. You should consider using the
Deduplication Evaluation Tool to evaluate the volumes before you enable
deduplication.

User documents. This includes group content publication or sharing, user home
folders (or MyDocs), and profile redirection for accessing offline files. Applying

Data Deduplication to these shares might save you up to 30 to 50 percent of your
system’s storage space.
Software deployment shares. This includes software binaries, cab files, symbols files,
• images, and updates. Applying Data Deduplication to these shares might be able to
save you up to 70 to 80 percent of your system’s storage space.
Virtualization libraries. This includes virtual hard disk files (i.e., .vhd and .vhdx files)
storage for provisioning to hypervisors. Applying Data Deduplication to these

libraries might be able to save you up to 80 to 95 percent of your system’s storage
space.
General file share. This includes a mix of all the types of data identified above.
• Applying Data Deduplication to these shares might save you up to 50 to 60 percent of
your system’s storage space.

Data Deduplication deployment candidates


Based on observed savings and typical resource usage in Windows Server 2016,
deployment candidates for deduplication are ranked as follows:

Ideal candidates for deduplication

o Folder redirection servers


o Virtualization depot or provisioning library
o Software deployment shares
o SQL Server and Exchange Server backup volumes
o Scale-out File Servers (SoFS) CSVs
o Virtualized backup VHDs (e.g., DPM)
• VDI VHDs (only personal VDIs)
Note: In most VDI deployments, special planning is required for the boot storm,
which is the name given to the phenomenon of large numbers of users trying to
simultaneously log in to their VDI, typically upon arriving to work in the morning.
o In turn, this hammers the VDI storage system and can cause long delays for VDI
users. However, in Windows Server 2016, when chunks are read from the on-disk
deduplication store during startup of a virtual machine, they are cached in memory.
As a result, subsequent reads don’t require frequent access to the chunk store
because the cache intercepts them; the effects of the boot storm are minimized
because the memory is much faster than disk.
Should be evaluated based on content

o Line-of-business servers
• o Static content providers
o Web servers
o High-performance computing (HPC)
Not ideal candidates for deduplication

• o Hyper-V hosts
o WSUS
o SQL Server and Exchange Server database volumes

Data Deduplication interoperability

In Windows Server 2016, you should consider the following related technologies and
potential issues when deploying Data Deduplication:

BranchCache. Access to data over the network can be optimized by enabling


BranchCache on Windows servers and clients. When a BranchCache-enabled system
communicates over a WAN with a remote file server that is enabled for Data
• Deduplication, all of the deduplicated files are already indexed and hashed, so
requests for data from a branch office are quickly computed. This is similar to
preindexing or prehashing a BranchCache-enabled server.
Note: BranchCache is a feature which can reduce wide area network (WAN)
utilization and enhance network application responsiveness when users access
content in a central office from branch office locations. When you enable
BranchCache, a copy of the content that is retrieved from the web server or file server
is cached within the branch office. If another client in the branch requests the same
content, the client can download it directly from the local branch network without
needing to retrieve the content by using the WAN.
Failover Clusters. Windows Server 2016 fully supports failover clusters, which
means deduplicated volumes will failover gracefully between nodes in the cluster.
Effectively, a deduplicated volume is a self-contained and portable unit (i.e., all of the
data and configuration information that the volume contains) but requires that each
• node in the cluster that accesses deduplicated volumes must be running the Data
Deduplication feature. When a cluster is formed, the Deduplication schedule
information is configured in the cluster. As a result, if a deduplicated volume is taken
over by another node, the scheduled jobs will be applied on the next scheduled
interval by the new node.
FSRM quotas. Although you should not create a hard quota on a volume root folder
enabled for deduplication, using File Server Resource Manager (FSRM), you can
create a soft quota on a volume root which is enabled for deduplication. When FSRM
encounters a deduplicated file, it will identify the file’s logical size for quota
calculations. Consequently, quota usage (including any quota thresholds) does not
change when deduplication processes a file. All other FSRM quota functionality,
including volume-root soft quotas and quotas on subfolders, will work as expected
when using deduplication.

Note: File Server Resource Manager (FSRM) is a suite of tools for Windows Server
2016 that allows you to identify, control, and manage the quantity and type of data
stored on your servers. FSRM enables you to configure hard or soft quotas on folders
and volumes. A hard quota prevents users from saving files after the quota limit is
reached; whereas, a soft quota does not enforce the quota limit, but generates a
notification when the data on the volume reaches a threshold. When a hard quota is
enabled on a volume root folder enabled for deduplication, the actual free space on
the volume and the quota restricted space on the volume are not the same; this might
cause deduplication optimization jobs to fail.
DFS Replication. Data Deduplication is compatible with Distributed File System
(DFS) Replication. Optimizing or unoptimizing a file will not trigger a replication
because the file does not change. DFS Replication uses Remote Differential
Compression (RDC), not the chunks in the chunk store, for over-the-wire savings. In
fact, you can optimize the files on the replica instance by using deduplication if the

replica is enabled for Data Deduplication.

Note: Single Instance Storage (SIS), a file system filter driver used for NTFS file
deduplication, was deprecated in Windows Server 2012 R2 and completely removed
in Windows Server 2016.

Monitoring and maintaining Data Deduplication


After you deploy Data Deduplication in your environment, it is important that you
monitor and maintain the systems that are enabled for Data Deduplication and the
corresponding data storage to ensure optimal performance. While Data Deduplication in
Windows Server 2016 includes a lot of automation, including optimization jobs, the
deduplication process requires that you verify the efficiency of optimization; make the
appropriate adjustments to systems, storage architecture, and volumes; and troubleshoot
any issues with Data Deduplication.

Monitoring and reporting of Data Deduplication

When planning for Data Deduplication in your environment, you will inevitably ask
yourself, “What size should my configured deduplicated volumes be?” Although
Windows Server 2016 supports Data Deduplication on volumes up to 64 TB, you must
assess the appropriate size of the deduplicated volumes that your environment can
support. For many, the answer to this question is that it depends on your hardware
specifications and your unique workload. More specifically, it depends primarily on
how much and how frequently the data on the volume changes and the data access
throughput rates of the disk storage subsystem.

Monitoring the efficiency of Data Deduplication in your environment is instrumental in


every phase of your deployment, especially during your planning phase. As detailed
earlier in the module, Data Deduplication in Windows Server 2016 performs intensive
I/O and compute operations. In most deployments, deduplication operates in the
background or on a daily schedule on each day’s new or modified data (i.e., data churn);
as long as deduplication is able to optimize all of the data churn on a daily basis, the
volume size will work for deduplication. On the other hand, some organizations simply
create a 64 TB volume, enable deduplication, and then wonder why they experience low
optimization rates. Most likely in this scenario, deduplication is not able to keep up with
the incoming churn from a dataset that is too large on a configured volume. Although
Data Deduplication in Windows Server 2016 runs multiple threads in parallel using
multiple I/O queues on multiple volumes simultaneously, the deduplication
environment might require additional computing power.

You should consider the following when estimating the size of your volumes enabled
for Data Deduplication:

• Deduplication optimization must be able to keep up with the daily data churn.
• The total amount of churn scales with the size of the volume.
The speed of deduplication optimization significantly depends on the data access

throughput rates of the disk storage subsystem.

Therefore, to estimate the maximum size for a deduplicated volume, you should be
familiar with the size of the data churn and the speed of optimization processing on your
volumes. You can choose to use reference data, such as server hardware specifications,
storage drive/array speed, and deduplication speed of various usage types, for your
estimations. However, the most accurate method of assessing the appropriate volume
size is to perform the measurements directly on your deduplication system based on the
representative samples of your data, such as data churn and deduplication processing
speed.

You should consider using the following options to monitor deduplication in your
environment and to report on its health:

Windows PowerShell cmdlets. After you enable the Data Deduplication feature on a
server, you can use the following Windows PowerShell cmdlets:

Get-DedupStatus. The most commonly used cmdlet, this cmdlet returns the
deduplication status for volumes which have data deduplication metadata, which
o
includes the deduplication rate, the number/sizes of optimized files, the last run-
time of the deduplication jobs, and the amount of space saved on the volume.
Get-DedupVolume. This cmdlet returns the deduplication status for volumes that
have data deduplication metadata. The metadata includes the deduplication rate,
o the number/sizes of optimized files, and deduplication settings such as minimum
• file age, minimum file size, excluded files/folders, compression-excluded file
types, and the chunk redundancy threshold.
Get-DedupMetadata. This cmdlet returns status information of the deduplicated
data store for volumes that have data deduplication metadata, which includes the
number of:

o ▪ Data chunks in a container.


▪ Containers in the data store.
▪ Data streams in a container.
▪ Containers in the stream map store.
▪ Hotspots in a container.
▪ Hotspots in the stream map store.
▪ Corruptions on the volume.
Get-DedupJob. This cmdlet returns the deduplication status and information for
currently running or queued deduplication jobs.

One common scenario is to assess whether deduplication is keeping pace with the
rate of incoming data. You can use the Get-DedupStatus cmdlet to monitor the
number of optimized files compared with the number of in-policy files. This
enables you to see if all the in-policy files are processed. If the number of in-policy
o files is continuously rising faster than the number of optimized files, you should
examine your hardware specifications for appropriate utilization or the type of data
on the volume usage type to ensure deduplication efficiency. However, if the
output value from the cmdlet for LastOptimizationResult is 0x00000000, the
entire dataset was processed successfully during the previous optimization job.

Note: For more information, refer to: “Storage Cmdlets in Windows PowerShell”
at: http://aka.ms/po9qve
Event Viewer logs. Monitoring the event log can also be helpful to understand
deduplication events and status. To view deduplication events, in Event Viewer,
• navigate to Applications and Services Logs, click Microsoft, click Windows, and
then click Deduplication. For example, Event ID 6153 will provide you with the
elapsed time of a deduplication job and the throughput rate.
Performance Monitor data. In addition to using the counters for monitoring server
performance, such as CPU and memory, you can use the typical disk counters to
monitor the throughput rates of the jobs that are currently running, such as: Disk
Read Bytes/sec, Disk Write Bytes/sec, and Average Disk sec/Transfer. Depending on
other activities on the server, you might be able to use the data results from these
counters to get a rough estimate of the saving ratio by examining how much data is
being read and how much is being written per interval. You can also use the Resource

Monitor to identify the resource usage of specific programs/services. To view disk
activity, in Windows Resource Monitor, filter the list of processes to locate
fsdmhost.exe and examine the I/O on the files under the Disk tab.

Note: Fsdmhost.exe is the executable file for the Microsoft File Server Data
Management Host process, which is used by the Data Deduplication process in
Windows Server 2016.
File Explorer. While not the ideal choice for validating deduplication on an entire
volume, you can use File Explorer to spot check deduplication on individual files. In
viewing the properties of a file, you notice that Size displays the logical size of the
• file, and Size on Disk displays the true physical allocation of the file. For an
optimized file, Size on Disk is less than the actual file size. This is because
deduplication moves the contents of the file to a common chunk store and replaces
the original file with an NTFS reparse point stub and metadata.

Maintaining Data Deduplication


With the data that is collected by monitoring, you can use the following Windows
PowerShell cmdlets to ensure optimal efficiency of deduplication in your environment.

Update-DedupStatus. Some of the storage cmdlets, such as Get-DedupStatus and


Get-DedupVolume, retrieve information from the cached metadata. This cmdlet

scans volumes to compute new Data Deduplication information for updating the
metadata.
Start-DedupJob. This cmdlet is used to launch ad hoc deduplication jobs, such as
optimization, garbage collection, scrubbing, and unoptimization. For example, you

might consider launching an ad hoc optimization job if a deduplicated volume is low
on available space because of extra churn.
Measure-DedupFileMetadata. This cmdlet is used to measure potential disk space
on a volume. More specifically, this cmdlet returns how much disk space you can
reclaim on a volume if you delete a group of folders and subsequently run a garbage

collection job. Files often have chunks that are shared across other folders. The
deduplication engine calculates which chunks are unique and would be deleted after
the garbage collection job.
Expand-DedupFile. This cmdlet expands an optimized file into its original location.
You might need to expand optimized files because of compatibility with applications

or other requirements. Ensure there is enough space on the volume to store the
expanded file.

Troubleshooting adverse effects of Data Deduplication

When Data Deduplication in Windows Server 2016 adversely impacts an application or


access to a file, several options are available including:

Use a different deduplication frequency by changing the schedule or opting for



manual deduplication jobs.
Use job options such as:

StopWhenSystemBusy, which halts deduplication if the job interferes with the


o
server's workload.
Preempt, which causes the deduplication engine to move specific deduplication
o
jobs to the top of the job queue and cancel the current job.
ThrottleLimit, which sets the maximum number of concurrent operations which
o
can be established by specific deduplication jobs.
• o Priority, which sets the CPU and I/O priority for specific deduplication jobs.
Memory, which specifies the maximum percentage of physical computer memory
that the data deduplication job can use.

Note: While allowing deduplication to manage memory allocation automatically is


o recommended, you might need to adjust the maximum percentage in some
scenarios. For most of these scenarios, you should consider a maximum percentage
within a range of 15 to 50, and a higher memory consumption for jobs that you
schedule to run when you specify the StopWhenSystemBusy parameter. For
garbage collection and scrubbing deduplication jobs, which you typically schedule
to run after business hours, you can consider using a higher memory consumption,
such as 50.
Use the Expand-DedupFile cmdlet to expand, or undeduplicate, specific files if

needed for compatibility or performance.
Use the Start-DedupJob cmdlet with the Unoptimization job type to disable

deduplication on a volume.

Troubleshooting Data Deduplication corruptions

Data Deduplication in Windows Server 2016 provides functionality to detect, report,


and even repair data corruptions. In fact, data integrity is considered highly important
by deduplication, because a large number of deduplicated files might be referencing a
single popular chunk, which gets corrupted. While there are a number of features built
into deduplication to help protect against corruption, there are still some scenarios
where deduplication might not recover automatically from corruption.

Additional Reading: For more information, refer to: “Troubleshooting Data


Deduplication Corruptions” at: http://aka.ms/Tdz13m

Some of the most common causes for deduplication to report corruption are:

Incompatible Robocopy options used when copying data. Using Robocopy with the
/MIR option on the volume root as the target wipes the deduplication store. To avoid
this problem, use the /XD option to exclude the System Volume Information folder
from the scope of the Robocopy command.

Note: For more information, refer to: “FSRM and Data Deduplication may be
adversely affected when you use Robocopy /MIR in Windows Server 2012” at:
http://aka.ms/W0ux7m
Incompatible Backup/Restore program used on a deduplicated volume. You should
verify whether your backup solution supports Data Deduplication in Windows Server

2016, as unsupported backup solutions might introduce corruptions after a restore.
More information about this is covered later in this module.
Migrating a deduplicated volume to a down-level Windows Server version. File
corruption messages might be reported on files accessed from a deduplicated volume,
which is mounted on an older version of Windows Server, but were optimized on a
later version of the operating system. In this scenario, you should verify the version
of the server accessing the deduplicated data is the same version level or higher than
• the version of the server that optimized the data on the volume. Although
deduplicated volumes can be remounted on different servers, deduplication is
backward compatible but not forward compatible; you can upgrade and migrate to a
newer version of Windows Server, but data deduplicated by a newer version of
Windows Server cannot be read on older versions of Windows Server and might
report the data as corrupted when trying to read.
Enabling compression on the root of a volume also enabled with deduplication.
Deduplication is not supported on volumes that have compression enabled at the root.

As a result, this might lead to the corruption and inaccessibility of deduplicated files.
Note: Deduplication of files in compressed folders is supported in Windows Server
2016 and should function normally.
Hardware issues. Many hardware storage issues are detectable early by using the
• deduplication scrubbing job. Refer to the general corruption troubleshooting steps
below for more information.
General corruption. You can use the steps below to troubleshoot most general causes
for deduplication to report corruption:

Check the Event Logs for details of corruption. Check the deduplication
Scrubbing Event logs for cases of early file corruption and attempted corruption
fixes by the scrubbing job. Any corruption detected by deduplication is logged to
the event log. The Scrubbing channel lists any corruptions that were detected and
files that were attempted to be fixed by the job. The deduplication Scrubbing
Event logs are located in the Event Viewer (under Application and Services >
Microsoft > Windows > Deduplication > Scrubbing). In addition, searching for
hardware events in the System Event logs and Storage Spaces Event logs will
a.
often yield additional information about hardware issues.

Note: The potentially large number of events in the deduplication Scrubbing


Event log might be difficult to parse through the Event Viewer. A publicly
available script that generates an easy-to-read HTML which highlights detected
corruptions and the results of any attempted corruption fixes from the Scrubbing
job. For more information, refer to: “Generate Deduplication Scrubbing Report”
at: http://aka.ms/N75avw

Run CHKDSK in the read-only mode. While this command can repair some data
corruption on volumes, running the command without any parameters will initiate
a read-only scan.
b.
Additional Reading: For more information, refer to: “CHKDSK” at:
http://aka.ms/Nep9wf
Run deep Scrubbing job to repair detected corruptions. A must for corruption
investigations, a deep Scrubbing job should be used to ensure that all corruptions
are logged in the deduplication scrubbing channel in the Event Logs. The
scrubbing events will provide a breakdown of the corruptions, including corrupted
chunks, affected files, the exact container offsets of the corruption, and the list of
affected files (up to 10 K files).
c.
You can use the following command in Windows PowerShell to initiate a deep
Scrubbing job:

Start-DedupJob VolumeLetter -Type Scrubbing –Full

Note: Replace VolumeLetter with the drive letter of the volume.

Backup and restore considerations with Data


Deduplication
One of the benefits of using Data Deduplication is that backup and restore operations
are faster. This is because you have reduced the space used on a volume, meaning there
is less data to back up.

When you perform an optimized backup, your backup is also smaller. This is because
the total size of the optimized files, non-optimized files, and data deduplication chunk
store files are much smaller than the logical size of the volume.

Note: Many block-based backup systems should work with data deduplication,
maintaining the optimization on the backup media. File-based backup operations that do
not use deduplication usually copy the files in their original format.

The following backup and restore scenarios are supported with deduplication in
Windows Server 2016:

• Individual file backup/restore


• Full volume backup/restore
• Optimized file-level backup/restore using VSS writer

On the other hand, the following backup and restore scenarios are not supported with
deduplication in Windows Server 2016:

• Backup or restore of only the reparse points.


• Backup or restore of only the chunk store.
In addition, a backup application can perform an incrementally optimized backup as
follows:

• Back up only the changed files created, modified, or deleted since your last backup.
• Back up the changed chunk store container files.
Perform an incremental backup at the sub-file level.

• Note: New chunks are appended to the current chunk store container. When its size
reaches approximately 1 GB, that container file is sealed and a new container file is
created.

Restore operations

Restore operations also can benefit from data deduplication. Any file-level, full-volume
restore operations can benefit because they are essentially a reverse of the backup
procedure, and less data means quicker operations. The method of a full volume restore
is:

1. The complete set of data deduplication metadata and container files are restored.
2. The complete set of data deduplication reparse points are restored.
3. All non-deduplicated files are restored.

Block-level restore from an optimized backup is automatically an optimized restore


because the restore process occurs under data deduplication, which works at the file
level.

As with any product from a third-party vendor, you should verify whether the backup
solution supports Data Deduplication in Windows Server 2016, as unsupported backup
solutions might introduce corruptions after a restore. Here are the common methods on
solutions which support Data Deduplication in Windows Server 2016:

Some backup vendors support unoptimized backup, which rehydrates the



deduplicated files upon backup; i.e., backs up the files as normal, full-size files.
Some backup vendors support optimized backup for a full volume backup, which

backs up the deduplicated files as-is; i.e., as a reparse point stub with the chunk store.
• Some backup vendors support both

The backup vendor should be able to comment on what their product supports, the
method it uses and with which version.

Note: For more information, refer to: “Backup and Restore of Data Deduplication-
Enabled Volumes” at: http://aka.ms/w8iows

Question: Can you enable Data Deduplication on a drive with storage tiering enabled?

Question: Can you enable Data Deduplication on ReFS formatted drives?


Question: Can you enable Date Deduplication on volumes in which virtual machines
are running and apply it to those virtual machines?

Lab B: Implementing Data


Deduplication
Scenario

After you have tested the storage redundancy and performance options, you decide that
it also would be beneficial to maximize the available disk space that you have,
especially on generic file servers. You decide to test Data Deduplication solutions to
maximize storage availability for users.

New: After you have tested the storage redundancy and performance options, you now
decide that it would also be beneficial to maximize the available disk space that you
have, especially around virtual machine storage which is in ever increasing demand.
You decide to test out Data Deduplication solutions to maximize storage availability for
virtual machines.

Objectives

After completing this lab, you will be able to:

• Install the Data Deduplication role service.


• Enable Data Deduplication.
• Check the status of Data Deduplication.

Lab Setup

Estimated Time: 40 minutes

Virtual machines: 20740B-LON-DC1 and 20740B-LON-SVR1

User name: Adatum\Administrator

Password: Pa55w.rd

For this lab, you must use the available virtual machine environment. These should
already be running from Lab A. If they are not, before you begin the lab, you must
complete the following steps and then complete Lab A:

1. On the host computer, start Hyper-V Manager.


In Hyper-V Manager, click 20740B-LON-DC1, and, in the Actions pane, click
2.
Start.
3. In the Actions pane, click Connect. Wait until the virtual machine starts.
Sign in using the following credentials:

4. o User name: Administrator


o Password: Pa55w.rd
o Domain: Adatum
5. Repeat steps 2 through 4 for 20740B-LON-SVR1.

Exercise 1: Installing Data Deduplication


Scenario

You decide to install the Data Deduplication role service on intensively used file servers
by using Server Manager.

The main tasks for this exercise are as follows:

1. Install the Data Deduplication role service.


2. Check the status of Data Deduplication.
3. Verify the virtual machine performance.

Task 1: Install the Data Deduplication role service

• On LON-SVR1, in Server Manager, add the Data Deduplication role service.

Task 2: Check the status of Data Deduplication

1. Switch to Windows PowerShell.


To verify Data Deduplication status, run the following commands:
2.
Get-DedupVolume
Get-DedupStatus
These commands return no results. This is because you need to enable it on the
3.
volume after installing it.

Task 3: Verify the virtual machine performance

On LON-SRV1, in Windows PowerShell, run the following command:

• Measure-Command -Expression {Get-ChildItem –Path D:\ -Recurse}

Note: You will use the values returned from the previous command later in the lab.

Results: After completing this exercise, you should have successfully installed the Data
Deduplication role service and enabled it on one of your file servers.

Exercise 2: Configuring Data Deduplication


Scenario

You determine that drive E is heavily used and you suspect it contains duplicate files in
some folders. You decide to enable and configure the Data Deduplication role to reduce
the consumed space on this volume.

The main tasks for this exercise are as follows:

1. Configure Data Deduplication.


2. Configure optimization to run now and view the status.
3. Verify if the file has been optimized.
4. Verify VM performance again.
5. Prepare for the next module.

Task 1: Configure Data Deduplication

1. In Server Manager, click File and Storage Services.


2. Click Disks.
3. Click disk 1, and then click the D volume.
4. Enable Data Deduplication for General purpose file server setting.
Configure the following settings:

5. o Deduplicate files older than (in days): 0


o Enable throughput optimization.
o Exclude: D:\shares

Task 2: Configure optimization to run now and view the status

On LON-SRV1, in Windows PowerShell, run the following commands:

Start-DedupJob D: -Type Optimization –Memory 50


• Get-DedupJob –Volume D:

Note: Verify the status of the optimization job from the previous command. Repeat
the previous command until the Progress shows as 100%.

Task 3: Verify if the file has been optimized

On LON-SVR1, in File Explorer, navigate to the files in D:\Labfiles\Mod04 and


1.
observe the following values from a few files properties: Size and Size on disk.
In Windows PowerShell, to verify Data Deduplication status, run the following
commands:

2. Get-DedupStatus –Volume D: | fl
Get-DedupVolume –Volume D: |fl

Note: Observe the number of optimized files.


In Server Manager, click File and Storage Services, select Disk 1, and then select
3.
Volume D.
Refresh the display, and observe the values for Deduplication Rate and
Deduplication Savings.
4.
Note: Because most of the files on drive D are small, you might not notice a
significant amount of saved space.

Task 4: Verify VM performance again

In Windows PowerShell, run the following command:

Measure-Command -Expression {Get-ChildItem –Path D:\ -Recurse}



Note: Compare the values returned from the previous command with the value of the
same command earlier in the lab to assess if system performance has changed.

Task 5: Prepare for the next module

When you complete the lab, revert the virtual machines to their initial state.

1. On the host computer, start Hyper-V Manager.


In the Virtual Machines list, right-click 20740B-LON-SVR1, and then click
2.
Revert.
3. In the Revert Virtual Machine dialog box, click Revert.
4. Repeat steps 2 and 3 for 20740B-LON-DC1.

Results: After completing this exercise, you should have successfully configured Data
Deduplication for the appropriate data volume on LON-SVR1.

Question: Your manager is worried about the impact that using data deduplication will
have on the write performance of your file servers’ volumes. Is this concern valid?

Module Review and Takeaways


Review Questions

Question: You attach five 2-TB disks to your Windows Server 2012 computer. You
want to simplify the process of managing the disks. In addition, you want to ensure that
if one disk fails, the failed disk’s data is not lost. What feature can you implement to
accomplish these goals?

Question: Your manager has asked you to consider the use of Data Deduplication
within your storage architecture. In what scenarios is the Data Deduplication role
service particularly useful?

Common Issues and Troubleshooting Tips


Common Issue Troubleshooting
Tip
Some files cannot be read when the free disk space on a
deduplicated volume approaches zero.

You might also like