WS-013 Azure Stack HCI
WS-013 Azure Stack HCI
Stack HCI
This module describes how to plan for and implement Azure Stack HCI Storage
Lessons:
o Overview of Azure Stack HCI storage core technologies
This lesson describes the three primary mechanisms Azure Stack HCI uses for storage
Topics:
Non-volatile storage technologies
File system technologies
Overview of Storage Spaces Direct
Non-volatile storage technologies
CSV:
o Serve as a general-purpose clustered file system layered above NTFS and ReFS
o Provide simultaneous read-write access to the same volume from multiple cluster
nodes:
• Each node can independently read from and write to individual files on the volume
• A single node functions as the CSV owner of the volume and handles metadata
orchestration
o Enables faster failover between cluster nodes by:
• Eliminating the need for volume dismount/mount
• Automatically balancing distribution of CSV ownership across nodes
CSV-supported workloads include:
o Clustered VMs (highly available disk files)
o Scale-out file shares for application data (Hyper-V VM disk files and SQL Server data
in disaggregated scenarios)
CSV cache:
o Provides block-level caching of read-only unbuffered I/O operations
File system technologies (4 of 5)
Data deduplication:
o Is available as a Windows Server role service
o Is enabled on a per-volume basis
o Provides deduplication and compression
o Increases usable capacity by:
• Scanning the file system for files meeting the optimization policy
• Breaking files into variable-size chunks
• Identifying unique chunks
• Placing chunks in the chunk store and optionally compressing them
• Replacing parts of the original files with reparse points to the chunk store
o Uses a post-processing approach that:
• Optimizes data on-disk (and does not interfere with writes)
• Runs as scheduled background jobs
• Supports configurable priority based on usage type
File system technologies (5 of 5)
o SMB networking
o
Overview of Storage Spaces Direct (2 of 3)
o SMB Multichannel (required to enable SMB Direct, automatically detects and uses
multiple network connections)
Lesson 1: Test your knowledge
Azure Stack HCI optimizes provisioning of hyperconverged infrastructures but still requires a
significant amount of planning necessary to maximize resiliency, capacity, and performance
of Storage Spaces Direct components
This lesson explains how to plan for an optimal Storage Spaces Direct configuration in Azure
Stack HCI
Topics:
o Plan for Storage Spaces Direct
o Choose drives
Performance
Storage
Spaces
Direct
Resiliency Capacity
Plan for Storage Spaces Direct (2 of 3)
1 O 2 OR 3
R
NVMe
NVMe NVMe
NVMe
NVMe
NVMe
NVMe for capacity NVMe for cache SSD for capacity SSD for capacity
Choose drives (slide 4 of 5)
Hybrid configuration that balances performance and capacity:
1 O 2 OR 3
R SSD SSD
HDD HDD
HDD HDD HDD HDD HDD HDD HDD HDD SSD HDD SSD HDD
NVMe for cacheHDD for capacity SSD for cacheHDD for capacity NVMe for cache
SSD+HDD for capacity
Choose drives (slide 5 of 5)
Manual configurations with dual-flash configurations, leveraging slower flash
drives to increase capacity:
1 O 2 OR 3
R
SSD SSD SSD SSD
1 1 1 1
NVMe1 NVMe1
NVMe2 NVMe2 NVMe2 NVMe2 SSD SSD SSD SSD2 SSD HDD SSD2 HDD
2 2 2 2
NVMe1 for cache
NVMe2 for capacity SSD1 for cache
SSD2 for capacity SSD1 for cacheSSD2+HDD for capacity
Drive symmetry considerations (slide 1 of 3)
Storage Spaces Direct works optimally when every server has the exact same drives
When implementing and maintaining Storage Spaces Direct:
o Each cluster node should have the same types of drives (capacity and cache)
o Each cluster node should have the same number of drives of each type (capacity and
cache)
o For the individual storage type (capacity and cache), the respective drive models and
firmware versions should match, whenever possible
o For the individual storage type (capacity and cache), the respective drive sizes should
match whenever possible
Storage Spaces Direct automatically handles capacity imbalances across drives and across
servers
Mismatches in drive sizes might result in stranded capacity
Drive symmetry considerations (slide 2 of 3)
Stranded capacity:
Stranded capacity
5TB
A A’ B”
B B’ C’
A“
A A’ B’ B”
B C C’ C”
Quorum determines the number of simultaneous failures a highly available entity can
survive
Quorum considers the number votes associated with resources that form the highly available
entity:
o Cluster nodes (for cluster quorum)
o Clustered disks (for pool quorum)
Cluster quorum:
Witness
Server node 2 fails, cluster Vote Vote Vote Vote Vote
stays up Server Server Server Server
Server node 3 fails, cluster node 1 node 2 node 3 node 4
stays up
Witness
Vote Vote Vote Vote
Witness
Vote Vote Vote
Pool quorum:
Server nodes 3 and 4 go
Vote Vote Vote Vote
down, pool stays up
Server Server Server Server
A disk in node 2 goes node 1 node 2 node 3 node 4
down, pool goes down
(simultaneous failure)
Storage pool
Storage pool
Plan for cluster and pool quorums (slide 4 of 5)
Cluster Can survive 1 node Can survive 2 subsequent Can survive 2 concurrent
nodes failure node failures node failures
2 50/50 No No
2 + Witness Yes No No
3 Yes 50/50 No
3 + Witness Yes Yes No
4 Yes Yes 50/50
4 + Witness Yes Yes Yes
5 and above Yes Yes Yes
Plan for cluster and pool quorums (slide 5 of 5)
Cluster Can survive 1 node Can survive 2 subsequent Can survive 2 concurrent
nodes failure node failures node failures
2 No No No
2 + Witness Yes No No
3 Yes No No
3 + Witness Yes No No
4 Yes No No
4 + Witness Yes Yes Yes
5 and above Yes Yes Yes
Plan for volumes (slide 1 of 3)
Volume size:
o Up to 64 TB
o Corresponding footprint
o ReFS
Resiliency type
Storage pool
Plan for volumes (slide 2 of 3)
o Cache drives (data in the cache benefits from the same resiliency as capacity drives)
In this lesson, you will learn about the implementation phases for Storage Spaces Direct in
Azure Stack HCI
Topics:
o Prerequisites for Azure Stack HCI deployment
o Memory
o Boot device
o Storage
o VLAN IDs
Deploy Storage Spaces Direct in Azure Stack HCI (1 of 2)
1. Assign custom computer names to Windows Server 2019 hosts running on Azure Stack HCI
nodes
2. Join the Windows Server 2019 hosts to an AD DS domain
3. Add Windows Server 2019 roles and features to each host operating system:
o Hyper-V
o Failover Clustering
o RSAT-Clustering-PowerShell
o Hyper-V PowerShell
o Data Center Bridging (optional when using iWARP)
4. Configure network connectivity (SET and RDMA)
5. Configure Storage Spaces Direct:
o Setup Failover Clustering, including the Witness
o Enable Storage Spaces Direct
6. Optionally, deploy SDN
7. If applicable, follow OEM-specific configuration to complete the initial deployment
Deploy Storage Spaces Direct in Azure Stack HCI (2 of 2)
One of your objectives is to minimize the effort associated with deployment and management
of on-premises resources. As part of this effort, you want to test the process of implementing
a Storage Spaces Direct cluster in an automated manner by using Windows PowerShell.
Lab A: Implementing a Storage Spaces Direct cluster by using
Windows PowerShell
Exercise 1: Implementing a Storage Spaces Direct cluster by using Windows PowerShell
Lesson 4: Managing
Storage Spaces Direct in
Azure Stack HCI
Lesson 4 overview
This lesson covers managing the Storage Spaces Direct disk technologies that Azure Stack
HCI uses, including the use of Storage Spaces Direct volumes, and deduplication and
compression
Topics:
o Create volumes
o Extend volumes
In Windows PowerShell:
Create volumes with the default resiliency settings:
o Run New-Volume with the ResiliencySettingName parameter specifying Mirror
or Parity
New-Volume -FriendlyName "Volume1" -FileSystem CSVFS_ReFS -
StoragePoolFriendlyName S2D* -Size 1TB -ResiliencySettingName Mirror
New-Volume -FriendlyName "Volume2" -FileSystem CSVFS_ReFS -
StoragePoolFriendlyName S2D* -Size 1TB -ResiliencySettingName Parity
Create volumes with default storage tier templates:
o Run New-Volume with the StorageTierFriendlyNames parameter specifying
Performance and Capacity tier templates
New-Volume -FriendlyName "Volume3" -FileSystem CSVFS_ReFS -
StoragePoolFriendlyName S2D* -StorageTierFriendlyNames Performance, Capacity -
StorageTierSizes 300GB, 700GB
Create volumes (3 of 3)
In Windows PowerShell:
Create volumes with nested storage tier templates:
o First run New-StorageTier with the MediaType and StorageTierFriendlyNames
parameters
New-StorageTier -StoragePoolFriendlyName S2D* -FriendlyName NestedMirror
-ResiliencySettingName Mirror -MediaType HDD -NumberOfDataCopies 4
New-StorageTier -StoragePoolFriendlyName S2D* -FriendlyName NestedParity
-ResiliencySettingName Parity -MediaType HDD -NumberOfDataCopies 2
-PhysicalDiskRedundancy 1 - NumberOfGroups 1 -FaultDomainAwareness
StorageScaleUnit -ColumnIsolation PhysicalDisk
o Then run New-Volume with the StorageTierFriendlyNames parameter
New-Volume -StoragePoolFriendlyName S2D* -FriendlyName Volume02
-StorageTierFriendlyNames NestedMirror, NestedParity -StorageTierSizes 100GB,
400GB
Extend volumes
1. Ensure there is enough capacity in the storage pool to accommodate the extended
footprint
2. Use either of the following:
o Windows Admin Center:
1. Ensure that the Data Deduplication Windows Server role service is installed on all cluster
nodes
2. Use any of the following:
o Windows Admin Center:
Procedure:
1. Run cluster validation using:
• Windows Admin Center, Failover Cluster Manager, or Test-Cluster
2. Add cluster node using:
• Windows Admin Center, Failover Cluster Manager, or Add-ClusterNode
Scale Storage Spaces Direct (slide 2 of 3)
Procedure:
1. Add physical disks to each server
o Retiring disks
o Restoring resiliency
o Disk pooling
o Get-ClusterPerformanceHistory
The Windows Server 2019 installable System Insights feature identifies future resource needs
by providing:
Automatic collection of metrics and events
Predictive analytics based on collected data
Integration with Azure Monitor and System Center Operations Manager
Troubleshoot Storage Spaces Direct and replace failed disks (3 of
3)
Start troubleshooting with the following steps:
1. Run cluster validation and focus on the Storage Space Direct section
2. Confirm that storage components are part of the Azure Stack HCI Catalog offering
3. Inspect the storage for any faulty drives and replace them
4. Update storage and drive firmware if necessary
5. Update network adapter drivers and firmware
Review additional troubleshooting scenarios if the problem remains unresolved:
Virtual disk resources with the No Redundancy operational status
“Unsupported media type” error message when enabling Storage Spaces Direct
Slow I/O performance
Slow file copy
Lesson 4: Test your knowledge
Now that you have provisioned a Storage Spaces Direct cluster in an automated manner by
using Windows PowerShell, you want to determine whether you can minimize administrative
effort associated with remediating disk failures within a Storage Spaces Direct cluster by
leveraging its resiliency and self-healing capabilities
Lab B: Managing storage of a Storage Spaces Direct cluster by
using Windows Admin Center and Windows PowerShell
Exercise 1: Managing storage of a Storage Spaces Direct cluster by using Windows Admin
Center and Windows PowerShell
Instructor-led lab C:
Managing and
monitoring resiliency
of a Storage Spaces
Direct cluster
Managing and monitoring resiliency of a
Storage Spaces Direct cluster
Lab C scenario
You want to examine resiliency in situations when there are simultaneous cluster node and
drive failures. You want to understand how resiliency can protect cluster stability and integrity.
To start, you will create tiered volumes and test volume, disk, and cluster resiliency.
Lab C: Managing and monitoring resiliency of a Storage Spaces
Direct cluster
Exercise 1: Managing and monitoring resiliency of a Storage Spaces Direct cluster
Instructor-led lab
D: Managing
Storage Spaces
Direct cluster
tiers
Managing Storage Spaces Direct cluster
tiers
Lab D scenario
Now that you know more about cluster resiliency, you want to explore additional provisions
that could help you optimize storage capacity and performance. To accomplish this, you will
configure and evaluate storage tiers, including tiers that involve nested resiliency.
Lab D: Managing Storage Spaces Direct cluster tiers
QoS is a set of networking and storage technologies that allow you to control the flow of
network traffic based on its characteristics so that you can optimize system functionality and
workload performance. This lesson covers Storage QoS in the context of Azure Stack HCI.
Topics:
o QoS
QoS helps you address performance requirements of workloads that rely on shared
infrastructure
Azure Stack HCI relies on Network QoS to provide the following functionality:
Bandwidth management
Classification and tagging
Priority-based flow control
QoS policies
In Azure Stack HCI clusters, Network Storage Spaces Direct QoS leverages RDMA networking:
RoCE over UDP/IP with DCB providing flow control and congestion management
iWARP over TCP/IP, with TCP providing flow control and congestion management
DCB relies on PFC to prioritize traffic based on a class assigned to a payload, such as:
SMB Direct
Failover clustering traffic
Configure QoS policies (1 of 2)
Storage Replica is a Windows Server technology that enables replication of volumes between
servers or clusters for the purpose of disaster recovery. This lesson covers functionality,
architecture, configuration options, implementation, monitoring, and troubleshooting of
Storage Replica in the context of Azure Stack HCI.
Topics:
o Storage Replica features
o Azure Stack HCI and Storage Replica
o Implement Storage Replica
o Demonstration: Implement Storage Replica
o Monitor and troubleshoot Storage Replica
Storage Replica features (slide 1 of 2)
o Synchronous or asynchronous
1 5
Storage Replica topologies:
o Stretch cluster
3
o Cluster-to-cluster Server Server
Cluster (SR) Cluster (SR)
4
o Server-to-server
o Server-to-self t
t 2 3
1
Caching considerations:
Storage Replica in Storage Spaces Direct clusters might result in increased latency in the
following scenarios:
o HDD capacity tier with NVMe-based cache
o HDD capacity tier with SSD-based cache
To remediate this issue:
o Use a mix of NVMe and SSD drives (rather than HDD)
o Configure NVMe and SSD drives as performance and capacity tiers, respectively
o Place Storage Replica log volume on the performance tier
o Place Storage Replica data volume on the capacity tier
Deduplication considerations:
Storage Replica supports Data Deduplication
To implement it:
o Install Data Deduplication on both the source and destination servers
o Enable deduplication on the data volume on the source server only
Implement Storage Replica (1 of 2)
In Windows PowerShell:
1. Identify the source volume
2. Create the destination volume and log volumes
3. Install on the source and destination servers:
o Storage Replica role service
4. Run Test-SRTopology
5. Grant the first cluster full access to the second cluster
6. Grant the second cluster full access to the first cluster
7. Create a Storage Replica partnership
Demonstration:
Implement
Storage Replica
Create source volumes
Create destination volumes
Use Windows Admin Center to implement
Storage Replica
Monitor and troubleshoot Storage Replica
Monitoring:
Performance: \\Storage Replica Partition I/O Statistics(*) and \Storage Replica
Statistics(*)
Status via Get-SRGroup and Get-SRPartnership
Initial replication via Event 1237 message in Storage Replica Admin event log or via Get-
SRGroup
Replication via the Storage Replica Statistics\Total Bytes Received counter on the
destination server
Troubleshooting:
Use Test-SRTopology whenever applicable
Investigate generic error messages displayed by Test-SRTopology, which might indicate
that:
o You are logged on to the source server as a local user (rather than a domain user)
o You specified an incorrect name or IP address of the destination server
o The destination server firewall is blocking access of Windows PowerShell cmdlets
o The destination server is not running the WMI service
Lesson 6: Test your knowledge
Another resiliency consideration you want to explore is metadata of the storage pool and its
components. You want to ensure that you understand resiliency provisions that must be taken
into account to protect cluster stability and integrity. You also want to be able to identify how
a Storage Spaces Direct cluster maintains information about its data.
Lab E: Identifying and analyzing metadata of a Storage Spaces
Direct cluster (optional)
Exercise 1: Identifying and analyzing metadata for a Storage Spaces Direct cluster
Module-review questions (slide 1 of 2)
3. Which resiliency type provides the best performance for random read/writes and highest
resiliency on a two-node Storage Spaces Direct cluster?
a. Two-way mirror
b. Nested two-way mirror
c. Mirror-accelerated parity
d. Nested mirror-accelerated parity
Module-review answers